How to Achieve High Availability Nirvana: Cluster or Load Balance?

Everyone wants their systems to be highly available, and for good reason. The age-old adage “time is money” has never been more true than in today’s interconnected world. Companies clamor to make sure their systems are always online, working around the clock to get as close as possible to the high availability nirvana of 99.9999 percent. Load balanced and clustered setups offer promises of high availability. But how exactly do they work?

For us to answer these questions, we first must have a good understanding of what it means for a system to be highly available. High availability is the idea that a system will maintain continuous, unrestricted operation over a prolonged period of time. The goal of a highly available system is to be available to accept user input and deliver a successful response at any moment.

Much of the confusion surrounding high availability is in understanding two components of its implementation: load balancing and clustering. Oftentimes administrators believe simply placing a load balancer in front of their system will make it highly available. This is far from the truth.

What is Load Balancing?

The purpose of a load balancer is to simply direct user requests between multiple resources (i.e., servers). Oftentimes load balancers are programmed to maximize resource utilization. Using complex algorithms, these load balancers direct user traffic to available resources that have the least amount of current use. Load balancing allows administrators to set up multiple instances of a system and equally distribute traffic between them so that there is no single point of failure. If one instance goes down, the load balancer will redirect traffic until it is brought back up.

Though it is a necessary part, by itself load balancing does not make a system highly available. Load balancing may direct traffic among resources, but it does not ensure a consistent user experience between them. This is because load balancing has nothing to do with resources communicating information between one another. Load balancing alone does not ensure user sessions are maintained between servers. Information such as system cache and search indexes can also become inconsistent between servers, rendering entire resources unavailable and making system processes unreliable.

Figure A: Load Balancing Setup

What is Clustering?

Clustering is the method of utilizing multiple instances of a system (also known as nodes) to interact with users as if they were a large, singular system. This means that the user experience is the same across all nodes. A consistent experience between nodes is achieved by collectively maintaining information between each node. Information such as cache, database content, search indexes, and more are either communicated between nodes directly or stored in a centralized location for all nodes to access. Much like load balancing, clustering removes a single point of failure, as each node is a point of access to the system.

While clustering your system is very important, like load balancing, it alone does not make a system highly available. Clustering without a load balancer requires users to explicitly request access to the system from each node. This means the user would need to enter a URL specific for each node to access the system. This removes the ability to have a single, easily recognizable point of entry (URL) to the system. Even more concerning, with this approach users choose the node from which they wish to access the system. Causing a specific node to receive a majority of the user traffic, while other nodes are never utilized. This is horribly inefficient, ruining system performance or causing ill-equipped nodes to crash.

Figure B: Cluster Setup

Working Together

Looking at both load balancing and clustering separately, we can see that alone they fit very specific and niche use cases. In reality, the question is very rarely about whether to use load balancing or clustering. Making your system highly available most likely means utilizing both. This is because both load balancers and clusters complement one another. Load balancers are built to increase resource utilization, the main problem of clustering alone. Clusters are built to provide consistent data between nodes for a cohesive experience, the main drawback of using only a load balancer.

By positioning a load balancer in front of the cluster, the entire system moves towards high availability. With this setup, all user requests are sent to a single location or URL to be redirected by the load balancer. The load balancer then redirects these requests to the least utilized node in the cluster, increasing the system’s performance and efficiency. Changes made by the user in one node are carried throughout the entire system as each node communicates with other nodes and any centralized data stores. This means that each user has the same consistent data and system state available to them across all nodes in the cluster. If a node were to crash and shutdown during a user’s session, the load balancer would automatically redirect the user’s request to the next available node, where their session would be maintained with no progress lost. Ultimately, the system’s availability increases, as there is no downtime or moment of lost operation.

Figure C: Load Balancing with a Cluster Setup

Open Source Software: The High Availability Advantage

It is easy to see how the principles of clustering and load balancing work together to create a highly available system. However, the real test comes in implementing this paradigm in the context of your system’s architecture. This is because many applications require vendor-specific software to become clustered or load balanced, software that may not fit into your company’s system architecture.  This vendor-specific software must be purchased and adopted into the system architecture, dramatically increasing total cost of ownership and requiring massive hours of work to install and configure.

Oftentimes open source applications remove this issue by following many different open standards. These standards allow the application to easily connect and integrate with other systems that are compliant with the standard. This includes standards such as: SOAP and JSON for Web Services, HTML5 and CSS3 for Web Standards, WebDAV for Web-based Distributed Authoring and Versioning, and more Software that follows this approach is called “vendor-agnostic.” The application does not require vendor-specific software at any level, since these open standards are followed quite pervasively. Vendor-agnostic, open source software can fit into almost any currently existing system architecture, greatly reducing the total cost of ownership and making it much easier to create highly available systems.

Critics of open source software often state that the costs of adoption outweigh the benefits of the aforementioned flexibility. These costs may include the loss of support and updates as well as a lack of reliability compared to proprietary software. However, this could not be farther from the truth as many open source software companies have begun adopting a subscription-based pay model to provide consistent, reliable support alongside regular updates.

Going Further

All single points of failure must be removed in order to truly have a highly available system. As noted in the comments, the diagrams above display a single load balancer between the user requests and application servers as well as a single controller between the application and database servers. These single points of failure can cause quite a headache if they are compromised. Imagine if the single load balancer above were to crash or fail, the entire system would be unreachable from all user requests, rendering the system completely unavailable. To take this paradigm even further, imagine that all the hardware above were to be housed in a single physical space, or data center. If this single data center were to experience a power or network outage, the entire system would again become unavailable.

The goal of high availability does not end once you cluster and load balance your system. Your entire architecture must be evaluated and all single points of failure should be assessed. 

Additional Resources

There are many posts that go in-depth over the nuances of high availability, including clustering and load balancing. For a closer look at what is involved in clustering and load balancing open source software like Liferay, read Why Load Balancing != Clustering in Liferay. In another post, Common Pitfalls when Clustering Liferay, we look at some of the pitfalls surrounding clustering open source software like Liferay.

There are many advantages to open source software in addition to high availability such as cost effectiveness and greater security. Learn more about the benefits of open source in our whitepaper Open vs. Closed Source.

Download the Open vs. Closed Source Whitepaper

Blogs
As you state that time is money: It's 23,28€ now (that's 11,28$ for the US guys, not sure how to calculate the "pm") and I shouldn't be commenting any more. However, it looks like clustering implies session replication, but it can very well run without it - and in most cases it should. The aspect clustering can't go without is cache invalidation (or cache synchronization) - the diagrams would be more accurate with this label. Do you agree?
Java Pairing-Based Cryptography (JPBC) I think you were looking for JDBC. So we get HA nirvana with a single LB, I'd call that a single point of failure. What about running this thing in multiple data centers so that we don't have a single point of failure with the data center. I think that at least deserves a mention when talking about HA nirvana.
Thank you for your feedback, Mika. You are correct, the diagrams should read "JDBC" as the connections highlighted in the diagram are between the application server and database. The diagrams should be updated to reflect this soon. However, the diagrams show a single data center for simplicity. Your point about the necessity of multiple data centers for high availability is completely valid and the article should now be updated to show this.
Very informative Blog Eric. I was looking around the implementation of the same solutions for our enterprise level application. We are going with Active - Passive mode. We believe we can achieve clustering through cluster manager of OS for this we are planning to purchase Red Hat Enterprise Linux OS.