Active-Active Load Balancing Configurations

This article provides information on load balancing using an active-active configuration. Active-active clusters are commonly used to serve high traffic websites, databases, and mail servers. Another common load balancing configuration is active-passive.

Overview

Active-active (I will refer to as AA throughout this article) is commonly used to distribute a load among 2 or more servers. AA configurations can be used for websites, databases, mail servers, and more. Another common form of load balancing is called active-passive. Active-passive is commonly used for redundancy. Server is a cluster are referred to as nodes.

Active-Active Configurations

Active-active configurations are configured using several different methods. Each method works a little differently, and some methods require dedicated hardware or software.

Round Robin is the cheap and simple way to configuration to run an AA configuration. This is done by simply adding 2 or more A records for the host which needs to run as AA. The diagram below shows a cluster of 4 servers running an AA configuration with round-robin AA:

While this method does work, it can have several problems. Let us take a look at a sample of A records:

Shown above you can see 4 A records for a web cluster. In this configuration, the A records will be returned to the requester in the order you see them. Meaning, visitor 1 will resolve cluster.skullbox.net as 172.17.55.100. Vistor 2, will resolve it as 172.17.55.101, so on and so fourth. While this seems good enough, the following events can occur:

DNS caching

Cluster node failure

Session overload

DNS caching can occur on a local machine or from an ISP. If vistor 1 and vistor 2 have the same ISP, that ISP may cache the record once vistior 1 performs a lookup. This will result in both visitor 1 and visitor 2 requesting data from the same server. Therfore, cached DNS records are defeating the round robin operation. Excessive caching may cause 1 or more servers to become overloaded. I have seen this happen in production environments before. Server 1 and 2 had heavy loads while servers 3 and 4 had almost none. Depending on the configuration of the actual web server software, excessive traffice can cause session overloads. When this happens visitors may receive a number of different error messages. Most commonly, the server will be too busy to handle the request and simply time out. A 404 error will be displayed and the visitor will simply think the site is down. While in fact, the site is not down, but the server they are trying to request data from is overloaded. If they were to manually using another IP address from the cluster, the site would display fine.

Node failures are another problem when using round robin. Round robin has no way of knowing if nodes in the cluster are online or if they even exist. There is no intelligent hardware or software at work and because of this, visitors will receive a 404 error if an A record of a failed node is returned to them. From an administrative side, site operators must either make sure all servers are running 24x7 or add the IP address of a failed node to active one. If you have bad hardware, this can become a cat and mouse game and it always seems to happen at the worst time.

Hardware-based devices are more common, but they come with a price. These devices can be tricky to configure, but allow more flexibility for active-active configurations. Unlike round robin, you do not need multiple A records. A hardware load balancer is usually configured with 1 IP address. The same device also knows the IP addresses of every server in a the cluster. It makes intelligent decisions based on the existing traffic and decides which server in the cluster to route the next request to. This more effecient and effective than round robin. See the example below:

The active-active configuration above is commonly done with hardware devices. The big players in this market include F5, Foundry, Radware, and others. Cisco and Juniper do have products for load balancing although they never launched as well as hoped. The Juniper DX line has a very short life before being discontined. Cisco's Global Site Selector and Load Director did not sell as expected either. Other vendors such as Zeus offer software that can be loaded on a generic server therefore making it a "software-based appliance." These and other solutions are primary Linux-based and range from free to outragously expensive. Barracuda, a company that originally produced spam filtering appliances, also provides load balancing, SSL VPN, and other network solutions. Some vendors are even offering virtual appliances for use on platforms like VMware, Xen, and others.

Contact Us

If you found this information useful, click the +1 button

Your E-mail:

Subject:

Type verification image:

Message:

NOTE: this form DOES NOT e-mail this article, it sends feedback to the author.