Setup Global load balancing for your site using Open source nginx

Nginx, called engine-x is a high performance HTTP server and reverse proxy, with proxy capabilities for IMAP/POP3/SMTP. Nginx is the creation of Russian developer, Igor Sysoev, and has been running in production for over two years. The latest stable release at the time of writing is Nginx 0.5.30, and is the focus of this article. While Nginx is capable of proxying non-HTTP protocols, we’re going to focus on HTTP and HTTPS.

High Performance, Yet Lightweight

Nginx uses a master process and N+1 worker process model. The number of workers is controlled by the configuration, yet the memory footprint and resources used by Nginx are several orders of magnitude less than Apache. Nginx uses epoll() in Linux. In our lab, Nginx was handling hundreds of requests per second, while using about 16MB of ram and a consistent load average of about 1.00. This is considerably better than Apache 2.2, and Pound doesn’t scale well with this type of usage (high memory usage, lots of threads). In general, Nginx offers a very cost effective solution.

Lighttpd

Lighttpd is a great lightweight option, but it has a couple of drawbacks. Nginx has very good reverse proxy capabilities with integrated basic load balancing. This makes it a very good option as a front end to dynamic web applications, such as those running under Rails and using Mongrel. Lighttpd on the other hand, has an old and unmaintained proxy module. Now it does have a new proxy module with Lighttpd 1.5.x, but that is the other problem with Lighttpd, where its going. Lighttpd 1.4 is lightweight, relies on very few external libraries and is fast. Lighttpd 1.5.x on
the other hand requires many more external libraries, including glib, now I don’t know about you but anything using glibc is far from “lightweight”.

Basic Configuration

The basic configuration of Nginx specifies the unprivileged user to run as, the number of worker processes, error log, pid and events block. After this basic configuration block, you have per protocol blocks (http for example).

user nobody;

worker_processes 4;

error_log logs/error.log;

pid logs/nginx.pid;

events {

worker_connections 1024;

}

Basic HTTP server

Nginx is relatively easy to configure as a basic web server, it supports IP and Name based virtual hosts, and it uses a pcre based URI processing system. Configuring static hosting is very easy, you just specify a new server block:

server {

listen 10.10.10.100:80;

server_name www.foocorp.com foocorp.com;

access_log logs/foocorp.com.log main;

location / {

index index.html index.htm;

root /var/www/static/foocorp.com/htdocs;

}

}

Here we are listening on port 80 on 10.10.10.100, with name virtual hosting using www.foocorp.com and foocorp.com. The server_name option also supports wildcards, so you can specify *.foocorp.com and have it handled by the configuration. The usual access logs, and root specifies htdocs. If you have a large number of name virtual hosts, you’ll need to increase the size of the hash bucket with server_names_hash_bucket_size 128;

Gzip compression

Nginx like many other web servers, can compress content using gzip.

gzip on;

gzip_min_length 1100;

gzip_buffers 4 8k;

gzip_types text/plain text/html text/css text/js;

Here Nginx allows you to enable gzip, specify a minimum length to compress, buffers and the mime types that Nginx will compress. Gzip compression is supported by all modern browsers.

HTTP Load Balancing

Nginx can be used a simple HTTP load balancer, in this configuration, you would place Nginx in front of your existing web servers. The existing web servers can be running Nginx as well. In HTTP load balancer mode, you simply need to add an upstream block to the configuration :

upstream a.serverpool.foocorp.com {

server 10.80.10.10:80;

server 10.80.10.20:80;

server 10.80.10.30:80;

}

upstream b.serverpool.foocorp.com {

server 10.80.20.10:80;

server 10.80.20.20:80;

server 10.80.20.30:80;

}

Then in the server block, you add the line:

proxy_pass http://a.serverpool.foocorp.com;

Health Check Limitations

Nginx has only simple load balancing capabilities. It doesn’t have health checking capabilities and it uses a simple load balancing algorithm. However, Nginx is a relatively new project, so one would expect to see various load balancing algorithms and health checking support added over time. While it might not be wise to replace your commercial load balancer with Nginx anytime soon, Nginx is almost there in terms of a very competitive solution. Monit, and other monitoring applications offer good options to compensate for a lack of health checking capabilities in Nginx.

Global Server Load Balancing

Nginx has a very interesting capability. With a little configuration can provide Global Server Load Balancing. Now Global Server Load Balancing (GSLB) is a feature you’ll find on high-end load balancing switches such as those from F5, Radware, Nortel, Cisco etc. Typically GSLB is an additional license you have to purchase for a few thousand dollars, on top of a switch that typically start around US$10,000.

GSLB works by having multiple sites distributed around the world, so you might have a site in Europe, a site in Asia and a site in North America. Normally, you would direct traffic by region by using different top level domains (TLD). So www.foocorp.com might go to North America, www.foocorp.co.uk to Europe, www.foocorp.com.cn to the server in Asia. This isn’t a very effective solution because it relies on the user to visit the proper domain. A user in Asia, might see a print advertisement for the North American market, hitting the .com address means they aren’t visiting the closest and fastest server.

GSLB works by looking at the source IP address of the request, and then determines which site is closest to that source address. The simplest method is to break the Internet address space down per region, then to route
traffic to the local site in that region. When we say region, we mean – North America, South America, EMEA (Europe, Middle East and Africa) and APAC (Asia-Pacific).

Configuring Nginx for GSLB

The geo {} block is used to configure GSLB in Nginx, the geo block causes Nginx to look at the source IP, and set a variable based on the configuration. The nice thing with Nginx is that you can set a default.

geo $gslb {

default na;

include conf/gslb.conf

}

Here in our configuration, we’re setting the default to na (North America) and then including the gslb.conf. The configuration file gslb.conf is a basic file consisting of subnet variable. Here is an excerpt from gslb.conf:

32.0.0.0/8 emea;

41.0.0.0/8 emea;

43.0.0.0/8 apac;

When Nginx receives a request from a source IP in 32.0.0.0/8 (for those of you unfamiliar with slash notation, this is the entire Class A, 32.0.0.0 thru 32.255.255.255), it sets the variable $gslb to emea. We then use that later in the configuration to redirect.

Inside the location block of our server configuration in Nginx, we add a number of if statements before the proxy_pass (if used) statement. These instruct the server to do a HTTP 302 Redirect (temporary redirect).

if ($gslb = emea) {

rewrite ^(.*) http://europe.foocorp.com$1 redirect;

}

if ($gslb = apac) {

rewrite ^(.*) http://asia.foocorp.com$1 redirect;

}

These are configured under the www.foocorp.com named virtual server, if someone from North America hits www.foocorp.com, it hits the default and simply loads from the same server. If the user is from Europe, the request should match one of the subnets listed in gslb.conf, and sets the gslb variable to emea. This request causes the North American site hosting the .com domain to redirect the client to the server(s) at the site in Europe.

On the European server, the configuration is slightly different. Instead of the emea check, you check for NA and redirect to the US site. This is to handle the situation when someone in North America hits the .eu or .co.uk site.

if ($gslb = na) {

rewrite ^(.*) http://www.foocorp.com$1 redirect;

}

Traffic Control: In-region not always faster

The problem with commercial solutions is that they are too generalized. In our example configurations so far, we make some pretty wild assumptions. The problem with the Internet is that a user in Asia, might not for example, have a faster connection to servers in Asia. A good example of this is India and Pakistan. A server hosted in Hong Kong or Singapore, is in Asia, and would be considered “in region” for customers in India and Pakistan. The reality though is that traffic from those countries to Hong Kong, is actually routed through Europe, so packets from India to Hong Kong, go from India thru Europe, across the United States and hit Hong Kong from the Pacific. However, in the same subnet, customers in Australia are only a few hops away from Hong Kong.

In such a situation, with commercial solutions, you are just out of luck, but with Nginx you can fine tune how traffic is directed. Here we know 120.0.0.0/6 is mainly APAC, but 122.162.0.0/16 and 122.163.0.0/16 have faster connections to Europe. So, we simply add these subnets to the configuration. Nginx will use the closest match to the source IP. So 122.162.0.0/16 is
finer grained than 120.0.0.0/6, so Nginx will use it.

Manual Tuning

The initial tuning can be done by using the whois command, for example whois 120.0.0.0 will give you an idea which region it belongs to – ARIN, RIPE, etc. ARIN, RIP, APNIC, AFRINIC, and LACNIC are regional internet registries or RIR. An RIR is an organization overseeing the allocation and registration of Internet number resources within a particular region of the world. IP addresses both IPv4 and IPv6 are managed by these RIRs. However, as in our previous example, you’re going to need to fine tune the gslb configuration with traceroute and ping information. Probably the best approach is to do a general configuration and then fine tune the configuration based on feedback from customers.

Cost Savings vs. Features

Looking at a well known Layer 4-7 switching solution, you would need a minimum of $15k per site to purchase the necessary equipment and licensing. Commercial solutions do have some additional fault tolerant measures, such as the ability to measure load and availability of servers at remote sites. However, with Nginx offering a very close solution which is available for FREE with the source code, it is only a matter of time before such features are part of Nginx or available thru other projects.

gslb.conf

The following is an initial example of gslb.conf, it should be sufficient for most users.