Recently I started to tackle a load problem on one of my personal sites, the issue was that of a poorly written but exceedingly MySQL heavy application and the load it would induce on the SQL server when 400-500 people were hammering the site at once. Further compounding this was Apache’s horrible ability to gracefully handle excessive requests on object heavy pages (i.e: images). This left me with a site that was almost unusable during peak hours — or worse — would crash the MySQL server and take Apache with it by frenzied F5ing from users.

I went through all the usual rituals in an effort to better the situation, from PHP APC then Eaccelerator, to mod_proxy+mod_cache, to tuning Apache timeouts/prefork settings and adjusting MySQL cache/buffer options. The extreme was setting up a MySQL replication cluster with MySQL-Proxy doing RW splitting/load balancing across the cluster and memcached, but this quickly turned into a beast to manage and memcached was eating memory at phenomenal rates.

Although I did improve things a bit, I had done so at the expense of vastly increased hardware demand and complexity. However, the site was still choking during peak hours and in a situation where switching applications and/or getting it reprogrammed is not at all an option, I had to start thinking outside the box or more to the point, outside Apache.

I have experience with lighttpd and pound reverse proxy, they are both phenomenal applications but neither directly handles caching in a graceful fashion (in pounds case not at all). This is when I took a look a nginx which to date I had never tried but heard many great things about. I fired up a new Xen guest running CentOS 5.4, 2GB RAM & 2 CPU cores….. an hour later I had nginx installed, configured and proxy-caching traffic for the site in question.

The impact was immediate and significant — the SQL server loads dropped from an average of 4-5 down to 0.5-1.0 and the web server loads were near non-existent from previously being on the brink of crashing every afternoon.

Enough with my ramblings, lets get into nginx. You can download the latest release from http://nginx.org and although I could not find a binary version of it, compiling was straight forward with no real issues.

First up we need to satisfy some requirements for the configure options we will be using, I encourage you to look at ‘./configure –help’ list of available options as there are some nice features at your disposal.

This will install nginx into ‘/usr/local/nginx’, if you would like to relocate it you can use ‘–prefix=/path’ on the configure options. The path layout for nginx is very straight forward, for the purpose of this post we are assuming the defaults:

The layout will be very familiar to anyone that has worked with Apache and true to that, nginx breaks the configuration down into a global set of options and then the individual web site virtual host options. The ‘conf/’ folder might look a little intimidating but you only need to be concerned with the nginx.conf file which we are going to go ahead and overwrite, a copy of the defaults is already saved for you as nginx.conf.default.

Lets take a moment to review some of the more important options in nginx.conf before we move along…

user nobody nobody;
If you are running this on a server with an apache install or other software using the user ‘nobody’, it might be wise to create a user specifically for nginx (i.e: useradd nginx -d /usr/local/nginx -s /bin/false)

worker_processes 4;
This should reflect the number of CPU cores which you can find out by running ‘cat /proc/cpuinfo | grep processor‘ — I recommend a setting of at least 2 but no more than 6, nginx is VERY efficient.

proxy_cache_path /usr/local/nginx/proxy … inactive=7d max_size=1000m;
The ‘inactive’ option is the maximum age of content in the cache path and the ‘max_size’ is the maximum on disk size of the cache path. If you are serving up lots of object heavy content such as images, you are going to want to increase this.

proxy_send|read_timeout 60;
These timeout values are important, if you run any scripts through admin interfaces or other maintenance URL’s, these values will cause the proxy to time them out — that said increase them to sane values as appropriate, anything more than 300 is probably excessive and you should consider running such tasks from cronjobs.

Lets go ahead and get our initial vhosts file created, my template is available from http://www.rfxn.com/downloads/nginx.vhost.conf and should be saved to ‘/usr/local/nginx/vhosts/myforums.com.conf’, the contents of which are as follows:

The obvious changes you want to make are ‘myforums.com’ to whatever domain you are serving, you can append multiple aliases to the server_name string such as ‘server_name domain.com alias www.domain.com alias sub.domain.com;‘. Now, lets take a look at some of the important options in the vhosts configuration:

listen 80;
This is the port which nginx will listen on for this vhost, by default unless you specify an IP address with it, you will bind port 80 on all local IP’s for nginx — you can limit this by setting the value as ‘listen 10.10.3.5:80;‘.

proxy_pass http://10.10.6.230;
Here we are telling nginx where to find our content aka the backend server, this should be an IP and it is also important to not forget setting the ‘proxy_set_header Host’ option so that the backend server knows what vhost to serve.

proxy_cache_valid
This allows us to define cache times based on HTTP status codes for our content, for 99% of traffic it will fall under the ‘200 301 302 20m’ value. If you are running allot of dynamic content you may want to lower this from 20m to 10m or 5m, any lower defeats the purpose of caching. The ‘404 1m’ value ensures that not found pages are not stored for long in case you are updating the site/have a temporary error but also prevent 404’s from choking up the backend server. Then the ‘any 15m’ value grabs all other content and caches it for 15m, again if you are running a very dynamic site you may want to lower this.

proxy_cache_use_stale
When the cache has stale content, that is content which has expired but not yet been updated, nginx can serve this content in the event errors are encountered. Here we are telling nginx to serve stale cache data if there is an error/timeout/invalid header talking to the backend servers or if another nginx worker process is busy updating the cache. This is really useful in the event your web server crashes, as to clients they will receive data from the cache.

location /admin
With this location statement we are telling nginx to take all requests to ‘http://myforums.com/admin’ and pass it off directly to our backend server with no further interaction — no caching.

That’s it! You can start nginx by running ‘/usr/local/nginx/sbin/nginx’, it should not generate any errors if you did everything right! To start nginx on boot you can append the command into ‘/etc/rc.local’. All you have to do now is point the respective domain DNS records to the IP of the server running nginx and it will start proxy-caching for you. If you wanted to run nginx on the same host as your Apache server you could set Apache to listen on port 8080 and then adjust the ‘proxy_pass’ options accordingly as ‘proxy_pass http://127.0.0.1:8080;’.

Extended Usage:
If you wanted to have nginx serve static content instead of Apache, since it is so horrible at it, we need to declare a new location option in our vhosts/*.conf file. We have two options here, we can either point nginx to a local path with our static content or have nginx cache our static content then retain it for longer periods of time — the later is far simpler.

In the above, we are telling nginx that our static content is located at ‘/home/myuser/public_html’, paths must be relative!! When a user requests ‘http://www.mydomain.com/img/flyingpigs.jpg’, nginx will look for it at ‘/home/myuser/public_html/img/flyingpigs.jpg’. The expires option can have values in seconds, minutes, hours or days — if you have allot of dynamic images on your site then you might consider an option like 2h or 30m, anything lower defeats the purpose. Using this method has a slight performance benefit over the cache option below.

With this setup we are telling nginx to cache our static content just like we did with the parent site itself, except that we are defining an extended time period for which the content is valid/cached. The time values are, content is valid for 2h (nginx updates cache) and every 2 days the content expires (client browsers cache expires and requests again). Using this method is simple and does not require copying static content to a dedicated nginx host.

We can also do load balancing very easily with nginx, this is done by setting an alias for a group of servers, we then define this alias in place of addresses in our ‘proxy_pass’ settings. In the ‘upstream’ option shown below, we want to list all of our web servers that load should be distributed across:

This must be placed in the ‘http { }’ section of the ‘conf/nginx.conf’ file, then the server group can be used in any vhost. To do this we would replace ‘proxy_pass http://208.76.83.135;’ with ‘proxy_pass http://my_server_group;’. The requests will be distributed across the server group in a round-robin fashion with respect to the weighted values, if any. If a request to one of the servers fails, nginx will try the next server until it finds a working server. In the event no working servers can be found, nginx will fall back to stale cache data and ultimately an error if that’s not available.

Conclusion:
This has turned into a longer post than I had planned but oh well, I hope it proves to be useful. If you need any help on the configuration options, please check out http://wiki.nginx.org, it covers just about everything one could need.

Although I noted this nginx setup is deployed on a Xen guest (CentOS 5.4, 2GB RAM & 2 CPU cores), it proved to be so efficient, that these specs were overkill for it. You could easily run nginx on a 1GB guest with a single core, a recycled server or locally on the Apache server. I should also mention that I took apart the MySQL replication cluster and am now running with a single MySQL server without issue — down from 4.

5 Comments for this entry

You’d also have to enlarge your limits in /etc/limits.conf (or where ever it is on your distro of choice), if you’d have 1000s of users simultaneous, you could have a bottleneck if there are too many “files” open.

You only need to change the apache port if you are running nginx locally, if thats the case then yes move it to something like port 81 or 8080 and in the nginx vhosts config simply set proxy_pass to something like http://127.0.0.1:81;