Technology, Startups and Strategy

I have previously talked about some of the most common nginx questions; not surprisingly, one such question is how to optimize nginx for high performance. This is not really overly surprising since most of new nginx users are migrating over from Apache and thus are used to having to tune settings and perform voodoo magic to ensure that their servers perform as best as possible.

Well I’ve got some bad news for you, you can’t really optimize nginx very much. There’s no magic settings that will reduce your load by half or make PHP run twice as fast. Thankfully, the good news is that nginx doesn’t require any tuning because it is already optimized out of the box. The biggest optimization happened when you decided to use nginx and ran that apt-get install, yum install or make install. (Please note that repositories are often out of date. The wiki install page usually has a more up-to-date repository)

That said, there’s a lot of options in nginx that affects its behaviour and not all of their defaults values are completely optimized for high traffic situations. We also need to consider the platform that nginx runs on and optimize our OS as there are limitations in place there as well.

So in short, while we cannot optimize the load time of individual connections we can ensure that nginx has the ideal environment optimized for handling high traffic situations. Of course, by high traffic I mean several hundreds of requests per second, the far majority of people don’t need to mess around with this, but if you are curious or want to be prepared then read on.

First of all we need to consider the platform to use as nginx is available on Linux, MacOS, FreeBSD, Solaris, Windows as well as some more esoteric systems. They all implement high performance event based polling methods, sadly, nginx only support 4 of them. I tend to favour FreeBSD out of the four but you should not see huge differences and it’s more important that you are comfortable with your OS of choice than that you get the absolutely most optimized OS.

In case you hadn’t guessed it already then the odd one out is Windows. Nginx on Windows is really not an option for anything you’re going to put into production. Windows has a different way of handling event polling and the nginx author has chosen not to support this; as such it defaults back to using select() which isn’t overly efficient and your performance will suffer quite quickly as a result.

The second biggest limitation that most people run into is also related to your OS. Open up a shell, su to the user nginx runs as and then run the command `ulimit -a`. Those values are all limitations nginx cannot exceed. In many default systems the open files value is rather limited, on a system I just checked it was set to 1024. If nginx runs into a situation where it hits this limit it will log the error (24: Too many open files) and return an error to the client. Naturally nginx can handle a lot more than 1024 file descriptors and chances are your OS can as well. You can safely increase this value.

To do this you can either set the limit with ulimit or you can use worker_rlimit_nofile to define your desired open file descriptor limit. (This requires nginx starts as root before dropping its privileges)

Nginx Limitations
With the OS taken care of it’s time to dive into nginx itself and have a look at some of the directives and methods we can use to tune things.

Worker Processes
The worker process is the backbone of nginx, once the master has bound to to the required IP/ports it will spawn workers as the specified user and they’ll then handle all the work. Workers are not multi-threaded so they do not spread the per-connection across CPU cores. Thus it makes sense for us to run multiple workers, usually 1 worker per CPU core. For most work loads anything above 2-4 workers is overkill as nginx will hit other bottlenecks before the CPU becomes an issue and usually you’ll just have idle processes. If your nginx instances are CPU bound after 4 workers then hopefully you don’t need me to tell you.

An argument for more worker processes can be made when you’re dealing with situations where you have a lot of blocking disk IO. You will need to test your specific setup to check the waiting time on static files, and if it’s big then try to increase worker processes.

Worker ConnectionsWorker connections effectively limits how many connections each worker can maintain at a time. This directive is most likely designed to prevent run-away processes and in case your OS is configured to allow more than your hardware can handle. As nginx developer Valentine points out on the nginx mailing list nginx can close keep-alive connections if it hits the limit so we don’t have to worry about our keep-alive value here. Instead we’re concerned with the amount of currently active connections that nginx is handling. The formula for maximum number of connections we can handle then becomes:

worker_processes * worker_connections * (K / average $request_time)

Where K is the amount of currently active connections. Additionally, for the value K, we also have to consider reverse proxying which will open up an additional connection to your backend.

In the default configuration file the worker_connections directive is set to 1024, if we consider that browsers normally open up 2 connections for pipe lining site assets then that leaves us with a maximum of 512 users handled simultaneously. With proxying this is even lower, though, your backend hopefully responds fairly quickly to free to the connection.

All things considered about worker connections it should be fairly clear that if you grow in traffic you’ll want to eventually increase the amount of connections each worker can do. 2048 should do for most people but honestly, if you have this kind of traffic you should not have any doubt how high you need this number to be.

CPU Affinity
Setting CPU affinity basically means you tell each worker which CPU core to use, they’ll then use only that CPU core. I’m not going to cover this much too much except to say that you should be really careful doing this. Chances are your OS CPU scheduler is far, far better at handling load balancing than you are. If you think you have issues with CPU load balancing then optimize this at the scheduler level, potentially find an alternative scheduler but unless you know what you’re doing then don’t touch this.

Keep AliveKeep alive is a HTTP feature which allows user agents to keep the connection to your server open for a number of requests or until the specified time out is reached. This won’t actually change the performance of our nginx server very much as it handles idle connections very well. The author of nginx claims that 10,000 idle connections will use only 2.5 MB of memory, and from what I’ve seen this seems to be correct.

The reason I cover this in a performance guide is pretty simple. Keep alive have a huge effect on the perceived load time for the end user. This is the most important measurement you can ever optimize, if your website seem to load fast to users then they’re happy. Studies done by Amazon and other large online retailers shows that there is a direct correlation between perceived load time and sales completed.

It should be somewhat obvious why keep alive connections have such a huge impact, namely you avoid the whole HTTP connection creation aspect, which is not insignificant. You probably don’t need a keep alive time out value of 65, but 10-20 is definitely recommended, and as previously stated, nginx can easily handle this.

tcp_nodelay and tcp_nopush
These two directives are probably some of the most difficult to understand as they affect nginx on a very low networking level. The very short and superficial explanation is that these directives determine how the OS handles the network buffers and when to flush them to the end user. I can only recommend that if you do not know about these already then you shouldn’t mess with them. They won’t significantly improve or change anything so best to just leave them at their default values.

Hardware Limitations
Since we’ve now dealt with all the possible limitations imposed by nginx it time to figure out how to push the most out of our server. To do this we need to look to the hardware level as this is the most likely place to find our bottleneck.

With servers we have primarily 3 potential bottleneck areas. The CPU, the memory and the IO layers. nginx is very efficient with its CPU usage so I can tell you straight up that this is not going to be your bottleneck, ever. Likewise, it’s also very efficient with its memory usage so this is very unlikely to be our bottleneck as well. This leaves IO as the primary culprit of our server bottleneck.

If you’re used to dealing with servers then you’ve probably experienced this before. Hard drives are really, really slow. Reading from the hard drive is probably one of the most expensive operations you can do in a server and therefore the natural conclusion is that to avoid an IO bottleneck we need to reduce the amount of hard drive reading and writing nginx does.

To do this we can modify the behaviour of nginx to minimize disk writes as well as make sure the memory constraints imposed on nginx allows it to avoid disk access.

Access Logs
By default nginx will write every request to a file on disk for logging purposes, you can use this for statistics, security checks and such, however it comes at the cost of IO usage. If you don’t use access logs for anything you can simply just turn it off and avoid the disk writes. However, if you do require access logs then consider saving the logs to a memory parition instead. This will be much faster than writing to the disk and will reduce IO usage significantly.

If you only use access logs for statistics then consider whether you can use something like Google Analytics instead, or whether you can log only a subset of requests instead of all of them.

Error Logs
I sort of debated internally whether I should even cover this directive as you really don’t want to disable error logging, especially considering how low volume the error log actually is. That said, there is one gotcha with this directive, namely the error log level parameter you can supply, if set too low this will log 404 errors and possibly even debug info. Setting this to warn level in production environments should be more than sufficient and keep the IO low.

Open File Cache
A part of reading from the file system consists of opening and closing files, considering that this is a blocking operation it is a not insignificant part. Thus, it makes good sense for us to cache the open file descriptors and this is where the open file cache comes in. The linked wiki has a pretty decent explanation of how to enable and configure it so I suggest you go read that.

Buffers
One of the most important things you need to tune is the buffer sizes you allow nginx to use. If the buffer sizes are set too low nginx will have to store the responses from upstreams in a temporary file which causes both write and read IO, the more traffic you get the more of a problem this becomes.

client_body_buffer_size is the directive which handles the client request buffer size, meaning the incoming request body. This is used to handle POST data, meaning form submissions, file uploads etc. You’ll want to make sure that the buffer is large enough if you handle a lot of large POST data submissions.

fastcgi_buffers and proxy_buffers are the directives which deal with the response from your upstream, meaning PHP, Apache or whatever you use. The concept is exactly the same as above, if the buffers aren’t large enough the data will be saved to disk before being served to the user. Notice that there is an upper limit for what nginx will buffer, even on disk, before it transfer it synchronously to the client. This limit is governed by fastcgi_max_temp_file_size as well as proxy_max_temp_file_size. In addition you can also turn it off entirely for proxy connections with proxy_buffering set to off. (Usually not a good idea!)

Removing Disk IO Entirely
The best way to remove disk IO is of course to not use the disks at all, if you have only a small amount of data chances are you can just fit it all in memory and thus remove the limitation of disk IO entirely. By default your OS will also cache the frequently asked disk sectors so the more memory you have the less IO you will do. What this means is that you can buy your way out of this limitation by just adding more memory. The more data you have the more memory you’ll need, of course.

Network IO
For the sake of fun we will assume that you’ve managed to get enough memory that you can fit your entire data set in there. This means you can theoretically do around 3-6gbps of read IO. Chances are, though, that you do not have that fast of a network pipe. Sadly, it’s limited how much we can actually optimize the amount of network IO as we need to transfer the data somehow. The only real way is to either minimize the starting data amount or compress it.

Thankfully nginx has a gzip module which allow us to compress the data before it’s sent to the client, this can drastically reduce the size of the data. Generally the gzip_comp_level where you start not getting any further improvements is around 4-5. There’s no point in increasing it further as you will just waste CPU cycles.

Side note: It’s also possible to pre-compress the static files with gzip_static so that you’re not recompressing on every request, thus saving CPU.

You can also minimize the data by using various javascript and css minimizers. This is not really nginx related so I will trust that you can find enough information on this using Google.

Phew
And with that we’ve reached the end of this subject. If you still require additional optimization then it’s time to consider using extra servers to scale your service instead of wasting time micro optimizing nginx further, but that’s a topic for another time as I’ve been running on for quite a while now. In case you’re curious the end word count was just above 2400, so best to take a small break before you go explore my blog further!

Mark Rose

Posted:April 28, 2011

Martin,

Excellent article! I learned a few things.

I have a small clarification to make with regards to keep alive. It's not just a huge difference in perceived load time, but actual load time, due to lowered or even completely eliminated latency between requests. Dumb browsers will hold the connection open, and make an additional request once the first request finishes. Smart browsers (Opera) will use HTTP pipelining and make several requests in one go, eliminating the latency completely. The establishment of a new HTTP connection is a fast and lightweight operation, except for latency. Except, of course, on legacy servers that spawn new threads/processes to handle incoming requests.

There is one thing to tweak on the OS level that will come in handy on a high traffic site, and that is the TCP Maximum Segment Lifetime. The MSL is essentially how long to wait around for stray packets after the connection has been closed, and effective blocks the same IP+remote port combination from being reused. The 2*MSL is considered safe by RFC, and is 60 by default on Linux (/proc/sys/net/ipv4/tcp_fin_timeout is 2MSL). This won't likely be an issue on a public facing machine, but if you're running behind a proxy that doesn't support keep alive to the backend, such as haproxy, you'll be limited to at most 2^16/2MSL requests per second, or a little under 1100 req/s. And since an internal network is unlikely to have stray packets living more than a second or two, you can safely reduce the MSL to a few seconds.

Mark Maunder

Posted:April 30, 2011

Great summary. A few things I've observed:

Be careful making certain buffers larger with a large number of nginx processes e.g. the buffers that store headers and request bodes. They can grow fairly quickly and if you don't have much RAM you may end up swapping [or having processes killed if you don't have swap].

I've spent some time trying to find the optimal number of nginx processes to use the 8 CPU cores in my front end machine. Roughly double (16) processes seems to work quite well. Increasing more than that just uses more memory.

fjordvald

Posted:April 30, 2011

Not swapping is a given for any high traffic site, I should hope people don't need me to tell them that. :)

As for your worker processes, what's your actual bottleneck? 16 Nginx processes seems like a lot of processes to use. If I use just 2 to 4 processes I never actually run into a Nginx bottleneck but rather a backend or IO bottleneck.

I actually have one case of a server serving primarily static file from memory. It's on a quad core machine with HT, so 4 physical cores and 8 virtual, and 4 above 4 workers I saw literally no improvements in throughput. Reply

Martin,
Great article. Some of the bottlenecks that you list are fairly common across most web architectures (not just nginx). Here's the blog I wrote debugging the performance of a fast-cgi app running behind nginx.

You can write your access/error logs in chunk, defining how large a single chunk can be (8KB, 32KB, 128KB, ...).

That will buffer those access/error logs in memory and flush to the disk in a larger, single write-request. That'll help if you're hitting a few thousand requests per second, to not have to write an access log on every request. Reply

Hugh Fraser

Posted:June 21, 2011

Hi there, a Really Useful Article. Just turning off the Access Log made a really big difference to my site, so thank you for that tip. I have a question, Given that the hard disk is the most likely bottleneck, would you recommend caching with static files or with something like APC? Most people seem to think that static files are fastest, but I rather think the latter. Reply

fjordvald

Posted:June 21, 2011

Thank you.

As for static file versus APC. In this specific case static files will usually win as the access layer for APC is PHP and the access layer for static files is a system command and the file system. PHP is quite a huge process and invoking it will be slower than accessing the file system.

The reason people like to use APC is that you can then handle the accessing logic in PHP as opposed to needing to do it in your web server via rewrites.

If you generalize your case a bit and say static files versus memory then you are correct that memory will be much faster. (at least until your OS cache the static files)
This means, though, that you'll again need to configure your webserver to look in memory. Thankfully, Nginx can do this via the memcached module for a fairly clean config. Reply

Hugh

Posted:June 22, 2011

Hi Martin, Many thanks for your reply. I can see there are arguments on both sides, but since speed is the most important in my case, I will switch back to static files. If I have understood your reply correctly, you recommend static files + the memcached moduel . Reply

fjordvald

Posted:June 22, 2011

I didn't recommend anything at all, I just laid out the facts.

I personally use the APC style as I value a clean configuration more than a few extra requests per second. Let's be realistic here for a second, you'll never actually get more than those 500 requests per second that caching with APC gives you. Also, the memcached module is not related to static files, it takes the response of your web application and stores it directly in memcached so that it never touches the disk. Reply

We are using NGINX as a component for a bank of websites we host. One of the sites has a form submission that emails name, email address and info request to a designated email recipient. Pretty simple. We need to capture a users IP address and send it along with the email. NGINX limits this because it uses a single IP address because of the way traffic and submissions are processed. Is there a way to capture the form submitter's IP address on an NGINX platform? Reply

fjordvald

Posted:December 1, 2011

Nginx imposes no such limit unless you have severely misconfigured your stack. if you're using nginx as a backend then you have to use http://wiki.nginx.org/HttpRealIpModule. Similarly, if you're using some other backend behind nginx then you need the real ip (sometimes called rpaf) module for it. Reply

Ethan

Posted:April 10, 2012

Hi, Thanks for the good informative article.

I notice that you haven't mentioned of UNIX sockets if nginx and backend app are on the same machine. Any reason why you avoided that? Reply

Wasn't really a concious decision to avoid it. While Unix sockets avoid the TCP overhead they also limit your scalability and that's far more important than any speed optimization you can do. First you make it scalable, then you optimize. Reply

Dean

Posted:May 10, 2012

Very good post on Nginx performance tuning. One caveat from our experience with Nginx - the statement that CPU/worker threads are never a bottleneck is only true for non-SSL traffic. Once SSL traffic is added to the mix, Nginx worker threads can spend substantial CPU time in OpenSSL.

From our testing, we've found that on a 2x quad core box, setting processor affinity to a single die (4 cores) and running 1.5 to 2x the workers as logical cores (with hyperthreading, 8 logical cores per die = 12-16 workers). The other CPU die (4 physical cores, 8 logical cores) is allocated to the co-resident application. I can't reconcile why this is the optimal config, as more workers than logical cores shouldn't be faster than a 1:1 ratio, but our latency and throughput metrics tell a pretty compelling story. Reply

SSL in Nginx uses a CPU heavy set of ciphers by default. In fact, the default SSL ciphers are in the heavy end of PCI compliance and it's possible to use quicker set of ciphers and still be PCI compliant.

The one I ended up using (I have no need for PCI-compliance so I didn't research that part) was the following: ssl_ciphers RC4:HIGH:!aNULL:!MD5:!kEDH;

The important thing is that you turn off Diffie-Hellman as this one will slow Nginx down a lot.

I will make a note to edit my post and include a section on ssl ciphers. Reply

akarmon

Posted:May 17, 2012

Great post !
You mentioned that "... Nginx does not support persistent connections to backends" Does that mean that nginx can not use memcached persistent connection the same way Apache does ?
Thanks! Reply

At the time of posting that was totally true. There was a 3rd party module which added keep-alive for memcached backends but nothing else. However, The latest stable version has HTTP/1.1 proxying built in and therefore also supports keep-alive to backends. I'm not sure if the 3rd party module is still needed for memcached proxying, my guess would be no but test it first please. Reply

I disagree. You can remain scalable using another instance of nginx as a proxy load-balancer to machines using nginx and sockets...
This wouldn't apply also to cloud-based systems, where you can scale by increasing the power of the system, not the number of systems. Reply

Rasmus

x

Posted:November 13, 2012

Not sure if you still post here but i recently started using nginx right now i just have it for personal use and plan on leaving it at that but the box only has 1 core/256mem and right now i have the current mem use at 105 for worker_connections what would you suggest i put right now it's at 768 but like i mentioned im trying to lower as much mem use as i possibly can Reply

Lowering the worker_connections won't matter much. The important thing for low memory situations is that you only have 1 worker_processes. If you need less memory usage for nginx after that then you will need to compile a custom binary with modules that you don't need disabled. Reply

owais sheikh

Posted:December 2, 2012

Excellent article and you explain every option in very dept thanks a lot for your kind beefing

For people reading this, please note that syncookies is a kernel level TCP setting that will turn of SYN flood protection. Think carefully before you do this, it's a calculated risk that might not be worth it. Reply

Very nice article, i have a quesion. We're running a video stream website just like youtube and mp4 videos are usually of 500MB of size. HDD io is constantly on 80+% and io wait is also usually high due to which users explain about the fluctuation of video buffering. We're using 12X3TB SATA Hardware Raid-10 drives on our servers with nginx+php-fpm. We can't afford more high drives like SAS/SSD right now. Could you suggest me if nginx has some module that can help me reducing the HIGH DISK I/O ?

Also our users use to upload videos with size of 1GB. Client_max_body size is already set to 3G but sometimes users explain that they received the HTTP error while uploading the file, and as per your article, should i increase client_body_buffer_size to 3G too ? for large size of POST data ?

First of all you should use separate servers for uploading. Write IO will kill your read IO very quickly so mixing the two together is never good.

While you cannot easily optimize your IO load you can optimize your IO wait a bit by using asynchronous IO. Nginx has the following directive to enable it: http://nginx.org/en/docs/http/ngx_http_core_module.html#aio
It will work on Linux but supposedly work even better on FreeBSD. Reply

Can Nginx survice in state of heavy load but default configurations. In my default configuration the the worker processes is set to 2 however my server has only 2GHz single core processor. Should I need to modify it because my web hosting company says nothing to worry about. Reply

Hello Martin, Thanks for all the technical information. I am new to nginx and not familiar with optimization. I found your article really interesting and it helped me a lot to understand some basics. Though its not nginx optimization related, those who want to speed up their website can store their website images to Amazon S3 CDN or any other CDN. It helps to load the website faster and increase performance. Reply