Introduction

Apache is the de facto web server on Unix system. Nginx is nowadays a popular and performant web server for serving static files (i.e. static html pages, CSS files, Javascript files, pictures, …). On the other hand, Varnish Cache is increasingly used to make websites “fly” by caching static content in memory. Recently, I came across a new application server called G-WAN. I’m only interested here in serving static content, even if G-WAN is also able to serve dynamic content, using ANSI C scripting. Finally, I also included Cherokee in the benchmark.

Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the ﻿TCP/IP stack instead of the server itself).

In this benchmark, every server is tested using its default settings. The same applies for the OS. Of course, on a production environment, each setting will be optimized. This has been done in a second benchmark. If you have comments, improvements, ideas, please feel free to contact me, I’m always open to improve myself and to learn new things.

Yes, It was already planned. In this first benchmark, I considered only the “out of the box” choices. On Ubuntu, if you install Apache2, by default it installs the MPM worker version. Thanks for your comment.

The upstream server (i.e Nginx) received only 1 request every 120 seconds (which is the default_ttl of Varnish). Moreover, I checked with varnishlog that the requests hit the cache. So, I can safely assume that Varnish actually served the requests.

As I’m about to do a second serie of benchmarks where the server are optimized, can you please give me some advice or pointers to optimze the configuration of Varnish ? Thanks a lot !

This looks plain wrong. I have tested Varnish to great extent — and nginx — and this does not match any reality I’m familiar with. Given the tiny difference between nginx, Apache and Varnish in that result, it’s very hard to believe that this isn’t a bogus test and that some elementary mistake in the testing process has been made.

Given the claims made, I’m inclined to ask for output of varnishstat -1, top and netstat -n.

Keep in mind that I’m testing locally the default setting of each server (without any optimization, such as tuning the thread_pools in the case of Varnish) on a laptop with a relatively slow CPU (Intel Core i3 – 370M @﻿ 2.4 Ghz).

Can you please point me the elementary mistake ? I’ll fix it, and redo all the tests. Thanks again for your help !

I ended up trying to replicate your test on a virtual server we have setup to benchmark Varnish. I created the same 100.html file on our production web site (Varnish is configured to pass-thru to our normal site for load time comparison). This virtual server is configured with a 1G malloc store and I ran the same benchmark script with a slight change to shorten the run:
#define FROM 800
#define TO 1000
#define STEP 10
#define ITER 10 /
From a geographically distance data center: min:102648 avg:145234 max:170035 Time:218 second(s) [00:03:38]
From the same data center: min:175157 avg:187689 max:194601 Time:232 second(s) [00:03:52]

Based on my results I am thinking something is happening on your Varnish benchmark that is invalidating the results. Maybe your hard drive is the limiting factor.

Actually, I was also a bit disappointed with the poor performance of Varnish. I’ve tried many different setups, and I still get the same behavior. However, I cannot exclude that something is biaised with my settings,

Essentially: after testing this myself, I was able to generate pretty much any result I wanted. This test needs a lot of work when it’s presented as a comparison. The only way I got results similar to yours was if the test too was failing due to the different performance patterns.

Thanks for sharing :-) I’m always happy to learn new things. So, as your are an expert in benchmarking, could you please come up with a “benchmark suite” (i.e. reproducible test steps) or a recipe to compare in a fair manner those servers (Again, I’m only interested in serving static content) ? It would be interesting for many people to have a proper and “unified” way to benchmark web servers …

Thanks for posting the varnish stats. From those it’s quite clear that you are running out of threads. Also, since this is a pure memory workload you should use the malloc allocator rather than the file one.

As for testing tools, I’d recommend using httperf over ab. Taking a look at http://www.web-polygraph.org/ might also be useful. It also looks like you’re sending a massive amount of requests per connection (about 40), if you want something a bit more realistic you should limit the number of resources fetched over a single connection to about five (which is what we’re seeing on live traffic).

As explained in the setup, I didn’t tune the settings of the different servers. And clearly, Varnish can do much better than that !! This is left for a second benchmark.
So, regarding the tuning of Varnish, except the threadpools and the allocator, what else can I optimize ?
I will also consider using httperf as you mentionned.
Thanks a lot for your helpful comment !

It should be a good start, at least. While you can certainly tune it further, that requires a bit of experience and isn’t merely adding a go-faster option, it depends on how your test is set up. For the next test, it’d be useful if you ran the settings and the numbers you’re getting past the developers of the various projects, as that’ll help pick up any errors or weird configuration settings that impact performance.

One thing you should probably do is make /var/lib/varnish a tmpfs, since your disk is quite slow and varnish logs a fair bit to the shared memory log.

Hi, nice benchmark. I have some suggestions, they are:
1. How about show the respon time in a line chart and compare it too? So we can compare the response time in every concurrency.
2. How about measure CPU usage (kernel CPU usage and server CPU usage) in every concurrency? This will be useful for someone who use VPS to deploy their web server.
3. How about measure memory usage in every concurrency? The motivation is the same with #2

I’m doing some tests with Funkload, which provides all the interesting metrics you mentionned (response time, CPU/memory/network usage, …). However, Funkload saturates way before the tested server. So, I need to put in place a distributed setup with several clients.

Nginx (a Web server) and Varnish (a Proxy server) serve different needs, so there is no surprise that their design is different – and this leads to different choices (Varnish is heavily relying on virtual memory because – as a “Web server accelerator” – it has the goal of storing a HUGE cache).

As I understand it, the point of this benchmark was to focus on the respective ability of those different server technologies to serve a small static file.

As small static files (< 100 KB) account for 90% of all the traffic served by today's Web infrastructure, I am thankful that someone had the idea of checking what solution works best for this specific need.

It would be interesting to test the _tuned_ settings
In fact, no one is going to run them at 35 000 requests per second on the base settings, you know.

People who just start their own web server wont need more than 20 req/s at best.

What we’re interested in, is more the complete picture when servers are optimized. It’s a difficult task of course, in fact, probably more of a process that you fine tune as you go, but it would be a LOT more interesting

How about Lighttpd? I’m wondering how well it would perform against the others. It’s my personal goto choice anyway but I expect it to be slightly slower than Nginx. Actually a simple feedback on people’s experience would be fine as well.

While it is true we don’t want to end up “testing the TCP stack”, one cannot benchmark a bevy of webservers from the same server and expect meaningful results.

First and foremost, your benchmarking scripts/tools are competing for resources with said webserver processes. This skews the results and causes weird “bumps” in the graphs.

Secondly, in practical applications, users will ALWAYS come over the network, so it is completely necessary to test from one or several satellite servers which hit the server being benchmarked over the network.

I would like to see this benchmark redone from satellite servers and a notation made of the differences between local testing.