You are here

Benchmarking Load Balancers in the Cloud

Benchmarking Load Balancers in the Cloud

Load balancing is one of the technologies that virtually all our customers are using within EC2, and there is an increasing set of options available for doing it. We've been giving advice to our customers for years on what we've seen work but we finally decided to spend some time and do a real A-B benchmark comparison of a number of solutions. The result is a white paper Load Balancing in the Cloud: Tools, Tips, and Techniques we just published.

We focused purely on request rate scalability - that is, how many requests per second the load balancer can sustain. We didn't focus on feature set, bandwidth, or other metrics. So we ended up requesting a tiny web page over and over as fast as the system under test would serve up and we measure requests/second, or rather, responses/second. We also didn't take advantage of more advanced features, such as caching in the load balancers, so we ran aiCache in a pure LB mode, for example.

Cutting to the chase, we ran the HAproxy, Zeus, and aiCache tests on an m1.large EC2 instance. After chasing down all kinds of options, trying to tune the kernel, trying other instance types, and finally conferring with AWS-- the result is 100,000 packets per second in+out! I know, that's not requests/sec or responses/sec so let me explain. Basically, with the current virtualization technology implemented in EC2 the network speed of light on an instance is getting about 100,000 packets per second through the two networking stacks of the host OS and the guest OS. Your load balancing solution and the tests you run may use these 100K pps to requests and responses in various ways which give you slightly different performance as measured in resp/sec. On average you can get about 5000 requests/sec through a load balancer. If you use HTTP1.1 persistent connections you get a few more resp/sec because there are a couple fewer packets per request, but the difference is not all that dramatic. If you turn some form of caching on you can roughly double the resp/sec because you're eliminating the packets to the back-end servers. Tuning kernel params has very little effect, but pinning the load balancer process to a specific core does help quite a bit and makes the performance a lot more even. But in the end it's all about pps (packets per second).

We then turned to ELB, the Amazon Load Balancing service. It operates differently from the other solutions in that it is a scalable service provided by Amazon. Everything ELB does for you can be replicated using the above solutions running on EC2 instances, but of course that requires extra work. Unfortunately benchmarking ELB is really tricky. One has to use many clients applying load, ensure that they requery DNS frequently, and run tests for a long time so ELB scales up. In the end we produced some pretty graphs like this, showing the requests/sec handled over time:

This shows how our tests ramped up to around 20K requests/sec over the course of about three hours. (Note that we ramped up the load slowly to see the progress, so this is not all time taken by ELB to ramp up.) We could have continued higher but we lost interest :-). I would prefer it if ELB were more transparent and easier to test, but it certainly delivers real-world performance!

The whole benchmarking project was interesting in that it once again showed that until you really understand what is going on your benchmark is not done. We chased down more supposed performance bottlenecks than we care to remember and we drove the helpful folks at aiCache batty because they expected to see better performance given their results on non-virtualized machines. But in the end the results make a lot of sense and 100K pps is easy to remember.

Hey I'm very glad to see that people now care about *real* issues in virtualized environments ! I spend my time explaining to some people that those are fine until you need to achieve high PPS rates. Another issue that you didn't mention / experience here is that when you're hitting the 100k PPS limit, the physical machine is on its knees, and other VMs running on the same host will be similarly impacted. So if you want to run a VM with a load balancer and another one with your web server, the limit will probably be around 50-75k PPS (assuming you're alone on that host, which is obviously not the case).
For these reasons, the only way to achieve high network data rates is to group together in same instances the services which need to exchange a lot of data, and to use a large number of instances in order to hope to keep some bandwidth on some physical hosts, in the event someone else' instance sharing the same physical host as yours makes heavy use of the traffic.

Compute cycles have become a commodity. Need more, just grab another box. I used to work day'n'night optimizing hardware utilization. No more. Now I'd rather optimize programmer time or operational efficiency. This shift is something that is hard to internalize without experiencing it for a while. But I do appreciate your comment and certainly appreciate your software - thanks!

Thorsten,
I have just downloaded the whole PDF to quickly review it. It would have been nice to see what you could take out of haproxy 1.4 which has keep-alive enabled. You're typically in an environment where this difference matters. Also, 1.4 has a few options to save some TCP packets (tcp-smart-accept and tcp-smart-connect), which can reduce a session from 10 to 7 packets. This could potentially mean 2 packets per request with keep-alive on the client, and 7 packets per response without keep-alive on the server, or about 9 packets. We could dream of something like 9000 requests per second if you're limited to 100kpps. Do not hesitate to drop me a mail when you plan other tests, I'm very interested in helping for the tuning!

[...] (you can get it from here, however registration is required). I found out about it from this post on the Rightscale blog which is a good introduction to the paper. Here are some things I found [...]

[...] how many requests per second I could expect a single load balancer to support. I found this article Benchmarking Load Balancers in the Cloud which says that on average you can get about 5000 requests/sec through a load [...]

[...] 12. Amazon ELB can easily support more than 20K+ Concurrent reqs/sec - Amazon ELB is designed to handle unlimited concurrent requests per second. ELB is inherently scalable and it can elastically increase /decrease its capacity depending upon the traffic. According to a benchmark done by RightScale, Amazon ELB was easily able to scale out and handle 20K or more concurrent requests /sec. Refer URL: http://blog.rightscale.com/2010/04/01/benchmarking-load-balancers-in-the-cloud/ [...]