Search This Blog

Benchmarking impact of batching in Hystrix

In previous article "Batching (collapsing) requests in Hystrix" we looked at collapsing API in Hystrix. Check it out before proceeding with this article. Example presented there was rather artificial, merely presenting API. Today let's look at semi-real-life example and do some benchmarking. We already used random.org API some time ago as an example (see: Your first message - discovering Akka), let's use it again. Imagine our application calls the following API facade in order to fetch exactly one random number per request (generateIntegers(1)):

As you can see this method can easily fetch more than one number. You might wonder why it returns some fancy RandomIntegers class rather than, say List<Integer>? Well, a list of integers is just a data structure, it doesn't represent any business concept, while *random integers* leaves no room for speculation. Still unsurprisingly the implementation is just a wrapper/decorator over a list:

We use an enormously big thread pool (100), Netflix claims 10-20 is enough in their cases, we'll see later how to fix it. Now imagine we have a simple endpoint that needs to call this API on each request:

Now imagine random.org has an average of 500 ms latency (totally made up number, see here for real data). When load testing our simple application with 100 clients we can expect about 200 transactions per second with half a second average response time:

Collapsing (batching) is about collecting similar small requests into one bigger. This reduces downstream dependencies' load and network traffic. However this feature comes at a price of increased transaction time because before calling random.org directly we must now wait a bit, just in case some other client wants to call random.org at the same time. The first step to batch requests is to implement HystrixCollapser:

GenerateIntegersCollapser is used twice in Hystrix - first when a bunch of requests come in at roughly the same time. After configured window (100 ms in our example) all requests are taken together and Hystrix asks us to create one batch command (see: createCommand()). All we do is calculate how many random integers we need in total and ask for all of them in one go. Second time GenerateIntegersCollapser is used when batch response arrives and we need to split it back into individual, small requests. That's the responsibility of mapResponseToRequests(). See how we chop batchResponse into smaller pieces? First I used mind-bending implementation with reduce just to avoid mutability:

One last remark is Scope.GLOBAL. Without it collapsing would work only within one thread, e.g. HTTP request, rather than across the whole application. We definitely want global collapsing. Having collapser in place we can use it instead of normal command:

So not much changes from client code perspective. Now let's benchmark a bit:

Ouch, this is quite disappointing. Average response time grew from 500 to 600 milliseconds. At the same time throughput measured in transactions per second (TPS, generated by 100 concurrent threads) dropped to about 160-170/s with significant variance. These numbers shouldn't really come as a surprise - enabling batching requires almost every request to wait a little bit, just in case another request comes by in the nearest future. Less stable measurements are also understandable - latency of each individual request depends on whether the collapsing window just opened or is about to close. OK, so what's the big deal, why do we bother with collapsing? Remember previous example - with 200 TPS we made 200 network calls and requests to external service. 100 concurrent clients meant 100 concurrent requests to backing service. The biggest win of batching/collapsing is reduction of downstream load, generated by our code. It means we put less load on our dependencies and flatten peaks. Just compare the number of queries we made to random.org API with and without batching:

200 requests (each asking for one random number) collapsed into 5-7 requests per second, however much larger in size - on average asking for about 30 random numbers rather than... one. It means within 100 millisecond window we captured about 30 requests and collapsed them together. And this, my dear friends, is a big improvement. We sacrificed our performance slightly to reduce generated traffic by an order of magnitude. Of course 100 millisecond batch window is quite extreme, but even 10 milliseconds (barely noticeable under normal circumstances) significantly reduces generated load to about 20 queries per second (rather than 200). This experiment shows that even 10 millisecond window captures (on average) 9 individual requests and collapses them, reducing downstream load. So collapsing is very powerful, just remember what type of optimization you really want to achieve.