Basic Micro-Optimizations Part 2

28 Nov 2014

Following yesterday's post on basic micro-optimizations, I wanted to look at a concrete example I recently came across. A couple weeks ago, I was profiling code for a typical HTTP API service written in Go. A significant amount of memory was being allocated for http.Header instances. Even if you aren't familiar with Go, the following code should be obvious:

Calling the above 1 million times takes 432ms and allocates 48MB of memory. On my machine, it required 753 garbage collections which took 175ms - a significant percentage of the total time.

What's the above code doing? Go's http.Header structure is a map[string][]string. In other words, it's a dictionary where the key is a string and the values are an array of strings. Why an array of strings? Because HTTP allows a key to have multiple values. Knowing this, think about what the above code is doing. First, it's allocating a map. From an API point of view, a map makes sense, but given that that 99% of all cases will require very few keys (<20), it's far from the most efficient solution. Next, for each key, it's allocating a dynamic array. I don't know what the starting size of the array is, but if it's anything other than 1, in most cases, that's a lot of wasted space.

There's little we can do about the fact that the built-in http.ResponseWriter requires a map, but can we reduce the memory footprint caused by the arrays and, if so, what impact would it have?

When I'm testing performance and memory alternatives, I always try to keep things focused. Isolate the components and data and do the simplest possible tests to see if it makes any sense. This was what I came up with:

The above code isn't production ready. Specifically, we [probably] need to make it concurrent-friendly, but that's a solvable problem (using a pool of []string for example).

It's a significant difference and, I believe, it's the type of attention to detail that results in being able to handle tens of thousands of requests per second on a few machines rather than a few hundred. It makes the code more complicated and more fragile, but even if it doesn't make sense in most cases, it's always good to be familiar with possibilities.