Monday, August 18, 2008

I wanted to find out the fastest way to return multiple values in the Microsoft CLR. In order to measure this, I needed to have two pieces of code that `do the same thing' except for the way values are returned. For a baseline, I used the Takeuchi function:

But the version using an out variable has an additional argument. I wanted to separate the cost of passing an additional argument from the total cost of using an out parameter, so I wrote another version of Tak that took an extra dummy argument:

These run pretty quick, so I put them in a loop that runs them ten times and measures the total time. I ran this loop a thousand times and recorded the results.

I didn't expect to find Tak1 to be substantially faster than Tak, so I did some investigating. Look at the graph for Tak:
You can clearly see a change in performance (there are two vertical segments). I had forgotten that my machine was in power save mode and was dynamically tuning the CPU clock speed. Somewhere in there the machine decided to hit the gas. I put the machine in performance mode to max out the clock and got a much more expected result.
In this graph, Tak is the normal Tak function. Tak1a and Tak1b are defined as follows:

We can see from the graph that it takes an additional 8-10 ns to process the additional argument, depending on whether we change the value or keep it the same.
I'll speculate that this is because the processor can notice the re-use of the argument and avoid a register rename or something along those lines.

So what about passing an out variable?
From this graph we can see that returning an answer in an out variable makes the code a factor of 2 slower. Using a ref variable is slightly worse, but that may be because we have to initialize the ref variable before we use it. In either case, that's quite a hit.

It isn't likely that returning an object from the heap will be fast, but we ought to actually mesaure it. This is easily done by simply declaring the return value to be of type object. The CLR will be forced to box the object on the heap.
Surprisingly, this isn't all that much slower than using an out or ref variable. What about returning a struct on the stack?
It seems that this is quicker than using an out variable.

If you look closely at these benchmarks you will see that the baseline performance of Tak varies a bit from run to run. This could be because of a number of reasons. It doesn't matter all that much because I was interested in the difference between the various mechanisms of returning values.