Saturday, 5 May 2012

Performance comparison of code invocation methods in .NET

Introduction

.NET has moved to become more of a functional programming since .NET 3.0. Currently there are many ways that you can invoke execution of code such as direct call, reflection, delegates, compiled expressions, dynamic, etc. In this post I will compare performance of these methods. Part of the inspiration for this comparison was reviewing the ASP.NET Web API code.

Readability/maintainability vs. Performance

Functional programming is on the rising popularity. Why now, is it not decades old?

Part of the popularity goes back to the fact that now we run much faster machines - with more cores. And we need to solve more complex problems and there is a need to write more readable code and be able to expression the code more succinctly. On the top of all this, add the fact that we need to invoke code in parallel to get the best of our machines horse power.

We are writing code that compared to 20 years ago is quite inefficient, using more processor cycles. C was slower than Assembly, and C# slower than C but it certainly does not make sense to write in Assembly. But language is only part of the story. We get choices in one language itself. In .NET, we sometimes use IEnumerable<T>.Count() instead of using IList<T>.Count. While .NET in case of .Count() first tries to convert the IEnumerable<T> to ICollection<T> and if it does not succeed then loops through to get the count, in other cases framework cannot improve the performance.

Our code is commonly a compromise between readability and performance. I for one would pick former over latter but my decision needs to be an informed decision. If a code is 1000 times slower (and that part of the code is called many times) I definitely would sacrifice readability, but if it is only twice slower, I would pick readability.

Context is also important. Poor performance on server can be costly but on the client usually would not be noticed. A real-time image processing code has to squeeze every drop of performance it can (having done real-time image processing in C++, I am pretty familiar with it), while an asynchronous batch process could use a more liberal approach. In any case, micro-optimisation is a common pitfall that I try not to fall into.

Code Invocation

In .NET we can use different ways to invoke some code (colour coded based on category):

Static method call

Instance method call

Instance method call on virtual methods where CLR has to walk up/down the inheritance hierarchy to understand the piece of code to run.

Invocation using reflection

Invocation using a previously bound reflected object

Invoking a delegate

Invoking a Func or a compiled expression (which itself is a delegate)

DynamicInvoke on a delegate

Compiling a lambda expression and executing it

Invocation on a dynamic object

For our test, I am using a simple class which calculates tangent of an angle. I have done all I could to make sure condition for all these scenarios are similar so that the difference in performance is only related to the call method.

First three call types need not much explanation. Static method and instance method do not have much difference. Internally in CLR, instance method has an additional parameter for the instance itself passing this to it. Jeff Richter explains the difference between IL's call and callvirt where callvirt is slower than call but as we will see difference is minimal.

With reflection, we have two steps. First one is binding where we get the MethodInfo from the type object. Next one is the actual execution where we use Invoke to execute the method. As you can see, second method is passed the MethodInfo while the first one binds every time. In our example, this will incur the overhead of boxing since both our input and output are double while parameters passed in and out of Invoke is object. However, I have decided to keep it so since this could also happen in a real scenario.

CalculateIt is a delegate defined above and called in the first method. Difference between first call and third is that the first one is a strongly typed delegated while the third one is simply weakly typed delegate that can be only called using DynamicInvoke.

NOTE: Metrics of running a compiled expression is the same as that of Func so it is not separately calculated. My point here is to show that compiling an expression all the time is expensive and I have seen cases were people do it - if you have not seen. I particularly saw an example where it was used for Null Guard expressions.

Test execution

Each method was called 10,000,000 times and results where compared. I have used a code which is very verbose but I was trying to eliminate anything that could skew the results.

There are other factors such as garbage collection and other processes running on the machine but I have ran the code quite a few times and results were more or less consistent with only 1-2% variation.

Results review

As it can be seen, compiling an expression is 10 times slower than most other call types including Func<T> delegates (result of expression compilation). This is particularly important since some of the new frameworks (e.g. ASP.NET Web API) heavily use expression compilation.

Also of importance is that binding is significant overhead in reflection. By binding once and invoking many times we can improve the performance.

Another important insight is that dynamic, unlike what it is famous for, is not that much slower than strongly typed direct calls (less than twice).

Conclusion

Compiling a lambda expression and running it is the slowest type of code invocation. There is not a meaningful difference between calling a static, instance, virtual method or Func/delegate. Calling a method on a dynamic object less than twice slower than making a direct call.

Source code

As I said, my code is very verbose to eliminate any element that could skew the result. In case you need to review, modify or run the source code, I have brought the source code here:

Usually you wouldn't. The point of the article is to compare the overhead in terms of orders of magnitude.

Yet, there are cases where you might do this especially when you dynamically compose an expression tree or when loading an expression tree from its persisted state. Although you might be able to cache based on the hash of the persisted stream, it is important to quantify the overhead.