.Net DI Container Speed Test

Mon, Aug 08 2011

You probably don’t care why I’ve done this, but if you don’t even care about most of the details of this test, let me lay it on you short and sweet: Autofac, Castle.Windsor, and StructureMap put out some excellent, consistent numbers; Spring.Net is middle of the road; Ninject is consistently the slowest of the pack by several orders of magnitude; and finally Unity showed such a weird behavior that made me question both the validity of my approach and the sanity of its developers (mostly kidding, ctavares is awesome). Note: there’s an update which examines some of the problems with Unity; you’ll probably want to read it.

Methodology

My tests (source code available on github) concern themselves almost exclusively with the speed with which objects are retrieved from the containers. Memory footprint was not even a secondary concern and, to be honest, I don’t even know how to measure it with any kind of precision. The code contains a modest attempt at memory measurements - if you believe it to be correct, there’s a setting in app.config to print it out as you run the tests.

I believe there are two scenarios that are relevant to observing the performance of these containers: retrieving the same instance each Resolve<T>() call, i.e., singleton scope, and retrieving a new instance on each call a.k.a. transient scope. I don’t think either solution is what you would see in real life situations; most likely you’d encounter a mix of the two, with transient objects being injected with singletons and transients alike, but I couldn’t come up with a recipe that wouldn’t be easily dismissible on account of lacking balance in proportion of singleton to transient objects. At least taking extremes only leaves you to defend from one direction.

Resolving singletons tests the container’s raw retrieval speed and its registration internals. Creating new objects on each call tests the container’s build plan and the strategies it uses to match up objects.

All containers allow more advanced object lifestyles, e.g., single instance per thread or per HTTP request, but I don’t have enough confidence in my abilities (I’m also lazy) to come up with a clear strategy to produce measurements that are a) accurate, b) relevant, and c) somewhat equivalent across containers. However, I think the singleton and transient stories are somewhat indicative of other usage scenarios.

In both cases my program will load and configure all the containers then perform 10 runs of a loop requesting the same object a large number of times (10k and 100k) then average those 10 runs. I will refer to these two as the 10k run and respectively the 100k run. So that’s four combinations so far: singleton 10k, singleton 100k, transient 10k, transient 100k.

Finally, to add a fake real world flavor to the test, I will repeat the four test scenarios with the containers loaded with some significant number classes in order to simulate a container under some pressure and make 10k and, respectively, 100k requests of a number of objects randomly selected. I don’t think any container performs any kind of heuristic analysis of the requests with intent to provide a quicker resolution path for those objects requested more/most often, though I wish they did.

I am running this in a virtual machine that I will restart before each of the eight scenarios and then I will run the test program three times. I will post the numbers from both the first and third run, but will favor the third run for figures and charts.

Results

Individual results are boring, but are presented in the next sections for your perusal, so let us take a look at pretty charts presenting the overall situation.

I had to force a max value on the Y-axis (milliseconds) seeing how Unity completely blew the scales. As such, please pay attention to the scale on the left and occasionally to the number on top of the bars.

The chart of timings by task gives us a focused view and paints the relative behavior of containers within those groups:

Plotting the same numbers by container shows the areas of strength and weakness of each participant, but also how the containers behaved overall relative to each other:

We also get to see quirks, such as four out of six containers showing a dip in the 100k singleton loaded test relative to their other numbers.

All containers are roughly on parity syntactically and feature-wise - one can easily move from one to the next with little training, the only significant differentiator that I observed was elegance of configuration, but that’s hardly something to hang your project onto.

Let us look at individual numbers in ugly tables next. Feel free to jump to my take if this is not of interest to you.

Retrieving Singletons

I believe this is the most common case in construction of DI trees and it’s after all just an optimization of the transient scenario. It’s suitable for classes whose context remains unchanged through the execution of the program, be they complex or not. If you rely (solely? largely?) on the constructor flavor of dependency injection, chances are everything behaves as if it was in singleton mode anyway. However, if you use a Service Locator approach, whether exclusively or not, the difference between singleton and transient becomes something you have to pay more attention to.

Retrieving singletons is a case of configuring the containers to indicate that once an object is created, the same instance should be returned for each subsequent Resolve call.

The median across containers for Resolve()-ing the same object ten thousand times is around 20 milliseconds, with four out of six containers being close around that number. Of the two offenders, Ninject was 8.6 times slower than the fastest container, Autofac, and Unity close on its heels at 7x the speed. The 100k run only served to make the gap more obvious with Ninject clocking in a whooping 12.5x slower. Surprisingly, the second fastest container in the 10k run, Spring.Net was now the second slowest (9.4x slower).

Transient Objects

The transient scope is most common with classes that control access to limited resources: database connection, web service requests, etc., or whose context changes on every call. Or if you just want to be safer at the expense of a bit performance. My gut feeling is that in most cases these type of classes will have either creation or use costs that outweigh the container’s, but nevertheless I thought it makes for an interesting test case because it tells us how efficient each container’s build plan is. I don’t expect any of the containers to use the very slowActivator.CreateInstance, but there are various strategies that can be employed to create new objects and calculate their dependencies, all which can use different lifestyles.

As the test focuses on object creation I chose the route of an object with no dependencies, rather than the more complex but more realistic case of a class that gets injected with other dependencies, which themselves might be dynamically resolved at that point.

I expected the 100k test to repeat the story of the singleton and to increase in the same proportion from the 10k figures, yet instead the numbers came close to those of the 100k singleton run, for all containers.

Ninject sticked to its guns and managed again to be 12x slower than the fastest container in the 10k race - StructureMap, and almost 14x slower in the 100k run. Worthy mention of suckage: Spring.Net, which managed slowness factors of 8.8 and 9.8 relative to StructureMap.

Retrieving Singletons - Loaded Containers

Based on what I’ve seen of their respective source code, each container takes different approaches to storing class registrations and as such when a resolution is performed, each container will use a different strategy to either retrieve an existing instance or create a new one (Unity calls them build plans, which I like a lot). I believe this test will attempt to show how well each container’s approach scales when the container has a good deal of objects registered and created.

I didn’t know what was supposed to be a realistic number of classes to saddle a container with, so I’ve picked pretty numbers. I’ve loaded each container with 666 classes registered for 222 interfaces (three name discriminators for each interface). I made then 10k and respectively 100k random requests, pairs of interface/discriminator, to resolve those registrations.

// StructureMap - registering named objectsObjectFactory.Initialize(x=>{x.For<IDummy0>().Singleton().Use<SimpleDummy0>().Named("0");x.For<IDummy0>().Singleton().Use<SimpleDummy1>().Named("1");x.For<IDummy0>().Singleton().Use<SimpleDummy2>().Named("2");x.For<IDummy1>().Singleton().Use<SimpleDummy3>().Named("0");...x.For<IDummy221>().Singleton().Use<SimpleDummy664>().Named("1");x.For<IDummy221>().Singleton().Use<SimpleDummy665>().Named("2");});// Retrieval of named objects - SM doesn't have an IsRegisteredIDummyd;if((d=(ObjectFactory.TryGetInstance(t,name)asIDummy))!=null)d.Do();

On one hand I expected the figures to be a bit slower, but not by much. The containers should have efficient ways to sift through the registrations to find the proper resolutions.

What the!? What just happened? Are those Unity numbers real? 258.5 and respectively a whooping 363.8 times slower than Autofac. By contrast, it makes the usual suspect, Ninject, look like Formula-1 car.

Unfortunately it is true. I was so stupefied by the result that I had to try two different approaches on three different machines. They all posted similar numbers.

Transient Objects - Loaded Containers

You can probably anticipate how quickly I moved on to the next test. Unity has put an okay fight in the transient test, recording numbers on par with the other two usually fast containers: Autofac and Windsor.

Alas, even as the test was in progress (and it took forever and three cookies), I could tell there was no redemption. And sure enough:

I’m not going to dwell on those numbers. I don’t know how to explain them. I’ll just quickly mention Ninject posted only 3.3x in the 10k, but jumped back to 7.5x in the 100k. Spring.Net was the second slowest in the 100k, being 6.9 times slower than StructureMap.

My Take

From seeing these numbers, working with the containers (not just for the purpose of this test), and being somewhat aware of what tells them apart: my favorite is Autofac, which is small, incredibly powerful, and yet very elegant to use. In stark contrast to Spring.Net, I think I spent the least amount of time trying to figure out how to use Autofac - mostly everything seemed obvious. Even the more advanced features, some of which the other containers are either barely now getting them or still dreaming of them, are very straightforward and easy to pick up.

StructureMap is fast (see, I made it italic so it seems fast even as it stands still), but its documentation is a few versions behind, which makes it hard at times to figure out what’s the right approach to a given problem. It might be friendlier to introduce in legacy code, given it provides from the get go a Service Locator through its ObjectFactory static root, but given your level of comfort or abstraction needs you might choose to roll your own Service Locator. At any rate, thanks to the Common Service Locator project, you can abstract your DI container quite well. You might find StructureMap suitable if you need a very fast container when loaded with a lot of classes, but if that was a very important criteria I would definitely devise more suitable tests before jumping in.

Castle.Windsor is pretty much the veteran, yet it managed to stay nimble. It has one of the smartest guys in the industry behind it. It also has a few quirks here and there, and it’s not the most elegant to work with (name registration and retrieval in particular). I honestly cannot conjure a good reason to pick Windsor over Autofac, unless you are using the rest of the Castle framework.

At the opposite side of the spectrum: Spring.Net. A very powerful framework, especially if you travelled here from the land of Java, its DI/IoC container is just horrible to work with. I’ve always left it till the very last and punished myself for eating too many brownies by forcing me to complete the wrapper classes for the tests. I am aware of the CodeConfig project and its attempt to bring some sanity and an alternative to the hideous XML configuration files that Spring.Net otherwise employs (don’t even try to configure generics). It not only has spotty and slow performance, it’s actually quite unpleasant to work with.

Microsoft’s Unity is not a bad DI container; it used to be meh… at version 1.2, but starting with 2.0 - which took a very long time to come to light - it’s quite nice to work with. That is until you see these numbers and that’s seriously off-putting, no matter how many injection and extension features it offers. If you think about picking it up because it integrates well with ASP.Net MVC/Prism/EF/etc, well, so does Autofac.

Finally, Ninject - cool name, awesome website, bad performance. I would use it if the alternative was Spring.Net - but frankly can’t see why anyone would voluntarily choose it. Except if you want to be different, which sounds to me like a perfectly good reason.

Alright, we’re done. The source code is on github and once you read it you’ll probably want to contact me and laugh at something stupid I did in there. Or call me names. Or maybe just to talk about those hard to believe Unity numbers.

So? Liked it? Hated it? Thought it was stupid? Thought I was stupid? Deemed it to be informative? Want to lavish excessive praise or cast fiery insults? Contact me and have at it.