Monday, August 13, 2007

I've been avoiding .NET for quite a long time. It's not that I'm an anti-Microsoft zealot, it's just that I used to code in Java at work and I figured that one platform full of frustrating limitations was more than enough. Recently, however, I decided to learn XNA on my own free time and that means getting involved with .NET and C#. It turned out to be just as frustrating as I feared, but at least it provided me with some nice blog material. Thus is the .NUTS "column" born in my blog, to document all that is weird in the .NET world. Expect stuff that might make you giggle, goggle and groan.

Enums Revisited

I first encountered the concept of enumerated types when I learned Pascal. Before that I owned a Commodore, so I dabbled in Simons' BASIC and 6502 assembler. Therefore, I didn't approach enums the way a C programmer might -- as a cleaner way of defining constants -- but as a way of defining a completely new type that accepts only those identifiers I define. The difference is subtle, but important: it shapes your attitude.

In .NET, enums have, obviously, inherited a lot of baggage from COM, which, in turn, comes from C. Furthermore, that baggage has been squeezed to fit the .NET luggage rack, where everything is an object, even the value types. Of course, value types are not really first-class objects, because they have to be boxed first. All in all, .NET enums are a part of a very leaky abstraction.

Starting Point

My journey of discovery started with an innocent little detail inside an excellent post on Shawn Hargreaves' blog. If you haven't read it yet, I strongly recommend it. I found it quite enlightening, especially the following advice:

If you use an enum type as a dictionary key, internal dictionary operations will cause boxing. You can avoid this by using integer keys, and casting your enum values to ints before adding them to the dictionary.

I found it strange that ints don't get boxed, while enums do. However, I was in a hurry to cobble my prototype together, so I just wrote it off as another one of those language eccentricities one encounters occasionally.

A couple of weeks later, I mentioned this to a friend of mine, who has been working in .NET since it first came out. He was even more flabbergasted: "That's weird. I mean, Microsoft keeps harping on how generic collections allow you to avoid unnecessary boxing. Seeing as how enum is one of the most common types for dictionary keys, the whole thing doesn't make sense at all."

That got me even more intrigued, enough to overcome my laziness and make me start poking around. This is where things started getting really interesting.

Now, those of you who don't care about the why and want the solution, skip to the part called "Finish Line", down below, near the end of this post.

Still reading? Okay, let's roll up those sleeves...

Stage 1: Observation

The obvious first step was to confirm Shawn's claim and work from there. For that, I armed myself with CLR Profiler (thanks to Mike Stall's blog) and I hammered out the following piece of code:

The results were as bad as Shawn predicted: the code allocated 4,825,246 bytes (more than 4.5 MB), out of which 74.61% were instances of Sandbox.TestEnum type. If an enum is winding up on the heap like that, it must be getting boxed.

The profiler reported 24,692 allocated bytes, with nothing sticking out as especially wasteful. It was clear that .NET handled ints and enums differently, even though they are both value types.

CLR Profiler has a very neat feature that allows you to find out which part(s) of the code allocated the memory, by right-clicking on the offending bar in the histogram view and selecting the "Show Who Allocated" option. That will open the allocation graph view, which in my enum case looked like this:

(click on the image to enlarge)

Apparently, my enum was being boxed by ObjectEqualityComparer, inside the Insert method in the generic Dictionary class. Culprit identified, time to move on to the next stage.

Stage 2: Under the Hood

Contrary to my first impulse -- whip out the IL Disassembler -- I decided to first check whether the help files mentioned ObjectEqualityComparer. No such luck. ILDASM it is, then. The disassembly of Insert method in the generic Dictionary class revealed the following interesting snippet:

So my suspect, ObjectEqualityComparer, is an implementation of an IEqualityComparer interface, contained in the comparer field inside my Dictionary object. Here's what help file says about IEqualityComparer:

This interface allows the implementation of customized equality comparison for collections. That is, you can create your own definition of equality for type T, and specify that this definition be used with a collection type that accepts the IEqualityComparer generic interface. In the .NET Framework, constructors of the Dictionary generic collection type accept this interface.

A default implementation of this interface is provided by the Default property of the EqualityComparer generic class. The StringComparer class implements IEqualityComparer of type String.

A bit more digging in help files revealed this:

Dictionary requires an equality implementation to determine whether keys are equal. You can specify an implementation of the IEqualityComparer generic interface by using a constructor that accepts a comparer parameter; if you do not specify an implementation, the default generic equality comparer EqualityComparer.Default is used. If type TKey implements the System.IEquatable generic interface, the default equality comparer uses that implementation.

So how does the EqualityComparer.Default get defined? According to the help file, like this:

The Default property checks whether type T implements the System.IEquatable generic interface and if so returns an EqualityComparer that uses that implementation. Otherwise it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T.

In other words, the boxing happens because my enum doesn't implement IEquatable interface. The funny thing is that Int32 value type implements IEquatable, as does every other primitive numeric type. Yet enums don't. When you stop to think of it, it sounds logical: the Enum value type can't implement IEquatable because that would theoretically allow you to compare any enum to any other enum and obtain bogus results. Not that I'm convinced by that argument, but it's really a moot point, seeing as how I can't change that myself. That leaves looking for a workaround, which brings us to the last stage.

Stage 3: The Lesser Evil

Once you're reduced to looking for a workaround, your job is to look for the least dirty and the least ugly solution. The first thing I tried was to make my enum implement the IEquatable interface. Naturally, there was no way to make C# swallow that syntax. Next I tried setting the EqualityComparer<TestEnum>.Default property, but unfortunately it's read-only. Finally, grudgingly, I conceded to write my own implementation of IEqualityComparer. The first attempt was to write it as a generic class, but when C# complained about not being able to explicitly cast an unknown value type to int (within my GetHashCode method) it finally drove the lesson home: .NET generics are definitely not the same thing as C++ templates.

To sum it up, for each enum type you intend to use as a dictionary key, you'll have to write your implementation of IEqualityComparer and pass it to Dictionary constructor. For each enum type you'll have to type 18 lines of boilerplate code -- you can squeeze it into less, I know -- but that's a relatively small price to pay for avoiding the memory waste caused by boxing your keys. I just wish it wasn't that ugly, but at least it's simple enough. Stay tuned for more .NUTS in the future.