Anonymous Types Unify Within An Assembly, Part Two

Last time I noted that any two usages of “the same” anonymous type within an assembly actually unify to be the same type. By “the same” we mean that the two anonymous types have the same property names and types, and that they appear in the same order. new {X = 1, Y = 2 } and new { Y = 2, X = 1 } do not unify to a single type.

Why is that? You’d think that we could make these unify.

The trouble is that doing so causes more problems than it mitigates. An anonymous type gives you a convenient place to store a small immutable set of name/value pairs, but it gives you more than that. It also gives you an implementation of Equals, GetHashCode and, most germane to this discussion, ToString. (*)

Imagine, for instance, that you have written a bunch of LINQ queries in your code that extract data from a table using an anonymous type. As part of your unit testing, you dump the results of the query as a string out to a file and compare it against a known baseline. Maybe you have hundreds of such tests. And then one day, someone in a completely different part of the code happens to write a LINQ query that has “the same” anonymous type, but with the properties in a different order. We have to pick some order for the properties to be written out by “ToString”, and there is no telling which one we’d pick if forced to choose. It seems very strange that using an anonymous type in one part of the program would cause tests to fail in a completely different part of the program.

Well then, you might say, you could solve this problem by canonicalizing the implementation of ToString. Always write out the properties in alphabetical order, say. But that is hardly an attractive solution. First off, whose alphabetical order? There are dozens of different alphabetical orders, depending on what location in the world you are in. Should we choose the alphabetical order of the developer? The current user? The “culture neutral” order? Assuming we could solve that problem satisfactorily, we’d still be disappointing most users. Developers have a reasonable expectation that ToString will give them the properties in the order they appear in the source code.

Another option would be to not implement ToString for you at all. That is to say, remove a useful and relatively commonly used feature (dumping data to a string for testing or debugging) in order to effectively implement a less useful, rarely used feature (unification of types).

(*) We give you Equals and GetHashCode so that you can use instances of anonymous types in LINQ queries as keys upon which to perform joins. LINQ to Objects implements joins using a hash table for performance reasons, and therefore we need correct implementations of Equals and GetHashCode.

Why not just unify types based on members and their order? That would not break existing code, that would allow deterministic order of fields in ToString(), and order mismatching can be easily reported by a compiler.

It's not remarkable that they share the same generic type definition. I would expect that *all* anonymous types with the same number of properties can share a common generic type definition. However, that does not mean that they are the same type any more than List<int> and List<Widget> are the same type…

Random832: You have to remember that the compiler is not the only consumer of anonymous types. Presumably one of the main reasons they were created was to allow arbitrary projections into objects from LINQ-based ORM queries. In other words, the compiler wouldn't just be the C# and VB compilers, but also EF, LINQ2SQL, and so on.

Specifying an extra constructor parameter would require each object to be maybe 4 bytes bigger and everything that generates anonymous objects would have to know how to specify that extra parameter in each constructor call. That seems like a lot of overhead for something so trivial.

I have my reservations about how smart it is to depend on compiler generated code not changing – if it's hard to handle changes between runs, it will be hard to handle changes between compiler updates. Regardless, I find it a weird that someone would expect you to unify different orders. It seems magical enough that anonymous types are unified at all to me!

How about having a family of generic anonymous classes which have an explicitly-specified "ToString" format, which would inherit from corresponding classes which would use the default. It's certainly not hard to imagine use cases for being able to specify how ToString should format data. What extra run-time cost would there be if only the anonymous objects which specified the "ToString" had a field to hold it?