Comparing .NET Generics and C++ Templates

Many times, I've heard developers refer to the upcoming generics feature of .NET 2.0 (also known as Whidbey) as "templates for .NET." While often a useful shorthand expression, in many ways, it's a very misleading expression. I'd like to discuss a number of the ways in which .NET generics differ from C++ templates.

Probably the most noticeable, if not significant, difference is that specialization of a .NET generic class or method occurs at runtime whereas specialization occurs at compile time for a C++ template. Let's look at C++ templates first.

For example, let's assume the compiler encounters the following specialization of the List<T> template class for the first time in a compilation unit:

The compiler processes the source code for the List<T> template class, replacing the type parameter placeholder with the specialization type of int and, in effect, creates a new data type called List<int> within the assembly. Basically, in C++, the compiler translates the source code for a template class into the appropriate binary specialization of that class within the output object module.

Here's a subtle point, but it's a direct result of how types are identified in .NET. Let's say that C++ code in assembly A has used a List<int> template specialization and that C++ code in assembly B expects to use that exact same specialized type. Code in assembly A creates an instance of the List<int> type and passes a reference to that instance to code in assembly B that wishes to use its List<int> type. Boom! At the very least, you will get a runtime type mismatch exception. Depending on how you pass the reference, you might even get the error at compile time. The error is because the List<int> type in assembly A is not the same type as List<int> in assembly B.

In .NET, all types are qualified by the assembly in which they are defined. (You generally don't notice this when programming using C# or VB.NET, but it's very explicit if you ever look at the IL version of your code.) Therefore, type Foo in assembly A is not the same as type Foo in assembly B even ifFoois defined in exactly the same way in both assemblies. So, [A]List<int> is a different type from [B]List<int>.

Now, let's compare the above behavior to how .NET generics behave. In C++, a template exists as source code and specialization of a template type causes the compiler to create the specialized type from the template. With .NET generics, a compiler expects to read metadata describing the template type. In other words, you must first compile the source code for a generic type to an assembly. Subsequent specializations of the generic type refer to the metadata definition of the generic type. Let's look at the above example in the context of generics.

First, to use the generic List type, you must compile it into an assembly. Let's call that assembly C (for Collection). Now, to compile the source code for assembly A, you will reference assembly C so that the compiler is aware of the definition of the generic List type that the code for A uses.

At runtime, a method that references a generic type with a set of type arguments causes the runtime to create a specialization of the generic type based on the generic definition's metadata. Therefore, when the compiler encounters the List<int> specialization in the source code for assembly A, it emits IL and metadata in assembly A that says, in effect, to the .NET runtime "Go to assembly C, find the definition of the generic type called List<T>, and, based on its definition, create a new specialization of that class called List<int>." So, when assembly A creates an instance of List<int>, it is creating a specialization of [C]List<T>, not [A]List<T> as templates did.

Similarly, when the compiler encounters a parameter of type List<int> when creating assembly B, it generates code that says, in effect, "Expect to receive an instance of the List<int> specialization of the [C]List<T> class." Another way of saying all this is that the definition of a generic type is unique across the application and the single definition can be used by all assemblies. In contrast, a C++ template type definition is local to the assembly into which you compile it.

As a minor point, in the bad ol' days of C++ templates, for each source file in which you used List<int>, the compiler produced a definition of the specialized class in the resulting object file. This often resulted in many duplicate copies of the same template class definition—one per object module. However, modern compilers/linkers typically remove all these duplicate classes so it's not really an issue anymore for C++.

Again, .NET works a little differently here. As the .NET runtime encounters a request to use a specialization of a generic class, it can determine if the specialization, for example, [C]List<int>, has previously been created and, if so, simply reuse the existing class rather than generate a new, duplicate, class. But, the initial request for a different specialization, for example, List<DateTime>, may produce another type definition. In fact, in Whidbey, the runtime creates a new specialization of a generic type or method whenever you specify a different value type for a substitutable type parameter.

However, the .NET runtime optimizes the case where the replaceable type parameters are reference types. Generally speaking, the .NET runtime can produce a single specialization of a generic type that can be reused for all specializations where the replaceable types are reference types. For example, when the runtime produces the specialization of List<System.String>, the resulting specialized type properly handles all possible reference types. Therefore, a subsequent specialization request for List<Employee>, where Employee is a reference type, can reuse the existing code for List<System.String>. After all, in both cases the code is simply manipulating a reference and all references are conceptually used the same way and occupy the same amount of space—unlike the case for value types. Using one shared type for all reference type specializations of a particular generic class or method reduces code bloat and working set.

Another subtle point also related to the compile time/runtime specialization of templates vs. generics relates to what you can actually do in the template code. For example, let's say you have the following method in a template/generic class:

In a C++ template, this code is processed during template specialization. If the type specified for T during specialization contains a CompareTo method that accepts a single argument of the same type, the compiler knows how to generate the code to call the method. However, if the type specified for T does not contain such a method, you get a compile-time error. However, when compiling a generic type definition such as the above, the compiler has a problem.

A .NET generic type definition is compiled into an assembly typically long before any specializations are known. In addition, a .NET generic type definition should be type safe (assuming no use of type unsafe features). There is no way, as the example currently stands, for the compiler to generate type safe code to call the CompareTo method on any arbitrary, yet to be specified, class. This ultimately requires the generic constraint feature.

A constraint is a restriction on the set of possible types that may be specified for a replaceable generic parameter. For example, I can rewrite the above example as follows:

The bold syntax is a constraint on the possible types that may be used for parameter T during specialization of the generic method. In this example, I've informed the compiler that whatever type is used for T during a subsequent specialization, that type must implement the IComparable interface. This implies that any type that implements the interface must provide an implementation of the IComparable::CompareTo virtual method.