Generics in C#

Bill Venners: How do generics work in C#?

Anders Hejlsberg: In C# without generics, you are basically able to say class
List {...}. In C# with generics, you can say class List<T>
{...}, where T is the type parameter. Within
List<T> you can use T as if it were a type. When it
actually comes time to create a List object, you say
List<int> or List<Customer>. You construct new
types from that List<T>, and it is truly as if your type arguments get
substituted for the type parameter. All of the Ts become ints or
Customers, you don't have to downcast, and there is strong type checking
everywhere.

In the CLR [Common Language Runtime], when you compile List<T>,
or any other generic type, it compiles down to IL [Intermediate Language] and metadata just
like any normal type. The IL and metadata contains additional information that knows
there's a type parameter, of course, but in principle, a generic type compiles just the way that
any other type would compile. At runtime, when your application makes its first reference to
List<int>, the system looks to see if anyone already asked for
List<int>. If not, it feeds into the JIT the IL and metadata for
List<T> and the type argument int. The JITer, in
the process of JITing the IL, also substitutes the type parameter.

Bruce Eckel: So it's instantiating at runtime.

Anders Hejlsberg: It's instantiating at runtime, exactly. It's producing native code
specifically for that type at the point it is needed. And literally when you say
List<int>, you will get a List of int. If the
code in the generic type uses an array of T, that becomes an array of
int.

Bruce Eckel: Does it garbage collect that class at some point?

Anders Hejlsberg: Yes and no, but that's an orthogonal issue. It creates the class in
that app domain, and then the class lives forever in that app domain. If you kill the app
domain, the class goes away, like any other class.

Bruce Eckel: But if I have an application that uses a List<int>
and a List<Cat>, but it never goes down the branch that uses
List<Cat>,...

Anders Hejlsberg: ...then the system won't instantiate a
List<Cat>. Now, there are exceptions to that rule. If you're NGENing an
image, that is, if you're generating a native image up front, you can generate instantiations
early. But if you're running under normal circumstances, the instantiations are purely demand
driven, and they are deferred to as late as possible.

Now, what we then do is for all type instantiations that are value types—such as
List<int>, List<long>,
List<double>, List<float>—we create a
unique copy of the executable native code. So List<int> gets its own
code. List<long> gets its own code. List<float> gets
its own code. For all reference types we share the code, because they are representationally
identical. It's just pointers.

Bruce Eckel: And you just need to cast.

Anders Hejlsberg: No, you don't actually. We can share the native image, but they
actually have separate VTables. I'm just pointing out that we do fairly aggressive code sharing
where it makes sense, but we are also very conscious about not sharing where you want the
performance. Typically with value types, you really do care that
List<int> is int. You don't want them to be boxed as
Objects. Boxing value types is one way we could share, but boy it would be an
expensive way.

Bill Venners: In the reference case, you actually have different classes.
List<Elephant> is different from List<Orangutan>,
but they actually share all the same method code.

Anders Hejlsberg: Yes. As an implementation detail, they actually share the same
native code.

Comparing C# and Java Generics

Bruce Eckel: How do C# generics compare with Java generics?

Anders Hejlsberg: Java's generics implementation was based on a project originally
called Pizza, which was done by Martin Odersky and others. Pizza was renamed GJ, then it
turned into a JSR and ended up being adopted into the Java language. And this particular generics
proposal had as a key design goal that it could run on an unmodified VM [Virtual Machine].
It is, of course, great that you don't have to modify your VM, but it also brings about a whole
bunch of odd limitations. The limitations are not necessarily directly apparent, but you very
quickly go, "Hmm, that's strange."

For example, with Java generics, you don't actually get any of the execution efficiency that I
talked about, because when you compile a generic class in Java, the compiler takes away the
type parameter and substitutes Object everywhere. So the compiled image for
List<T> is like a List where you use the type
Object everywhere. Of course, if you now try to make a
List<int>, you get boxing of all the ints. So there's a
bunch of overhead there. Furthermore, to keep the VM happy, the compiler actually has to
insert all of the type casts you didn't write. If it's a List of
Object and you're trying to treat those Objects as
Customers, at some point the Objects must be cast to
Customers to keep the verifier happy. And really all they're doing in their
implementation is automatically inserting those type casts for you. So you get the syntactic
sugar, or some of it at least, but you don't get any of the execution efficiency. So that's issue
number one I have with Java's solution.

Issue number two, and I think this is probably an even bigger issue, is that because Java's generics
implementation relies on erasure of the type parameter, when you get to runtime, you don't
actually have a faithful representation of what you had at compile time. When you apply
reflection to a generic List in Java, you can't tell what the List
is a List of. It's just a List. Because you've lost the type
information, any type of dynamic code-generation scenario, or reflection-based scenario,
simply doesn't work. If there's one trend that's pretty clear to me, it's that there's more and
more of that. And it just doesn't work, because you've lost the type information. Whereas in
our implementation, all of that information is available. You can use reflection to get the
System.Type for object List<T>. You cannot actually
create an instance of it yet, because you don't know what T is. But then you
can use reflection to get the System.Type for int. You can then
ask reflection to please put these two together and create a List<int>,
and you get another System.Type for List<int>. So
representationally, anything you can do at compile time you can also do at runtime.

Comparing C# Generics to C++ Templates

Bruce Eckel: How do C# generics compare with C++ templates?

Anders Hejlsberg: To me the best way to understand the distinction between C#
generics and C++ templates is this: C# generics are really just like classes, except they have a
type parameter. C++ templates are really just like macros, except they look like classes.

The big difference between C# generics and C++ templates shows up in when the type
checking occurs and how the instantiation occurs. First of all, C# does the instantiation at
runtime. C++ does it at compile time, or perhaps at link time. But regardless, the
instantiation happens in C++ before the program runs. That's difference number one.
Difference number two is C# does strong type checking when you compile the generic type.
For an unconstrained type parameter, like List<T>, the only methods
available on values of type T are those that are found on type
Object, because those are the only methods we can generally guarantee will
exist. So in C# generics, we guarantee that any operation you do on a type parameter will
succeed.

C++ is the opposite. In C++, you can do anything you damn well please on a variable of a
type parameter type. But then once you instantiate it, it may not work, and you'll get some
cryptic error messages. For example, if you have a type parameter T, and
variables x and y of type T, and you say
x + y, well you had better have an operator+ defined for
+ of two Ts, or you'll get some cryptic error message.
So in a sense, C++ templates are actually untyped, or loosely
typed. Whereas C# generics are strongly typed.