I was going to start with a comparison to Java "generics" and C++ templates just to show that .NET is "better". But I decided not to do that because we already know that .NET is wonderful. Don't we? So let's leave it as a statement.

Overview

Short intro

Generics in .NET

.NET object memory layout

.NET Generics under the hood

The dessert – a bug in JITter

Moral

Intro

I was going to start with a comparison to Java "generics" and C++ templates just to show that .NET is "better". But I decided not to do that because we already know that .NET is wonderful. Don't we? So let's leave it as a statement. We will recall .NET’s object memory layout and how objects lay in memory, and about Method Table and EEClass. We will take a look at how generics affect them and how they work under the hood, and what optimizations CLR performs to keep them efficient. Then there's a dessert prepared about performance degradation and a bug in CLR. Stay tuned.

Yes, Java and a couple of swear words. When I was developing for .NET, I always thought that it’d be cool somewhere else, in another world, stack or language. That everything would be interesting and easy there. Hey, Scala has pattern matching, they shouted. Once we introduce Kafka, we can process millions of events easily. Or Akka Streams, that's a bleeding edge and would solve all our stream-processing problems. And interest took root and I moved to JVM. And more than half a year I write code in Scala. I noticed that I have started to curse more often, I don't sleep well, and I come home and cry on my pillow sometimes. I don't have the familiar things and tools anymore that I had in .NET. And generics, of course, which don't exist in JVM. People here in Lithuania say: "Šuo ir kariamas pripranta." That means a dog can even get used to the gallows. I have started to like it but that's another story. Sooo...

Generics in .NET

And they are awesome! Probably there are no developers who don’t use them or love them. Are there any? They have a lot of advantages and benefits. In the CLR documentation, it's written that they make programming easier, yes it's bold there. They reduce code duplication. They are smart and support constraints such as class and struct, and implement classes and interfaces. They can preserve inheritance through covariance and contravariance. They improve performance: no more boxings/unboxings, no castings. And all that happens during compilation. How cool is that?! But nothing is free and we'll figure out the price.

.NET memory layout

First, let's recall how objects are stored in memory. When we create an instance of an object, then the following structure (array) is allocated in the heap:

The first element is called the "header" and contains a hashcode or an address in the lock table. The second element contains the Method Table address. Next are the fields of the object. So, the variable o is just a pointer that points to the Method Table. And the Method Table is ...

EEClass

Let me start from EEClass. EEClass is a class, and it knows everything about the type it represents. It gives access to its data through getters and setters. It's quite a complex class. It consists of more than 2000 lines of code and contains other classes and structs, which are also not so small. For example, there is EEClassOptionalFields, which is like a dictionary that stores optional data. Or EEClassPackedFields, which optimizes memory use. EEClass stores a lot of numeric data, such as the number of methods, fields, static methods, static fields, etc. So, EEClassPackedFields optimizes them, drops leading zeros and packs them into one array with access by index. EEClass is also called "cold data". So, getting back to Method Table…

Method table

Method Table is used for optimization! Everything that the runtime needs is extracted from EEClass to Method Table. It's an array with access by index. It is also called "hot data". It may contain the following data:

WinDbg

To take a look at how they are presented in CLR, WinDbg comes to the rescue – the great and powerful. It's a very powerful tool for debugging any application running on Windows but it has an awful user interface and user experience. There are plugins: SOS (Son of Strike) from the CLR team and SOSexfrom a third-party developer.

The Son of Strike isn't just a nice name. When the CLR team was formed, it had an informal name "Lightning". They created a tool for debugging the runtime and called it "Strike". It was a very powerful tool that could probably do everything in CLR. When the time came for the first release, they limited it and called it "Son of Strike". True story.

The name has the type arity "`1" showing the number of generic type parameters and !T as a placeholder for the type. It's a template that tells JIT that the type is generic and unknown at the compile time and will be defined later. Miracle! CLR knows about generics. Let's create an instance of our generic with type object and take a look at the Method Table:

The name with string type but methods have the same strange signature with System.__Canon . If we take a closer look, then we'll see that addresses are the same as in the previous example with the object type. So, the EEClass is the same for a string typed generic and it's shared with object typed generic. However, their Method Tables are different. Let's take a look at value types:

So how does it work under the hood then?

Value types do not share anything and each value type has its own Method Table and EEClass and its own JITted code. In other words, for each value type used as a generic type parameter, CLR will produce a different piece of code. This could lead to what is known as "code bloat" or code explosion, and increase the memory footprint of the program. But that's inevitable because the compiler has to know the size of the value type and the layout of its fields.

Reference types have their own Method Tables. And we can say that a Method Table uniquely describes a type. But all reference types of a generic share one EEClass and share JITted code of its methods between each other. In other words, for each reference type used as a generic type parameter, CLR will use one piece of code. That's an optimization for the memory that greatly reduces the footprint used for generics. That's possible because reference types have the same "word" size. System.__Canon is an internal type and acts as a placeholder. Its main goal is to tell JIT that the type will be found during runtime.

The rules are the same for generics with more than one type parameter. If all type parameters are reference types, then the code is shared otherwise not.

Everything is pretty straightforward when you call a specialized (typed) generic method from a regular method. All checks and type lookups can be done during the compilation (inc JIT) phase. But things get tricky when you call a generic method from another generic method where you don't know the type. The code for the reference types is shared, remember? When a shared method is executed, then any application of generics in its body will have to be looked up to get the concrete runtime type. CLR calls this process "runtime handle lookup". This process is the most important aspect of making shared generic code nearly as efficient as regular methods. Because of the critical performance needs of this feature, both the JIT and runtime cooperate through a series of sophisticated techniques to reduce the overhead.

Let’s talk about how the runtime optimizes these lookups. There are essentially a series of caches to avoid the ultimately expensive lookup of types at runtime via the class loader. Without going into too much detail, you can abstractly look at the lookup costs like this:

"Class loader" – This walks through the entire hierarchy of objects and their methods and tries to find out which method fits the application. Obviously, this is the slowest way to do it (300 clocks).

Type hierarchy walk with global cache lookup – This is a hierarchy walk but it looks in the global cache using the declaring type (think about 50 to 60 clocks for a hit).

Global cache lookup – This is a lookup in the global cache using the current and the declaring types (think about 30 clocks for a hit).

Method Table slot – This adds a slot to the declaring type with a code sequence that can look up the exact type within a few levels of indirection (think 10 clocks for a hit).

The source for this info is given a bit later.

The DESSERT

This is the most interesting part for me. I work on high-load low latency and other fancy-schmancy systems. At that time, I worked on a real-time bidding system that handled ~500K RPS with latencies below 5ms. After some changes, we encountered a performance degradation in one of our modules that parsed the user-agent header and extracted some data from it. I have simplified the code as much as I can to reproduce the issue:

We have a generic class BaseClass<T>, which has a generic field and a method Run to perform some logic. In the constructor, we call a generic method and in method Run() too. And we have an empty class DerivedClass, which is inherited from the BaseClass<T>. And a benchmark:

My first thoughts were like “What?” What programming is that when you add two empty methods and it performs faster? Then I got an answer from Microsoft with the same workaround and saying that the reason is due to the JIT heuristic algorithm. I felt relieved. There was no magic there. Then, the CLR sources were opened and I raised an issue on GitHub. I got a really great explanation from @cmckinsey, one of CLR’s engineers/managers, who explained everything in detail and admitted that it's a bug in JITter. Go and read it! It's worth it. I'll wait.

Basically, it says that point #3 "Global cache lookup" in the list of optimizations mentioned above doesn’t work as expected (or at all). Take a look at the comment above the changed lines - it wasn't changed because it was right. That rare moment...

Moral

Is your code slow? Just add two empty methods. Everyone has experienced this bug for, probably, years. It has been fixed in .NET Core only so far. I just was lucky to find it, and I asked and pushed the CLR team to fix it. Actually, .NET Framework is being coded by developers like you and me. They also make bugs. And that's normal. Just for fun. If there wasn’t any interest then nothing would happen.