IL2CPP Internals: Generic sharing implementation

This is the fifth post in the IL2CPP Internals series. In the last post, we looked at how methods are called in the C++ code generated for the IL2CPP scripting backend. In this post, we will explore how they are implemented. Specifically, we will try to better understand one of the most important features of code generated with IL2CPP – generic sharing. Generic sharing allows many generic methods to share one common implementation. This leads to significant decreases in executable size for the IL2CPP scripting backend.

Note that generic sharing is not a new idea, both Mono and .Net runtimes use generic sharing as well. Initially, IL2CPP did not perform generic sharing. Recent improvements have made it even more robust and beneficial. Since il2cpp.exe generates C++ code, we can see where the method implementations are shared.

We will explore how generic method implementations are shared (or not) for reference types and value types. We will also investigate how generic parameter constraints affect generic sharing.

Keep in mind that everything discussed in this series are implementation details. The topics and code discussed here are likely to change in the future. We like to expose and discuss details like this when it is possible though!

What is generic sharing?

Imagine you are writing the implementation for the List<T> class in C#. Would that implementation depend on the type T is? Could you use the same implementation of the Add method for List<string> and List<object>? How about List<DateTime>?

In fact, the power of generics is just that these C# implementations can be shared, and the generic class List<T> will work for any T. But what happens when List is translated from C# to something executable, like assembly code (as Mono does) or C++ code (as IL2CPP does)? Can we still share the implementation of the Add method?

Yes, we can share it most of the time. As we’ll discover in this post, the ability to share the implementation of a generic method depends almost entirely on the size of that type T. If T is any reference type (like string or object), then it will always be the size of a pointer. If T is a value type (like int or DateTime), its size may vary, and things get a bit more complex. The more method implementations which can be shared, the smaller the resulting executable code is.

Mark Probst, the developer who implemented generic sharing Mono, has an excellent series of posts on how Mono performs generic sharing. We won’t go into that much depth about generic sharing here. Instead, we will see how and when IL2CPP performs generic sharing. Hopefully this information will help you better analyze and understand the executable size of your project.

IL2CPP does not share generic method implementations when T is a value type because the size of each value type will differ (based on the size of its fields).

Practically, this means that adding a new usage of SomeGenericType<T>, where T is a reference type will have a minimal impact on the executable size. However, if T is a value type, the executable size will be impacted. This behavior is the same for both the Mono and IL2CPP scripting backends. If you want to know more, read on, it’s time to dig into some implementation details!

The setup

I’ll be using Unity 5.0.2p1 on Windows, and building for the WebGL platform. I’ve enabled the “Development Player” option in the build settings, and the “Enable Exceptions” option is set to a value of “None”. The script code for this post starts with a driver method to create instances of the generic types we will investigate:

And all of code is nested in a class named HelloWorld derived from MonoBehaviour.

If you view the command line for il2cpp.exe, note that it does not contain the --enable-generic-sharing option, as described in the first post in this series. However, generic sharing is still occurring. It is no longer optional, and happens in all cases now.

Generic sharing for reference types

We’ll start by looking at the most often occurring generic sharing case: reference types. Since all reference types in managed code derive from System.Object, all reference types in the generated C++ code derive from the Object_t type. All reference types can then be represented in C++ code using the type Object_t* as a placeholder. We’ll see why this is important in a moment.

Let’s search for the generated version of the DemonstrateGenericSharing method. In my project it is named HelloWorld_DemonstrateGenericSharing_m4. We’re looking for the method definitions for the four methods in the GenericType class. Using Ctags, we can jump to the method declaration for the GenericType<string> constructor, GenericType_1__ctor_m8. Note that this method declaration is actually a #define statement, mapping the method to another method, GenericType_1__ctor_m10447_gshared.

Let’s jump back, back and then find the method declarations for the GenericType<AnyClass> type. If we jump to the declaration of the constructor, GenericType_1__ctor_m9, we can see that it is also a #define statement, mapped to the same function, GenericType_1__ctor_m10447_gshared!

If we jump to the definition of GenericType_1__ctor_m10447_gshared, we can see from the code comment on the method definition that this method corresponds to the managed method name HelloWorld/GenericType`1<System.Object>::.ctor(). This is the constructor for the GenericType<object> type. This type is called the fully shared type, meaning that given a type GenericType<T>, for any T that is a reference type, the implementation of all methods will use this version, where T is object.

Look just below the constructor in the generated code, and you should see the C++ code for the UsesGenericParameter method:

In both places where the generic parameter T is used (the return type and the type of the single managed argument), the generated code uses the Object_t* type. Since all reference types can be represented in the generated code by Object_t*, we can call this single method implementation for any T that is a reference type.

In the second blog post in this series (about generated code), we mentioned that all method definitions are free functions in C++. The il2cpp.exe utility does not generate overridden methods in C# using C++ inheritance. However, il2cpp.exe does use C++ inheritance for types. If we search the generated code for the string “AnyClass_t” we can find the C++ representation of the C# type AnyClass:

1

2

3

structAnyClass_t1:publicObject_t

{

};

Since AnyClass_t1 derives from Object_t, we can pass a pointer to AnyClass_t1 as the argument to the GenericType_1_UsesGenericParameter_m10449_gshared function without problems.

What about the return value though? We can’t return a pointer to a base class where a pointer to a derived class is expected, right? Take a look at the declaration of the GenericType<AnyClass>::UsesGenericParameter method:

The generated code is actually casting the return value (type Object_t*) to the derived type AnyClass_t1*. So here IL2CPP is lying to the C++ compiler to avoid the C++ type system. Since the C# compiler has already enforced that no code in UsesGenericParameter does anything unreasonable with type T, then IL2CPP is safe to lie to the C++ compiler here.

Generic sharing with constraints

Suppose that we want to allow some methods to be called on an object of type T? Won’t the use of Object_t* prevent that, since we don’t have many methods on System.Object? Yes, this is correct. But we first need to express this idea to the C# compiler using generic constraints.

Take a look again in the script code for this post at the type named InterfaceConstrainedGenericType. This generic type uses a where clause to require that it type T be derived from a given interface, AnswerFinderInterface. This allows the ComputeAnswer method to be called. Recall from the previous blog post about method invocation that calls on interface methods require a lookup in a vtable structure. Since the FindTheAnswer method will make a direct function call on the constrained instance of type T, then the C++ code can still use the fully shared method implementation, with the type T represented by Object_t*.

If we start at the implementation of the HelloWorld_DemonstrateGenericSharing_m4 function, then jump to the definition of the InterfaceConstrainedGenericType_1__ctor_m11 function, we can see that this method is again a #define, mapping to the InterfaceConstrainedGenericType_1__ctor_m10456_gshared function. If we look just below that function for the implementation of the InterfaceConstrainedGenericType_1_FindTheAnswer_m10458_gshared function, we can see that indeed, this is the fully shared version of the function, taking an Object_t* argument. It calls the InterfaceFuncInvoker0::Invoke function to actually make the call to the managed ComputeAnswer method.

This all hangs together in the generated C++ code code because IL2CPP treats all managed interfaces like System.Object. This is a useful rule of thumb to help understand the code generated by il2cpp.exe in other cases as well.

Constraints with a base class

In addition to interface constraints, C# allows constraints to be a base class. IL2CPP does not treat all base classes like System.Object, so how does generic sharing work for base class constraints?

Since base classes are always reference types, IL2CPP uses the fully shared version of the generic methods for these types. Any code which needs to use a field or call a method on the constrained type is performs a cast in C++ to the proper type. Again, here we rely on the C# compiler to correctly enforce the generic constraint, and we lie to the C++ compiler about the type.

Generic sharing with value types

Let’s jump back now to the HelloWorld_DemonstrateGenericSharing_m4 function and look at the implementation for GenericType<DateTime>. The DateTime type is a value type, so GenericType<DateTime> is not shared. We can jump to the declaration of constructor for this type, GenericType_1__ctor_m10. There we see a #define, as in the other cases, but the #define maps to the GenericType_1__ctor_m10_gshared function, which is specific to the GenericType<DateTime> class, and is not used by any other class.

Thinking about generic sharing conceptually

The implementation of generic sharing can be difficult to understand and follow. The problem space itself is fraught with pathological cases (e.g. the curiously recurring template pattern). It can help to think about a few concepts:

Generic types with a reference type generic parameter are fully shared – they always use the implementation with System.Object for all type parameters.

Generic types with two or more type parameters can be partially shared if at least one of those type parameters is a reference type.

The il2cpp.exe utility always generates the fully shared method implementations for any generic type. It generates other method implementations only when they are used.

Sharing of generic methods

Just as method implementations on generic types can be shared, so can method implementation for generic methods. In the original script code, notice that the UsesDifferentGenericParameter method uses a different type parameter than the GenericType class. When we looked at the shared method implementations for the GenericType class, we did not see the UsesDifferentGenericParameter method. If I search the generated code for “UsesDifferentGenericParameter” I see that the implementation of this method is in the GenericMethods0.cpp file:

Notice that this the fully shared version of the method implementation, accepting the type Object_t*. Although this method is in a generic type, the behavior would be the same for a generic method in a non-generic type as well. Effectively, il2cpp.exe attempts to always generate the least code possible for method implementations involving generic parameters.

Conclusion

Generic sharing has been one of the most important improvements to the IL2CPP scripting backend since its initial release. It allows the generated C++ code to be as small as possible, sharing method implementations where they do not differ in behavior. As we look to continue to decrease binary size, we will work to take advantage of more opportunities to share method implementations.

In the next post, we will explore how p/invoke wrappers are generated, and how types are marshaled from managed to native code. We will be able to see the cost of marshaling various types, and debug problems with marshaling code.

12 Comments

What about the following code? Would this work as expected? (Which means, the private method Bar is called whenever any Foo instance is used as an interface. The public Bar is called when I call the method directly on an Foo instance or any subclass)

Yes, I would expect this code to work as expected. Since the Bar method is called on the object instance in one case, and on the interface in another case, IL2CPP can tell the difference and make the correct call.

Thanks for pointing this out. We’re trying to correct them, but the blogging software is not cooperating at the moment. I hope that the formatting issues don’t interfere with the content too much. They should only impact the C++ code snippets now, I think.

No, it was not intended to be funny and sarcastic at all! You are correct, I did not get it fixed properly the first time. I *think* that all of the formatting is right now though. Thanks for pointing this out again!