A some time ago I had to write my own reflection engine for .NET, I needed it to build .NET documentation generator. After few weeks I discovered Mono.Cecil which (after some modifications) is now ideal for me, therefore I dismissed developing my reflection engine. I thought it is a great opportunity to thank all CodeProject community members for sharing their hard earned knowledge, that they have made in recent years, and write an article.

This two-part article covers signatures, that are, second, very important part of .NET file after metadata, about which Daniel Pistelli wrote an excellent article that can be found here. It's strongly recomended to read this article before progressing, additionaly you can also read An In-Depth Look into the Win32 Portable Executable File Format in MSDN Magazine, describing PE file format which forms foundation for .NET metadata, signatures and Intermediate Language (IL) code. Of course almost everything can be found in the Partition II Metadata specification, but as usual, specifications sacrifice readability for completness, that is another reason why I wrote this article.

English is not my native language, so forgive me horrible mistakes, especially misusing of a/an/the articles

In brief, signatues store data that can not be compactly stored in metadata tables, for instance parameters types, arguments supplied to custom attributes, marshalling descriptors, etc. Storing information such as parameters types in tables would result in excessive data fragmentation, unintelligibility and impose performance penalty, hence CLI/CLR engineers invented signatures allowing storing previously mentioned sort of data in compact and decent manner, in the next chapters you will clearly see why they are so important.

In this chapter you will learn few things nedded to understand rest of the article, so do not underestimate it, information contained here will be extensively used in the following sections. Terms asociated with signatures, but not covered here, will be explained later, along the way.

For viewing .NET metadata and signatures code, we will be using CFF Explorer written also by Daniel Pistelli. The CFF Explorer is a freeware tool, that is capable to view and edit PE headers, resources and some fields and flags of .NET metadata, you can download it at this site. At the picture below you can see CFF Explorer running with sample assembly loaded. Usually signatures are indexed by Signature column, in the red circle there is location of MethodDefSig signature indexed by Method.Signature in the #Blob heap that can be explored by clicking on green circle.

"In computing, endianness is the byte (and sometimes bit) ordering used to represent some kind of data. Typical cases are the order in which integer values are stored as bytes in computer memory (relative to a given memory addressing scheme) and the transmission order over a network or other medium. When specifically talking about bytes, endianness is also referred to simply as byte order."

In our case we consider endianness as byte order, data (usually integers) stored in a file. There are two methods (orders) of representing data in a file, big-endian and little-endian, PE/.NET file uses both methods, so below we discuss each of them.

Big endianIn this ordering method, the most significant byte is stored at file location with the lowest offset, the next byte value is stored at the following file offset and so on. In the example below we want to store value 0x1B5680DA at the offset 100, then memory would look like:

Signatures are compressed before being stored into the #Blob heap by compressing the integers embedded in the signature. In contrast to normal integers that have fixed size, compressed integers use only as much space as needed, almost all signatures use integer compression instead of normal, fixed size integers. Because vast majority numbers in signatures lies below 128, space saving is significant. Below you can see encoding alghoritm copied from specification:

If the value lies between 0 (0x00) and 127 (0x7F), inclusive, encode as a one-byte integer (bit 7 is clear, value held in bits 6 through 0) If the value lies between 2<sup>8</sup> (0x80) and 2<sup>14</sup> - 1 (0x3FFF), inclusive, encode as a 2-byte integer with bit 15 set, bit 14 clear (value held in bits 13 through 0) Otherwise, encode as a 4-byte integer, with bit 31 set, bit 30 set, bit 29 clear (value held in bits 28 through 0) A null string should be represented with the reserved single byte 0xFF, and no following data

Example 1Value is less than 0x80, so this is the first case, we cut three unnecessary bytes.

Orginal value (32-bit)

Compressed value

Saved bytes

Hex

00 00 00 03

0x03

3

Binary

00000000 0000000 00000000 00000011

00000011

-

Example 2The same as Example 1.

Orginal value (32-bit)

Compressed value

Saved bytes

Hex

00 00 00 7F

7F

3

Binary

00000000 0000000 00000000 01111111

01111111

-

Example 3In this example orginal value is equal 0x80, although one byte is enough to save 0x80, using compressed integer requires clearing last bit, hence to store value 0x80 as compressed integer we have to have additional byte.

Orginal value (32-bit)

Compressed value

Saved bytes

Hex

00 00 00 80

80 80

2

Binary

00000000 0000000 00000000 10000000

10000000 10000000

-

Example 4We cut two unnecessary bytes.

Orginal value (32-bit)

Compressed value

Saved bytes

Hex

00 00 2E 57

AE 57

2

Binary

00000000 0000000 00101110 01010111

10101110 01010111

-

Obviously compression comes at a cost, some bits must be reserved to indicate how many bytes compressed integer occupies, thus the maximum encodable integer is 29 bits long with value 0x1FFFFFFF. Compressed integers are physically encoded using big endian byte order.

The following list represents common constants that are frequently used in almost all signatures, in the next parts of the article we will be refering to them very often by abbrevations, using only last name member, for example ELEMENT_TYPE_I8 as I8, ELEMENT_TYPE_STRING as STRING, and so on.

We have almost all preparation behind us, and we can now start talking about signatures, but still they are few things worth mentioning that may not be obvious for everybody. First, almost all integer in signatures are compressed. Second thing that you should remember is that all signatures begin from size (in bytes) that it occupies on the #Blob heap, of course this value is stored using integer compression. Last but not least, values that locate signature on the #Blob heap are absolute, i.e. you do not have to add/subtract anything to the main value (such as in the red circle on the Picture 1) to find signature on the heap.

Also keep in mind that when you recompile attached source code (even without modyfying it) signatures in a resultant assembly may change offset.

Because this article is rather guide, in this chapter we will discuss byte by byte all signatures, begining from the most simple and ending on the most advanced, each signature being discussed is asociated with description, diagram or syntax copied from specification and set of examples whose complete binaries and sources can be downloaded at the top of this article, if possible applications are written using C#, otherwise using CIL (formerly MSIL).

As stated above we begin from the most simple signatures, on of them is FieldSig signature, it mainly describes field's type and custom modifiers attached to a field, is indexed by the Field.Signature column. Of course Field's signature starts from entire signature size, next comes FIELD prolog that has constant value 0x6, zero or more custom modifiers, and field's type. The syntax diagram for FieldSig is shown below, on the Picture 2

NOTE: Please do not confuse custom modifiers with custom attributes ! These are completely different things. Because custom modifiers form part of several signatures, they will be subject to discuss in the next chapter. In examples in the current chapter we will not use any custom modifier.

Picture 2, The FieldSig signature syntax diagram

Example 1This example is straightforward, we have created a simple field of int32 type, as below.

Now we have to load binary assembly FieldSig\1.dll to CFF Explorer, and go to Field table in order to find row associated with our field (should be only one), the picture below should help you a little bit.

PropertySig signature is indexed by the Property.Type column, it stores information about property, that is, the number of parameters supplied to property in order to get data, zero or more custom modifiers, the type of the returned value, the type of each supplied parameter, but there is also one new thing that appeared in PropertySig signature, namely HASTHIS flag (of constant value 0x20), it indicates whether at run-time, the called method is passed a pointer to the target object as its first argument (the this pointer). As you can deduce the HASTHIS flag is set when property (in fact its setter and getter) is instance or virtual, and is not set when property (getter and setter) is static. The flag (if set) is ORed together with signature's prolog value. Below you can see the full syntax diagram for this signature.

Picture 5, The PropertySig signature syntax diagram

Example 1The first example is trivial, we have created one instance property of type int32, as shown below.

Example 2This example is a little bit more complicated because it uses indexed property which returns a different value, depending on the parameters supplied to the property, and as you can see below, such type of property does not have any name (in C#) but in metadata Field table is always declared as Item. You can define only one indexed property per class/structure, but you can overload it.

As name implies, this signature stores information related to methods defined in current assembly, such as the calling convention type, the number of generic parameters, the number of normal method's parameters, the return type and the type of each parameter supplied to the method. Is indexed by the MethodDef.Signature column.

Picture 6, The MethodDefSig signature syntax diagram

Additionaly some flags are used (listed in the table below), they are ORed together and placed in the second byte of the signature (first is a size of a signature).

Name

Value

Meaning

HASTHIS

0x20

First argument passed to a method is the this pointer, this flag is set when method is instance or virtual. You can also see explanation of the HASTHIS flag in the previous subsection.

EXPLICITTHIS

0x40

Specification says: "Normally, a parameter list (which always follows the calling convention) does not provide information about the type of the this pointer, since this can be deduced from other information. When the combination instance explicit is specified, however, the first type in the subsequent parameter list specifies the type of the this pointer and subsequent entries specify the types of the parameters themselves." Please note that if EXPLICITTHIS is set HASTHIS must also be set.

DEFAULT

0x00

Let the Common Language Runtime determine calling convention, this flag is set when calling static methods.

VARARG

0x05

Specifies the calling convention for methods with variable arguments.

GENERIC

0x10

Method has one or more generic parameters.

Example 1As usual, let us start with simple example, this time we have created the instance method that have two generic parameters and two normal parameters, for clarity the method does not have any body.

There is one flag set, namely DEFAULT, it means that method is static, and lets CLR determine a calling convention used. The method is also not generic method, because GENERIC flag is not set, thus next byte specifies a number of normal (not generic) parameters supplied to the method.

Example 4In this example we have created method that accepts variable arguments, i.e. in addtion to normal parameters that are in declaration it accepts variable number of variable type parameters. Adding vararg in the CIL language keyword to the method definition makes method accepting variable arguments, as you can see on the below code listing.

IMPORTANT: Using params keyword in C# does not set the VARARG flag in associated method's signature. The result of my investigation is that method which use params keyword in C# is just decorated by the C# compiler with the ParamArray attribute, and additional parameters are treated as a normal array. You can also make a method truly VARARG in C# by following this instruction, but this is not CLS compliant.

This signature is very similar (if not identical) to previously mentioned the MethodDefSig, but in concern to it, the MethodRefSig describes a method's calling convention, parameters, etc., at the point where a method is called (also known as call site). The signature is indexed by the MemberRef.Signature column, and if a method does not accept variable arguments is identical to MethodDefSig and shall match exactly the signature specified in the definition of the target method, otherwise is as below.

Picture 7, The MethodRefSig signature syntax diagram

As you can see when you calling VARARG method in its associated MethodRefSig there is one additional constant, namely SENTINEL, this value has only one simple aim, it denotes end of the required parameters supplied to the method, and begining of additional (variable) parameters, you can find more information about sentinel values here. Also notice that the ParamCount integer indicates total number of parameters supplied to the method. In the tabele below there is full listing of abbrevations used in MethodRefSig signature when it is different than MethodDefSig.

Name

Value

Meaning

HASTHIS

0x20

First argument passed to a method is the this pointer, this flag is set when method is instance or virtual. You can also see explanation of the HASTHIS flag in the subsection 4.2.

EXPLICITTHIS

0x40

Specification says: "Normally, a parameter list (which always follows the calling convention) does not provide information about the type of the this pointer, since this can be deduced from other information. When the combination instance explicit is specified, however, the first type in the subsequent parameter list specifies the type of the this pointer and subsequent entries specify the types of the parameters themselves." Please note that if EXPLICITTHIS is set HASTHIS must also be set.

VARARG

0x05

Specifies the calling convention for methods with variable arguments.

SENTINEL

0x41

Denotes end of required parameters.

Example 1To convince you that when calling non VARARG method there is no difference between MethodDefSig and its associated MethodRefSig signature, I have created the following code.

Example 2In this example we will demonstrate how the MethodRefSig signature deals with calling VARARG methods. For this purpose we have created truly VARARG method that takes one required parameter, and other, variable parameters. Remember that using params keyword in C# does not set the VARARG flag in associated method's signature, because params just decorates a method with the ParamArray attribute and additional parameters are treated like array of objects of some type. In order to set VARARG flag in the signature you have to add __arglist to a method definition as the last parameter, but this is not CLS compliant, for more information go here.

I have discovered that for above call there are two rows in the MemberRef table, I do not know why this is so, but I know that signature from first encountered row has HASTHIS flag set but it does not contain any information about variable arguments that have been supplied to the method, specification does not say anything about this strange behaviour. But signature indexed by the second row is OK, let us look.

This signature type is very similar to the MethodRefSig, it provides call site signature for a method, but has two key differences, first is that StandAloneSig can specify an unmanaged target method, StandAloneSig is usually created as preparation for executing calli instruction that invokes either managed or unmanaged code. The second important difference is that the StandAloneSig signature is indexed by the StandAloneSig.Signature column, which is only one column in the StandAloneSig metadata table, what is more, each row in this table is not referenced by any other table (that is why its name is "stand alone"), this table is filled by code generators. The signature at StandAloneSig.Signature column shall be either, the StandAloneMethodSig signature for each execution of calli instruction or the LocalVarSig signature that describes local variables in each method, and which will be further clarified in the next subsection. The syntax diagram for the StandAloneSig signature is as follows.

Picture 8, The StandAloneMethodSig signature syntax diagram

Because this signature is different from the MethodRefSig signature only to those that StansAloneMethodSig can call unmanaged methods, few other constants were added that describe calling conventions used to invoke unmanaged methods.

IMPORTANT: As you will see soon, there are different calling conventions for invoking methods accepting variable parameters for managed and unmanaged code. Diagram for each case may look different, for example the VARARG calling convention invokes managed methods accepting variable parameters, in this case signature has additional elements, SENTINEL and one or more Param (shaded boxes), however the C calling convention also invokes methods accepting variable parameters (unmanaged code), but signature for this case ends just before Param element. From my observations compiler generates signatures as stated above, unfortunately my sample code compile, but throws an exception and I do not know where is the problem so I can not certainly say that my observations are correct, moreover specification is not clear: "Two separate diagrams have been combined into one in this diagram, using shading to distinguish between them. Thus, for the following calling conventions: DEFAULT (managed), STDCALL, THISCALL and FASTCALL (unmanaged), the signature ends just before the SENTINEL item (these are all non vararg signatures). However, for the managed and unmanaged vararg calling conventions: VARARG (managed) and C (unmanaged), the signature can include the SENTINEL and final Param items (they are not required, however). These options are indicated by the shading of boxes in the syntax diagram.". Do you see that ? Why the C box is not shaded if using C calling convention may add SENTINEL and Param elements when calling unmanaged method which accepts variable arguments ? Under what circumstances Param elements are not required ? The calli instruction occurs very rarely in 100% properly working code that calls an unmanaged method (392 assemblies from my GAC executes the calli instruction only twice and only against managed methods !), so I can not say that my explanations for following sample code in this subsection are absolutely true. If somebody know how StandAloneMehtodSig signature looks when correctly calling unmanaged method (either, accepting or not accepting variable arguments - in both cases code throws exception), please let me know, I would be very grateful.

Name

Value

Meaning

HASTHIS

0x20

First argument passed to a method is the this pointer, this flag is set when method is instance or virtual. You can also see explanation of the HASTHIS flag in the subsection 4.2.

EXPLICITTHIS

0x40

Specification says: "Normally, a parameter list (which always follows the calling convention) does not provide information about the type of the this pointer, since this can be deduced from other information. When the combination instance explicit is specified, however, the first type in the subsequent parameter list specifies the type of the this pointer and subsequent entries specify the types of the parameters themselves." Please note that if EXPLICITTHIS is set HASTHIS must also be set.

DEFAULT

0x00

Let the Common Language Runtime determine calling convention, this flag is set when calling static methods.

VARARG

0x05

Specifies the calling convention for managed methods with variable arguments.

Some parameters are placed in ECX and EDX registers, the rest of the arguments are placed (pushed) onto the stack from right to left.

Called method performs stack cleanup.

You can use this calling convention by adding the unmanaged fastcall keyword to a method definition in the CIL language.

SENTINEL

0x41

Denotes end of required parameters.

NOTE: One thing worth mentioning here, is that in contrast to the CL (Microsoft C\C++ compiler), the ILASM (Microsoft CIL compiler) does not add any special characters (such as "@", "_", "?", etc.) to a method name when using any of the calling conventions for unmanaged targets. The CIL compiler does not decorate any methods names with special characters because it just generates the bytecode, that can be later compiled into the machine code by the CLR's Just-in-time compiler, so when you choose some calling convention when coding in CIL, the ILASM compiler does not determine who (caller or called method) cleans a stack, does not determine in what order arguments are passed to a method, and does not change methods names, this is doing during JIT compilation / optimization. If you do not know what I am talking about, you can read Nemanja Trifunovic's article entitled Calling Conventions Demystified which throughly describes different calling conventions types for C and C++, their meaning, how their work, etc.

Example 1In sample code listing we have two managed methods, first method has one fixed parameter of type int32 and returns also int32 (in fact it does not return anything, since there is no any data that is pushed onto the evaluation stack), second listed method just executes first method, you can see it below.

Before the method TestRunMethod executes the TestMethod, it pushes one int32 value (argument) onto the evaluation stack using ldc.i4.1 instruction, then pushes pointer to the first method onto the evaluation stack by ldftn instruction, finally it calls our test "do nothing" managed method executing calli, and this last instruction generates the StandAloneMethodSig signature which is explained in the table below.

Offset

Value

Meaning

0x01

0x04

Signature size.

0x02

0x00

The method does not use any specific calling convention, the method is not instance method, since there is no HASTHIS flag set.

Example 2In this example we will make sample method accepting variable arguments and we will call it by calli with one required and one additional parameter. The fixed parameters are separated from the additional parameters with an ellipsis (...), as seen below.

Example 3This is the most problematic sample of the entire article, the method in sample code below calls the unmanaged method that accepts variable arguments, code compiles but throws a TypeLoadException exception ("The signature is incorrect"), unfortunately specification is not clear about this case (see important note at the begining of this subsection). The sample code shown below likewise in the first example, calls method that accepts variable arguments but this time the called method is unmanaged.

Share

About the Author

Przemek was born in 1988, he lives in small town near Warsaw in Poland, Europe. Currently he codes some C# stuff and J2EE as well, ocasionally he uses C++ for fun. Przemek is cycling fun, if weather permits he rides a bike.

Thank you very much for you appreciation. I had worked on documentation generator based on Mono.Cecil, something like NDoc, but much more complex, unfortunately I hadn't enough time to continue development of my project. It is good job, but still, it leaves much, much to be desired, it does not have any documentation so there is no possibility to publish in the web what I already did. But in the future, I certainly finish this project .

You may ask why not to use standard, built-in reflection engine (System.Reflection), the answer is simple, if you once load an assembly to an application domain you couldn't unload the assembly when you are finished, of course, you can create separate application domain load an assembly to it, and unload whole application domain, but this requires inheriting from MarshalByRefObject Class[^] when sending data across domains and this creates horrible bottleneck .

Mono.Cecil is superb library but it also has some drawbacks, the key is that it consumes pretty much memory (for mscorlib it is around 20MB), this is problem because, my documentation generator is built on a different philosophy, tools such as Sandcastle, first stores data gathered from reflection, then merges it with user comments and MSDN documentation, my project just loads all assemblies to memory, sorts namespaces and types within them, and then creates documentation - without saving any reflection data to file, this makes documentation generating very fast, but it comes at a cost. When you have 5-10 big assemblies being documented there is no problem, but if each of these assemblies references another 5 assemblies - they also must be loaded in order to document members from base classes. That makes a serious problem, memory consumption can be huge. I've removed some code from Cecil that was responsible for handling method's body, but in fact, it didn't help too much. I have some ideas to hop-over this problem, but currently I don't have enough time.

I have just seen your second article. Now I know what you were up to. I am using Mono Cecil also for some tool to check for API changes. My approach is different from yours in the way that I am loading only one assembly at a time into memory and try not to resolve the base types declared in other assemblies. It limits of course the ability to gather related information. There should be ways to unload assemblies when they are no longer needed. The easiest would be with a memory profiler to look where the objects are rooted to let the GC do its work.

This is an option, but from my point of view the best way is to store reflection data obtained by Mono.Cecil in some serverless database, such as SQLite. Then objects would be constructed only when necessary, with complex types loaded on-demand, good design of table/indexes, and caching already constructed instances, this solution would be pretty fast. Unfortunately, as far as I know, NHibernate doesn't support SQLite database dialect, and ADO.NET Entity Framework is rather young and immature, moreover it is widely criticised for lack of lazy-initialization, so the only one option is to create O/R mapping code manually and using LINQ to SQLite.

I am not sure if you really need a database. NDepends for example does store all the data as XML files which tend also to become very big and memory hungry. The all in memory approach does not scale for big projects with several hundreds up to thouands of assemblies. I would make the resolution process in passes. First pass read all assemblies one by one and store the base classes and interface dependencies for all types. The next pass can then build uppon the knowledge and load only on assembly at a time and load also the ones for the needed types. Suck the data out and make sure all the Mono Cecil data is ready to be GC collected. This way a rather slim system could be done. If you like you could store the type relationships in an XML index file just for fun but do not fall into the trap to store the dependencies as XML nodes since they are rather memory hungry.

This would be good solution, if documentation were generated assembly by assembly, rather than namespace by namespace - as my documentation generator does and as MSDN class reference for .NET is arranged. Since many assemblies can contain types which reside in the same namespace, there are chances that different assemblies would be constantly loaded and unloaded, when documenting a single namespace. Let's consider the following example, we have two assembies: A and B, the assembly A has TypeA and TypeC class, the assembly B has TypeB and TypeD class, all are members of MyNamespace namespace, below you can see what would happen if we wanted to generate documentation for this namespace assuming that members of the namespace are sorted ascending:

Many namespaces are divided between two or more assemblies, as you can see above delays in loading/unloading assemblies would seriously lengthen the time of generating documentation. I want to let user choose how documentation is arranged (namespaces as a root or assemblies as a root), so this solution will not satisfy me either. Anyhow, thanks for your tips.

Thank you. No I was not member of soviet union , Poland was part of soviet union till 1988. I put "formerly member of soviet union" because most readers of CodeProject are americans, in states, still some people don't know that Poland is now independent country with free-market economy.

Yes, you are right, theoretically Poland was not member of soviet union, but Poland along with other countries and Russia formed the Eastern Block, countries that were members of it had no influence on any key decisions, each country in Eastern Block has its own so-called secretary, but economy and army was controlled by the leader of the soviet union. In fact you are right, I have removed doubtful element from my CP profile, it was deceptive.

No, no, I don't feel offended, I was misunderstood, I wanted to clarify this issue to others too, because my sister was in USA two years ago and she told me that some people asked her whether Poland is still in soviet union, or event, whether Poland is in Russia