This article is the second of a two series of articles about the .NET Framework internals and the protections available for .NET assemblies. This article analyzes more in depth the .NET internals. Thus, the reader should be familiar with the past article, otherwise certain paragraphs of this article may seem obscure. As the JIT inner workings haven't been analyzed yet, .NET protections are quite naïve nowadays. This situation will rapidly change as soon as the reverse engineering community will focus its attention on this technology. These two articles are aimed to raise the consciousness about the current state of .NET protections and what is possible to achieve but hasn't been done yet. In particular, the past article about .NET code injection represents, let's say, the present, whereas the current one about .NET native compiling represents the future. What I'm presenting in these two articles is new at the time I'm writing it, but I expect it to become obsolete in less than a year. Of course, this is obvious as I'm moving the first steps out from current .NET protections in the direction of better ones. But this article isn't really about protections: exploring the .NET Framework internals can be useful for many purposes. So, talking about protections is just a means to an end.

Strictly speaking it means converting the MSIL code of a .NET assembly to native machine code and then removing the MSIL code from that assembly, making it impossible to decompile it in a straightforward way. The only existing tool to native compile .NET assemblies is the Salamander.NET linker which relies on native images to do its job. The "native images" (which in this article I called "Native Framework Deployment") technique is quite distant from .NET internals: one doesn't need a good knowledge of .NET internals to implement it. But, as the topic is, I might say, quite popular, I'm going to show to the reader how to write his Native Framework Deployment tool if he wishes to. However, the article will go further than that by introducing Native Injection, which means nothing else than taking the JIT's place. Even though this is not useful for commercial protections (or whatever), it's a good way to play with JIT internals. I'm also going to introduce Native Decompiling, which is the result of an understanding of .NET internals. I'm also trying to address another topic: .NET Virtual Machine Protections.

The internal format of native images is yet undocumented. It also would be quite hard documenting it as it constantly changes. For instance, it completely changed from version 1 to version 2 of the .NET framework. And, as the new Framework 3.5 SP1 has been released a few days ago, it changed another time. I'm not sure on what extent it changed in the last version, but one change can be noticed immediately. The original MetaData is now directly available without changing the entry in the .NET directory to the MetaData RVA found in the Native Header. If you do that action, you'll end up with the native image MetaData which isn't much interesting. Also, in earlier native images (previous to 3.5 SP1 framework) to obtain the original MSIL code of a method, one had to add the RVA found in the MethodDef table to the Original MSIL Code RVA entry in the native header. This is no longer necessary as the MethodDef RVA entry now points directly to the method's MSIL code.

This is important, since protections like the Salamander Linker need to remove the original MSIL code from a native image before they can deploy it. Otherwise the whole protection become useless, since MetaData and MSIL code are all what is necessary to rebuild a fully decompilable .NET assembly. The stripping of MSIL code was easier in the "old" format, because one only needed the Original MSIL Code RVA and Size entries to know which part of the native image had to be erased with a simple memset.

All we need to know about the native images' format in order to write a Native Framework Deployment tool is how to strip the MSIL code from it. Even the Salamander Linker will need time to adapt to the new native image format in order to work with the framework 3.5 SP1. And, as there isn't currently any protection which works with 3.5 SP1 native images, what I'm writing in this article has been only tested against earlier images.

Another reason why it is difficult to document native images is the lack of the code which handles them in the Rotor project. It was a deliberate choice made by Microsoft to exclude this part of the framework from the Rotor project.

The name I gave to this sort of protection may appear a bit strange, but it will appear quite obvious as soon as I have explained how it actually works. As already said, there's no protection system other than the Salamander Linker which removes the MSIL and ships only native machine code. And, in order to do that, the Salamander Linker relies on native images generated by ngen. The Salamander Linker offers a downloadable demonstration on its home page and we will take a look at that without, of course, analyzing its code, as I don't intend to violate any licensing terms it may imply. In this paragraph I'm going to show how it is technically quite easy to write a Native Framework Deployment tool, but I doubt that the reader will want to write one after reading this. Don't get me wrong, the Salamander Linker absolutely holds its promise and actually removes the MSIL code from one's application, but the method used faces many problems and in my opinion is not a real solution.

The Salamander Linker's demonstration is called scribble and it's a simple MDI application. Let's look at the application's main directory:

The v2.0.50727 directory corresponds to the framework directory which can be found inside C:\Windows\Microsoft.NET\, although it comes with only a limited number of files inside:

I'll explain in a moment why some important assemblies like System or System.Windows.Forms are missing. Meanwhile, the C directory leads to a series of other directories. The main path it produces looks something like this: C\WINDOWS\assembly\. In the last directory of this path two more directories are contained. One directory is called GAC_32 and contains the mscorlib assembly. The other directory is called NativeImages_v2.0.50727_32 and is the directory where native images are stored. This directory contains only two native images: the mscorlib one and the scribble one. The scribble native image is gigantic, that's because before ngening scribble was merged with its dependencies: System, System.Windows.Forms, etc. The only dependency which can't be merged to another assembly is mscorlib. The reasons for that are many. The reader can imagine one of them if he has read the past article: mscorlib is a low level assembly strictly connected to the framework, among the things it does it provides the internal calls implementation. If a non-system assembly tries to call an internal function, it will only result in the framework displaying a privileges error.

The Salamander Linker deploys a subset of the framework. Thus, the name Native Framework Deployment I gave to this technique. Native images are bound to a the framework in a rather complicate way. In fact, native images are highly framework dependent. But let's for a second focus only on the relationship between an assembly and its native image on the local system. One can modify an assembly all he wants, but by just leaving its #GUID stream and some data in the MetaData table unchanged the same native image will be loaded for that assembly. This means that one can even bind a totally different assembly to a native image. This is quite easy to achieve: first, let's ngen a random assembly. Assemblies are bound to their native images through the registry. The registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Fusion\NativeImagesIndex\v2.0.50727_32 is where the binding between assemblies and native images happens:

This key has two subkeys: "IL" and "NI". The "IL" key contains a series of subkeys which represent the ngened assemblies and the information needed to bind them to their native images:

Keep in mind the DisplayName as it The SIG value contains the assembly's GUID and its SHA1 hash:

The selected bytes represent the SHA1 hash. Ironically, this hash isn't used to bind the actual assembly to its native image. But this behaviour might change in the future, so it's worth mentioning.

The "NI" key's subkeys tell the framework where it can find the native image for a given assembly:

The MVID value specifies the path of the native image. In this case it'll be: C:\Windows\assembly\NativeImages_v2.0.50727_32\rebtest\0f12d8560d3b72df51b3471002c911a0. Also, it should be noted that the "511072a1" subkey references the appropriate "IL" subkey.

So, in order to bind another assembly to this assembly's native image, it is necessary to change its GUID and also the Assembly MetaData table:

The Name in the Assembly MetaData table should be changed to the display name (in this case: "rebtest"). Also, change the MajorVersion, MinorVersion, BuildNumber and RevisionNumber accordingly. I showed the Module Table in the image just because it would be logical to change that as well, but the framework doesn't care about it. Thus, neither do we.

This is all it takes to bind a local image and it works with the framework 3.5 SP1 as well. Of course, binding a native image on another computer isn't as easy, since native images are framework / system dependent. And also it is not guaranted to work, since, as mentioned earlier, native images may change along with newer versions of the framework. This problem can be "solved" by shipping the whole framework along with the native images.

Let's go back to the Salamander Linker demonstation's main directory. The Scribble.exe is a native exe which loads the Scribble.rsm. Scribble.rsm is an empty assembly used to load a native image. The binding between this empty assembly and a native image is done how I described above. By shipping its own framework version the Salamander Linker has only to worry about local binding. Of course, it is not sufficient to put the framework files in a folder in order to deploy it. A virtualization has to be provided as well. The mdepoy.registry is a text file which contains the registry keys to virtualize. It looks like this:

The actual file is much bigger (31 KB). rsdeploy.dll is the part of the Salamander Linker which does most of the work: it hooks all the APIs it needs to virtualize the framework. This can be easily verified without analyzing its code. Among the APIs it needs to hook there's LoadLibrary, of course, and all registry functions. It also needs to hook some other functions, which I'm going to discuss in the next paragraph.

When virtualizing an application there's not only the file system and the registry to consider. Environment variables have to be considered as well. If we look at the environment of the Scribble process with Russinovich's Process Explorer we will notice something:

The Salamander Linker sets the COMPLUS_InstallRoot variable to its own main directory. Since this variable is not used and the framework is loaded even without it, my guess is that it's a deprecated variable of the framework 1.0.

This is about everything one has to know in order to develop his own Native Framework Deployment tool. One might be asking where the merging part comes in. Actually, the merging is not really necessary. It only makes things easier and also, since the whole framework is shipped, it speeds up performances. I could easily adapt the Rebel.NET code to write an assembly merger (it would be a two-weeks job), but I'm not interested in anything that can be achieved through merging assemblies: like, for instance, writing a protection like this one. As alternative, one might consider using ILMerge, a Microsoft utility which can also be used in commercial applications. The only drawback is that it is extremely slow (it's a .NET assembly) and I have already experienced cases where it doesn't work, but this may improve in time. In the next sub-paragraphs I'm going to address some aspects of the possible development of a Native Framework Deployment service.

Let's see how a possible loader for a Native Framework Deployment service may look like. What follows is only a first draft of the loader: I'm not introducing the complete loader yet, because I'm proceeding gradually.

There are a few things to say about this code. For once, it may not seem obvious to the reader why I'm fixing IAT and relocations. Usually, LoadLibrary (which I'm using to load the assembly) does this task, but on systems which have the .NET framework installed it doesn't do this for .NET assemblies. After fixing the PE, I jump to the assembly's entry point (which is just a jump to _CorExeMain in mscoree). Actually, I could have called the _CorExeMain directly without jumping to the original entry point. Thus, making the code to fix IAT and relocations not necessary. I just did it this way in order to avoid any incompatibilities in the future. The key point to load an assembly is to understand how _CorExeMain is going to retrieve the base address of the main assembly in the current address space. The code of _CorExeMain, after doing some checks to load the correct .NET runtime, calls the same function inside mscorwks. Here's the code inside mscorwks:

The _CorExeMain function in mscorwks retrieves the main assembly through a call to GetModuleHandleA/W(NULL) called inside WszGetModuleHandle. Not only that: before GetModuleHandle, GetModuleFileName gets called inside mscoree. This API accepts the same NULL syntax as GetModuleHandle to obtain information about the main module in the current address space. So, the easiest way to tell the framework which the main assembly is, is to hook both GetModuleHandleA/W and GetModuleFileNameA/W. I decided to use Microsoft's Detour to implement the hooking, since its licensing is free for research projects and it is guaranted to work on every Windows platform. Here's the code of the actual loader:

This code just loads a .NET assembly. In order to achieve the deployment of a .NET framework, it is necessary to hook registry APIs and file system ones such as LoadLibrary as well. In the next paragraph I'm going to address registry virtualization which brings us one step forward.

I wouldn't have written this paragraph if I hadn't already had the material which I'm going to present. One of my unfinished (due to the lack of time) articles is related to virtualization. Many months ago I wrote a registry virtualizer.

The main form (VirtualReg Manager) of this tool provides the visual interface to create a virtual registry. This can also be achieved through command line, as we'll see later. One can decide whether to virtualize a key along with its subkeys or not.

The virtual registry is an XML database. The format of this XML file looks like this:

Numbers are stored in hex format, whereas all other data is base64 encoded. The virtual registry file can be edited with VirtualReg Editor (vregedit), which is very user-friendly as its interface is identical to regedit's one.

Creating a virtual registry from the GUI is okay for manual task, but tools can use the program's command line to generate a virtual registry. In order to do that, a ".tovreg" file has to be passed as command line to the program. A tovreg file has this syntax:

As one can see, it's a simply ini file. If the "subkeys" parameter is missing, then subkeys are not virtualized.

As this is part of an unfinished article, I have not written the monitor to retrieve the keys to virtualize yet. However, it's quite easy to write one or, being very lazy, using the log generated by Russinovich's Process Monitor is also an option. The catched keys should be virtualized without their subkeys, as this might in some cases result in a much to big virtual registry with unnecessary keys.

Since the code generation for native images is platform specific, it might as well imply optimizations which cannot work on other CPUs. An example of this could the use of a specific version of SSE instructions which are not available on every architecture. This problem could be "solved" by making ngen believe that it is running on an older (or different) CPU, but this is just a mess.

I'm not in favor of personal opinions inside technical articles, but it is necessary to say something about this, since one might ask me why I'm not writing a Native Framework Deployment service myself. With the information provided in this article it would take no longer than a month to provide a commercial product. The reason why I don't do it is simply because I believe it is unprofessional and technically speaking a mess. It might as well always work, but no one in his right mind would deploy every .NET assembly with a subset of the .NET framework. Deploying 40 MBs or more of data for a simple assembly is not a real solution. In fact, it's not a solution at all.

I was tempted to write a complete demonstration of such a protection (without the merging part, of course) for this article and it would have taken me no longer than a few days, but it has some drawbacks. Since I'm not interested in developing a commercial solution around this concept, someone else might simply re-use the code. Even now there's not much to do, but at least one's got to work on it a bit before having something to make money out of. However, I am all in favour of reversers writing a demonstration just for fun and giving it away for free. Yes, it ought to be free. It is not technically complicate and shouldn't be commercialized at all.

In this paragraph I'm going to show how it is possible to do the work which is being done when native images are being loaded by taking the JIT's place. The code contained in native images needs to be fixed: many references have to be solved at runtime like, for instance, external calls. I'm not showing a method to actually native compile .NET assemblies, since taking the place of the JIT is not only complicated, but also unlikely to work in future versions of the .NET framework. In fact, what I'm writing works on the .NET Framework 2 and 3, but it seems that the new Framework 3.5 SP1 changed lots of things and I already noticed that what I'm doing doesn't work on that version installed on Vista x64. This is rather unimportant and I'm not interested in digging to solve the problem, since what I'm doing here is only a hack to give a better understanding of how the JIT works, which will turn out useful in the next paragraphs. It will also prove the point of my final conclusions about .NET native compiling.

The test asssembly used in this paragraph is rebtest.exe: an assembly I already used to test Rebel.NET. The application is very simple, it's just a form with a text box and a button. When the user clicks the button, it checks whether the password inserted in the text box is right or not. If not, it shows the message box: "Wrong password!". Here's the MSIL code of the button click event:

Even in this small method many things are solved at runtime. In this particular case we have a ldfld, a callvirt, a ldstr and a call. One thing that should be noted is that this assembly code is using fastcalls storing the first argument in ecx and the second one in edx.

In order to understand how to solve these references, it is necessary to understand how the JIT works internally. In the first article, I introduced the compileMethod function, but I only focused on its first two arguments: ICorJitInfo and CORINFO_METHOD_INFO. What I have not discussed yet are its last two: nativeEntry and nativeSizeOfCode. Two pointers used to retrieve the native code's address and size. One could, of course, hook the compileMethod to retrieve the native code of a method after having called the original compileMethod function (which isn't very useful) or one could actually use these two arguments to inject his own native code. And that's exactly what I'm going to do. But I'm not injecting any kind of code. No, I'm going to inject native .NET code by solving internal references.

The function is actually much bigger, but I only pasted the interesting part for us. Among the last lines of code I pasted you can see that compileMethod is calling the function jitCompile. This is the main function of the JIT. It's a very huge function since it contains the switch to handle every MSIL opcode. I'm going to past a "small" part of the function here to give you an idea of the magnitude.

Only in the last lines of code we encounter the switch I was talking about. The switch is inside a loop (naturally) which goes on until the last opcode hasn't been jitted. As one can notice, the switch doesn't come directly after the beginning of the jitting loop. That's because before every instruction to handle the JIT performs many checks. For instance, it checks that the maximum stack size hasn't been exceeded or that the current offset isn't the benning of a try block. However, we don't care about all those things, since we don't have to perform validity checks nor implement exception handlers.

Note: the GET macro should be briefly discussed for better understanding. This macro reads a value type from the current MSIL opcode stream pointer and puts it in a variable (first argument), then it increments the stream pointer.

What I'm going to do is to inject the .NET message box displaying "Right password!". Thus, we'll have to analyze how the JIT handles the opcodes ldstr and call. This is a good way to proceed, as the ldstr opcode is very easy and gives the reader the time to adapt to the JIT logic. So, let's look at the ldstr case in the switch:

case CEE_LDSTR:
JitResult = compileCEE_LDSTR();
break;

This is the usual syntax used to handle opcodes: a call to compileCEE_OpcodeName. Let's look at this function:

When looking at this function it is necessary to define what we need in order to get a string reference. We're already familiar with the GET macro and its use. We already have a string token and also a scope. We don't need to do any sort of verification. So, it all comes down to the function constructStringLiteral which is declared in dynamicmethod.cpp:

I pasted the function only to show how the reference to the string is retrieved internally. It wasn't necessary for the demonstration, but I thought it's interesting since it involves GetDynamicResolver and the module handle. I have already introduced CORINFO handles in the past article, showing how they are nothing else than class pointers. In fact, GetDynamicResolver is basically just a cast:

To conclude the analysis of compileCEE_LDSTR, the "emit_" macros are used to generate the platform specific native code, whereas the pushOp function is part of a series of functions to handle the MSIL stack necessary for jitting to native code. I'll discuss later the MSIL stack.

As I said earlier, ldstr was a very easy opcode to handle. The call instruction is a bit more complex, but don't get impressed, it's simple to understand. The size of the code is mainly the result of the many validity checks. compileCEE_CALL calls first getCallInfo which is, as it seems, misused to activate the assembly in which the code is contained. Then findMethod is called to retrieve the handle of the method which is being called. After that, the compileHelperCEE_CALL function is called. This function performs lots of checks: we can skip those and focus on the latter part. Among the last calls a getFunctionEntryPoint function can be spotted and that's exactly what we were looking for. The buildCall, emit_callnonvirt and compileDO_PUSH_CALL_RESULT do only build the native code calling syntax and emit the native opcodes.

The only description of getFunctionEntryPoint can be found in corinfo.h:

// return a callable address of the function (native code). This function// may return a different value (depending on whether the method has// been JITed or not. pAccessType is an in-out parameter. The JIT// specifies what level of indirection it desires, and the EE sets it// to what it can provide (which may not be the same).virtualvoid__stdcall getFunctionEntryPoint(
CORINFO_METHOD_HANDLE ftn, /* IN */
InfoAccessType requestedAccessType, /* IN */
CORINFO_CONST_LOOKUP * pResult, /* OUT */
CORINFO_ACCESS_FLAGS accessFlags =
CORINFO_ACCESS_ANY) = 0;

Basically, this function retrieves the callable native code of the target function. Before calling getFunctionEntryPoint it is necessary to retrieve the target method's handle. This can be achieved with findMethod.

It's now possible to write a little demonstration. As in the past article, I'm using a .NET loader to hook the JIT before loading the victim assembly. The nvcoree.dll hooks compileMethod and injects the native code which shows a .NET message box with the text "Right password!". Here's the code of nvcoree.dll:

The two instructions I handled were rather simple. Other opcodes like ldfld and callvirt are a bit more complicated, since they also make use of the MSIL stack, which I mentioned earlier. ldfld pops out a value from the stack which is the object whose field it is going to reference. Here's a bit of the code which jits ldfld:

The actual data contained in this class fits into a qword. The main value of this class is the type member. In some cases (depending on the type), additional information, such as a handle, is needed. For instance, if the type is typeMethod, a CORINFO_METHOD_HANDLE is also needed. The reason why I pasted this code is that understanding the MSIL stack might turn useful for the next two paragraphs.

This topic has never been discussed yet regarding the .NET context. What I mean by native decompiling is not going from machine code to C# (to name one), but going from machine code to MSIL. The MSIL can then be decompiled into C#. Converting machine code to MSIL is not only easier, but the only logical decompiling method. This procedure is difficult: I'm only discussing the possibility. The most important thing is stack interpretation. Let's take for instance part of the code seen in the Native Injection paragraph:

Since I know that the call at offset 38h calls as MessageBox.Show(String), I also know that the first argument on the stack or in this case, since it's a fastcall, the data in ecx represents a String class. However, this is rather normal, because MessageBox is a public API. Public APIs could be solved in the same way in native C++ applications. The difference can be noted when considering the CheckPassword(String) method called in this code. CheckPassword is a private method, nonetheless I can retrieve its arguments, its return type and, if it hasn't been obfuscated, even its name. Thus, I perfectly know that the data moved in ecx represents an instance, since CheckPassword is a non-static class member, and that the data moved in edx represents a String class. I also know that this call returns a boolean value and can interpret the instructions below accordingly.

I have to do a small comparision with native C++ applications, because many people minimize the fact that MSIL code can be decompiled by saying that even C/C++ code can be decompiled. This is a completely incorrect statement as it compares apples to oranges. Speaking about C/C++ applications, a rough decompiled C code can be obtained sometimes. In some cases, the decompiler is not even able to generate any C code at all. And even if he is able to, in many cases the decompiled code is wrong. And even in those cases where the decompiled C code is actually right (meaning it correctly represents what the machine code is doing), it is not guaranteed to be easier to understand for the reader than the machine code, since the decompiled C code is mostly a mess. And last but not least, the C decompiler has no clue of how to interpret data. For example, when I'm referencing a member in a structure, the resulting decompiled C code will only produce a reference to pointer + N, where N is the offset to the referenced member. This means that "info.bValue = TRUE" generates something like "*((int *) (ptr + N)) = 1;" in C code. The same applies to the method's arguments, return value, calls, etc. Although the decompiled C code may sometimes be recompilable, it is absolutely no threat to intellectual property. At least, no more than analyzing the machine code is.

When talking about protecting .NET applications, the root of the problem is the MetaData. The MetaData is useful for many purposes, but I'm analyzing it from the point of view of a reverser. The MetaData leaves nothing uncovered, making it impossible to hide something.

Although .NET native decompiling hasn't to be thought as an important issue right now, it's interesting to evaluate the possibility, since it would make an attempt such as a Native Framework Deployment service useless. Native images themselves have to hold enough information in order for the execution engine to solve the references within the native code. This information could be exploited by a reverser for decompiling. Even if the information was missing, like in the case when one manually injects native code, it would be still possible (although not easy) to communicate with the JIT to solve the references.

The machine code could, in theory, also be obfuscated in order to further complicate decompiling, but it would be still possible to solve the references in the code, making it much easier to understand it than its C/C++ equivalent.

Virtual machines have been a big hit in the area of native code. It was only a matter of time, before someone tried to bring the concept to .NET code. I don't know how many protections rely on this technology, but I can say that Microsoft itself invested in it with its SLP (Software Licensing & Protection) services. I can't analyze the code of their product as it would in some way violate their licensing terms, but I can discuss it.

SLP provides a per method protection. This means the user can choose which methods to protect. A protected method when disassembled looks like this:

The method does nothing else than invoking the virtual machine by passing the class instance, the method's arguments and a string that represents the method being called.

The protection's runtime is made of three .NET assemblies. The runtime creates its own virtual machine on top of the .NET Framework. .NET virtual machines use the reflection to solve external references. If I reference a private variable inside, let's say, the current class, the virtual machine will do the following:

As one can see, MetaData turns out to be quite useful when combined with reflection. However, I leave the reader imagine how slow a .NET virtual machine built on top of the reflection technology will result in execution time. That's why even the SLP guide warns its users:

In the earlier analogy about baking a cake from a recipe, it was assumed that you had to protect the entire recipe. Of course, there is a lot of similarity between cake recipes, and it is unnecessary to protect the entire recipe, just those parts of it that make it unique. This would do little to reduce the security of the recipe, but makes it much faster to read–only those secret ingredients need to be decrypted.

Similarly, because the SVM needs to interpret the SVML code, and runs on top of the CLR, there is a performance element to the equation that needs to be addressed. You do not want to protect the entire code base, because it would slow the whole application down and add little to overall security. Instead, you want to protect only what is necessary: the secret ingredient.

In this text, they make it sound like it is something good that only few methods are being protected, though this isn't realistic. Given that the .NET virtual machine approach is quite good and that it is much more professional than Native Framework Deployment services, it has some signifcant flaws. This approach might be the best one regarding the licensing of a .NET application, but it really can't help much to protect intellectual property. If one's entire application relies on a bunch of non execution-time critical methods, then what it is hiding really isn't a great secret anyway. There are also some restrictions regarding the virtualization of methods:

Methods with the following constructs cannot be transformed in Code Protector.

Methods within generic classes.

Methods containing explicit instantiations of generic types.

Methods with generic parameters.

Non-static methods of a structure.

Methods with “out” or “ref” parameters.

Methods that invoke other methods with “out” or “ref” parameters.

Methods that modify any method parameter, even if the parameter is defined as a “by value”.

Methods with a variable number of parameters (e.g., using the “params” keyword in C#).

Methods with too many local variables or parameters (> 254).

Methods that contain calls to Reflection.Assembly.GetExecutingAssembly(), Reflection.MethodInfo.GetCurrentMethod(), or Reflection.Assembly.GetCallingAssembly().

CLR 1.1 Framework only: Methods that create objects using constructors that have a variable number of parameters. This restriction does not exist when a non-constructor method is invoked.

This list is also interesting for those who might consider writing a .NET virtual machine themselves. I have given the reader my opinion about this protection technique, but let's examine how one could overcome it.

If one is really interested in what a protected method does, it is necessary to analyze the virtual machine's code. The first approach which comes to my mind is using the .NET profiling API to inject logging code in order to retrieve the methods called inside the virtual machine. This would provide an execution flow log which can be used to analyze the virtual machine's code executed for a particular method.

The second technique to overcome this kind of protection is based on substitution. If one isn't interested in what the code does, since he knows it or knows what the code should do, then he can replace the code with his own. This can be easily accomplished through Sebastien Lebreton's Reflexil. This approach addresses cracking, not reversing. But since SLP is also a licensing system, this must be taken into account. Let's say that the method F sets up the initializations settings for an application. This method is protected through SLP, which won't execute it unless one has a valid license for the program. One could reimplement the F method and completely detach the SLP runtime from the protected assembly. This might be difficult in some cases, but that's what reversing is all about. However, SLP is terribly slow and protecting many methods reflects in an unacceptable performance loss. The performance problem could be signifcantly improved by automatically generating native images during the setup process.

Sometimes, the virtual machine protection is combined with code obfuscation to provide security for all the methods which have not being virtualized. In this case, if one is interested in decompiling the MSIL code, the first step is removing the code obfuscation. This can only be done by analyzing the obfuscation algorithm and understanding how to reverse it. The rebuilding of the de-obfuscated assembly can be easily achieved through Rebel.NET.

As I've never read a book nor an article about the CLR infrastructure, what has been presented in this article are the .NET internals from the perspective of a reverser. This was the second part of the two series of articles about .NET internals and protections. I hope I have given the reader an idea of the problems surrounding .NET protection systems. As the .NET technology is still very young, it might change significantly. I don't know if intellectual property will be taken into account in next versions of the framework. I also hope that these problems will be taken into account when new frameworks are going to be developed in the future. As the .NET framework has been a new playground for reversing, I can only guess that many problems were not too obvious at beginning of its development (although the Java experience should've been a lesson). A possible evolution of the .NET framework could rely on offering native compiling as alternative to MSIL and drastically reducing the MetaData information by preserving it only for public types / members.

Maybe, I'm totally wrong and we will soon see most major applications being deployed as MSIL assemblies. I strongly doubt it.

Share

About the Author

The languages I know best are: C, C++, C#, Assembly (x86, x64, ARM), MSIL, Python, Lua. The environments I frequently use are: Qt, Win32, MFC, .NET, WDK. I'm a developer and a reverse engineer and I like playing around with internals.

Comments and Discussions

Oh I wouldn't worry about it at all. It is clear from your writing you have a scientific and engineering method behind your arguments.

And it is the same people that vote all the .NET criticism type of artciles or postings (whether well substantiated, fishing or not) with 1s as well.

The saddest thing is that they do not even realise that the purpose of pointing out the CLR/Java flaws is to make their jobs better and safer, to make the computing a better science, not something W*F easy or bloated but hey.. only lack of knowledge produces that level of ignorance and 'evangelistic' adoption and there's only one type of crowd to blame for that.

Mostly, when you see programmers, they aren't doing anything. One of the attractive things about programmers is that you cannot tell whether or not they are working simply by looking at them. Very often they're sitting there seemingly drinking coffee and gossiping, or just staring into space. What the programmer is trying to do is get a handle on all the individual and unrelated ideas that are scampering around in his head. (Charles M Strauss)

I do not know about others, but if you can pull it off on paper as a business and technical proposal, and for 3.5 I am pretty confident you would get the funding, people with internals knowledge, and support from number of industries to push this beyond Salamander + dotfuscator guys and more.

Of course it would have quirks initially but the basic premise seems viable, and optimisation doesn't stop with taking out metadata. It is also very useful info for the future generations.

FWIW, the dollar amount wasted/given away on/by .NET apps is huge, and licence at a fraction of each product price would more than compensate initial investment. I am not looking at this from yet another obfuscator product perspective but extending it to a mass market product.

If the above sounds too strong, than again I do not see why hesitate or what you have to lose in attempting the investment route and to see the interest it would generate.

And looking at MS's SLP, what an utter waste of money and resources..

If MS wanted .NET to be open, well it wouldn't constantly produce lock-in and bloated type of junk. Technical smchadery...

But, it isn't all that bad, there is a very simple route out of it all (a bit of extra work, so what). An option that anyone with any brains will realise eventually, even the C# blind: write your .NET app, then use tools and rewrite it in native, properly optimisable, reflection and 'object' free, language OUT of their control and deploy as much as you can on nothing but: free LINUX and derivatives.

No wonder more viable alternative are appearing all over the place by the day, and that is one trend that will never stop, dumping that dependency on MS and Windows 7, 8, 9 and 10 (that will give nothing in return and dissapoint with yet another proprietary, reversing friendly, bloat-tech).

Well, it pretty much depends on MS. The only *clean* way to bring .NET out of the framework concept is writing native compilers for it. This is surely a hard task and I don't know if it would partially violate MS rights on the .NET framework (which include deployment). But even if this was all legal, it is very difficult to achieve since the .NET framework is changing every day and by the time one's got a working compiler for .NET 2 assemblies, WPF, silverlight etc. will have been introduced and your compiler might be outdated already. So, I think .NET can't really be safe for intellectual property if Microsoft doesn't change its mind.

I'm glad you think I would get the financing, but I'm not an "all .NET guy". I have other interests and don't want to spend years writing a protection for a framework which isn't meant to be protected.

If someone's looking for multiplatform and ease of use AND protection. Then, I'd advise using trolltech's Qt. The best framework ever, in my opinion. It's native C++, so protection comes easy and the licensing isn't more expensive than a license of these so-called protections for .NET.

In my opinion, right now Microsoft is taking A LOT of bad decisions, generally speaking. .NET started off pretty cool, but I don't like how it is evolving at all. But that's just my opinion. I'm pretty sure there are lots of other people who just love the way it is proceeding.

Yes, I am a C++ enthusiast. There are some new things I'd like to introduce in the C++ standard, but all in all I believe it's the best language out there.

I discovered Qt (4) not long ago. It was an incredible relief coming from 6 or more years of MFC. I also believe that Qt4 beats Winforms in GUI programming (I'm not talking about WPF because I don't think it's a practical way for standard GUI programming). .NET has some good things when considering GUI programming, but the refreshing Qt approach is something new for me.