A Possible Enhancement of the MSIL

Microsoft Intermediate Language (MSIL) may be improved by adding a few new instructions

Abstract

Microsoft’s .NET Framework is a new platform designed to simplify the
development of distributed applications. At the bottom of the .NET lies the
CLR, the so called common language runtime. CLR provides an environment
that controls the .NET code, such as memory management, thread execution, code
checking, security services and garbage collection. CLR also implements a strict
infrastructure called CTS (common type subsystem) for checking the code
and types, i.e. types created in one language can be safely used and referred
to in another language. The managed execution environment solves certain problems
related to the traditional native code execution. The CLR automatically handles
the arrangement of object and their references, freeing them when they are no
longer necessary, thus avoiding invalid references and memory leaks. The CLR
does not rely on the native x86 instruction set. Instead, Microsoft has created
an intermediate language (IL) that is generated by all compilers with support
for the .NET platform. Microsoft Intermediate Language (MSIL) is a processor
independent instruction set and includes instructions for execution control,
direct memory access, exception handling, logic and arithmetic operations, as
well as instructions for loading, storing and initialization of object. The
MSIL code cannot be directly executed; it must be converted to a processor-specific
code by a JIT (just-in-time) compiler. Any supported architecture must
provide its specific JIT compiler.

When I was writing the code generator for the Delta Forth .NET compiler (http://www.dataman.ro/dforth)
I discovered that the IL could be improved to a certain extent. This paper is
about a proposed enhancement for the Microsoft Intermediate Language.

Microsoft Intermediate Language

MSIL is different from what we know about the traditional assembly languages.
Although endowed with low level instructions, the language also works with
high level concepts such as exception handling and memory management. The
IL also has a virtual parameter stack that can hold objects of different types,
such as strings, integers, floats, pointers, etc. The stack discipline is
stricter, meaning that at the time of execution of a command, the stack must
hold the exact number of parameters needed. At the opposite side, the x86
assembly language allows a random number of parameters on the stack and even
provides and instruction for cleaning the stack upon leaving a procedure (see
ret n).

For more information about the IL instruction set, have a look at [2].

Although sufficient for achieving their goal, the IL instruction set seems
to lead to the generation of inefficient code. In the development process
of a Forth compiler for the .NET platform I discovered that many times the
generated code is longer than it should be fact due to the lack of some simple
instructions. In order for the reader to understand the code sequences that
follow, it is appropriate to get to know some of the stack-handling primitives
of the Forth programming language:

Forth Primitive

Stack Transition

Explanation

DUP

( A – A A )

Duplicates the element on top
of stack

OVER

( A B – A B A )

Duplicates the second element
on top of stack

SWAP

( A B – B A )

Swaps the two top-most elements
on the stack

ROT

( A B C – B C A )

Rotates the three top-most elements
on the stack

1+

(A – A+1)

Increments by 1 the element
on top of stack

2+

(A – A+2)

Increments by 2 the element
on top of stack

The stack transition column shows the stack status before and after the execution
of the primitive. The two statuses are delimited by the dash sign.

Solving mathematical expressions in Forth is quite complicated for the programmer
used to traditional imperative languages, however there’s nothing special
except for the use of the Reverse Polish Notation (RPN). For further reading
I recommend [3].

Let’s take for example the expression:

(A + B) * (A – B):

The postfix equivalent is:

A B + A B - *

The sequence of Forth primitives that solves the expression (assuming that
the values A and B are already on the stack in this order) is:

Understanding the code above will help you understand the IL code snippets
that follow. Let’s move on and look…

Under the Hood

It is interesting to see the IL code generated at compilation of a program
written in a language with support for .NET. My language of choice is C# which
seems very promising. To repeat the experiments below you need Visual Studio
.NET and .NET Framework IL Disassembler.

The class we will use for testing is very simple and is easily understood
even by beginners:

The first line and the second lines push on the stack the value of the static
variable x and I respectively. The value of i gets incremented (by pushing
1 and performing the addition). The increment operation (this time with 2)
is repeated in lines IL_000c and IL_0012. Eventually, the result is stored
at its destination in line IL_0013.

At a first glance, we see that the code sequence computes the two indexes i+1
and i +2 respectively. If we recall the 1+ Forth primitive from the previous
paragraph, we see that using it would lead to a greater efficiency especially
since most modern processors have INC and DEC instructions. Translation from
the IL to native code would be immediate. The modified sequence could look as
follows:

We see that the compiler has automatically generated two anonymous local
variables to keep the values for x (line IL_0006) and i+1 (line IL_000f).
This sequence can be substantially optimized using the OVER Forth primitive,
which duplicates the second top-most element on the stack. The rewritten IL
code looks like this:

Note that the IL sequence follows the C# program closely. It would be interesting
if the high-level language natively provided the programmer a SWAP instruction,
so that previous code snippet could be rewritten as follows:

The gain is here 9 bytes, without taking into account the space saved by
not declaring an intermediate temporary variable (t). The code can be further
refined by replacing the SWAP operation completely:

Conclusion

Even though at a first glance the gain is space is small, in reality things
are slightly different. Such code sequences are relatively frequent in every-day
applications. If we consider that a medium-sized application may contain approximately
200 of such optimizable constructs and that the medium gain is around 8 bytes
per construct, we save 1.6KB of code space which is not negligible for embedded
systems.

The space gain is not the only argument in the favor of using the new IL instructions.
The native code generated by the JIT may benefit from using these instructions
by a more efficient conversion, since some of the dedicated processor instructions
can be used.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

I am a software engineer based in Timișoara, Romania and currently hold the position of software architect for one of the largest companies in the world.

I invented a dialect of the Forth programming language and implemented the first Forth compiler for the .NET platform. I reverse-engineered the communication protocol of some GPS trackers and wrote from scratch a vehicle tracking system that is currently used to track my two cars. I hold a PhD in computer science and I am the author of several papers and a book chapter. In the 90s I wrote several computer viruses in assembly language for my own research and I was the first to devise a technique to deter heuristic virus scanners. In short, a humble man.

Optimization is an interesting issue, and I don't think that MS has put enough effort into it with regards to IL. On the one hand, one has to produce fairly generic syntax, on the other hand, it seems that there is no "context" optimization going on. For example, one of the optimization switches for C++ projects in VS6 was to detect previously loaded index registers and re-use them.

I think one of the problems to optimization in the case of IL is that it's supposed to be processor independant. Unfortunately, an optimization that would work with one processor's instruction set might not work with another. That said, there's no excuse for MS not to have an optimization layer somwhere, either in front of the IL generator or between the IL and the final assembly language output.

Having looked at IL briefly, I wish that it had provided a mirror of the CPU's instruction set (yes, this makes it processor specific). I really liked the _asm capability in C++, allowing me to create some really optimized code. But then again, this defeats the "managed" concept, and I can always link to a piece of unmanaged code. Oh well, the world seems to be full of unhappy compromises with IL. It's a great idea, to be able to unify languages into the CLR, but at a cost.

Also, correct me if I'm wrong, put Forth, being so stack based, shouldn't an implementation really maintain two stacks--one for the "data" and for all the other stuff--return addresses, register saves, etc?

Marc Clifton wrote:it seems that there is no "context" optimization going on. For example, one of the optimization switches for C++ projects in VS6 was to detect previously loaded index registers and re-use them.

That's not true. .NET optimizes ASM when generating it from IL. And it can do a better job than the VC6.0 optimization phase because:
1. It can see the whole application, not only a single .OBJ (I know, vc7...)
2. It can do environment-specific optimizations, like using specific processor opcodes, since it's done on the user's machine, not on the developer's one. It can even change calling conventions based on the registers it has available.

Marc Clifton wrote:there's no excuse for MS not to have an optimization layer somwhere
And that's why they put it on the JIT compiler. Why do you think they call it "Intermediate" language????

Marc Clifton wrote:I really liked the _asm capability in C++, allowing me to create some really optimized code. But then again, this defeats the "managed" concept, and I can always link to a piece of unmanaged code. Oh well, the world seems to be full of unhappy compromises with IL. It's a great idea, to be able to unify languages into the CLR, but at a cost.

Marc Clifton wrote:
>> there's no excuse for MS not to have an optimization layer ?
>> somwhere
> And that's why they put it on the JIT compiler. Why do you
> think they call it "Intermediate" language????

Although I agree completely with your comments, this one is wrong - high level optimisations and trivial low level ones (jump elimination, unnecessary instructions elimination, etc.) should be done at compile time - trivial lowlevel ones since they save time at runtime, while being benign enough and high level ones since they are simply not possible in JIT.

Currently no .NET language features any optimisations, even if there is a switch for it. However, if I remember correctly, that will change with the next MC++ version (not CS, for example).

And why? This would make no difference on the generated native code, only could affect IL code size a bit. Even if you do such optimizations at compile time, on "JIT-time", you still would need to create a control flow graph again, and apply again such optimizations...

Marc Clifton wrote:I think one of the problems to optimization in the case of IL is that it's supposed to be processor independant. Unfortunately, an optimization that would work with one processor's instruction set might not work with another.
In my article I only discussed the advantages in terms of space. With the optimizations I described it is possible to reduce the size of the PE executable by a certain extent. That has nothing to do with processor independence. It's all related to the JIT itself.Marc Clifton wrote:Also, correct me if I'm wrong, put Forth, being so stack based, shouldn't an implementation really maintain two stacks--one for the "data" and for all the other stuff--return addresses, register saves, etc?
That's correct. Forth has a value stack and a return stack. They're independent in every aspect except for a couple of primitives which move values from one to another.

Can't this be done by the JIT compiler? When generating code, couldn't the JIT optimize this code and eliminate some of these sequences? There's no necessary 1x1 relationship between IL instructions and native ASM instructions. The IL is meant to be a platform-independent ASM, so what assure that you'll have, e.g., an SWAP on stack ASM instruction on the target platform? Isn't better let the JIT optimize phase decide this? A simple flow optimizing compiler could take care of this. Besides this, are you assuming ldsfld and ldc will translate to push instructions? No, the JIT compiler will allocate registers for this.
And these constructions seems more suitable to Forth than to other languages, like C#, VB.NET, J#, COBOL, since constructions like this are highly frequent on Forth, because, correct me if I'm wrong, Forth is an stack-oriented language (oh, my good old forgoten ZX81 days), but not on newer structured or OOP languages this kind of construction is much more rare. Isn't this extending the .NET IL to suit an specific language? If MS does this with every language, soon the IL will become bloated and the generated code will become larger.
Digging into .NET generated code, you could always find one more IL instruction to create, but there must be an optimal balance between the number of instructions and the size of generated code.

Digging into .NET generated code, you could always find one more IL instruction to create, but there must be an optimal balance between the number of instructions and the size of generated code.

This reminds me of the argument for more instructions in the olden days of microprocessors: more instructions balanced with the number of transistors required. Before the days of micro-code, I guess. Remember when we could have the microprocessor execute "illegal" instructions and sometimes they actually did useful things?

Daniel Turini wrote:Can't this be done by the JIT compiler? When generating code, couldn't the JIT optimize this code and eliminate some of these sequences? There's no necessary 1x1 relationship between IL instructions and native ASM instructions. The IL is meant to be a platform-independent ASM, so what assure that you'll have, e.g., an SWAP on stack ASM instruction on the target platform?
As I mentioned in a previous post, the article deals with shrinking the size of the PE executable, i.e. the size of the IL code itself. Indeed, we need not have a 1:1 relationship between IL and native ASM, however having a few more (and simple) IL instructions leads to a shorter IL code.

Daniel Turini wrote:And these constructions seems more suitable to Forth than to other languages, like C#, VB.NET, J#, COBOL, since constructions like this are highly frequent on Forth, because, correct me if I'm wrong, Forth is an stack-oriented language (oh, my good old forgoten ZX81 days), but not on newer structured or OOP languages this kind of construction is much more rare.
We're talking about constructions like:x[i + 1] = i + 2;x[i + 1] += 1;t = i; i = j; j = t;
... and similar. I don't know about you but I've seen this kind of sequences in many programs written in languages such as C and C++.

Daniel Turini wrote:About the TUCOWS issue: be careful, some people here hate console applications, too:
Then they surely hate the csc.exe utility which by the way is the very C# compiler ;P This is no excuse for the "Two Mad Cows" team, they could have simply said they will not accept a console app.

After I went to your site, because I'm interested in compilers and .NET IL generation, and your compiler seems a nice piece of work,I had read this http://www.dataman.ro/dforth/tucows_story/story.htm[^] I was shocked by the Tucows staff , what morons !!!!!
They don't respect the hard work of the programmers.
DOS application , hein , geeeeeeez .... ouch

Cheers,Joao VazAnd if your dream is to care for your family, to put food on the table, to provide them with an education and a good home, then maybe suffering through an endless, pointless, boring job will seem to have purpose. And you will realize how even a rock can change the world, simply by remaining obstinately stationary.-Shog9

James T. Johnson wrote:It would probably nullify any attempts of getting your compiler added to Tucows.

After this, I doubt that he want it ...

James T. Johnson wrote:Edit]Changed my wording a little, can't have people thinking I'm a lawyer [/Edit]

You lyer, you are a .NET lawyer,shame on you

Cheers,Joao VazAnd if your dream is to care for your family, to put food on the table, to provide them with an education and a good home, then maybe suffering through an endless, pointless, boring job will seem to have purpose. And you will realize how even a rock can change the world, simply by remaining obstinately stationary.-Shog9

Thanks for being with me. I've received some similar letters before:Sam Henry, Micorosft Corp.:Just wanted to let you know that your Forth story about Tucows cracked me up. I’m the Visual Studio .NET Technical Product Manager.Brad Merrill, Microsoft Corp.: Sorry I didn’t hear about your problem with Tucows earlier. We are going to contact them to see if we can understand what the issue is…
Tucows is not the only shareware directory out there, so I list my sofware some place else.

Perhaps, Brad Merril, could help you out, afterall it's from Microsoft.

Valer BOCAN wrote:so I list my sofware some place else.

There are certainly a lot of other places to put your good work

Cheers,Joao VazAnd if your dream is to care for your family, to put food on the table, to provide them with an education and a good home, then maybe suffering through an endless, pointless, boring job will seem to have purpose. And you will realize how even a rock can change the world, simply by remaining obstinately stationary.-Shog9

While the reviewers might have to review heaps of software, you would expect more. But then again most non-programmer/database people I know at work still refer to cmd.exe as the 'DOS Box'. They look at me blankly when I mention the 'console'. Of course all the *nix guys and gals, know exactly what I'm on about.

Nice explainations of MSIL . Perhaps Microsoft would optimize the IL instruction set in the .NET 2.

BTW do you known if the next generics extensions [^]in Rotor have more optimized IL instructions like yours ?

Cheers,Joao VazAnd if your dream is to care for your family, to put food on the table, to provide them with an education and a good home, then maybe suffering through an endless, pointless, boring job will seem to have purpose. And you will realize how even a rock can change the world, simply by remaining obstinately stationary.-Shog9