An Architectural Tour of Rotor

The Microsoft Shared Source CLI Implementation (aka "Rotor") is a source
code distribution that includes fully functional implementations of both
the ECMA-334 C# language standard and the ECMA-335 Common Language
Infrastructure standard. These standards together represent a substantial
subset of what is available in the Microsoft .NET Framework.
The source code will build and run under Windows XP or FreeBSD 4.5,
and the distribution contains numerous additional goodies, including a
JScript compiler written entirely in C#, an IL assembler, a disassembler,
a debugger, tools for examining metadata, and other samples and utilities. To complement this article, we've also published "Get Your Rotor Running", which takes you through the steps of installing, building and running Rotor.

Downloading and expanding the Rotor tarball reveals a bewildering
collection of scripts, license files, specifications, and
subdirectories jammed full of mysterious source code.
The READFIRST file may also seem daunting, as it refers to the distribution
as "experimental" and "beta-quality code with known defects..."
The prospective Rotor enthusiast may have several reservations at
this point: "Hmmm, is it this ready for primetime? What are the
most interesting directories to browse?" This article will strive to
answer these questions, dispel any such reservations, and help
with the most interesting question of all: "How do you make Rotor build and run?"

A View from the Top (the sscli directory)

Before doing anything else, it is a good idea to take a look at the license
for Rotor, which you will find in the root of the distribution in a file named
license.txt. Rotor is liberally licensed for non-commercial
use -- you are free to make modifications and to share these with other folks,
for example -- but it is licensed. Before browsing the code, as with
any source-code distribution, you'll want to ensure that you're comfortable
with the terms that come attached to that code!

These four areas are spread, scattershot, across the source tree. As with
any project of this scope, both history and build dependencies have conspired
to make navigation less than perfect. Fortunately, there is documentation
to help. Whether you want to learn, to tinker, or to experiment
with Rotor's infrastructure, you are likely to find that the files in the
docs directory make a valuable first stop (along with the
file named readfirst.html found in the root of the distribution).

Execution Engine (the clr/src directory and environs)

Conceptually, the execution engine
is the heart and soul of the CLI runtime, and as such, it contains a large
quantity of fascinating code. Compilers and tools that target this engine,
including the C# and JScript compilers that come as a part of Rotor, create
and manipulate executable files that contain metadata tables,
resource blobs, and code in the form of abstract CIL opcodes. (CIL, the
Common Intermediate Language, is an intermediate representation of program
instructions that can be shared by tools targeting the CLI.) Executables of
this form are commonly referred to as "managed executables," and the code
contained in them, when running under the control of the execution engine,
is called "managed code."

Dave Stutz is coauthoring O'Reilly's upcoming book on Rotor, Shared Source CLI Essentials. This book will provide a roadmap for anyone trying to navigate or manipulate the Shared Source CLI code, and will include a CD-ROM that contains all the source code and files.

The loading of a managed program is a miracle of self-assembly, during which
those inert blobs of metadata, resources, and CIL are transformed into instructions
executing directly on the microprocessor. The vm sub-directory contains
the main core of the life-support system for managed components that accomplishes
this transformation, including the CLI's sophisticated automatic heap and stack
management, its object-capable type system, and its mechanisms for dynamically
loading code, safely. The fusion and md (metadata)
subdirectories are also important; they comprise important parts of the
data-driven process, and have code for resolving references to external types,
and metadata manipulation and validation code, respectively.

One instructive way to find your way through the execution engine is to take a
very simple managed program and trace its execution in your debugger of choice.
By doing this, you can see the initial load sequence used, and familiarize
yourself with the C++ classes used to implement the execution engine itself.
AppDomains, the ClassLoader, EEClasses, MethodTables, and finally, the various
classes that represent managed objects directly, are all worth browsing.

Frameworks (fx, managedlibraries, bcl, and classlibnative)

In addition to the exotic machinery of managed code, you'll also find more familiar
programming support infrastructure in the Rotor CLI, wrapped up as a set of class
frameworks. The specification for these frameworks is part of ECMA-335, and includes
a "base class library" (commonly referred to as "the BCL"), runtime infrastructure and
reflection classes, networking and XML classes, and floating point and extended array
libraries. All of these are in Rotor, in source code form. There are also a few
additional libraries included in this distribution, most notably support for regular
expressions and an extensive framework for type serialization, object remoting, and
automatic type marshaling.

What do you think of Rotor and Microsoft's shared source initiative? Is shared source open enough for you? Are you excited to start working with Rotor?

Unlike some virtualized execution platforms, the CLI was never designed to obscure
the details of whatever system runs beneath it. Much as the implementers of the C
programming language took a minimalist approach to exposing portable runtime services,
the CLI likewise tries to provide only what is absolutely necessary. Of course, in
2002, the list of services that programmers take for granted is quite a bit larger
than it was in the early 1970s: verified typesafety, support for Web services, and
support for interop between managed and unmanaged code all fall on the list of today's
"minimal subset."

Compilers and Tools (jscript and clr/src/csharp)

One of the unique features of the CLI is the depth to which
components written in many different languages can share their representation and
runtime behavior. This seamless interoperability is one of the most
compelling reasons to use the services of the CLI; component builders can exploit
the unique characteristics of the platform on which their components
are running, while still enjoying the benefits of shared infrastructure. Furthermore,
tools written against the CLI will automatically complement pre-existing tools,
languages, and runtimes because of the CLI's built-in capability for interoperation.
To see this aspect of the CLI in action, examine the implementations of language
compilers and tools found in the Rotor distribution.

The JScript compiler, found in the jscript directory, is completely
written in C#. The language itself is quite interesting from the perspective of
a compiler writer, since it supports dynamic reshaping of classes, as well as the
runtime evaluation of arbitrary fragments of code. For those who would like to
implement dynamic languages such as Python or Scheme on top of the CLI runtime, or
understand how this could be done, this code will prove instructive.
In particular, note the heavy use of runtime reflection and the dynamic emission of metadata.

In order to build the sources for the JScript compiler, there must be a C#
compiler in the Rotor distribution. Not surprisingly, there is, and it can be
found in the clr/src/csharp directory. C# is a new language that has
been developed in parallel with the CLI to highlight the features of its environment.
Not only does C# reflect these capabilities, but the language was also standardized
by the same ECMA technical group that worked on the CLI. The Rotor C# implementation
should be a useful guide to anyone building their own C# compiler and/or frameworks.

Besides the C# compiler, the sources to a managed code debugger, an assembler and
disassembler (ILASM and ILDASM), an assembly linker, and a stand-alone verification
tool reside in subdirectories of clr/src. These tools will be indispensible
as you look through Rotor and work with the code; they will serve as both implementation
examples and everyday programming tools.

The final compiler to point out during this leg of the tour is the combination JIT
compiler and verifier that lives in the clr/src/fjit directory. Large
parts of Rotor are written in C#, and because the Rotor C# compiler outputs CIL opcodes
rather than native code, this means that large parts of Rotor are compiled by the JIT
compiler when the types are loaded, at the last possible moment. The "just-in-time"
approach to loading allows for an executable format that is portable from platform to
platform -- those of you with both FreeBSD and Windows can compile a C# program on one OS
and then run it on the other. Of course, in this case, both systems are built for x86
microprocessors, and so the same low-level instruction set can be used without changes.
What is much more interesting is that the Rotor JIT has been designed to be portable to
other microprocessors. Although its design is simple, it is very easy to build new versions
of the compiler that target other chips. Which is a fine way to move on to the topic of
portability ...

The Platform Adaptation Layer and Build (pal, palrt, tests and tools)

Our final stop on this whirlwind tour is Rotor's portability layer, tests, and build
tools, which together enable moving this distribution to alternate platforms.
It is relatively simple to target new microprocessor architectures in Rotor's JIT,
but remember, there is more to a platform than just the instruction set on which it relies.
The native operating system APIs and the toolchain through which they are programmed
also present large porting issues! In order to facilitate moving Rotor's codebase,
which was originally written for Windows, our development team adopted a very common
porting strategy: the use of a "Platform Adapatation Layer" (PAL).

The code for the FreeBSD PAL, which is found in the pal/unix directory, is
well worth a peek. This code was written to conform with a subset of the Win32 API,
outlined in docs/techinfo/pal_guide.html and implemented in
pal/rotor_pal.h. By mimicking Win32's semantics for structured
exception handling, threading, synchronization primitives, file and network I/O,
debugger support, and other similar system-level services, porting the Rotor codebase
became more of an exercise in finding PAL bugs than in re-implementing existing code.
Furthermore, anyone who wishes to move Rotor to new platforms should find repeating the
same exercise straightforward. There is a small amount of platform-specific assembler
code in the execution engine, but besides this, the PAL and JIT make up the bulk of
porting work.

The bootstrap sequence for the Rotor build process is interesting, and shows that the
PAL is important for more than just the CLI runtime itself. The first thing to build
on any platform is the PAL itself; obviously, this must be done using native libraries
and tools. After the PAL has been successfully built, Rotor's own build tools are then
compiled against the PAL, after which the CLI C# compiler can be built. By this point,
we have a working C# compiler, and so the large number of C# files that are a part of this
distribution can be compiled. And since C# uses the managed execution environment,
the last step of the build process actually occurs when you run any of the programs that
contain managed code -- the JIT compiler is invoked on your behalf!

The build tools used in this bootstrap sequence can be found in the
tools directory, and are documented in the
docs/buildtools directory. Once the build has been successfully
executed and you are making modifications, you'll want to pay a visit to the
tests directory, in order to take advantage of its PAL suite, as well as
the general Rotor quality suites, which currently contain base IL tests, base verification
tests, and some JIT verification tests.

Have fun!

That's it for the tour. There is no better way to learn about the
CLI and C# standards than by browsing and building the Rotor sources. For those
ready to take the next step -- modifying or extending the code -- the depth of the Rotor
codebase will not disappoint. The entire Rotor team sincerely hopes that this project
will provide a great foundation for whatever programming itches you want to scratch!

David Stutz
is a tenured member of the Microsoft Research team, and is currently working on the team that is implementing the
Microsoft Shared Source CLI.