Introduction

This article is about loading type metadata from .NET
assemblies using Reflection or Mono Cecil, with the specific goal of supporting
the resolution of symbolic references in a codeDOM. Sources are included. This is Part 6 of a series on codeDOMs, but
it may be useful for anyone who wishes to retrieve type metadata from .NET assemblies
for any reason. In the previous parts,
I’ve discussed “CodeDOMs”, provided a C# codeDOM, a WPF UI and IDE, a C#
parser, and solution/project codeDOM classes.

Background

The codeDOM discussed in this series handles symbolic
references to named objects with various classes derived from the SymbolicRef
base class. These references might be
resolved in a manually generated object tree, but when parsed from a C# source
file, most of them will be unresolved (represented by the UnresolvedRef
object). Only references to built-in
types will be resolved, since they are parsed from keywords. Symbolic references in a codeDOM which refer
to declarations inside the codeDOM (local variables, members of the same type,
types declared within the codeDOM, etc) can be resolved by searching for the
declaration within the codeDOM. However,
symbolic references to external declarations in other projects or assemblies
require a Project
object to be resolved, as it contains a collection of references to such
external sources of types (these references derive from the Reference
class, not SymbolicRef,
and they aren’t related to symbolic references). This article will cover only the loading of
assemblies and the types in them – the actual resolving of symbolic references
using this metadata will be covered in the next article.

Loading Types From References

The first step in loading type data referenced by a Project
is to validate and locate (or “resolve”) the project and assembly
references. For project references,
public types are simply imported from the referenced project into the
namespaces of the local project (if an “InternalsVisibleTo”
attribute specifies the local project, non-public types must also be imported). For assembly references, the referenced assemblies
must be located and loaded before the type metadata is retrieved, and this can
be a rather complicated process. A
referenced assembly can have a “hint path”, but this is not a guaranteed
location – a search has to be done to locate the assembly. Some assemblies might be located in the GAC,
and .NET framework (BCL) assemblies have their own locations which must be
determined with the help of the targeted framework version specified for the
project.

As assemblies are located and loaded, they must be kept track
of to prevent duplicate loads and also to (if possible) unload them at some
point. This functionality has been
placed in a class named ApplicationContext, which is used to load and
track all assemblies loaded by the current application (into the primary AppDomain). A class named
FrameworkContext is used to track
all assemblies for each framework version or profile (such as .NET 2.0,
Silverlight, Client Profiles, etc). Some
frameworks are partial, meaning they build on a previous release (such as .NET 3.0
and 3.5), and so their FrameworkContext instances are chained
together. Each project has a targeted
framework version, and assemblies are loaded with Project.LoadAssembly(),
which calls FrameworkContext.LoadAssembly()
for the targeted framework, which in turn loads them from the global ApplicationContext
instance. Loaded assemblies are
represented by the class LoadedAssembly, which is actually a base
class that will be subclassed for different loading methods and also has a
subclass ErrorLoadedAssembly
to represent assemblies that failed to load for some reason. When the context objects are looking for
assemblies in the GAC, they make use of the GACUtil helper class, which
also includes a method for comparing assembly versions.

When a
Project is going to be resolved,
ResolveReferences()
is called on it first, which calls Resolve() on each
Reference. For an
AssemblyReference, this
attempts to find the referenced “.dll” and also verifies that it’s valid for
the targeted framework. For a ProjectReference,
the referenced project is located in the solution by filename. If the project isn’t found in the codeDOM or
the project type isn’t supported, then the output filename is determined and
the project reference is treated as an assembly reference in an attempt to still
resolve the types. Once the references
are resolved, LoadReferencedAssemblies()
is called on the project to load all of the referenced assemblies into memory
(by calling Load()
on each Reference),
and finally the type metadata is loaded from each assembly for public types (by
calling LoadTypes()
on each LoadedAssembly).

When an entire
Solution is going to be resolved,
ResolveReferences()
is called on each project, and then UpdateProjectDependencies() is called on
the solution, which determines the order in which the projects should be
resolved based upon their references to each other. Then, LoadReferencedAssemblies()
is called on each project in dependency
order to load the referenced assemblies, and finally the type metadata is
loaded from each assembly.

Loading Type Metadata With Reflection

The most obvious method of retrieving type metadata from
assemblies in .NET is using Reflection. This is done by calling Assembly.Load() to load an assembly, which
returns an Assembly
instance. The LoadedAssembly class
is subclassed as ReflectionLoadedAssembly for such assemblies. Then,
LoadTypes()
is called on the resulting assembly object to get an array of Type
objects. Members of types are
represented by MethodInfo,
PropertyInfo,
FieldInfo,
etc. Helper classes providing some
useful static methods for working with these classes are located in Utilities/Reflection.

Feature

Reflection Class

Assemblies

Assembly

Members of types

MemberInfo

Types

Type

All Methods

MethodBase

Methods

MethodInfo

Constructors

ConstructorInfo

Properties

PropertyInfo

Events

EventInfo

Fields

FieldInfo

Parameters

ParameterInfo

Reflection doesn’t just load the metadata for examination –
it loads assemblies with the intention of allowing them to execute, and this
means that they must pass various security and validation checks, or the assembly
will fail to load. Some of these checks
occur while browsing the types and their members, and can cause exceptions to
be thrown.

In using Reflection for this project, I experienced the following
issues:

You can’t load different
versions of an assembly into the same AppDomain, and different projects
can and do reference different versions of the same assembly – this can
even occur in the same project due to chained references. This also means an app running on .NET 4
can’t load older .NET libraries. These are serious problems for a tool like Nova CodeDOM.

You can’t unload an
assembly from an AppDomain once it’s loaded, you can only unload the
entire AppDomain, but you can’t do that to the primary AppDomain. This leads to continuous memory growth
when loading a series of different projects that reference different
assemblies.

Trying to work around #1
and/or #2 above by using multiple AppDomains is a lot more difficult than
might be expected – you can’t reference types loaded into one AppDomain by
code in another one, or they will be silently loaded into the other
AppDomain. You would have to locate
all code that references them locally, or create your own types to marshal
the data between the domains. This
is not obvious – I’ve seen code where people think they have solved this
problem, but they actually haven’t due to referencing the types across the
domains.

Exceptions can be thrown
when loading assemblies or browsing types, including security (CAS)
exceptions, and various other types of exceptions for many different
reasons. This can be a major
problem that prevents the loading of certain types or entire assemblies.

Performance is not very
good.

It turns out that while using Reflection might be fine for a
program to analyze its own assemblies at runtime, it’s actually not a good
choice at all for loading types from unrelated assemblies in order to do static
analysis. So, what other option is there?

Loading Type Metadata With Reflection Using Reflection-Only Loads

Back in .NET 2.0, a “read-only” capability was added to
reflection to get around many of the problems with using reflection for static
analysis of assemblies. The Assembly.ReflectionOnlyLoad()
method is specifically designed to allow for reflection of metadata for static
analysis – meaning that you do not intend to execute any code in the assembly,
but only inspect it. This would seem to
be exactly what is needed for this
project, and so I added a CodeDOM.ApplicationContext.UseReflectionOnlyLoad
config file option to enable this mode, and set it to be on by default.

Using Reflection with assemblies that were loaded in
reflection-only mode gets around the really big problem of not being able to
load different versions of the same assembly in the same AppDomain. It also avoids many possible exceptions,
because it bypasses strong name verification, CAS policy checks, processor
architecture loading rules, binding policies, doesn’t execute any
initialization code, and prevents automatic loading of dependent assemblies.

However, reflection-only loads still have some issues:

You still can’t load a
different version of the ‘mscorlib’ assembly into the primary AppDomain
(you can try, and it will pretend to work, but it won’t – you’ll still have
only the version that the running app was compiled with in memory). This can be worked around by “hiding”
newer types in mscorlib when an older version was desired in order to
prevent resolve conflicts.

You still can’t unload an assembly from an AppDomain once it’s
loaded.

So, #1 is mostly fixed,
but #2 is the same, and using multiple AppDomains to get around these
issues is just as difficult as before.

Although many exceptions
are avoided, new problems can occur that are specific to reflection-only
mode. Bypassing the binding
policies can cause attempts to load older framework assemblies that aren’t
compatible with newer and/or 64-bit OSes. Cross-references between normally loaded assemblies (such as the
resident ‘mscorlib’ that you can’t replace) and reflection-only loaded
assemblies can cause problems.
Exceptions are reduced, but they are far from eliminated.

Just like normal
Reflection, performance is not very good.

Unlike normal Reflection,
when using reflection-only mode dependent assemblies aren’t handled
automatically. A callback is fired
whenever an assembly references another one (see OnReflectionOnlyAssemblyResolve),
and the assembly must be manually located and loaded. This provides some flexibility, but
finding the correct assembly can take a lot of work, especially because
the callback can occur at any time (such as when type metadata is being
browsed) and there isn’t any easy way to determine the precise context
(such as which project the callback is related to when an entire solution
is being loaded).

Because no code in the
assembly can be executed, anything that instantiates types will throw an
exception. For example, you can’t
retrieve custom attributes using the normal GetCustomAttributes()
method (on Assembly,
MemberInfo,
or ParameterInfo)
because it instantiates them. This particular
issue can be worked around by using the CustomAttributeData
class, which provides static methods for retrieving custom attributes from
these types.

It’s possible to work around some of these issues, and I’ve
added logic to do that where I could, but it’s still not too hard to come
across a project that you just can’t load without some problems.

In summary, it turns out that although using Reflection in
reflection-only mode avoids some problems, it still has some serious issues in
certain cases. It’s still likely to fail
to load some assemblies or types, so it’s not a great choice for loading
metadata from assemblies for static analysis. So, we could use an alternative, and one such possibility is Mono Cecil.

Loading Type Metadata With Mono Cecil

Mono Cecil is an open source library for reading metadata
from .NET assemblies (it’s part of the Mono project). Based upon my problems with using Reflection,
I decided to add support for using Mono Cecil (version 0.9.5) to load metadata
and see how it compared, and so I added a CodeDOM.ApplicationContext.UseMonoCecilLoads
config file option (which is on by default). In this case, assemblies are loaded by calling AssemblyDefinition.ReadAssembly(),
which returns an AssemblyDefinition
instance. The LoadedAssembly class is subclassed as MonoCecilLoadedAssembly
for such assemblies, and a MonoCecilAssemblyResolver
class is used to resolve dependent assemblies during loading or type browsing. Then,
TypeDefinition objects are loaded from the assembly definition
object. Members of types are represented
by MethodDefinition,
PropertyDefinition,
FieldDefinition,
etc. Helper classes providing some
useful static methods for working with these classes are located in Utilities/Mono.Cecil.

Feature

Reflection Class

Mono Cecil Class

Assemblies

Assembly

AssemblyDefinition

Members of types

MemberInfo

IMemberDefinition

Types

Type

TypeDefinition

Generic types

Type

GenericInstanceType

Type parameters

Type

GenericParameter

All Methods

MethodBase

MethodDefinition

Methods

MethodInfo

MethodDefinition

Constructors

ConstructorInfo

MethodDefinition

Properties

PropertyInfo

PropertyDefinition

Events

EventInfo

EventDefinition

Fields

FieldInfo

FieldDefinition

Parameters

ParameterInfo

ParameterDefinition

Performance of Mono Cecil is generally very good, apparently
partly because of deferred loading (which moves some CPU time from assembly
load time to later browsing of the type data). On average in my experience, it takes about 1/3 the time that Reflection
takes to load assemblies and types. However,
memory usage is actually much higher than Reflection – about twice as much on
average. The table below shows some
example times of loading assemblies and types for some solutions along with
memory usage.

Solution

Projects

Files

Load (secs)

Diff

Memory (MB)

Diff

Refl.

Cecil

Refl.

Cecil

Nova

8

687

1.1

0.5

45%

27

49

181%

SubText 2.5.2

7

849

1.1

0.2

18%

24

76

317%

MS EntLib Tests

70

2,445

2.2

0.7

32%

76

155

204%

Large Proprietary

43

4,677

4.6

1.2

26%

129

223

173%

Issues with Mono Cecil include:

Despite some comments on
the web to the contrary (perhaps for older versions) it uses a lot more
memory than Reflection – roughly twice as much on average (it varies from
50% more to twice as much, or in some cases 3-4 times as much).

It isn’t thread safe – not
even if you’re only reading type data.
This is a rather shocking omission for a library that tends to load
and process a lot of data. As a
workaround for this, the ILSpy project on github has a forked version that
has been made thread safe for reading only.

It has a questionable
object model, with objects that represent definitions actually deriving
from objects that represent references (TypeDefinition derives
from TypeReference,
MethodDefinition
from MethodReference,
etc). This seems to be a trick to
allow the definitions to also be treated as references, or perhaps just to
inherit similar functionality as a possible consequence of the metadata
format. In any case, it’s not a
logical “is-a” relationship, so it’s somewhat confusing and precludes the
use of normal inheritance, such as a common base class for all member
definitions (an interface is implemented instead). In some cases, it gets downright ugly,
such as generic type instances represented by GenericInstanceType
have a GenericArguments
collection of TypeReferences (and HasGenericArguments
property), but also have a GenericParameters
collection of GenericParameters (and HasGenericParameters
property) – courtesy of the TypeReference base class – which is not
used. It just doesn’t seem very
clean to me, and it also seems that it could have been more similar to
Reflection in order to reduce the learning curve.

It hard-codes the use of
‘mscorlib’ for built-in types, so it won’t work correctly with assemblies
that supply their own built-in types instead of using mscorlib (this is
relatively rare). It’s possible to
work around this limitation by modifying the source.

Mono Cecil (version 0.9.5) uses a lot more
memory than Reflection, isn’t thread safe, and has a somewhat confusing object
model. This is unfortunate, because otherwise
it would be truly great. But, it
certainly gets the job done when Reflection sometimes can’t, it’s faster, and
it’s open source – so there’s still plenty to like about it. I should also mention that it allows you to
read and modify IL, so it’s a great option if you need to do that.

I’ve left the Reflection capability in Nova CodeDOM as a
fallback primarily because Mono Cecil uses so much more memory, but I don’t
really expect it to get used much – Mono Cecil just works better overall. Are there any other alternatives? Yes – there is CCI (Common Compiler
Infrastructure) on CodePlex and perhaps others, but the word on the web seems
to be that Mono Cecil is generally easier to use. If anyone has direct experience otherwise,
please let me know. In the meantime, I think
Mono Cecil is good enough for this project.

Using the “Reference Assemblies” for the .NET BCL

Starting with .NET 3.0, “reference assemblies” have been
provided for the .NET framework for design and build time use, and are
preferred to the runtime assemblies in the GAC. These assemblies were added to avoid conflicts due to minor changes in
the runtime assemblies, and they contain metadata only (no IL code). They are located in “%ProgramFiles%\Reference
Assemblies\Microsoft\Framework\...”, with separate subdirectories for different
versions and profiles. If you look at a BCL assembly reference (such as “System”) in a VS project, you’ll see that it
points to these “reference” assemblies. The code included with this article will attempt to use these assemblies instead of the runtime assemblies if possible (they won’t
exist if VS is not installed on the machine, and they don’t exist for
frameworks prior to 3.0).

Using the Attached Source Code

A new Projects/Assemblies folder has been added
with new classes used to load assemblies and their type metadata, and some
existing classes have been updated to support the new functionality (such as Solution
and Project
for things related to loading). Loading
solutions/projects will now show output messages regarding the loading of
referenced assemblies and their types (set Log.LogLevel to Detailed
in the config file to have all loaded assemblies listed as they are
loaded). Other than the loading of
assemblies and type data, there is little change in functionality from the
previous article – this has been necessary preparation for resolving symbolic
references, which will be implemented in the next article. As usual, a separate ZIP file containing
binaries is provided so that you can run them without having to build them
first.

Summary

My codeDOM is now able to load type metadata from the
various assemblies referenced by projects, and it also has knowledge of
references between projects in a solution. I now have everything needed to tackle the next big part of this
project: resolving. In my next article,
I’ll undertake the big task of resolving all symbolic references within a
codeDOM.

Share

About the Author

I've been writing software since the late 70's, currently focusing mainly on C#.NET. I also like to travel around the world, and I own a Chocolate Factory (sadly, none of my employees are oompa loompas).

Comments and Discussions

Very nice compilation of the issues. You should also mention that Mono cannot load managed C++ targets. Personally I had no problems with Mono and multi threading. For my ApiChange tool I did read each assembly in its own thread which worked very good. The memory consumption issue is really an issue and should be solved.

Interesting about managed C++... I thought that was supposed to work fine, in fact I've seen stuff on the web about how Reflection has problems with some other languages and managed DLLs, but Cecil handles them well. I'll have to look into it. Multi-threading with Cecil 0.9 or later is problematic due to deferred loading of data, which can be triggered at any time while browsing types. Perhaps problems can be avoided if you do things just right using separate threads for each DLL and load ALL types before continuing, but you could also be fooled into thinking things are OK and then get exceptions in rare cases when browsing a type that references another which isn't loaded yet. JB Evain (the author) is clear that it's NOT thread safe even for just reading, and I've experienced the exceptions that occur myself. Maybe you used Cecil 0.6 instead of 0.9 (or later)?

Yes I am still using 0.6 which does work fine where each dll is loaded by one thread. But I still had plans to upgrade to 0.9 to get better perf. But as far as I can see I really should wait until stuff (memory and threading) is fixed or do the changes by myself. If I only had more time ...

Yes, I wish JB would fix those issues, but I wouldn't hold my breath. If you don't need write capability, I would recommend the ILSpy version of 0.9 modded to be thread safe on reads. You'll probably find it significantly faster, although it might use even more memory - but memory is cheap and plentiful these days