First off all, sorry for the bad pun, but I couldn't resist. Once Whidbey
ships, one of the areas that .NET will be light years ahead of Java is the
ability to share memory between different instances of the runtime. Microsoft
did lots of work in Whidbey to enable sharing of memory pages (e.g. see
Rico's post).
Sun did a little work in J2SE 5.0 to allow rt.jar to be shared across VM
instances, but that's really not much compared with the sharing that NGEN
enables on Whidbey.

Frozen Strings

One aspect that hasn't been written about much is the ability to pre-create
string instances in NGENed images. What this means is that string literals are
layed out in the .data section of the NGEN image exactly like they would be
layed out when they are dynamically created by the CLR. So whenever you use a
frozen string literal in your managed code you're simply passing around a pointer to
static data in the NGEN image and not to an object in the GC heap. Since these
strings live in the .data section of the image, the standard copy-on-write page
sharing that the operating system uses for initialized data sections in images applies, so unless you
modify the object somehow (more about this in a bit) all applications using that
image will be sharing the same physical memory pages.

To get NGEN to create frozen strings for your string literals, you have to
mark your assembly with the StringFreezingAttribute.
Note that the downside of doing this is that your assembly will not be
unloadable, because the frozen string instances that live in your image aren't
tracked by the GC, the CLR needs to keep the image loaded for the lifetime of
the process.

Copy-on-Write

Strings are immutable, so why did I mention modifying the object earlier? One
obvious way to modify a string is to use unsafe (or native) code to poke inside
the string (a really bad idea!), but there are other ways of "modifying"
immutable objects. The first is to use an object as a monitor (using
Monitor.Enter or the C# lock() construct) and the
second is to get the object's identity hashcode by calling
System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode() or doing a non-virtual call to Object.GetHashCode() on the object.
Using an object as a monitor will cause the object header to be used as a
lightweight lock or as an index into the syncblock table that contains the
heavyweight lock, so this can mutate the object (header). Locking on string
literals was always a bad idea, because they're probably interned so they may be
shared by other pieces of code that you don't know about and they can also be
passed across AppDomain boundaries, but in Whidbey there is the additional
(potential) cost of having to take a page fault and having to make a private
copy of the page containing the strings object header, if the string is frozen.
The second issue (identity hashcode) turns out not to be an issue for frozen
strings, because NGEN pre-computes an identity hashcode for frozen strings, so
RuntimeHelpers.GetHashCode() will simply return the value that was pre-computed
and stored in the object header.

Yesterday's release candidate has a much smaller IKVM.GNU.Classpath.dll than the previous release candidate. As I noted this is due to some optimizations in the metadata. Let's look at the file size of IKVM.GNU.Classpath.dll over time:

The big jump in size between 0.8 and 0.10 is mostly due to three reasons: 1) Long period between releases, 2) Huge growth in GNU Classpath, 3) 0.10 for the first time includes source file names and line number tables (to be able to show source files and line numbers in stack traces).

The size reduction in 0.16 was due to a more efficient format for the line number tables. After making this optimization in 0.16, I wanted to investigate exactly what makes up the size of IKVM.GNU.Classpath.dll, but I couldn't find any tools to analyse a managed PE file based on this criterion. So I opened the ECMA specification and hacked together some code. Here's what I found:

First, let's start with the size of the Java classes and resources that IKVM.GNU.Classpath.dll consists of (to have some reference):

bytes

Classes

9,694,349

Resources

2,016,851

Total

11,711,200

Zipped

5,843,852

So, compared with the uncompressed size of the classes and resources, the size of IKVM.GNU.Classpath.dll isn't too bad at all.

Here's a breakdown of the parts of the PE file structure:

bytes

PE Headers/overhead

4,096

.text section

6,770,688

.rsrc section

4,096

.reloc section

4,096

Not very interesting, except maybe that there is a .reloc section that I don't understand the need for, since there's only managed code in this module.

A little more interesting is a breakdown of the .text section:

bytes

Unknown

8

CLI Header

72

Code + resources

3,651,236

Managed metadata

3,116,284

Filler (alignment)

3,088

Here's a breakdown of the managed metadata:

bytes

Metadata Header

32

#~ header

12

#Strings header

20

#US header

12

#GUID header

16

#Blob header

16

#~ stream

1,704,068

String heap

356,048

Userstring heap

327,652

GUID heap

16

Blob heap

728,392

And finally, a breakdown of the #~ stream:

bytes

Header

112

Module table

12

TypeRef table

2,210

TypeDef table

79,146

Field table

148,620

Method table

669,870

Param table

220,448

InterfaceImpl table

9,472

MemberRef table

5,424

Constant table

47,370

CustomAttribute table

488,676

StandAloneSig table

24,216

PropertyMap table

8

Property table

90

MethodSemantics table

66

MethodImpl table

600

ModuleRef table

8

TypeSpec table

400

ImplMap table

108

Assembly table

28

AssemblyRef table

112

ManifestResource table

3,654

NestedClass table

3,416

Filler (alignment)

2

For those unfamiliar with the CLI metadata specification, these tables contain fixed length records and the fields in the records typically contain flags, indexes into other tables or pointers into a string, userstring or blob heap, or an offset into the Code + resources part of the .text section.

From the above table It should be obvious that the custom attributes contribute a significant part of the file size (and remember, this is the 0.18 version of IKVM.GNU.Classpath.dll that has already been optimized quite a bit).

In my analysis tool I built specific code to look at the sizes of the custom attributes and here's the report it generated:

One notable item is that the line number tables have grown in 0.18, this is due to the fact that 0.18 has been compiled with the Eclipse Java Compiler whereas 0.16 was compiled with Jikes. For some reason, the Eclipse compiler generates larger line number tables. I haven't investigated this yet.

The new -strictfinalfieldsemantics ikvmc option was the direct result of studying the impact of metadata on the file size. Without this option, public and protected final fields are converted into readonly properties and the field requires and additional attribute. With the option, final fields are converted into initonly fields, which has the same semantics under a strict interpretation of the 1.5 VM specification. This option alone saves 155,648 bytes.

Looking at the custom attribute sizes, there appears to be more room for improvement. In particular, the ThrowsAttribute, InnerClassAttribute and ImplementsAttribute can benefit from using tokens instead of encoding the class names in the constructor blob, but the required APIs to resolve tokens are new in Whidbey, so for the time being that isn't an option.

Another long term improvement would be to include the line number tables in the method IL (or after the IL), to save on the records in the custom attribute table (which contribute a very significant 358,368 bytes). This would probably be possible in Whidbey by using MethodBody.GetILAsByteArray(), but it would be nicer if the ECMA spec would be extended to support this directly (it would also remove the need for the ridiculously large PDB files simply to get line numbers in stack traces for other .NET applications).

The previous release candidate 0.16 rc1 never actually made it to release because of a GNU Classpath showstopper bug. Thanks to Mark Wielaard for working hard to quickly do a follow up GNU Classpath release that fixes the problem and includes a number of other improvements. I haven't made many changes to IKVM in the mean time, the only major one being that I focussed some effort on reducing the size of the metadata of IKVM.GNU.Classpath (and as a side effect for most other ikvmc generated assemblies as well). I also switched to the Eclipse Java Compiler because Jikes generates code that is incompatible with the new -strictfinalfieldsemantics ikvmc option.

Switched to the Eclipse Java Compiler for compiling GNU Classpath (not just the generics branch).

Added optimization to only store source file name is the name differs from the class name + ".java"

Simplified and optimized inner class attribute metadata.

Added -strictfinalfieldsemantics option to ikvmc to generate more efficient (and 1.5 spec compliant) code. Note that this is not enabled by default for maximum compatibility with the Sun JVM (which isn't compliant with the 1.5 spec).

Rely less on HideFromJavaAttribute and more on naming conventions to use less metadata.

GNU Classpath 0.16 was released yesterday, so I've made a new IKVM release based on it. An interesting new feature in this release is that you can now debug dynamically loaded Java classes in the June CTP of Visual Studio 2005. Note that this only works if you start your application in the Visual Studio debugger (at startup the IKVM runtime checks System.Diagnostics.Debugger.IsAttached to determine if it should emit debugging information or not).

Changed ExceptionHelper.readObject to not convert ClassNotFoundException into IOException.

Named parameters for Cast, CastArray, IsInstance and IsInstanceArray methods in ghost structures.

Removed support for "deprecated" attribute in map.xml. Marking methods deprecated in map.xml can now be done by applying the System.ObsoleteAttribute attribute.

Removed support for "hidefromjava" attribute in map.xml. Marking methods HideFromJava in map.xml can now be done by applying the IKVM.Attributes.HideFromJavaAttribute attribute.

Instancehelper methods on java.lang.String are no longer EditorBrowable(Never).

Only emit method parameter names for public/protected methods in public types.

Interface methods and methods that don't have debug info now get synthesized parameter names.

A couple of "random" awt fixes.

Added support for P/Invoke (DllImportAttribute).

Moved ikvmc specific compiler support to AotTypeWrapper class.

Fixed member access checks (for real this time).

Moved native methods of FileChannelImpl and MappedByteBufferImpl from IKVM.Runtime to IKVM.GNU.Classpath (using P/Invoke from Java).

Added -fileversion option to ikvmc to set the unmanaged file version.

Added call to AssemblyBuilder.DefineVersionInfoResource() to ikvmc, so that a version info resource is now automatically created (the contents are based on the various assembly attributes and the -version and -fileversion options).

Fixed possible (but unlikely) NullReferenceException in ClassLoaderWrapper.FinishAll() when it encounters a type that cannot be finished for some reason.

Made the line number table encoding a little more efficient.

Changed build process to support building the GNU Classpath generics branch (in addition to the main branch).

Added a hack to ikvmstub to export generic type instantiations. This enables a usable mscorlib.jar to be generated from the 2.0 version of mscorlib.dll.

Set DebuggableAttribute to assembly before creating the module, to make sure Visual Studio (Whidbey) picks up the attribute when debugging dynamically generated code.Fixed possible System.ArgumentException in Class.forName() (when trying to load a class with a name that is invalid in .NET)

Fixed JNI_GetCreatedJavaVMs to accept null pointer for nVMs.

Fixed class name in error message for VerifyError that occurs when overriding a final method.

For the past three years I've been using the Jikes compiler to build GNU Classpath, but unfortunately Jikes development isn't very active anymore. I wanted to start playing with the GNU Classpath generics branch (the branch that contains the classes that require 1.5 specific language features), so I needed a compiler that could compile the generics branch and was freely available. The Eclipse Java Compiler was the obvious choice and while I ran into two bugs when I tried to compile the generics branch, the bugs were fixed within hours after I filed them, that kind of responsiveness is confidence inspiring.

The next IKVM release (hopefully due next week) will still be built with Jikes, but future releases will most likely be built with the Eclipse Java Compiler. To make this easier I've used ikvmc to compile it to ecj.exe and created a Windows installer that installs ecj.exe and adds IKVM.GNU.Classpath.dll and IKVM.Runtime.dll to the Global Assembly Cache. If you're not on Windows, you can download ecj.exe in this zipfile. Note that ecj.exe requires Mono 1.1.8.1 or later.

I did some research into supporting partial trust and it looks like it might be feasible. This snapshot already contains some changes to better support running in partial trust (particularly for IKVM.GNU.Classpath, IKVM.Runtime contains unsafe code so it currently needs to be trusted). On .NET 1.1 non of the built in partial trust permission sets are suitable, because I require ReflectionPermission(ReflectionPermissionFlag.TypeInformation). In Whidbey this permission flag is deprecated, so the story looks more promissing there.

One of the consequences of adding partial trust support is that IKVM.Runtime.dll will need to be split into several parts. At the very least the JNI implementation will need to be in a separate assembly, so that the common non-JNI scenarios won't require SkipVerification permission.

Exception Handling

I made some major changes to exception handling in this version. However, for Java code nothing should change (except that it hopefully runs a little bit faster), but for .NET/Java interop there are some important changes:

Exceptions generated by the CLR or .NET code (e.g. System.NullReferenceException) will no longer be changed into their Java equivalents for non-Java code. This means that when you catch an exception in IKVM Java code, you'll still see the corresponding Java exception (e.g. java.lang.NullPointerException), but when you rethrow the exception, the original exception gets thrown.

When Java code explicitly throws a .NET exception (e.g. System.NullReferenceException) it is no longer remapped to the Java equivalent.

Catching exceptions now faithfully corresponds to the IKVM type system. This means that you can now use catch(cli.System.Exception) to catch the unremapped .NET exceptions.

This is a major step towards my ultimate vision for exception handling, but I'm not nearly there yet. Other changes I want to make include adding more exception state to java.lang.Throwable instead of the WeakHashMap construct that is currently used (the WeakHashMap will still be required to associate the .NET exceptions with their remapped Java exceptions). I also want to use exception filters to check for remapped exceptions, to make the debugging experience better and the bytecode compiler needs to be improved to recognize try {} finally {} constructs so that they can be compiled as .NET try {} finally {} blocks, instead of the currently used and vastly less efficient try {} catch() { throw; }.

Enabled generation of debug info when a debugger is attached (at the time the runtime is initializing), to allow debugging of dynamically generated code (a Whidbey feature, although in beta 2 it doesn't work yet).

Added check to ikvmc to make sure that referenced ikvmc-generated assemblies were compiled with the same version of the ikvm runtime.