VM Interface

Just a quick post to mention some changes I’ve been working on for IcedTea (the JDK 7 tree) and finally committed last night:

The build is now based around OpenJDK b33 (just as b34 is posted…). Such an update has been delayed, due firstly to CORBA build issues with b32 and then issues with javah and the new Java-based NIO generator in b33.

You can now build against something other than the JDK tree by using --enable-hg and --with-project. Current values for –with-project are caciocavallo, closures, cvmi and bsd.

While working out where the class library makes calls to the VM can be tricky at times, such points are usually well-delimited (e.g. all in jvm.h in OpenJDK as we saw last time) and there are a variety of clues to help find them. Firstly, if a non-Java VM is in use, then the possible VM-utilising methods is already limited to those that are declared native. Secondly, failures in this area usually produce clear runtime errors such as linking errors (e.g. libjava.so can’t resolve the symbol JVM_IHashCode on loading). Thus, the main issues to deal with are usually ‘what should this method do?’ and ‘what state(s) can I expect this method to be executed from?’.

Library calls made from the VM are much more subtle. First of all, they of course vary from VM to VM. Some VMs may choose to call up to the library much more often than others. Certainly, this will be the case with a Java-based VM where the boundaries between the two are less clear, being delimited by packages rather than language/linking boundaries (though this also applies a little in the opposite direction too). Over the past few weeks, I’ve been attempting to get JamVM to run with the OpenJDK class library, simply by swapping glibj.zip from GNU Classpath for the rt.jar from OpenJDK*. This has revealed a number of cases where both JamVM and HotSpot call methods in their corresponding class libraries and thus depend on their presence.

As mentioned briefly last time, one of the first things I noticed was that it was assumed that the VM boot process would link libjava.so. As CACAO has also done, I had to add this to JamVM’s boot process in initialiseNatives.
Another area where a unspoken relationship between the VM and the class library exists is in the 1.4 JNI NIO support. The VM needs to be able to create a java.nio.DirectByteBuffer which maps onto a native buffer underneath. Both OpenJDK/HotSpot and GNU Classpath/JamVM do this in a similar way, but there are two notable differences between them:

The pointer to the buffer is passed from HotSpot to the OpenJDK class library as a jlong (a Java long, not to be confused with a C long, which may vary in size; Java longs are always 64-bit). GNU Classpath encapsulates pointers in a wrapper class called either Pointer32 or Pointer64, both of which have the common superclass of gnu.classpath.Pointer. This has the advantage of making it clearer that the number being stored is a pointer, with the disadvantage of having to create an object instance.

For now, I’ve altered JamVM to use the OpenJDK/HotSpot way of doing things, but it may be worth providing a Pointer class in OpenJDK, as it does have safety advantages. The only other minor difference is that the GNU Classpath constructor seems to take a few more arguments, but the same values were being provided by both JamVM and the superconstructor call in the OpenJDK class library.

Both issues highlighted so far have fairly trivial solutions and involve minimal interaction between the VM and class library. However, the biggest interaction point between the two is at boot-time. It is the VM that is started by the user (or in the case of OpenJDK, the launcher which then starts the VM via JNI). However, the actual code needs to be executing with direct reference to the class library, not the VM. This, again, differs between native and non-native VMs, where the former have to also make the switch from native code to Java code before the user code can be executed.

The runtime overview of HotSpot, provided by the HotSpot team, provides a good overview of the boot process from HotSpot’s perspective, although it doesn’t go into full detail on the interaction with the class library. Here, we concern ourselves mainly with the call to JNI_CreateJavaVM by the launcher, which takes us into src/share/vm/prims/jni.cpp in the HotSpot VM, and the class and thread initialisation that follows.

Most of the work (2-12) actually goes on in src/share/vm/runtime/thread.cpp and the create_vm method. The main part of this which is of interest for VM–>library interaction is the process by which the native OS thread is linked to a java.lang.Thread object. HotSpot makes three calls to the class library:

(create_initial_thread_group) It creates an instance of java.lang.ThreadGroup using its no-arg constructor. This constructor is private and so can only be accessed by HotSpot for this purpose. This default constructor creates the root thread group, “system”.

(create_initial_thread_group) It creates another java.lang.ThreadGroup instance using the public constructor which takes a parent group and a name. It uses this to create a child of “system” called “main”.

(create_initial_thread) Calls the public constructor of java.lang.Thread to create a thread in the “main” group with the name “main”.

Apart from this, it initialises several core classes including java.lang.String, java.lang.reflect.Method, java.lang.ref.Finalizer and java.lang.Class, along with a number of exception and error classes such as java.lang.OutOfMemoryError. Additional classes are initialised if some options are enabled, such as java.util.HashMap if aggressive optimisations are turned on.

Although parts of this are specific to HotSpot, there is an implicit assumption in the class library that, for example, certain classes will have been initialised before the VM is fully operational. Thus another implementing VM has to do the same, calling the same internal methods such as the private thread group constructor described above. GNU Classpath has similar requirements, and in both cases, it is important we make such requirements explicit.

Note that GNU Classpath’s handling of threads differs considerably and I think there is some room for improvement here. JamVM does not pass or create any thread groups. An internal constructor expects a java.lang.VMThread which is then stored in java.lang.Thread and used for later calls such as start. Notably, the VM has to remember to set the group after the constructor concludes, and the hierarchy is different; GNU Classpath provides a root group called “main” which is created on java.lang.Thread class initialisation and JamVM places the main thread in that. There is no “system” group.

Both solutions leave the management of the thread itself to the VM, and it’s hard to see what the advantage is of GNU Classpath storing the VMThread. Calling general VMThread methods with the Thread instance may be a better solution. Where possible, general work should be taken off the VM and into the class library, and so it would be better if the group addition was handled by the constructor of Thread rather than relying on the VM to remember to do it.

I’d welcome comments on how different VMs handle this bootup process and what is the best solution for ensuring that the method contract between VM and class library is well documented and adhered to.

* Note that this means we still JamVM as the launcher for now. CACAO’s OpenJDK implementation takes a different approach with CACAO merely providing a replacement libjvm.so into a JDK tree and thus using the same launcher as HotSpot.

The biggest difference between the GNU Classpath and Sun OpenJDK VM interfaces is the point at which control shifts from the library to the virtual machine. Both solutions do provide separation between the library and the VM. Contrary to what may be initially assumed, this is true of OpenJDK even though both HotSpot and the JDK are maintained in the same location. This is what allows different versions of HotSpot to be swapped in, as mentioned in the OpenJDK trademark license. That said, there are likely to be closer ties between the JDK within OpenJDK and HotSpot than there are between GNU Classpath and any of its VMs, simply because of the number of and variance between the latter.

OpenJDK’s VM interface is entirely C-based. The class library calls into the VM using a number of functions with the prefix ‘JVM_’ that are listed in src/share/javavm/export/jvm.h. Implementations of these functions can be found both in HotSpot’s src/share/vm/prims/jvm.cpp and CACAO’s src/native/vm/openjdk/jvm.c. These are dynamically linked in at runtime for a variety of dynamic libraries held in jre/lib/${arch} where ${arch} is the architecture in use such as amd64. The output below shows their use in the recent b31 drop:

Clearly most of these are found in libjava.so, which is the library that contains the native code for the core classes like those in java.lang. What’s interesting about libjava is that, unlike for example libnet, it isn’t loaded via a call to LoadLibrary in the class library code. Rather, it is expected that the VM will know about and load this. In CACAO, a special OpenJDK case is provided in src/native/vm/nativevm.c to handle this. This needs to happen early in the VM initialisation process before any of the native calls in the core classes are used.

In contrast, GNU Classpath interacts with the VM while still at the Java level. Each call which may need to go to the VM is first handed off to a package-private VM class, for example vm/reference/java/lang/VMObject.java provides methods like wait() and notify(). The reference version of these classes tend to do what the OpenJDK classes do in the original classes like java.lang.Object; define the methods as native. However, a VM can replace these VM* classes as needed. Some are mandatory, as it isn’t possible to provide a generic reference implementation in the class library. In both cases, missing implementations are visible through linking errors at runtime.

To further clarify the difference, let’s trace the path of one such function, java.lang.Object#wait(). In GNU Classpath, java.lang.Object defines it as:

In some cases, there is also a reference native implementation under native/jni/${package name}/${package_name}_${class_name}.c (native/jni/java-lang/java_lang_VMObject.c in this case), but this isn’t the case here. Instead, we look to CACAO and find this in src/native/vm/gnuclasspath/java_lang_VMObject.cpp:

Unlike with GNU Classpath, the java.lang.Object code goes straight to the native implementation found in src/share/native/java/lang/Object.c. However, we don’t find an implementation there. Instead, we need to look at another function in Object.c called registerNatives:

The array, methods, tells us that wait maps to JVM_MonitorWait. Looking at the list above, we see that JVM_MonitorWait is one of the VM symbols in the java library. And sure enough, we find it in src/share/javavm/export/jvm.h:

Clearly, there isn’t that much difference between the two. The main issue is finding the right functions and where and how they are called. One other major difference is that Classpath VMs provide their own launchers (you run ‘cacao’, ‘jamvm’, ‘gij’, ‘kaffe’ etc. binaries) while the ‘java’ in OpenJDK is a standard launcher (also used for other tools) which invokes the VM via JNI and libjvm.so. There’s also a mechanism for selecting VM in this manner through src/(solaris|windows)/bin/${arch}/jvm.cfg, which we’ll look at later.

With the CVMI project, we aim to document these issues and also experiment with providing an OpenJDK interface more at the Java level. Comments and suggestions welcome.

I read Mario and Roman’s posts this morning and they inspired me to post too, after realising that there was a lot of stuff that’s been going on that I also hadn’t blogged about. Firstly, Google Summer of Code closed its doors to student applications on Monday (well Tuesday really, here in Europe) and we were rather disappointed to find only two applications for GNU Classpath, both for java.util.Scanner. This is a real pity, as I think we had some really good ideas on the list (and even more interesting ones were left to one side after AICAS didn’t get in as a mentoring organisation).

I think students picked Scanner because it’s used on many undergraduate courses these days (something of which I was blissfully unaware, as it’s years since I did any introductory Java course or read an introductory book). Unfortunately, the future of such an idea is really dubious, as I explained to our first applicant online, as there is already an implementation (no idea how good or complete) by a student of Christian Thalinger (twisti) which we hope to get into the codebase, once the legalities are sorted out. Equally, we are looking at using BrandWeg to get the OpenJDK version; this has only really ground to a halt because I found we’d need to either update our regex implementation or also bring in the OpenJDK one and I simply haven’t had time. Again, BrandWeg was on the ideas list, but no takers

Personally, I’ve mainly been looking at IcedTea recently, in preparation for the OpenJDK challenge work. I haven’t really blogged on the details of this yet, but my first port of call will be to take an OpenJDK/IcedTea build and attempt to get the class library from that to work with one of our Free GNU Classpath VMs. Rather than trying to build the Sun interface into that VM, I want to use that process to figure out the best way to create a more well-documented VM interface including support for VMs that don’t want to go the native route for the VM interface. I intend this to be an interactive process so I’ll be posting results and hoping for feedback as we go along. As the OpenJDK challenge infrastructure gets sorted, I expect this will take place under the auspices of an OpenJDK project. I’ll be tagging appropriate blogs with the ‘VM interface’ category so feel free to track them.

One question that does spring to mind is whether the challenge projects are intended to work with OpenJDK or OpenJDK6. IcedTea work has pretty much shifted to OpenJDK6 (in the form of IcedTea6) with good reason; distros want to ship a stable Free equivalent to JDK6, not an early alpha of JDK7 (which doesn’t yet even have a JSR attached to it). Ideally, I think we should maintain both, as the IcedTea porting process seemed to suggest the differences weren’t too major. But feedback here would be welcomed. I’d also like to make the new VM work easy for others to test, so hopefully some integration with IcedTea will be possible there too. I’m very impressed with IcedTea so far, especially Gary’s zero port (as my previous blogs hopefully illustrate). There are some niggles, but lots of eyes and people testing it in different environments will fix these. It’s great to have OpenJDK on PPC64 for one thing!

To close, I notice that people are moving around in the Free Java world (as my reference to Mario’s title in mine hopefully indicates). First, Tom Marble left Sun earlier this year, and now we find that Dalibor has successfully stepped into his (hopefully clean) shoes. To me, there doesn’t seem to be a more appropriate replacement, though it does make his governance board position interesting… Mario has also now moved to AICAS, so it seems like most people are getting new jobs and different roles in the new Free Java world brought about by OpenJDK. Things will change for me as well soon, as I’m due to finish my PhD here in October (well the funding runs out at least, which means I need some alternate source of cash at any rate). So it will be interesting to see where I am in a year from now too…