Sunday, March 22, 2009

My colleague Murali Pottlapelli ran into an interesting problem the other day. He added Rhino to his BPEL Service Engine, and saw this error happen:

java.lang.LinkageError: loader constraint violation: loader (instance of <bootloader>) previously initiated loading for a different type with name "org/w3c/dom/UserDataHandler"

The weird thing was that this exception was thrown from a call to Class.getMethods() on a class shipped with the JVM!

Googling this problem revealed that there are a lot of people running into this issue, often when using OSGi. Most search results referred to email lists postings where people ran into this problem. None of the web pages properly explained what the problem was. Intuitively I felt we could solve our problem by moving a jar to a different classloader, but was that merely hiding the problem? As with my post on "How to fix the dreaded "java.lang.OutOfMemoryError: PermGen space" exception (classloader leaks)", I was convinced that understanding the problem is key. So Murali and I set out to dig in this problem deeper until we completely grasped it.

As it turns out, there are some aspects about this problem that make it very confusing:

In a dynamic component system, a change in one component may cause the other components to fail in areas that used to work properly before. At the same time, the changed component appears to be working properly.

The order in which components are activated determines where and how the problem shows up.

The effects of the problem may show up in innocuous and seemingly unrelated calls such as Class.getMethods().

In the following sections I'll first illustrate the problem, and then explain in detail what's causing the problem.

The problem

Let's look at a model example. In stead of looking at an OSGi example or a JBI example, let's look at EE because it will be more familiar. Imagine we have an EAR file with an EJB and two WAR files. The WAR files are identical, and have a servlet that uses an EJB to log in. As such we have three classes:

Let's say that one WAR is configured with a self-first classloader, and the other one uses the default parent-first class loading model.

Now consider these three scenarios:

We log in using the parent-first servlet, and then inspect the EJB with Class.getMethods(). Everything works as expected, but when we then try to login on the second servlet, we see the linkage error.

We log in using the self-first servlet. Then when we call Class.getMethods() on the EJB, this fails. Also, we can no longer log in on the parent-first servlet!

We first call Class.getMethods() on the EJB. We can no longer login using the self-first servlet, but the parent-first servlet still works.

What is going on? To explain, let's first revisit some classloader basics. If parent-first and self-first is in your daily vocabulary, feel free to skip the next section.

Self-first versus parent-first delegation

What is meant with self-first delegating classloaders? Here's the skinny on classloaders. In Java you can create your own classloader for two reasons: this allows multiple versions of the same class to co-exist in memory, as is often found in OSGi. It also allows classes to be unloaded, as is found in application servers. A classloader typically represents a set of jars that make up the module, the component, or the application. Each classloader must have a parent classloader. Hence, classloaders form a tree with the bootstrap classloader as the root. See the picture above.

When a classloader is asked to load a class, it can first ask its parent to load the class. If the parent fails to load the class, that classloader will then try to load the class. In this scheme, called parent-first class loading, common classes are always loaded by the parent classloader. This allows one application or module to talk to another application or module in the same VM.

Instead of asking the parent classloader first, a classloader can also try to find a class itself first, and only if it cannot find the class would it ask the parent classloader to find the class. A self-first classloader allows for an application to have a different version of a class than found in the parent classloader.

Classloader lab

To show what's going on, I've developed a small demo that emulates the scenario with the two WARs and the EJB. Key in this demo is a custom classloader. The constructor takes a list of classes that should be defined by that classloader, i.e. the classloader behaves as self-first for those classes, and delegates to the parent classloader for the other classes. The custom classloader is listed in the code at the bottom of this post.

This is how the system is setup: a classloader for the EJB that loads the LoginEJB and the User class. A classloader for the parent-first WAR that loads the Servlet only, and a self-first classloader that loads the Servlet and the User class.

When the JVM loads the LoginEJB class, it goes over references in the class and loads those classes too: the java.lang.Object class because it's the super class of the EJB, and the java.lang.System and java.io.PrintStream class because they are used in the static block. That these "JVM" classes are loaded is in itself remarkable and shows an interesting aspect of how classloading works. "JVM" classes are not treated specially, and it is not relevant that they are already loaded in the bootstrap classloader and are used all over the place.

When the EJB classloader receives the request to load these "JVM" classes, that classloader of course delegates those requests to the parent classloader. In fact, it's a requirement to delegate all class load requests to the parent classloader for all classes that are in java.* and javax.*.

Something also remarkable is what is not loaded: the User class. Apparently, the fact that this class is used in a method is not enough to cause this class to be loaded when the EJB class is loaded. It is difficult to predict what classes are loaded as the result of loading a particular class. I think the spec leaves a lot of room to implementers to decide when to do so.

It's important to realize that when the EJB class is loaded, the User class is not loaded yet.

The User class is loaded and defined in the self-first WAR classloader, and when the login() method is called, an instance of that class is passed to the EJB. Why does the linkage error happen?

When the parent-first servlet invoked the EJB, it passed in a User object. At that moment, the JVM links the reference to com.stc.Demo$User to a class instance. A class instance is identified using the fully qualified name and the classloader instance in which it was loaded. Upon invocation of login(User), the JVM will check that the class object of the User passed in, matches the class object that was linked to com.stc.Demo$User. If they don't match, the JVM will throw a LinkageError.

This linking happened on the first invocation. We can also force the linking to happen by calling LoginEJB.class.getMethods(). I can illustrate that by changing the test program to fist inspect the EJB class, and then make the self-first servlet to login.

The login fails because the LoginEJB.class.getMethods() invocation causes the com.stc.Demo$User reference to be linked with the User class loaded in the EJB classloader. When the self-first servlet invokes the method, the two class objects don't match, causing the Error to be thrown.

Small mistake results in "poisoning" a shared class

By now it should be obvious that the User class should not have been packaged in the self-first WAR. What may not be obvious yet, is that this small mistake has big consequences. If a login happens on the self-first servlet before anything else, the linking happens with the erroneous User class object from the self-first classloader. This will cause the login of the parent-first WAR to fail. It will also cause the LoginEJB.class.getMethods() invocation to fail.

The first login with the self-first WAR has effectively poisoned the EJB, making it unusable. In everyday life, the problem is likely not so clear cut. For instance, a developer adds a jar to a component, or changes the classloading delegation model of a component, and all tests with that component may succeed. The problem may only show up in next day's build when integration tests are run. And as the example above shows, the stacktrace does not tell much about where the real cause of the error is.

Formalization and references

A formal description of loading constraints can be found in detail in section 5.3.4 of The Java Virtual Machine Specification (online available at http://java.sun.com/docs/books/jvms/second_edition/html/ConstantPool.doc.html). In simple words, a linkage error can occur if two different classes interact with each other, and in this interaction the classes refer to types with the same symbolic name but with different class objects. In the example, the self-first servlet referred to EJB:LoginEJB.login(sfWeb:User), but the EJB's representation was EJB:loginEJB(EJB:User).

Other places where linkage errors may occur are in class hierarchies. If class Derived overrides a method f(A a, B b) in class Super, A and B as seen from Super must be the same A and B as seen from Derived. References to static variables are another example.

Thursday, March 19, 2009

You would think that subsequent calls to System.nanoTime() would return ever increasing values. After all, time cannot run backwards. However, when you run a JVM on the Hyper-V virtualization platform, it turns out that time may actually run backwards.

Hence, if you use this timer to measure time differences, you need to account for negative differences. This was a problem that I had not accounted for in the Hulp Measuring package.

Tuesday, March 17, 2009

Last week Masoud Kalali asked me for an interview about OpenESB. Today the interview was published. Nothing new for those that already known OpenESB and Java CAPS, but for people who look at it for the first time it may provide some useful background information.