Sunday, October 15, 2006

Did you ever encounter a java.lang.OutOfMemoryError: PermGen space error when you redeployed your application to an application server? Did you curse the application server, while restarting the application server, to continue with your work thinking that this is clearly a bug in the application server. Those application server developers should get their act together, shouldn't they? Well, perhaps. But perhaps it's really your fault!

Try to redeploy this little sample a number of times. I bet this will eventually fail with the dreaded java.lang.OutOfMemoryError: PermGen space error. If you like to understand what's happening, read on.

The problem in a nutshell

Application servers such as Glassfish allow you to write an application (.ear, .war, etc) and deploy this application with other applications on this application server. Should you feel the need to make a change to your application, you can simply make the change in your source code, compile the source, and redeploy the application without affecting the other still running applications in the application server: you don't need to restart the application server. This mechanism works fine on Glassfish and other application servers (e.g. Java CAPS Integration Server).

The way that this works is that each application is loaded using its own classloader. Simply put, a classloader is a special class that loads .class files from jar files. When you undeploy the application, the classloader is discarded and it and all the classes that it loaded, should be garbage collected sooner or later.

Somehow, something may hold on to the classloader however, and prevent it from being garbage collected. And that's what's causing the java.lang.OutOfMemoryError: PermGen space exception.

PermGen space

What is PermGen space anyways? The memory in the Virtual Machine is divided into a number of regions. One of these regions is PermGen. It's an area of memory that is used to (among other things) load class files. The size of this memory region is fixed, i.e. it does not change when the VM is running. You can specify the size of this region with a commandline switch: -XX:MaxPermSize . The default is 64 Mb on the Sun VMs.

If there's a problem with garbage collecting classes and if you keep loading new classes, the VM will run out of space in that memory region, even if there's plenty of memory available on the heap. Setting the -Xmx parameter will not help: this parameter only specifies the size of the total heap and does not affect the size of the PermGen region.

Garbage collecting and classloaders

When you write something silly like

private void x1() {
for (;;) {
List c = new ArrayList();
}
}

you're continuously allocating objects; yet the program doesn't run out of memory: the objects that you create are garbage collected thereby freeing up space so that you can allocate another object. An object can only be garbage collected if the object is "unreachable". What this means is that there is no way to access the object from anywhere in the program. If nobody can access the object, there's no point in keeping the object, so it gets garbage collected. Let's take a look at the memory picture of the servlet example. First, let's even further simplify this example:

After loading the above servlet, the following objects are in memory (ofcourse limited to the relevant ones):
In this picture you see the objects loaded by the application classloader in yellow, and the rest in green. You see a simplified container object that holds references to the application classloader that was created just for this application, and to the servlet instance so that the container can invoke the doGet() method on it when a web request comes in. Note that the STATICNAME object is owned by the class object. Other important things to notice:

Like each object, the Servlet1 instance holds a reference to its class (Servlet1.class).

Each class object (e.g. Servlet1.class) holds a reference to the classloader that loaded it.

Each classloader holds references to all the classes that it loaded.

The important consequence of this is that whenever an object outside of AppClassloader holds a reference to an object loaded by AppClassloader, none of the classes can be garbage collected.

To illustrate this, let's see what happens when the application gets undeployed: the Container object nullifies its references to the Servlet1 instance and to the AppClassloader object.
As you can see, none of the objects are reachable, so they all can be garbage collected. Now let's see what happens when we use the original example where we use the Level class:

Here known is a static ArrayList in the Level class. Now what happens if the application is undeployed?

Only the LeakServlet object can be garbage collected. Because of the reference to the CUSTOMLEVEL object from outside of AppClassloader, the CUSTOMLEVEL anyonymous class objects (LeakServlet$1.class) cannot be garbage collected, and through that neither can the AppClassloader, and hence none of the classes that the AppClassloader loaded can be garbage collected.

Conclusion:any reference from outside the application to an object in the application of which the class is loaded by the application's classloader will cause a classloader leak.

More sneaky problems

I don't blame you if you didn't see the problem with the Level class: it's sneaky. Last year we had some undeployment problems in our application server. My team, in particular Edward Chou, spent some time to track them all down. Next to the problem with Level, here are some other problems Edward and I encountered. For instance, if you happen to use some of the Apache Commons BeanHelper's code: there's a static cache in that code that refers to Method objects. The Method object holds a reference to the class the Method points to. Not a problem if the Apache Commons code is loaded in your application's classloader. However, you do have a problem if this code is also present in the classpath of the application server because those classes take precedence. As a result now you have references to classes in your application from the application server's classloader... a classloader leak!

I did not mentiond yet the simplest recipe for disaster: a thread started by the application while the thread does not exit after the application is undeployed.

Detection and solution

Classloader leaks are difficult. Detecting if there's such a leak without having to deploy/undeploy a large number of times is difficult. Finding the source of a classloader leak is even trickier. This is because all the profilers that we tried at the time, did not follow links through classloaders. Therefore we resorted to writing some custom code to find the leaks from memory dump files. Since that exercise, new tools came to market in JDK 6. The next blog will outline what the easiest approach today is for tracking down a glassloader leak.