The leak hunter faces his toughest challenge yet

Nikita Salnikov-Tarnovski recounts a nightmare twelve-hour search for the source of an applications memory leaks.

A
week ago I was asked to fix a problematic webapp suffering
frommemory
leaks. How hard can it be, I thought –
consideringthat I have both seen and fixed hundreds
of leaksoverthe past year or
so.

But this one proved to be a challenge. 12 hours
later I had discovered no less than five leaks in the application
and had managed to fix four of them.Ifigured it would be an experience worth sharing.

The application at hand was a simple Java web
application with a few datasources connecting to the relational
databases,Spring in the middle to
glue stuff together and simpleJSP pages rendered to the end user. No
magic whatsoever. Or so I thought. Boy, was I wrong.

First
stop -MySQL drivers. Apparently the most common
MySQL drivers launches a thread in the background cleaning up your
unused and unclosed connections. So far so good. But the catch is
that the
context classloader of this newly created
thread is your web application classloader. Which means that while
this thread is running and you are trying to undeploy your webapp,
its classloader is left dangling
behind -with
all the classes loaded in it.

Apparently it took from July 2012 to February
2013 to fix this after the bug was discovered. You can follow the
discussion inMySQL issue
tracker.The solution
finally implemented was a
shutdown() method to the API,which you as a developer should know to invoke before
redeploys. Well, I didn’t. And I bet 99% of you out theredidn’t, either.

There is a good place for such shutdown hooks
in your typical Java web application,namely the
ServletContextListener class
contextDestroyed()method. This
specific method gets called each and every time the servlet context
is destroyed, which most often happens during redeploys for
example. Chances are that quitea
few developers are aware this place exists, but
how many are actuallyrealise theneed to
clean up in this particular hook?

Back to the application, which was still far
from being fixed. Mysecond
discovery was also related to
context classloaders and datasources. When
you are usingcom.jdbc.myslq.Driver it
registers itself as a driver in
java.sql.DriverManager class. Again,this is done withgood intentions. After
all, this iswhatyour applicationuses
tofigure out how to choose the right driver
foreachquerywhen
connecting to the database URL. But as you might
guess, there is a catch:this
DriverManager is loaded in bootstrap
classloader,rather than your web
application’s classloader, so cannot
be unloaded when redeploying your application.

What now makes things really peculiar is that
there is no general way to unregister the driver by yourself. The
reference to the class you are trying to unregister seems to
deliberately hidden from you. In this particular case I was lucky
and the connection pool used in the application was able to
unregister the driver. In case I remember to ask. Looking back to
similar cases in my past, this was
the first time I saw such a feature implemented in connection pool.
Before that, I once had to enumerate through all theJDBC drivers
registered with
DriverManager to figure out which ones should
I unregister. Not an experience I can recommend to
anyone.

This should be it, I thought. Two leaks in the
same application is already more than one can tolerate. Wrong.
Thethird issuestaring right at me
from the leak report wassun.awt.AppContext with
its static field
mainAppContext. What? I have no idea what this
class is supposed to do, but I was pretty sure that
the application at hand
didn’t useAWT in
any way. So I started a debugger to find out who loads this class
(and why). Another
surprise:it was
com.sun.jmx.trace.Trace.out(). Can youthink of a good reason why a
com.sun.jmx class would call asun.awt class?I
certainly can’t. Nevertheless, that class stack
originated from my connection pool,BoneCP. And
there’s absolutely zero way to skip that
code line that leads to this particular memory leak. Solution? The
following magic incantation in myServletContextListener.contextInitialized():

Thread.currentThread().setContextClassLoader(null);
// Force the AppContext singleton to be created and initialized without holding reference to WebAppClassLoder
sun.awt.AppContext.getAppContext();

But I still wasn’t done: Something was still leaking. In this
case I found out that our application was binding this datasource
to the
InitialContext()JNDI tree, a good, standardized
way to bind your objects for future discovery. But again – when
using this nice thing you had to clean up after yourself by
unbinding this datasource from the JNDI tree
in the very same
contextDestroy() method.

Well, so far we had pretty logical, albeit rare
and somewhat obscure problems, but with some reasoning
and google-fu werequickly
fixed. Myfifth and last problemwas
nothing like that. I still had that application crashing
with
OutOfMemoryError: PermGen. Both Plumbr and Eclipse MAT
reported to me that the culprit, the one who
had taken my classloader hostage,
was a thread named
com.google.common.base.internal.Finalizer.

“Who the hell is this guy?” – was my last
thought before the darkness engulfed me.

A couple of hours andfour coffees later I found myself staring
atthreelines:

emf.close();
emf = null;
ds = null;

It is hard to recollect exactly what
happenedduring the interveninghours. I
have remote memories of
WeakReferences,
ReferenceQueues,
Finalizers,
Reflection and my first time of seeing
a
PhantomReference in the wild.Even today I still cannot fully explain why and
for what purpose my connection pool used finalizers tied to
google’s implementation of reference queue running in a separate
thread.

Nor can I explain why closing
javax.persistence.EntityManagerFactory(named
emf in the codeaboveand held in static reference in one of application’s own
classes) was not enough; and so I had to manually null this
reference. And similar static reference to the data source used by
that factory. I was sure that Java’s
GCcouldcope with
circular references all day
long,but
it seems that this magical ring of classes, static references,
object, finalizers and reference queues was too hard even for him.
And so, again for first time in my long career, I had to nullify
java reference.

I am a humble guy and thus cannot claim that I
was the most efficient in finding the cure for all of the above in
a mere 12 hours. But I have to admit I have been dealing with
memory leaks almost exclusively for the past three years. And I
even had my own creation,Plumbr, helping
me(in fact, four out of five of
those leaks were discovered byPlumbr in 30 minutes or so). But to
actually solve those leaks, it took me more than a full working day
in addition.

Overall – something is apparently broken in the
Java EE and/or classloader world. It cannot be normal that a
developer must remember all those hooks and configuration
tricks, because it simply isn’t possible.
After all, we like to use our heads for something productive. And,
as seen from the workarounds bundled with two popular servlet
containers (Tomcat and
Jetty), the problem is
severe. Solving it, however, will require more
than simply alleviating some of the symptoms, but curingthe underlying design errors.

Nikita Salnikov-Tarnovski is co-founder of Plumbr, the memory leak detection product, where he now contributes his time as a core developer. Besides his daily technical tasks he is an active blogger and conference speaker (Devoxx, TopConf, JavaDay,