11.01.2012

One of the
great things about JavaOne annual conferences is the presentation of several technical
and troubleshooting labs presented by subject matter experts. One of these labs
did especially capture my attention this year: “HOL6500 - Finding And Solving Java Deadlocks”, presented by Java Champion Heinz Kabutz. This is one of the
best presentations I have seen on this subject. I recommend that you download, run
and study the labs yourself.

This
article will revisit this classic thread problem and summarize the key
troubleshooting and resolution techniques presented. I will also expand the
subject based on my own multi-threading troubleshooting experience.

Java deadlock: what is it?

A true
Java deadlock can essentially be described as a situation where two or more
threads are blocked forever, waiting for each other. This situation is very
different from other more commons “day-to-day” thread problem patterns such as
lock contention & thread races, threads waiting on blocking IO calls etc. Such lock-ordering
deadlock situation can be visualized as per below:

In the
above visual example, the attempt by Thread A & Thread B to acquire 2 locks
in different orders is fatal. Once threads reached the deadlocked state, they
can never recover, forcing you to restart the affected JVM process.

Heinz also
describes another type of deadlock: resource
deadlock. This is by far the most common thread problem pattern I have seen
in my experience with Java EE enterprise system troubleshooting. A resource
deadlock is essentially a scenario where one or multiple threads are waiting to
acquire a resource which will never be available such as JDBC Pool depletions.

Lock-ordering deadlocks

You should
know by now that I am a big fan of JVM thread dump analysis; crucial skill to
acquire for individuals either involved in Java/Java EE development or
production support. The good news is that Java-level deadlocks can be easily identified
out-of-the-box by most JVM thread dump formats (HotSpot, IBM VM…) since they contain
a native deadlock detection mechanism which will actually show you the threads
involved in a true Java-level deadlock scenario along with the execution stack
trace. JVM thread dump can be captured via the tool of your choice such as
JVisualVM, jstack or natively such as kill -3 <PID> on Unix-based OS. Find below the JVM Java-level deadlock detection section after running lab 1:

Now this
is the easy part…The core of the root cause analysis effort is to understand
why such threads are involved in a deadlock situation at the first place. Lock-ordering
deadlocks could be triggered from your application code but unless you are
involved in high concurrency programming, chances are that the culprit code is
a third part API or framework that you are using or the actual Java EE container
itself, when applicable.

Essentially involves the definition of a
global ordering for the locks that would always prevent deadlock (please
see lab1 solution)

# Deadlock resolution by TryLock
(see lab2 solution)

Lock the first lock

Then try to lock the second lock

If you can lock it, you are good to go

If you cannot, wait and try again

The above
strategy can be implemented using Java Lock & ReantrantLock which also gives
you also flexibility to setup a wait timeout in order to prevent thread
starvation in the event the first lock is acquired for too long.

If you
look at the JBoss AS7 implementation, you will notice that Lock & ReantrantLock
are widely used from core implementation layers such as:

Deployment service

EJB3 implementation (widely used)

Clustering and session management

Internal cache & data structures (LRU, ConcurrentReferenceHashMap…)

Now and as
per Heinz’s point, the deadlock resolution strategy #2 can be quite efficient but proper
care is also required such as releasing all held lock via a finally{} block otherwise
you can transform your deadlock scenario into a livelock.

Resource deadlocks

Now let’s
move to resource deadlock scenarios. I’m glad that Heinz's lab #3 covered
this since from my experience this is by far the most common “deadlock”
scenario that you will see, especially if you are developing and supporting
large distributed Java EE production systems.

Now let’s get
the facts right.

Resource deadlocks are not true Java-level
deadlocks

The JVM Thread Dump will not magically should
you these types of deadlocks. This means more work for you to analyze and
understand this problem as a starting point.

Thread dump analysis can be especially confusing
when you are just starting to learn how to read Thread Dump since threads
will often show up as RUNNING state vs. BLOCKED state for Java-level
deadlocks. For now, it is important to keep in mind that thread state is
not that important for this type of problem e.g. RUNNING state != healthy
state.

The analysis approach is very different than
Java-level deadlocks. You must take multiple thread dump snapshots and
identify thread problem/wait patterns between each snapshot. You will be
able to see threads not moving e.g. threads waiting to acquire a resource
from a pool and other threads that already acquired such resource and hanging…

Thread Dump analysis is not the only data point/fact
important here. You will need to collect other facts such statistics on
the resource(s) the threads are waiting for, overall middleware or
environment health etc. The combination of all these facts will allow you
to conclude on the root cause along with a resolution strategy which may
or may not involve code change.

I hope you
had the chance to review, run and enjoy the labs from Heinz's presentation as
much as I did. Concurrency programming and troubleshooting can be quite
challenging but I still recommend that you spend some time trying to understand
some of these principles since I’m confident you will face a situation in the
near future that will force you to perform this deep dive and acquire those
skills.