Monday, January 25, 2010

Performance analysis is an important aspect of the application development process. It is typically done by a specialist whose main goal is to improve the code performance on a given platform. This problem becomes even more difficult when dealing with concurrent/multi-threaded applications running on multicore platforms. In such cases, one not only has to worry about code performance but also about the scalability of the code. Different types of code profiling tools are used to help with the overall performance analysis process.

With the introduction of the java.util.concurrent (JUC) package in Java 5, a new type of lock was introduced into the Java lanaguage. There are no tools available in either IBM or externally to profile JUC locks and provide detailed contention information like those provided by JLM for regular Java locks. Also, usage of the JUC package is becoming more and more popular as more application are either developed or fine tuned to run better on multicore systems. This absence of a JUC lock profiling tool is the motivation behind the development of our lock tool.

In order to capture the juc lock runtime data, several juc classes are instrumented offline, and replace original classes in JRE. Before jucprofiler is used in the first time, user has to run a command to generate PreInstrument.jar. This step can only be done once, if JRE is not changed. If users change to another JRE, users have to remove PreInstrument.jar, and re-run this command to generate PreInstrument.jar again.

We record the allocation of java.util.concurrent.locks.AbstractQueuedSynchronizer and java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject, and assign unique id to them. For the time usage on lock, we capture the time usage of invoking park(blocker) and parkNanos(blocker, nanos) in class java.util.concurrent.locks.LockSupport, when these two methods are invoked in different places,

Where, {tracefile} is the full path of trace file or directory contains trace files, such as BCIAgent.***.bci.trace. {resultOutputFile} is an optional option to set file to store the analysis results, if omitting this option, the analysis results will be printed in console.

As figure shown below, the plain text output includes kinds of information, such as lock name, lock contention count and time, lock hold time and count, lock allocation stack trace, duration and stack trace of each lock contention, etc. The result can help user find out the Juc lock contention bottleneck in program.

Before “LEGEND” section, the profiling result report first summarizes all juc lock contentions in program, descending sorted by lock contention count, then contention time. Each summary row item belongs to one of two different types, “AQS” for individual juc lock and “CHM” for ConcurrentHashMap. Since a ConcurrentHashMap is divided by several Segment(s) for elements storage and each Segment is protected by a different juc lock, a ConcurrentHashMap can be viewed as a composition of juc locks from lock perspective. E.g. “CHM@8” below has contention count 276 and contention time 39457000, means that total contention count of all segments locks in “CHM@8” is 276, total contention time of all segments locks in “CHM@8” is 39457000 nanoseconds. This locks grouping helps programmers to identity in which ConcurrentHashMap object the most serious juc lock contentions occur. On the other side, look at the individual juc lock “AQS@1790”, it doesn’t belong to any ConcurrentHashMap object and this lock is used explicitly in program. Note that because lock hold events are not enabled in the example trace, 0 is put in columns HOLD-COUNT and HOLD-TIME.

After “LEGEND” section, the profiling result reports details of each juc lock contention. As result snippet below, for ConcurrentHashMap “CHM@8”, lock contention happens in two segment locks “Lock [AQS@135]” and “Lock [AQS@146]”. For “Lock [AQS@135]”, it contends in one place, follows by contention count, contention time, and full stack back trace of the contention. So does “Lock [AQS@146]”. These details help programmers to locate the lock contention in program and clearly understand which segments of ConcurrentHashMap contend most.

There are some views developed in Eclipse to show jucprofiler trace file with tables and figures, which is called Visual Analzyer. Currently, there are two views for jucprofiler, one is “J.U.C statistics” view, and the other is “J.U.C synchornization view”.

“J.U.C statistics” view is shown as below. Two columns in right-most are “Contention Times” and “Contention Counts”. “Allocation Stack” column is about the allcation call site of JUC locks.

“J.U.C synchronization” view is shown as below. The first lane is Time, indicating when this lock contention occurs. The second lane is Thread, indicating which thread occurs lock contention. The third lane is Monitor, indicating which JUC lock is contented. The last lane is Method, indicating where does the lock contend.

During the runing, juc profiler will create a ControlServer listening on port 2009. User can use ControlClient to connect that port and control the behavior of juc profiler, e.g. the trace can be turned on and off on-the-fly,

A simple shell is provided, and user can type command juc.on and juc.off to turn on and off juc profiler. For example, java -cp BCIRuntime.jar com.ibm.msdk.bciruntime.control.ControlClient, ControlClient will connect to localhost, and open a shell to control juc profiler.

With the popularity of multicores, more and more concurrent/multi-threaded Java applications will be developed. We need better tools for profiling such concurrent applications. The jucprofiler described in this article fulfills one of the key gaps in Java profiling tools.