Last June, I saw an interesting conference talk at J-Spring given by Martijn Verburg (from jClarity) about the Performance Diagnostic Methodology (PDM), a structured approach in order to find the root cause of Java performance problems. In this post, I will try to highlight the key concepts but I do recommend watching a recording of the talk from Devoxx UK. In the next part of this post, we will try to apply the theory to some problem applications.

What Is the PDM?

As already written in the introduction, the Performance Diagnostic Methodology (PDM) is a structured approach in order to find the root cause of Java performance problems. When performance issues occur, often people start panicking and start tuning the JVM without exactly knowing whether they are solving the cause of the performance issue. Therefore, a structured approach can exclude some possible causes and point you in the right direction in order to solve the issue appropriately. The approach is visualized with the next scheme (this scheme is recreated from the original with the permission of Martijn Verburg and Kirk Pepperdine from jClarity).

In the next sections, we will traverse through the scheme and highlight which tools can help you with analyzing the performance issue.

Some Prerequisites

Before we actually dive into the scheme, there are three things that you need to know about your infrastructure and application. If you look at your own application and you don’t have these things in place, it is time to take action.

You must know what your actual resources are. This means that you need to know the specifications of the hardware your application is running on and where it is running. This might be a trivial thing when you host the hardware yourself, but it can be a challenge to know this when your application is running in a cloud environment.

Ensure that you have logical and physical architecture diagrams of your application. You also must know the data flow of your application. If a user reports problems with a certain functionality, then this will make it easier to pinpoint where the problem occurs in your application.

Have a measurement at each entry and exit point into your architecture. This will also help you in pinpointing the location of the problem. When you have measurements, you are able to verify whether, for example, the time for consuming a request is increasing in time. You will be able to measure where a possible bottleneck occurs in your application.

It is obvious that all of the above has to be in place before you run into problems. There will be no time (or it will cost you a lot of time) if you need to rake up information about your hardware or the architecture of your application — time you will need to solve the problem. Your users, customers, and a bunch of managers will probably be putting a lot of pressure on you to fix the problem and they won’t be happy when you first need to document the architecture of your application at that moment.

Kernel Dominant, User Dominant, or No Dominator

Now back to the scheme — we can distinguish three sections: kernel dominant, user dominant, and no dominator. The first step is to know what your CPU is doing. You can use the Linux tool vmstat to know which section you need to start searching. Run the vmstat command with e.g. parameter 5, which will print the output of the vmstat command every five seconds (a good explanation of the other columns can be found here). Below is an example of the output on a system running a simple Java Spring Boot application:

The interesting part in our case is the last section containing the details of the CPU usage and more specifically the following two columns:

us: percentage of user CPU time

sy: percentage of system CPU time

Thus, when the system CPU time exceeds 10 percent, then our problem seems to be kernel dominant. When the user CPU time reaches nearly 100 percent, then our problem seems to be user dominant. When both the system and user CPU are very low and your users are complaining about performance, then it probably will be either a deadlock or your application is waiting for a response from an external interface.

CPU > 10 Percent Is Kernel

When the outcome of vmstat is that the problem might be kernel dominant, then the cause can be one of the following reasons:

Context switching: two or more applications constantly switching and ‘fighting’ for CPU time

In either way, it is advisable to look at these things together with a sysadmin who is probably already acquainted with these tooling.

CPU User Is Approaching 100 Percent

When the outcome of vmstat is that the problem might be user dominant, then it is time to take a look at the JVM and more specifically to the Garbage Collector (GC). The first thing to do is to turn on the GC logging. There are free tools available that give you a graphical view of the GC logs. One of these tools is GCEasy. On their website, it is explained how to turn on the GC logging. For Java 1.4 up to 1.8, you have to pass the following arguments to the JVM:

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file-path>

For Java 9 and higher, you pass the following arguments:

-Xlog:gc*:file=<file-path>

<file-path> is the path to where the GC logging will be written. When you have some GC logging, you can upload it to their website, which, in turn, will provide a report with all kinds of graphs that will help you to interpret the details. Below are some guidelines for interpreting the results:

A full GC will block your application

When you notice that the heap is increasing up to its top, then you probably need to increase the heap or increase the amount of memory available on your machine

When you notice a lot of full GC’s without freeing heap, then there is probably a memory leak

A normal running application will show a sawtooth in the heap consumption graph.

How to Detect a Memory Leak?

When we have excluded that the problem is situated in the JVM or GC, then we probably have a memory leak in our application. In that case, we need to use a memory profiler. A free profiler to use is VisualVM. VisualVM used to be part of the JDK, but now (Java 11) has been moved to the GraalVM. It is, however, separately downloadable from GitHub. In the screenshot below, you can see an example of a memory leak. MyMemoryLeakobjects are created but cannot be garbage collected. This can also be seen from the number in the Generationscolumn. These objects have survived 56 garbage collections.

It might be possible that standard Java objects are increasingly created. In that situation, you have to drill down until you find an object which corresponds to your application. This way, you can determine where the memory leak occurs.

No Dominator

When the CPU usage for kernel and user is low and your users are complaining about performance issues, then probably threads are waiting a long time (e.g. for an external system) or are locked. This kind of behavior can also be analyzed with VisualVM. In the screenshot below, we can clearly see that a deadlock occurs between the two threads MyTaskExecutor-1 and MyTaskExecutor-2.

The important thing to notice is that no profiler is 100 percent accurate.

Optimize Your Code

If you think that you need to optimize your code, then you have to think twice. The JIT compiler already will optimize your code, probably even better than you will do. And besides that, optimizing your code probably means less readable and maintainable code. Tools that can help you in optimizing your code are Jitwatch and Java Microbench Harness.

Summary

In this post, we have described how the Performance Diagnostic Methodology works and how it can be used. Of course, this is a theoretical description. The next step is to wait for a real-life performance problem and to apply the theory in practice. Instead of waiting for a real-life problem, we will create some problem applications in the next part of this post and verify whether we can apply the PDM in order to find the root cause of the problems. Stay tuned!