Blog about software development on JVM and related technologies

Main menu

Tag Archives: JVM

In the previous blog post we measured the effect of simplest JVM JIT optimisation technique of method inlining. The code example was a bit unnatural as it was super simple Scala code just for demonstration purposes of method inlining. In this post I would like to share a general approach I am using when I want to check how JIT treats my code or if there is some possibility to improve the code performance in regards to JIT. Even the method inlining requires the code to meet certain criteria as bytecode length of inlined methods etc. For this purpose I am regularly using great OpenJDK project called JITWatch which comes with bunch of handy tools in regard to JIT. I am pretty sure that there is probably more tools and I will be more than happy if you can share your approaches when dealing with JIT in the comment section bellow the article.

Java HotSpot is able to produce a very detailed log of what the JIT compiler is exactly doing and why. Unfortunately the resulting log is very complex and difficult to read. Reading this log would require an understanding of the techniques and theory that underline JIT compilation. Free tool like JITWatch process those logs and abstract this complexity away form the user.

those settings will produce log file hotspot_pidXXXXX.log. For purpose of this article I re-used code from previous blog located on my GitHub account with JVM flags enabled in build.sbt.

In order to look into generated machine code in JITWatch we need to install HotSpot Disassembler (HSDIS) to install it to $JAVA_HOME/jre/lib/server/. For Mac OS X that can be used from here and try renaming it to hsdis-amd64-dylib. In order to include machine code into generated JIT log we need to add JVM flag -XX:+PrintAssembly.

The most interesting from our perspective is TriView where we can see source code, JVM bytecode and native code. For this particular example we disabled method inlining via JVM Flag “-XX:CompileCommand=dontinline, com/jaksky/jvm/tests/jit/IncWhile.inc“

To just compare with the case when the method body of IncWhile.inc is inlined – native code size is grater 216 compared to 168 with the same bytecode size.

Compile Chain provides also a great view on what is happening with the code

Inlining report provides a great overview what is happening with the code

As it can be seen the effect of tiered compilation as described in JIT optimisation starts with client C1 JIT optimisation and then switches to server C2 optimisation. The same or even better view can be found on Compiler Thread activity which provides a timeline view. To refresh memory check overview of JVM threads. Note: standard java code is subject to JIT optimizations too that’s why so many compilation activity here.

JITWatch is really awesome tool and provides many others views which doesn’t make sense to screenshot all e.g. cache code allocation, nmethodes etc. For detail information I really suggest reading JITWatch wiki pages. Now the question is how to write JIT friendly code? Here pure jewel of JITWatch comes in: Suggestion Tool. That is why I like JITWatch so much. For demonstration I selected somewhat more complex problem – N Queens problem.

Suggestion tool clearly describes why certain compilations failed and what was the exact reason. It is coincidence that in this example we hit again just inlining as there is definitely more going on in JIT but this window provides clear view into how we can possibly help JIT.

Another great tool which is also a part of JITWatch is JarScan Tool. This utility will scan a list of jars and count bytecode size of every method and constructor. The purpose of this utility is to highlight the methods that are bigger than HotSpot threshold for inlining hot methods (default 35 bytes) so it provides hints where to focus benchmarking to see whether decomposing code into smaller methods brings some performance gain. Hotness of the method is determined by set of heuristics including call frequency etc. But what can eliminate the method from inlining is its size. For sure just the method size it too big breaching some limit for inlining doesn’t automatically mean that method is performance bottleneck. JarScan tool is static analysis tool which has no knowledge of runtime statistics hence real method hotness.

To wrap up, JITWatch is a great tool which provides insight into HotSpot JIT compilations happening during program execution and it can help you to understand how decision made at the source code level can affect the performance of the program.

Like this:

Previous article structure of JVM – java memory model briefly mentions bytecode executions modes and article JVM internal threads provides additional insight into internal architecture of JVM execution. In this article we focus on Just In Time compilation and on some of its basic optimisation techniques. We also discuss performance impact of one optimisation technique namely method inlining. In the reminder of this article we focus solely on HotSpot JVM however principles are valid in general.

HotSpot JVM is a mixed-mode VM which means that it starts off interpreting the bytecode, but it can compile code into very highly optimised native machine code for faster execution. This optimised code runs extremely fast and performance can be compared with C/C++ code. JIT compilation happens on method basis during runtime after the method has been run a number of times and considered as a hot method. The compilation into machine code happens on a separate JVM thread and will not interrupt the execution of the program. While the compiler thread is compiling a hot method JVM keeps on using the interpreted version of the method until the compiled version is ready. Thanks to code runtime characteristics HotSpot JVM can make sophisticated decision about how to optimise the code.

Java HotSpot VM is capable of running in two separate modes (C1 and C2) and each mode has a different situation in which it is usually preferred:

C1 (-client) – used for application where quick startup and solid optimization are needed, typically GUI application are good candidates.

C2 (-server) – for long running server application

Those two compiler modes use different techniques for JIT compilation so it is possible to get for the same method very different machine code. Modern java application can take advantage of both compilation modes and starting from Java SE 7 feature called tiered compilation is available. Application starts with C2 compilation which enables fast startup and once the application is warmed up compiler C2 takes over. Since Java SE 8 tiered compilation is a default. Server optimisation are more aggressive based on assumptions which may not always hold. These optimizations are always protected with guard condition to check whether the assumption is correct. If an assumption is not valid JVM reverts the optimisation and drops back to interpreted mode. In server mode HotSpot VM runs a method in interpreted mode 10 000 times before compiling it (can be adjusted via -XX:CompileThreshold=5000). Changing this threshold should be considered thoroughly as HotSpot VM works best when it can accumulate enough statistics in order to make intelligent decision what to compile. If you wanna inspect what is compiled use -XX:PrintCompilation.

Among most common JIT compilation techniques used by HotSpot VM is method inlining, which is practice of substituting the body of a method into the places where the method is called. This technique saves the cost of calling the method. In the HotSpot there is a limit on method size which can be substituted. Next technique commonly used is monomorphic dispatch which relies on a fact that there are paths through method code which belongs to one reference type most of the time and other paths that belong to other type. So the exact method definitions are known without checking thanks to this observation and the overhead of virtual method lookup can be eliminated. JIT compiler can emit optimised machine code which is faster. There is many other optimisation techniques as loop optimisation, dead code elimination, intrinsics and others.

The performance gain by inlining optimisation can be demonstrated on simple Scala code:

Where method inc is eligible for inlining as the method body is smaller than 35 bytes of JVM bytecode (actual size of inc method is 9 bytes). Inlining optimisation can be verified by looking into JIT optimised machine code.

Difference is obvious when compared to machine code when inlining is disabled use –XX:CompileCommand=dontinline,com/jaksky/jvm/tests/jit/IncWhile.inc

Difference in runtime characteristics is also significant as the benchmark results show. With disabled inlining:

When inlining enabled JVM JIT also capable to use next optimizations like loop optimizations which might case that our whole loop is eliminated as it is easily predictable. We would get time around 3 ns which is for 1GHz processor unreal to perform billion of operations. To disable most of loop optimizations use -XX:LoopOptsCount=0 JVM option.

so the performance gain by inlining a method body can be quite significant 2 seconds vs 300 miliseconds.

In this post we discussed mechanics of Java JIT compilation and some optimisation techniques used. We particularly focused on the one of the simplest optimisation technique called method inlining. We demonstrated performance gain brought by eliminating a method call represented by invokevirtual bytecode instruction. Scala also offers a special annotation @inline which should help us with performance aspects of the code under the development. All the code for running the experiments is available online on my GitHub account.

Like this:

In the Structure of Java Virtual Machine we scratched the surface of class file structure, how it is connected to java memory model via class loading process. Also we briefly discussed bytecode structure and its execution including short introduction to Just In Time runtime optimisation. In this post we will look more at internals of execution engine, however there there is no ambition to substitute a detailed VM implementation documentation for HotSpot JVM but just provide enough details to gain bigger picture.

Basic Threading model in HotSpot JVM is a one to one mapping between Java threads (an instance of java.lang.Thread) and native operating system threads. The native thread is created when the Java thread is started and reclaimed once it terminates. The operating system is responsible for scheduling all threads and dispatching to any available CPU. The relationship between java threads priorities and operating system thread priorities varies across operating systems.

HotSpot provides monitors by which threads running application code may participate in mutual exclusion (mutex) protocol. Monitor is either locked or unlocked. Only one thread may own the lock at any time. Only after acquiring ownership of the monitor thread may enter the critical section protected by this monitor. Critical sections are referred as synchronised blocks delineated by synchronised keyword.

Apart from application threads JVM contains also internal threads which can be categorised to following groups:

VM thread spends its time waiting for requested operations to appear in the operation queue (VMOperationQueue). Operation are typically passed to VM Thread because they require the VM to reach safepoint before they can be executed. When the VM is at safepoint all threads inside the VM have been blocked and any threads executing in native code are prevented from returning to the VM while the safepoint is in progress. This means that VM operation can be executed knowing that no thread can be in the middle of modifying heap. All threads are in a state such that their Java stacks are unchanging and can be examined.

Most familiar VM operation is related to garbage collection, particularly stop-the-world phase of garbage collection that is common to many garbage collocational algorithms. Other VM operation are: thread stacks dumps, thread suspension or stopping, inspection or modification via JVMTI etc. VM operation can by synchronous or asynchronous.

Safepoints are initiated using cooperative pooling based mechanism. Thread asks: “Should I block for a safepoint?” Moment when this is happening often is during thread state transition. Threads executing interpreted code don’t usually ask the question, instead when safepoint is requested interpreter switches to different dispatch table which includes that question. When safepoint is over dispatched table is switched back. Once safepoint has been requested VM Thread must wait until all threads are known to be in safepoint safe state before proceeding with operation. During safepoint thread lock is used to block any threads that were running and releasing lock when operation completed.

Like this:

Java based applications run in Java Runtime Environment (JRE) which consists of set of Java APIs and Java Virtual Machine (JVM). The JVM loads an application via class loaders and run it by execution engine.

JVM runs on all kind of hardware where executes a java bytecode without changing java execution code. VM implements a Write Once Run Everywhere principle or so-called platform independence principle. Just to sum up JVM key design principles:

Platform independence

Clearly defined primitive data types – Languages like C or C++ have size of primitive data types dependent on the platform. Java is unified in that matter.

Java uses bytecode as an intermediate representation between source code and machine code which runs on hardware. Bytecode instruction are represented as 1 byte numbers e.g. getfield 0xb4, invokevirtual 0xb6 hence there is maximum of 256 instructions. If the instruction doesn’t need operand so next instruction immediately follows otherwise operands follows instruction according to instruction set specification. Those instructions are contained in class files produced by java compilation. Exact structure of class file is defined in “Java Virtual Machine specification” section 4 – class file format. After some version information there are sections like: constant pools, access flags, fields info, this and super info, methods info etc. See the spec for the details.

A class loader loads compiled java bytecode to the Runtime Data Areas and the execution engine executes the java bytecode. Class is loaded when is used for the first time in the JVM. Class loading works in dynamic fashion on parent child (hierarchical) delegation principle. Class unloading in not allowed. Some time ago I wrote article about class loading on application server. Detail mechanics of class loading is out of scope for this article.

Runtime data areas are used during execution of the program. Some of these areas are created when JVM starts and destroyed when the JVM exits. Other data areas are per-thread – created on thread creation and destroyed on thread exit. Following picture is based mainly on JVM 8 internals (doesn’t include segmented code cache and dynamic linking of language introduced by JVM 9).

Program counter – exist for one thread and has address of current executed instruction unless it is native then the PC is undefined. PC is in fact pointing to at a memory address in the Method Area.

Stack – JVM stack exists for one thread and holds one Frame for each method executing on that thread. It is LIFO data structure. Each stack frame has reference to for local variable array, operand stack, runtime constant pool of a class where the code being executed.

Native Stack – not supported by all JVMs. If JVM is implemented using C-linkage model for JNI than stack will be C stack(order of arguments and return will be identical in the native stack typical to C program). Native methods can call back into the JVM and invoke java methods.

Stack Frame – stores references that points to the objects or arrays on the heap.

Local variable array – all the variables used during execution of the method, all method parameters and locally defined variables.

Operand stack – used during execution of the bytecode instruction. Most of the bytecode is manipulating operand stack moving from local variables array.

Heap – area shared by all threads used to allocate class instances and arrays in runtime. Heap is the subject of Garbage Collection as a way of automatic memory management used by java. This space is most often mentioned in JVM performance tuning.

Non-Heap memory areas

Method area – shared by all threads. It stores runtime constant pool information, field and method information, static variable, method bytecode for each classes loaded by the JVM. Details of this area depends on JVM implementation.

Runtime constant pool – this area corresponds constant pool table in the class file format. It contains all references for methods and fields. When a method or field is referred to JVM searches the actual address of the method or field in the memory by using constant pool

Method and constructor code

Code cache – used for compilation and storage of methods compiled to native code by JIT compilation

…

Bytecode assigned to runtime data areas in the JVM via class loader is executed by execution engine. Engine reads bytecode in the unit of instruction. Execution engine must change the bytecode to the language that can be executed by the machine. This can happen in one of two ways:

JIT (Just In Time) compiler – compensate disadvantage of interpretation. Start executing the code in interpreted mode and JIT compiler compile the entire bytecode to native code. Execution is than switched from interpretation to execution of native code which is much faster. Native code is stored in the cache. Compilation to native code takes time so JVM uses various metrics to decide whether to JIT compile the bytecode.

How the JVM execution engine runs is not defined by JVM specification so venders are free to improve their JVM engines by various techniques.

Like this:

Time to time it might happen that you need to know which version the class files were compiled for. Or to be more specific what target were specified while running javac compiler. As target specifies VM version the classes were generated for. This can be specified in maven as follows: