2.10.2015

I would like
to share with my fellow readers that DZone
has published a great 2015 guide about Continuous Delivery, which is a core
principle and goal of the DevOps methodology.

If you are
part of an organization about to implement DevOps principles, emerging tools or
simply wish to improve your knowledge and awareness on Continuous Delivery, I
highly recommend that you download your own copy today, it is FREE!

8.13.2014

This post
is to inform you that I published recently a recent refcard on Java Performance
Optimization which is now available from DZone.
Refcardz are basically “Cheat Sheets” which can be very useful for Developers
and Application Support individuals.

“It is possible to define “optimal
performance” in different ways, but the basic elements are: the ability of a
Java program to perform its computing tasks within the business response time
requirements, and the ability of an application to fulfill its business
functions under high volume, in a timely manner, with high reliability and low
latency. Sometimes the numbers themselves become patternized: for some major
websites, a page response time of 500ms maximum per user function is considered
optimal. This Refcard will include target numbers when appropriate, but in most
cases you will need to decide these on your own, based on business requirements
and existing performance benchmarks…”

3.18.2014

This post is to inform you that Java 8 has been officially released by Oracle. You can download the official release here.

In a nutshell:

Lambda expressions: a new language feature that enables you to treat functionality as a method argument, or code as data.

Other significant enhancements and changes to the Java language and standard libraries including default methods, the new java.util.stream package, and the Date-Time API.

Compact Profiles contain predefined subsets of the Java SE platform and enable applications that do not require the entire Platform to be deployed and run on small devices.

Security enhancements include updates to the Java Cryptography Architecture; limited doPrivileged, a mechanism that enables code to assert a subset of its privileges; SSL/TLS Server Name Indication (SNI) Extension; and keystore enhancements.

JavaFX documentation has been updated for this release.

A new JavaScript engine, Nashorn, is included in JDK 8.

Java Mission Control 5.3 is included in JDK 8.

Decommission of the JVM Permanent Generation space and replaced by the Metaspace.

I will publish more articles in the upcoming weeks, including tutorials on Java Mission Control 5.3 which is now part of the JDK 8.

In the meantime, I suggest that you revisit my original article on the replacement of the HotSpot PermGen space by the Metaspace. I will publish the part 2 in the upcoming weeks.

3.13.2014

Most Java developers
are familiar with the infamous and very common java.lang.ClassNotFoundException.
While the sources of this problem are generally well understood (missing classes/libraries
in classpath, class loader delegation problems etc.), the impact to the overall
JVM and performance is often unknown. This situation can lead to significant
impacts to your application response time and scalability.

Large Java
EE enterprise systems with multiple applications deployed are the most exposed
to this type of problem due to the large number of different application
classloaders active at runtime. This increases the risk of facing “undetected”
ClassNotFoundException unless a clear business impact is identified and close
log monitoring is implemented, results: on-going performance impact along with possible
JVM class loading IO and Thread lock contention.

The
following article and sample programs will demonstrate that any occurrence of
ClassNotFoundException found from your client production systems should be
taken very seriously and addressed promptly.

Java class loading: the missing link for
optimal performance

Proper
understanding of this performance problem starts with proper knowledge of the
Java class loading model. ClassNotFoundException essentially means a failure of
the JVM to locate and/or load a particular Java class such as:

The Class.forName() method

The ClassLoader.findSystemClass() method

The ClassLoader.loadClass() method

While
class loading of your application Java classes should occur only once in the
JVM life-cycle (or through dynamic re-deploy functions), some applications also
rely on dynamic class loading operations.

Regardless,
repetitive valid and “failing” class loading operations can be quite intrusive,
especially when the loading process is attempted by the default JDK
java.lang.ClassLoader itself. Indeed, the default JDK 1.7+ behaviour, due to backward
compatibility, will allow only one class to be loaded at the time, unless the
class loader is tagged as “parallel capable”. Please keep in mind the even if
the synchronization is done at the Class level only, a repetitive class loading
failure against the same class name is still going to trigger thread lock
contention depending of the Java thread concurrency level you are dealing with.
The situation was much worst back with JDK 1.6 with synchronization done systematically
at the class loader instance level.

For this
reason, Java EE containers such as JBoss WildFly 8
are using their own internal concurrent
class loaders for the loading of your application classes. These class loaders implements
locking at a finer level of granularity, thus allowing loading different
classes from the same instance of the class loader concurrently. This is also
aligned with the latest JDK 1.7+ improvement introducing the support for Multithreaded
Custom Class Loaders which also help prevent certain class loader deadlock
scenarios.

That being
said, class loading of system level classes such as java.* and Java EE
container modules still fall back on the default JDK ClassLoader. This means
that repetitive class loading failure for the same class name e.g.
ClassNotFoundException can still trigger severe thread lock contention. This is
precisely what we will replicate and demonstrate for the rest of the article.

Thread lock contention - problem replication

In order
to recreate and simulate this problem, we created a simple application as per
below specifications:

A JAX-RS (REST) Web
Service executing Class.forName() against a dummy Class name “located”
from the system package level:

The simulation
essentially executes concurrently with 20
threads the JAX-RS Web service. Each invocation generates a ClassNotFoundException.
The logging was fully disabled in order to reduce impact on IO and focus on the
class loading contention only.

Now let’s look at the results from JVisualVM
from our 30-60 seconds run. We can clearly see a lot of BLOCKED threads waiting
to acquire a lock on an Object monitor.

Analysis of
the JVM thread dump is clearly exposing the problem: Thread lock contention. We
can see from the execution stack trace that JBoss delegates the loading of the
class to the JDK ClassLoader…why? This is because our wrong Java class name is
detected to be part of the system class path e.g. java.*. In this situation,
JBoss will delegate the loading to the system class loader, triggering
systematic synchronization for that particular class name and waiters from the
other threads waiting to acquire a lock to load the same Class name.

at
org.jboss.as.server.deployment.module.VFSResourceLoader$2.run(VFSResourceLoader.java:127)

at
org.jboss.as.server.deployment.module.VFSResourceLoader$2.run(VFSResourceLoader.java:124)

at
java.security.AccessController.doPrivileged(Native Method)

at
org.jboss.as.server.deployment.module.VFSResourceLoader.getClassSpec(VFSResourceLoader.java:124)

at
org.jboss.modules.ModuleClassLoader.loadClassLocal(ModuleClassLoader.java:252)

at
org.jboss.modules.ModuleClassLoader$1.loadClassLocal(ModuleClassLoader.java:76)

at org.jboss.modules.Module.loadModuleClass(Module.java:526)

at
org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:189) // JBoss now
fully responsible to load the class

at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:444)
// Unchecked since using JDK 1.7 e.g. tagged as “safe”
JDK

at
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:432)

at
org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:374)

at
org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119)

at
java.lang.Class.forName0(Native Method)

at
java.lang.Class.forName(Class.java:186)

at
org.jboss.tools.examples.rest.MemberResourceRESTService.AppCLFailure(MemberResourceRESTService.java:196)

at
org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.AppCLFailure(Unknown
Source)

at
sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)

……………….

The above
execution stack trace is quite revealing:

Since the Java class name was not detected to be
part of the Java system package, no ClassLoader delegation was performed,
thus no synchronization.

Since JBoss considers JDK 1.7+ as a “safe” JDK, the ConcurrentClassLoader
.performLoadClassUnchecked() method was used, not triggering any
Object monitor lock.

No synchronization means no Thread lock contention
triggered as a result of the non-stop ClassNotFoundException error.

It is
still important to note that while JBoss is doing a great job at preventing
Thread lock contention in this scenario, the repetitive class
loading attempt will still degrade the performance to a certain degree due to
IO overhead associated with excessive JAR file search operations, again
re-enforcing the need to take immediate corrective actions.

Final words

I hope
that you enjoyed this article and have nowa better understanding of potential
performance impacts due to excessive class loading operations. While JDK 1.7 and
modern Java EE containers brought great improvements on class loader related issues
such as deadlocks and thread lock contention, potential problematic scenarios
still remain. For this reason, I highly recommend that you closely monitor your
application behaviour, logs and ensure that class loader related errors such as java.lang.ClassNotFoundException and java.lang.NoClassDefFoundError are aggressively corrected.

1.27.2014

As IT architects
and consultants, it is crucial to ensure a highly performing, stable and scalable
production environment for you client applications. Proper monitoring along
with a comprehensive capacity planning model is crucial in order to achieve
such goals.

While most
performance data can be extracted at runtime through various OS commands and
open source tools, large and complex IT production environments often requires the
deployment of commercial grade APM solutions for an efficient management of the
operations.

Before
providing recommendations to your client and purchase the first APM product you
find on the Web, it is important to properly review the different APM solutions
and compare their product capabilities vs. your client operation needs and
budget.

A
trusted source of information I have been following since a few years now is
the Gartner
Magic Quadrant report for APM. Gartner’s APM yearly report is essentially
an extensive research of every vendor that meets their qualifications. The
process also includes customer interviews. The final report places each vendor
on two axes, Completeness of Vision and Ability to Execute, and then places
them accordingly on a Quadrant. If not done already, I recommend that you
obtain your own copy of the 2013 report.

I will
publish more articles this year and share my recent experience with top APM
solutions such as New Relic, AppDynamics, Compuware DynaTrace and Dell
Foglight.

Please do
not hesitate to share your experience along with your constructive comments on
APM solutions.

11.14.2013

As you may
have seen from our previous performance oriented articles, a healthy
JVM is one of the most important goals to achieve for optimal application
performance and stability. Such health assessment is very often only focusing
on the frequency (avoidance) of major collections or detecting the presence of memory
leaks. What about the sizing and footprint of the Young Generation space or
short lived objects?

This
article is based on a true story and a recent troubleshooting experience we had
with one of our IT clients. It will demonstrate, through a sample application,
that an excessive memory footprint and frequency of minor collections can
trigger performance problems as severe as its bigger brother, the Old Generation
or tenured space.

JVM health diagnostic

If you are
new to the JVM tuning world, you will soon realize that there are no universal solutions
applicable for all applications. Regardless of the high quality material that
you may find from the Web from various sources, you still need to do your due
diligence and properly understand the type of application that you are dealing
with, including its sensitivity to JVM GC pauses (some application requires
very low JVM pause time < 1%).

Java
profiling (including memory leak detection) along with performance & load
testing are good examples of extra work that you will need to perform in order to
gather all proper data and facts about your application memory footprint and
JVM runtime health.

That being
said, what do we mean by a “healthy” JVM? Please answer the following questions
on the best of your knowledge.

Did you observe over the last 3 months occurrences
of JVM crashes (sudden JVM failure with core dump & crash report)?

Do you believe that your JVM is currently unstable and/or
requires too much human intervention (regular restart etc.)?

If you
answered YES or I DON’T KNOW to any of those, this means that you or your production
performance tuning team have some work to do here, including a review of the
current JVM GC policy.

If you
answered NO to all of those with high confidence level, it means that you have likely
achieved a solid application and JVM stability, congratulation. I still recommend
that you re-assess the situation in between major releases and incremental load
forecasts.

Young Generation: Stop the world, really?

As we saw
from the quick JVM health assessment exercise, one of the points refers to the
JVM overall pause time. This essentially means how much time the JVM is spending
during the “stop the world” events. During such periods, application threads
are suspended and not performing any work, increasing response time of your
application. This metric is crucial since large JVM pauses will trigger
unstable and unpredictable response times.

One common
misperception that I have seen over the last few years is that YoungGen or
minor collections are fully transparent and not effecting the application
response time. This statement could almost
be true if your Java heap size is small (YG space < 1 GB) and dealing with
moderate short lived objects footprint or allocation rate. In this scenario, if
the minor collections are executed very fast (< 20 ms) and not too
frequently (every 30 seconds+), the overall JVM pause time contributed by the
YoungGen space will remain small (<< 1%). However, the situation can
change very quickly if the YG memory allocation rate increases (increased
footprint per request, traffic surge etc.).

I
recommend the following articles for more details about the YoungGen space and
concurrent collectors available for the HotSpot JVM.

Regardless
of the HotSpot GC policy that you are using, including the mostly concurrent collectors
such as CMS or G1, the YoungGen space collection remains a “stop the world”
event. To our knowledge, Azul Zing C4
is the only JVM collector advertised as a true continuously concurrent compacting
collector. We did not have a chance to experiment with this collector at this
point. I welcome anyone with C4 tuning experience to share their observations,
especially true fact figures vs. mostly concurrent collectors such as G1.

Now that
we covered some of the theory, let’s deep dive into our sample application and
review the performance testing results against various YoungGen footprint and allocation
rates.

Sample application specifications

In order
to compare the responsiveness and JVM pause time % between various YG
allocation rates, we created a sample application as per below specifications:

A JAX-RS (REST) Web Service was created and exposed
via the jvm
URI
as per below attributes.

In addition,
an initial memory footprint of 1 GB of long-lived objects (not eligible for GC)
is allocated at class loading time in order to create some noise for the CMS collector.

The YG
short lived objects memory allocation and OldGen static retention was simply
achieved through the creation of a static array of primitive byte values as per
below code snippet. The true memory footprint can be observed as per the JVM
heap dump analysis using MAT.

The
following performance testing simulated a real life application that was
dealing with high JVM pause time and severe degradation under peak load. 3 runs
were executed, one for the baseline, and 2 runs after simulating improvements (reduction)
of the application memory footprint per request.

Baseline

10 concurrent threads

100 MB of short lived objects created per execution
per JVM process

The short
lived objects memory footprint may look extreme but this is indeed what we were
dealing with initially.

Results

Average response time: 140 ms

Throughput: 68 req / sec

JVM overall pause time: 25.8%

YG collection frequency: 7 collections per second

Rate of GC: 308 909 MB per minute

As per
JVisualVM, it looks like the JVM is healthy (no memory leak, stable & low OldGen
etc.). However, when you deep dive further in the verbose:gc logs, you then
realize that the overall JVM pause time is 25.8%, and all due to excessive frequency
of YG collections. This is strongly proving the point to properly analyze the
verbose:gc logs vs. only focusing on the JVM tenured space trends.

Testing & tuning #1

10 concurrent threads

50 MB of short lived objects created per execution per
JVM process

This run
simulates an initial improvement of the application footprint and memory
allocation rate from 100 MB to 50 MB per allocation. We can clearly see an
improvement to all figures, especially the throughput by simply reducing the application
memory footprint per request.

Results

Average response time: 119 ms -21

Throughput: 79 req / sec +11

JVM overall pause time: 15.59% -10.21

YG collection frequency: 3-4 collections per second -3

Rate of GC: 164 950 MB per minute -143 959

Testing & tuning #2

10 concurrent threads

5 MB of short lived objects created per execution per
JVM process

This run
simulates a much reduced application footprint and memory allocation rate from
100 MB to only 5 MB per allocation.

As you can
see, the final improvement to the application footprint and memory allocation
did significantly decrease the JVM pause time to a more acceptable 1.9%. It is
important to note that throughout these 3 tests, the OldGen footprint and CMS
activity did not have any substantial impact on the JVM pause time, the
performance problem was due to the excessive activity and high volume of stop
the world events associated with the YG collections.

Solutions and recommendations

Our
problem case demonstrated that we can reduce the JVM pause time associated with
the excessive YG collection activity by tuning and reducing the memory
footprint per application request, thus reducing the allocation rate and YG GC
frequency.

However, when
such tuning strategy is not possible in the short term, it is worth exploring
other solutions. Similar results can potentially be achieved through the
following capacity improvement strategies:

·Horizontal and vertical scaling: split the
traffic via an increased number of JVM processes, at the expense of the
available hardware, thus reducing the allocation rate and frequency of YG
collections. This essentially means throwing hardware at the problem. My recommendation
is always to fine tune your application memory footprint first, and then
explore in parallel other scaling options.

·Java heap size & YG ratio tuning:
increasing the size of the YG space will definitely help reducing the frequency
of stop the world YG collections. Now please be careful that you don’t “starve”
the OldGen space otherwise you will simply move the problem with even more
severe consequences such as JVM thrashing and OOM events.

Final words

I hope
that you enjoyed this article and now have a better understanding of potential performance
impact of excessive JVM YG collections.

I
recommend that you do the following exercise after reading this article:

Pick one of your busiest applications.

Review the verbose:gc log and determine the JVM
pause time via GCMV.

Determine the frequency and impact of the YG
collections and identify tuning opportunities.