The
problem was first communicated by our production Ops team following recent performance
degradation complaints by the end-users under peak load. An initial root cause
analysis exercise did reveal the following facts and observations:

Response time spikes were observed on regular basis
and especially under peak load.

Processing time of the application web requests
(after body/payload received) was found to be optimal and < 1 sec.

An initial review of the WebLogic Threads and JVM
Thread Dump did not expose any bottleneck or contention within the
application code.

Network packet analysis did not expose any network
latency but isolated the response time delay within the WebLogic server
tier.

JVM Thread Dump analysis – second pass

Another
analysis iteration was performed of the JVM Thread Dump data captured which did
reveal the following findings:

As we can
see from the above image, it was identified that “Java Muxers” threads were
being used for the overall WebLogic Network I/O. In general, it is not
recommended enabling the Java Muxers since they offer poor scalability and
suboptimal performance vs. native Muxers or more recent NIO Muxers. Java Muxers
block on “reads” until there is data to be read from a socket and does not
scale well when dealing with a large influx of inbound web requests.

The
following Thread stacktrace can be found from the thread dump when using NIO
(Oracle WebLogic 12.2.x).

Following
the above finding, a review of the WebLogic 11g configuration was performed but
did not reveal any problem (native IO enabled). The next phase of the RCA was
now to determine why Java Muxers were enabled by WebLogic on start-up.

Root Cause and Solution

The root
cause was finally identified following a review of the WebLogic start-up logs.

As per
above, it was found that Native IO was disabled on start-up due to a problem
with the “Performance Pack”, which includes the Native Muxers, falling back on Java
IO but still allowing the WebLogic server to start properly.

Furthermore,
it was identified that the JVM 1.7 start-up parameters did not include the “-d64”
which was confusing & preventing WebLogic from loading the proper 64-bit
Performance Pack library, thus disabling Native IO and falling back on the Java
Muxers.

Following
the implementation of the solution (restoration of the Native Muxers) to the
production environment, we could observe a significant improvement of the
application performance and improved scalability.

8.08.2016

I am happy to inform you that I published recently an update to the existing refcard on Java Performance Optimization which is now available from DZone. The updated material now better reflects the Java 8 features and provides a dedicated section and guidelines about the JVM Metaspace.

“By default, the Metaspace memory space is unbounded and will use the available process and/or OS native memory available for dynamic expansions. The memory space is divided into chunks and allocated by the JVM via mmap.We recommend keeping the default, dynamic resize mode as a starting point for simpler sizing combined with close monitoring of your application metadata footprint over time for optimal capacity planning…”

Overhyped or not, the cloud has deeply changed how we build and run software—and not just because IaaSes make VMs trivial to spin up and PaaSes make environments easy to set up. As a user you know what’s changed, and you understand the concept “as a service” (well, ever since you started running *nix); and, thank goodness, you don’t really have to worry about the physical details that make those services run.

9.08.2015

Brendan Gregg and Martin Spier from Netflix recently shared a very interesting article titled Java in Flames, describing their latest experimentation with a new JDK option (-XX:+PreserveFramePointer) that allowed them to create a full CPU consumers view as a "flame" graph. This article is an advanced read but extremely interesting for Java Performance enthusiasts.This option is now included in the recently released JDK 8u60.We will create our own experiment shortly and post a video exploring this CPU profiling capability real-time vs. existing CPU profiling tools & techniques. As mentioned in the article, a clear added-value would be to automate and visualize CPU utilization delta (deviation from an established baseline) between releases or code changes. This approach would allow fast detection of CPU bottleneck or improvements following software changes, improving the overall performance and scalability of the production environment over the long run, as well as keeping the cloud or on-premise hardware cost under control.Here is a small snippet from the original article:"Java mixed-mode flame graphs provide a complete visualization of CPU usage and have just been made possible by a new JDK option: -XX:+PreserveFramePointer. We've been developing these at Netflix for everyday Java performance analysis as they can identify all CPU consumers and issues, including those that are hidden from other profilers..."

7.21.2015

This article will share with you a few JVM "buzzwords" that are important for Java developers to understand and remember before performing any JVM performance and garbage collection tuning. A few tips are also provided including some high level performance tuning best practices at the end of the article. Further recommendations regarding the Oracle HotSpot concurrent GC collectors such as CMS and G1 will be explored in future articles.Before reading any further, I recommend that you first get familiar with the JVM verbose GC logs. Acquiring this JVM data analysis skill is essential, especially when combined with more advanced APM technologies.

Use tools such as GCMV (GC Memory Visualizer) in order to assess your JVM pause time and memory allocation rate vs. sizing the generations by hand.

Allocation & Promotion Rates

It is important to keep track of your application allocation and promotion rates for optimal GC performance.

Keep the GCAdaptiveSizePolicy active, as part of the JVM ergonomics. Tune by hand only if required.

LIVE Data Calculation

Your live application data corresponds to the OldGen occupancy after a Full GC.

It is essential that your OldGen capacity is big enough to hold your live data comfortably and to limit the frequency of major collections and impact on your application load throughput.

Recommendation: as a starting point, tune your Java Heap size in order to achieve an OldGen footprint or occupancy after Full GC of about 50%, allowing a sufficient buffer for certain higher load scenarios (fail-over, spikes, busy business periods...).

*Hot Spot*: watch for OldGen memory leaks!

What is a memory leak in Java? Constant increase of the LIVE data over time...

7.09.2015

This post is to inform you that I will be releasing an article shortly on the industry adoption of SHA-2 SSL certificates and potential impact to your Java EE production environments. It will be especially useful if your secured application is still using an older version of Oracle WebLogic, packaged with the deprecated Certicom-based SSL implementation which does not support SHA-2 (SHA-256 signature algorithm).In the meantime, I recommend that you consult the high level SHA-2 migration guide from Entrust. It is a very good starting-point and will help increase your awareness level on this upcoming SHA-1 to SHA-2 upgrade.

5.21.2015

Eric Smith from AppDynamics recently released a great article on application scalability.Essentially the main point is that the ability or effectiveness of scaling vertically/horizontally your application depend on various factors, more complex than just looking at the OS CPU and memory utilization.Proper usage of the right tools and capture of application specific metrics are crucial in order to identify tuning opportunities. This approach will also help you determine the right initial and incremental infrastructure/middleware sizing for your on-premise or in the cloud production environment, reducing your client hardware/hosting long-term cost and improve the ROI.

For example, if you we look at your Java application LIVE data (OldGen footprint after a major collection). Some applications have LIVE data which depend mainly on the concurrent load and/or active users e.g. session footprint and other long-lives cached objects. These applications will benefit well from vertical or horizontal scaling as load is split across more JVM processes and/or physical VM's, reducing pressure point on the JVM fundamentals such as the garbage collection process.On the contrary, Java applications dealing with large LIVE data footprint due to excessive caching, memory leaks etc. will poorly scale since this memory footprint is "cloned" entirely or partially over the new JVM processes or physical VM's. These applications will benefit significantly from an application and JVM optimization project which can both improve the performance and scalability, thus reducing the need to "over-scale" your environment in long-term.

5.09.2015

I would like to inform my fellow readers that I am currently preparing a cluster of fresh articles on Java Performance following intense troubleshooting and performance tuning work over the past 12 months.

In the meantime, I recommend the following list of fresh DevOps related articles from Electric Cloud which offer different perspectives on this practice.Please stay tune for more updates...Thanks.P-H

2.10.2015

I would like
to share with my fellow readers that DZone
has published a great 2015 guide about Continuous Delivery, which is a core
principle and goal of the DevOps methodology.

If you are
part of an organization about to implement DevOps principles, emerging tools or
simply wish to improve your knowledge and awareness on Continuous Delivery, I
highly recommend that you download your own copy today, it is FREE!