12.24.2012

This
article will provide you with a tutorial allowing you to determine how much and
where Java heap space is retained from your active application Java threads. A true case study from an Oracle
Weblogic 10.0 production environment will be presented in order for you to
better understand the analysis process.

We will also
attempt to demonstrate that excessive garbage collection or Java heap space
memory footprint problems are often not caused by true memory leaks but instead
due to thread execution patterns and high amount of short lived objects.

Background

As you may
have seen from my past JVM overview article, Java threads are part of the JVM
fundamentals. Your Java heap space memory footprint is driven not only by
static and long lived objects but also by short lived objects.

OutOfMemoryError
problems are often wrongly assumed to be due to memory leaks. We often overlook
faulty thread execution patterns and short lived objects they “retain” on the
Java heap until their executions are completed. In this problematic scenario:

Eventually such
short lived objects get promoted to the long lived object space e.g.
OldGen/tenured space by the garbage collector

As a side effect, this is causing the OldGen space
to fill up rapidly, increasing the Full GC (major collections) frequency

Depending of the severity of the situation this can
lead to excessive GC garbage collection, increased JVM paused time and ultimately OutOfMemoryError: Java heap space

Your application is now down, you are now puzzled on
what is going on

Finally, you are thinking to either increase the
Java heap or look for memory leaks…are you really on the right track?

In the above
scenario, you need to look at the thread execution patterns and determine how
much memory each of them retain at a given time.

OK I get the picture but what about the
thread stack size?

It is very
important to avoid any confusion between thread stack size and Java memory
retention. The thread stack size is a special memory space used by the JVM to
store each method call. When a thread calls method A, it “pushes” the call onto
the stack. If method A calls method B, it gets also pushed onto the stack. Once
the method execution completes, the call is “popped” off the stack.

The Java
objects created as a result of such thread method calls are allocated on the
Java heap space. Increasing the thread stack size will definitely not have any
effect. Tuning of the thread stack size is normally required when dealing with java.lang.stackoverflowerror
or OutOfMemoryError: unable to create new native thread problems.

Case study and problem context

The
following analysis is based on a true production problem we investigated
recently.

Severe performance degradation was observed from a
Weblogic 10.0 production environment following some changes to the user web
interface (using Google Web Toolkit and JSON as data payload)

Analysis of the verbose:gc logs did confirm full
depletion of the 32-bit HotSpot JVM OldGen space (1 GB capacity)

Thread dump snapshots were also generated before and
during the problem

The only problem mitigation available at that time
was to restart the affected Weblogic server when problem was observed

A rollback of the changes was eventually performed
which did resolve the situation

The team
first suspected a memory leak problem from the new code introduced.

Thread dump analysis: looking for
suspects…

The first
analysis step we did was to perform an analysis of the generated thread dump
data. Thread dump will often show you the culprit threads allocating memory on
the Java heap. It will also reveal any hogging or stuck thread attempting to
send and receive data payload from a remote system.

The first
pattern we noticed was a good correlation between OOM events and STUCK threads
observed from the Weblogic managed servers (JVM processes). Find below the
primary thread pattern found:

<10-Dec-2012
1:27:59 o'clock PM EST> <Error> <BEA-000337>

<[STUCK]
ExecuteThread: '22' for queue:

'weblogic.kernel.Default
(self-tuning)'

has been
busy for "672" seconds working on the request

which is
more than the configured time of "600"
seconds.

As you can
see, the above thread appears to be STUCK or taking very long time to read and
receive the JSON response from the remote server. Once we found that pattern,
the next step was to correlate this finding with the JVM heap dump analysis and
determine how much memory these stuck threads were taking from the Java heap.

Heap dump analysis: retained
objects exposed!

The Java heap dump analysis was performed using MAT. We will now list the different
analysis steps which did allow us to pinpoint the retained memory size and
source.

1. Load the HotSpot JVM heap dump

2. Select the HISTOGRAM view and filter by “ExecuteThread”

* ExecuteThread is the Java class used by the Weblogic kernel for thread creation & execution *

As you can
see, this view was quite revealing. We can see a total of 210 Weblogic threads created. The total retained
memory footprint from these threads is 806 MB.
This is pretty significant for a 32-bit JVM process with 1 GB OldGen space.
This view alone is telling us that the core of the problem and memory retention
originates from the threads themselves.

3. Deep dive into the thread memory footprint
analysis

The next
step was to deep dive into the thread memory retention. To do this, simply right
click over the ExecuteThread class and select: List objects > with outgoing
references.

As you can
see, we were able to correlate STUCK threads from the thread dump analysis with
high memory retention from the heap dump analysis. The finding was quite surprising.

3. Thread Java Local variables identification

The final
analysis step did require us to expand a few thread samples and understand the
primary source of memory retention.

As you can
see, this last analysis step did reveal huge JSON response data payload at the
root cause. That pattern was also exposed earlier via the thread dump analysis
where we found a few threads taking very long time to read & receive the
JSON response; a clear symptom of huge data payload footprint.

It is crucial
to note that short lived objects created via local method variables will show up in the heap dump analysis.
However, some of those will only be visible from their parent threads since they are not referenced by other objects, like in
this case. You will also need to analyze the thread stack trace in order to identify
the true caller, followed by a code review to confirm the root cause.

Following
this finding, our delivery team was able to determine that the recent JSON faulty
code changes were generating, under some scenarios, huge JSON data payload up
to 45 MB+. Given the fact that this environment is using a 32-bit JVM with only
1 GB of OldGen space, you can understand that only a few threads were enough to
trigger severe performance degradation.

I hope
this article has helped you understand how you can pinpoint the Java heap
memory footprint retained by your active threads by combining thread dump and
heap dump analysis. Now, this article will remain just words if you don’t
experiment so I highly recommend that you take some time to learn this analysis
process yourself for your application(s).

12.17.2012

This is
the first post of a new series that will bring you frequent and short “question of the
day” articles. This new format will complement my existing writing and is
designed to provide you with fast answers for common problems and questions
that I often get from work colleagues, IT clients and readers.

Each of
these QOTD posts will be archived and you will have access to the entire list
via a separate page. I also encourage you to post your feedback,
recommendations along with your own complement answer for each question.

Weblogic
thread state can be a particularly confusing topic for Weblogic administrators
and individuals starting to learn and monitor Weblogic threads. The thread monitoring section
can be accessed for each managed server Managed server under the Monitoring >
Threads tab.

As you can
see, the thread monitoring tab provides a complete view of each
Weblogic thread along with its state. Now let’s review each section and state so you
can properly understand how to assess the health.

# Summary section

Execute Thread Total Count: This is the total number
of threads “created” from the Weblogic self-tuning pool and visible from
the JVM Thread Dump. This value correspond to the sum of: Active + Standby threads

Active Execute Threads: This is the number of
threads “eligible” to process a request. When thread demand goes up,
Weblogic will start promoting threads from Standby to Active state which
will enable them to process future client requests

Standby Thread Count: This is the number of threads waiting
to be marked “eligible” to process client requests. These threads are
created and visible from the JVM Thread Dump but not available yet to
process a client request

Hogging Thread Count: This is the number of threads
taking much more time than the current execution time in average
calculated by the Weblogic kernel

Execute Thread Idle Count: This is the number of
Active threads currently “available” to process a client request

In the
above snapshots, we have:

Total of 43 threads, 29 in Standby state and 14 in
Active state

Out of the 14 Active threads, we have 1 Hogging thread
and 7 Idle threads e.g. 7 threads “available” for request processing

Another way to see the situation: we have a total of
7 threads currently “processing” client request with 1 out of 7 in Hogging
state (e.g. taking more time than current calculated average)

# Thread
matrix

This
matrix gives you a view of each thread along with its current state. There is
one more state that you must also understand:

STUCK: A stuck thread is identified by Weblogic when
it is taking more time than the configured stuck thread time (default is
600 seconds). When facing slowdown conditions, you will normally see
Weblogic threads transitioning from the Hogging state followed by STUCK, depending
how long these threads remain stuck executing their current request