Wednesday, February 8, 2012

Analyzing Thread Dumps in Middleware - Part 2

This section details with when, how to capture and analyze thread dumps with special focus on WebLogic Application Server related thread dumps. Subsequent sections will deal with more real-world samples and tools for automatic analysis of Thread Dumps.

The Diagnosis

Everyone must have gone through periodic health checkups. As a starting point, Doctors always order for a blood test. Why this emphasis on blood test? Can't they just go by the patient's feedback or concerns? What is special about blood test?

Blood Test help via:

Capturing a snapshot of the overall health of the body (cells/antibodies/...)

Identifying and start focusing on the problem areas with prevention and treatment.

The Thread Dump is the equivalent of a blood test for a JVM. It is a state dump of the running JVM and all its parts that are executing at that moment in time.

It can help identify current execution patterns and help focus attention on relevant parts

There might be 100s of components and layers but difficult to identify what is getting invoked the most and where to focus attention. A thread dump taken during load will highlight the most executed code paths.

Dependencies and bottlenecks in code and application design

Show pointers for enhancements and optimizations

When to capture Thread Dumps

Under what conditions should we capture thread dumps? Anytime or specific times? Capturing thread dumps are ideal for following conditions:

To understand what is actively executed in the system when under load

When system is sluggish or slow or hangs

Java Virtual Process running but Server or App itself not responding

Pages take forever to load

Response time increases

Application and or Server behavior erratic

Just to observe hot spots (frequently executed code) under load

For Performance tuning

For Dependency analysis

For bottleneck identification and resolution

To capture snapshot of running state in server

The cost associated with capturing thread dumps is close to near zero; Java Profilers and other tools can consume anywhere from 5 to 20% overhead while a thread dump is more of a snapshot of threads which requires no real heavy weight processing on part of the JVM. There can be some cost only if there are too many interrupts to the JVM within real short intervals (like dumps forever every second or so).How to capture Thread Dumps

There are various options to capture thread dumps. JVM vendor tools allow capture and gather of thread dumps. Operating System interrupt signals to the process can also be used in generating thread dumps.

Sun Hotspot's jstack tool (under JAVA_HOME or JRE Home/bin) can generate thread dumps given the jvm process id. Similarly, jrcmd (from JRockit Mission Control) can generate thread dumps given a jvm process id and using the print_threads command. Both require to be run on the same operating system as the jvm process and both dump the output to the user shell.

In Unix, kill -3 <PID> or kill -SIGQUIT<PID> will generate thread dumps on the JVM process. But the output of the thread dump will go to the STDERR. Unless the STDERR was redirected to a log file, the thread dump cannot be saved when issued via kill signals. Ensure the process is always started and STDERR is redirected to a log (best practice to also have STDOUT redirected to same log file as STDERR).

In Windows, Ctrl-Break to the JVM process running in foreground can generate thread dumps. The primary limitation is the process needs to be running in the shell. Process Explorer in windows can also help in generating thread dumps but its much more problematic to get all the thread stacks in one shot. One has to wade through each thread and gets its stack. Another thing to keep in mind is, JVM might ignore interrupts if they were started with -Xrs flag.

But generating thread dumps via Weblogic specific commands is not recommended as the JVM itself might be sluggish or hung and never respond to higher app level commands (including weblogic). Issuing thread dumps via the WLS Admin Console requires the Admin server to be first up and healthy and also in communication with the target server.

Sometimes, even the OS level or JVM vendor specific tools might not be able to generate thread dumps. The causes can include memory thrashing (Out Of Memory - OOM) and thread deaths in the JVM, the system itself under tough constraints (memory/cpu) or the process is unhealthy and cannot really respond to anything in predictable way.

What to Observe

Now that we can capture thread dumps, what to observe? Its always recommended to take multiple thread dumps at close intervals (5 or 6 dumps at 10-15 seconds intervals). Why? A thread dump is a snapshot of threads in execution or various states. Taking multiple thread dumps allows us to peek into the threads as they continue execution.

Compare threads based on thread id or name across successive thread dumps to check for change in execution. Observe change in thread execution across thread dumps. Change in the thread state and or stack content for a given thread across successive thread dumps implies there is progress while no change or absence of progress indicates either a warning condition or something benign.

So, what if a given thread is not showing progress between thread dumps? Its possible the thread is doing nothing and so has no change between dumps. Or its possible its waiting for an event to happen and continues in same state and stack appearance. These might be benign. Or it might be blocked for a lock and continues to languish in the same state as the owner of the lock is still holding on the same lock or someone else got the lock ahead of our target thread. Its good to figure out which of the conditions applies and rule out benign cases.

Identifying Idle or Benign threads

One can use the following as rule of thumb while analyzing Application Server side thread dumps. Some additional pointers for WebLogic Specific threads.

If the thread stack depth is lesser than 5 or 6, treat it as an idle or benign thread. Why? By default, application server thread will itself take away 2-3 thread stack depths (like someRunnable or Executor.run) and if there is not much stack depth remaining, then its more likely sitting there for some job or condition.

If thread appears with Muxer in its name and is either doing processSockets() or blocked for lock in processSockets() ignore these as weblogic internal housekeeping threads.

Ensure the thread that is blocked is waiting for a lock held by another Muxer thread and not a non-Muxer Thread.

"ExecuteThread: '4' for queue: 'weblogic.socket.Muxer'" id=31 idx=0xc8 tid=8777 prio=5 alive, blocked, native_blocked, daemon
-- Blocked trying to get lock: java/lang/String@0x17674d0c8[fat lock]
at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
at weblogic/socket/EPollSocketMuxer.processSockets(EPollSocketMuxer.java:153)
at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
"ExecuteThread: '5' for queue: 'weblogic.socket.Muxer'" id=32 idx=0xcc tid=8778 prio=5 alive, native_blocked, daemon
at jrockit/ext/epoll/EPoll.epollWait0(ILjava/nio/ByteBuffer;II)I(Native Method)
at jrockit/ext/epoll/EPoll.epollWait(EPoll.java:115)[optimized]
at weblogic/socket/EPollSocketMuxer.processSockets(EPollSocketMuxer.java:156)
^-- Holding lock: java/lang/String@0x17674d0c8[fat lock]
at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace

One can see that both ExecuteThread:'5' and '4' are belonging to the weblogic.socket.Muxer queue with ExecuteThread:'5' holding a lock on java/lang/String@0x17674d0c8 and ExecuteThread: '4' blocking for the same lock. The Muxer thread holding the lock is doing a native poll while other muxer threads are blocked as there can be only one thread doing poll on a set of fds.

WebLogic Specific Thread Tagging

WebLogic tags threads internally as Active and Standby based on its internal workmanager and thread pooling implementation. There is another level of tagging based on a thread getting returned to the thread pool or not within a defined time period. If the thread does not get returned within a specified duration its tagged as Hogging and if it exceeds even higher limits (default of 10 minutes), its termed as STUCK. It means the thread is involved in a very long activity which requires some manual inspection.

Its possible for pollers to get kicked off but never return as they just keep polling in an infinite loop. Examples are the AQ Adapter Poller threads which might wrongly get tagged as STUCK.These are benign in nature.

Thread is doing a database read or remote invocation due to some user request execution and is tagged STUCK. This implies the thread has never really completed the request for more than 10 minutes and is definitely hung - its possibly doing a socket read on an endpoint that is bad or non-responsive or doing real big remote or local execution which should not be synchronous to begin with. Or it might be blocked for a resource which is part of a deadlocked chain and that might be the reason for the non-progress.

Analyzing a Thread Stack

The java thread stack content in a thread dump are always in text/ascii format enabling a quick read without any tools. Almost invariably, the thread stack is in English which lets the user read the package, classname and method call. These three apsects help us understand the general overview of the call the thread is executing, who invoked it and what it is invoking next thereby establishing the caller-callee chain as well as what is getting invoked. The method name gives us a general idea of what the invocation is about. All of these data points can show what is getting executed by the thread even if we haven't written the code or truly understand the implementation.

In the above stack trace, we can see the thread belong to the "weblogic.kernel.Default (self-tuning)" group with Thread Name as "ExecuteThread: '4'" and is waiting for a notification on a lock. Some WLS Specific observations based on the stack:

WLS Admin Console Applicationattempts to get ApplicationStatus of the Deployments on the remote server.

MBeanServer.getAttribute() implies it was call to get some attributes (in this case, the application status of deployments) from the MBean Server of a Remote WLS Server instance.

weblogic.rjvm.ResponseImpl.waitForData()signifies the thread is waiting for a weblogic RMI response from a remote WLS server for the remote MBean Server query call.

In the above thread stack, it shows WebLogic LDAP is just starting up and is attempting to generate a Random Seed Number using the entropy of the file system. This thread can stay in the same state if there is not much activity in the file system and the server might appear hung at startup.• Solution would be to add -Djava.security.egd=file:/dev/./urandom to java command line arguments to choose a urandom entropy scheme for random number generation for Linux. Note: this is not advisable to be used for Production systems as the random number is not that truly random.

"[ACTIVE] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000002c5ede5000 nid=0x2ca6 waiting for monitor entry [0x000000006a6cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at weblogic.messaging.kernel.internal.QueueImpl.addReader(QueueImpl.java:1082)
- waiting to lock <0x0000002ac75d1b80> (a weblogic.messaging.kernel.internal.QueueImpl)
at weblogic.messaging.kernel.internal.ReceiveRequestImpl.start(ReceiveRequestImpl.java:178)
at weblogic.messaging.kernel.internal.ReceiveRequestImpl.<init>(ReceiveRequestImpl.java:86)
at weblogic.messaging.kernel.internal.QueueImpl.receive(QueueImpl.java:841)
at weblogic.jms.backend.BEConsumerImpl.blockingReceiveStart(BEConsumerImpl.java:1308)
at weblogic.jms.backend.BEConsumerImpl.receive(BEConsumerImpl.java:1514)
at weblogic.jms.backend.BEConsumerImpl.invoke(BEConsumerImpl.java:1224)
at weblogic.messaging.dispatcher.Request.wrappedFiniteStateMachine(Request.java:961)
at weblogic.messaging.dispatcher.DispatcherServerRef.invoke(DispatcherServerRef.java:276)
at weblogic.messaging.dispatcher.DispatcherServerRef.handleRequest(DispatcherServerRef.java:141)
at weblogic.messaging.dispatcher.DispatcherServerRef.access$000(DispatcherServerRef.java:34)
at weblogic.messaging.dispatcher.DispatcherServerRef$2.run(DispatcherServerRef.java:111)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

This thread stack shows WLS adding a JMS Receiver to a Queue.

Following thread stack shows Oracle Service Bus (OSB) Http Proxy service blocked for response from an outbound web service callout and got tagged as STUCK as there was no progress for more than 10 minutes.

What constitutes a Hot Spot? Hot spot is a set of method calls that gets repeatedly invoked by busy running threads (not idle or dormant threads). It can be as simple healthy pattern like multiple threads executing requests to the same set of servlet or application layer or it can be a symptom of bottleneck if multiple threads are blocked waiting for the same lock.

Attempt to capture thread dumps and identify hot spots only after the server has been subjected to some decent loads by which time most of the code have been initialized and executed a few times.

For normal hot spots, try to identify possible optimizations in the code execution.
It might be that the thread is attempting to do a repeat look of JNDI resources or recreation of a resource like JMS Producer/consumer or InitialContext to a remote server.

Cache the JNDI resources, Context and producer/consumer objects. Pool the producers/consumers if possible, reuse the jms connections. Avoid repeated lookups. It can be as simple as resolving an address by doing InetAddress.getAllByName(), cache the results instead of repeat resolutions. Same way, cache XML Parsers or handlers to avoid repeat classloading of xml parsers/factories/handlers/implementations. While caching, ensure to avoid memory leaks if the instances can grow continuously.

Identify Bottlenecks

As Goldfinger says to James Bond in "Goldfinger",
Once is happenstance, Twice is coincidence and the third time is enemy
action when it comes to threads blocked for same lock. Start navigating
the chain of the locks, and try to identify the dependencies between the
threads and the nature of locking and try to resolve them as suggested
in Reducing Locks Section 1 of this series.

Identify blocked threads and dependency between the locks and the threads

Start analyzing dependencies and try to reduce locking

Resolve the bottlenecks.

This entire exercise can be a bit like pealing onions as one layer of bottleneck might be masking or temporarily resolving a different bottleneck. It requires repetition of analyzing, fixing, testing and analyzing.

Following the chain

Sometimes a thread blocking others from obtaining a lock might itself be on blocked list for yet another lock. In those cases, one should navigate the chain to identify who and what is blocking it. If there are deadlocks (circular dependency of locks), the JVM might report the deadlock automatically. But its best practice to navigate and understand the chains as that will help in resolving and optimizing the behavior for the better.

Let take the following thread stack. The ExecuteThread: '58' is waiting to obtain lock on the

java/util/TreeSet@0x4bba82c0

"[STUCK] ExecuteThread: '58' for queue: 'weblogic.kernel.Default (self-tuning)'" id=131807 idx=0x90 tid=927 prio=1 alive, blocked, native_blocked, daemon
-- Blocked trying to get lock: java/util/TreeSet@0x4bba82c0[thin lock]
at jrockit/vm/Locks.monitorEnter(Locks.java:2170)[optimized]
at weblogic/common/resourcepool/ResourcePoolImpl$Group.getCheckRecord(ResourcePoolImpl.java:2369)
at weblogic/common/resourcepool/ResourcePoolImpl$Group.checkHang(ResourcePoolImpl.java:2463)
at weblogic/common/resourcepool/ResourcePoolImpl$Group.access$100(ResourcePoolImpl.java:2210)
...........
at weblogic/common/resourcepool/ResourcePoolImpl.reserveResourceInternal(ResourcePoolImpl.java:450)
at weblogic/common/resourcepool/ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:329)
at weblogic/jdbc/common/internal/ConnectionPool.reserve(ConnectionPool.java:417)
at weblogic/jdbc/common/internal/ConnectionPool.reserve(ConnectionPool.java:324)
at weblogic/jdbc/common/internal/MultiPool.searchLoadBalance(MultiPool.java:312)
at weblogic/jdbc/common/internal/MultiPool.findPool(MultiPool.java:180)
at weblogic/jdbc/common/internal/ConnectionPoolManager.reserve(ConnectionPoolManager.java:89)

Search for the matching lock id among the rest of the threads. The lock is held by ExecuteThread '26' which itself is blocked for oracle/jdbc/driver/T4CConnection@0xf71f6938
...

The oracle/jdbc/driver/T4CConnection@0xf71f6938 lock is held by yet another thread MDSPollingThread-owsm which is attempting to test the health of its jdbc connection (that it reserved from the datasource connection pool) by invoking some basic test on the database. The socket read from the DB appears to the one is either slow or every other thread kept getting the lock except for ExecuteThread 58 & 26 as both appear STUCK in the same position for 10 minutes or longer. Having multiple thread dumps will help confirm if the same threads are stuck in the same position or there was change in ownership of locks and these threads were just so unlucky in getting ownership of the locks.

"MDSPollingThread-[owsm, jdbc/mds/owsm]" id=94 idx=0x150 tid=2993 prio=5 alive, in native, daemon
at jrockit/net/SocketNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BIII)I(Native Method)
at jrockit/net/SocketNativeIO.socketRead(SocketNativeIO.java:32)
at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(SocketInputStream.java)
at java/net/SocketInputStream.read(SocketInputStream.java:129)
at oracle/net/nt/MetricsEnabledInputStream.read(TcpNTAdapter.java:718)
at oracle/net/ns/Packet.receive(Packet.java:295)
at oracle/net/ns/DataPacket.receive(DataPacket.java:106)
at oracle/net/ns/NetInputStream.getNextPacket(NetInputStream.java:317)
at oracle/net/ns/NetInputStream.read(NetInputStream.java:104)
at oracle/jdbc/driver/T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:126)
at oracle/jdbc/driver/T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:82)
at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1177)
at oracle/jdbc/driver/T4CTTIfun.receive(T4CTTIfun.java:312)
at oracle/jdbc/driver/T4CTTIfun.doRPC(T4CTTIfun.java:204)
at oracle/jdbc/driver/T4C8Oall.doOALL(T4C8Oall.java:540)
at oracle/jdbc/driver/T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1079)
at oracle/jdbc/driver/OracleStatement.doExecuteWithTimeout(OracleStatement.java:1419)
at oracle/jdbc/driver/OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3752)
at oracle/jdbc/driver/OraclePreparedStatement.execute(OraclePreparedStatement.java:3937)
^-- Holding lock: oracle/jdbc/driver/T4CConnection@0xf71f6938[thin lock]
at oracle/jdbc/driver/OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1535)
at weblogic/jdbc/common/internal/ConnectionEnv.testInternal(ConnectionEnv.java:873)
at weblogic/jdbc/common/internal/ConnectionEnv.test(ConnectionEnv.java:541)
at weblogic/common/resourcepool/ResourcePoolImpl.testResource(ResourcePoolImpl.java:2198)

The underlying cause for the whole hang was due to firewall dropping the socket connections between the server and database and leading to hung socket reads.

If the application uses java Concurrent Locks, it might be harder to identify who is holding a lock as there can be multiple threads waiting for notification on the lock but no obvious holder of lock. Sometimes the JVM itself report the lock chain. If not, enable -XX:+PrintConcurrentLocks in the java command line to get details on Concurrent Locks.

What to observe

The overall health of JVM is based on the interaction of threads, resources, partners/systems/actors along with the OS, GC policy and hardware.

Ensure the system has been subjected to some good loads and everything has initialized/stabilized before starting the thread capture cycle.

Ensure no deadlock exists in the JVM. If there are deadlocks, restart is the only option to kill the deadlock. Analyze the locking and try to establish a order of locking or to avoid the locks entirely.

Verify GC or Memory is not a bottleneck

Check that Finalizer thread is not blocked. Its possible for the Finalizer thread to be blocked for a lock held by a user thread leading to accumulation in finale objects and more frequent GCs.

Threads requesting for new memory too frequently. These can be detected in JRockit VM by the threads executing jrockit.vm.Allocator.getNewTLA()

Use Blocked Lock Chain information if available

Identified by JVM as dependent chains or blocked threads. Use this to navigate the chain and reduce the locking.

Sometimes the health of one server might be really linked to other servers its interacting with, especially in a clustered environment. In those cases, it might require gathering and analyzing thread dumps from multiple servers. For instance in WLS: deployment troubleshooting will require thread dump analysis of Admin Server along with Managed server where the application is getting deployed or Oracle Service Bus (OSB) along with SOA Server instance.

Parallel capture from all linked pieces simultaneously and analyze.

Look for excess threads.These can be for GC (a 12 core hyperthreaded box might lead to 24 Parallel GC threads) or WLS can be running with large number of Muxer threads. Try to trim number of threads as these can only lead to excess thread switching and degraded performance.

Parallel GC threads can be controlled by using the jvm vendor specific flags.

For WLS, best to have the number of Native Muxer threads to 4. Use -Dweblogic.SocketReaders=4 to tweak the number of Muxer threads.

Ensure the WLS Server is using the Native Muxer and not the JavaSocketMuxer. Possible reasons might include not setting the LD_LIBRARY_PATH or SHLIB_PATH to include the directory containing the native libraries. Native Muxer delivers best performance compared to Java Socket Muxer as it uses native poll to read sockets that have readily available data instead of doing a blocking read from the socket.

This section of the series discussed how to analyze thread dumps and navigate lock chains as well as some tips and pointers on what to look for and optimize. The next section will go into tools that can help automate thread dump analysis, some real world examples and limitations of thread dumps.

Followers

About Me

Sabha Parameswaran is an Advisory Solutions Architect at Pivotal working primarily on Enterprise Deployments, Reference Architecture recommendations for Pivotal's Cloud Foundry and Container Service product offerings. Was the team lead in Platform engineering, working on integration with Partners, building tools (tile-gen prototype, apigee bosh integration etc). Previously was a Solutions Architect in the Architect Team (A-Team) of Fusion Middleware Development in Oracle and BEA Systems Pre-Sales System Engineering. Areas of focus include Cloud enablement, Product Integrations, Architecture Design and Review, Proof Of Concepts, Performance Tuning and Best Practices across the Middleware stack.

Disclaimer

The views expressed in this blog are my own and do not necessarily reflect the views of Oracle Corporation. All content is provided on an 'as is' basis, without warranties or conditions of any kind, either express or implied, including, without limitation, any warranties or conditions of title, non-infringement, merchantability, or fitness for a particular purpose. You are solely responsible for determining the appropriateness of using or redistributing and assume any risks.