Investigating "Mysterious" HotSpot Behavior

13 Aug 2014
tech

Yo! I am preparing a new talk, to be first presented at the upcoming Joker Conference
in Saint Petersburg, Russia. With it, I intend to make everyone capable of digging deep into
HotSpot to get to the bottom of their issues.

This blog entry is a teaser which examines a sample problem encountered by Nikita of
Plumbr.

Last Ditch Collection

One of the most awesome things about HotSpot is that it is almost completely built from OpenJDK.
So, we have virtually no restrictions in our investigation. Let’s
fetch the latest version
and do a grep:

There we are! Reading the enclosing method
gives us an understanding of what’s happening:

// Don't clear the soft refs yet.
heap->collect_as_vm_thread(GCCause::_metadata_GC_threshold);
// After a GC try to allocate without expanding. Could fail
// and expansion will be tried below.

// If still failing, allow the Metaspace to expand.
// This should work unless there really is no more space
// or a MaxMetaspaceSize has been specified on the command line.
_result = _loader_data->metaspace_non_null()->expand_and_allocate(_size, _mdtype);

// If expansion failed, do a last-ditch collection and try allocating
// again. A last-ditch collection will clear softrefs.
heap->collect_as_vm_thread(GCCause::_last_ditch_collection);

When HotSpot finds itself unable to allocate memory for something that should be stored in metaspace (e.g., a class), it first runs a GC (GCCause::_metadata_GC_threshold). If that doesn’t help, an attempt to expand metaspace is made. If that still doesn’t help, a full GC is triggered with reason GCCause::_last_ditch_collection.
Soft references are cleaned up at this point. If there is still insufficient room, then an OOM is thrown.

Sounds good, but let us write some actual code that gets this behavior consistently.
Leaking a lot of stuff to metaspace
should, by this logic, result in at least one last ditch collection:

Note that usually, more than one last ditch collection is observed. It is expected, since…
Well, otherwise there wouldn’t be much point doing it in the first place.

Also, we were using a Java 9 bundle, but the logic of last ditch collection has not changed
for quite a long while, so we’re safe in that regard.

Another fun thing to note is that we did not really need to leak to metaspace all by ourselves.
JVM does a good job of loading a lot of classes on its own, so if we just run
java -XX:+PrintGCDetails -XX:MaxMetaspaceSize=2m X, we will get some last ditch collections
and an OOM before JVM even notices that class X does not exist.

No GC

Now, “No GC” is a bit more tricky. Obviously, the HotSpot developers would not in their
sane minds intentionally make this a valid GC cause. Actually, the only two reasonable places
this could appear is if we run
jstat before any GC has occurred:

So, if Nikita saw this in a different situation, this is very likely a bug.
Seeing as we did not observe this issue while investigating the last ditch collection,
it is possible that the bug has been fixed already. Let’s be optimistic and
look for all the relevant changesets:

Let the question marks and what gave G1 away remain an exercise for the reader ;)

Summary

We have just seen that, given just a short tweet to go on and using just the basic command line
tools, we can identify a HotSpot bug hidden in the depths of time, put and upper bound on the
JVM version that the issue has arisen at, and even learn what GC algorithm was in use
at the time.

Call For Action

This is all nice and cool, but I need more real life examples of puzzling HotSpot behavior.
You are welcome to send them over to me by email,
twitter or in the comments below. It is quite probable that I will
investigate your issue, free of charge!

P.S. Thanks to Nikita for giving this sample and proof-reading the post! (Any mistakes are my own, though)