Tuning

Chapter: Memory Management

Sizing the young generation is the most important tuning step in a generational GC setup. First, it must be large enough to handle all threads of the concurrent working set without tenuring short-lived objects to the old generation. Second, we want long-lived objects tenured as quickly as possible. Remember, the more objects remain live, the more expensive the GC cycle!

Young Generation Sizing

We're trying to achieve an efficient balance between two extremes, and we'll start by defining a large enough young generation to prevent the old generation growing too fast under load (at least for the 90% of the time that matters most).

For the Oracle Hotspot JVM, try using a throughput collector with adaptive sizing:

-XX:+UseAdaptiveSizePolicy

Pause Goal: -XX:MaxGCPauseMillis

Throughput Goal: -XX:GCTimeRatio

You need to optimize for the desired survivor size, so keep an eye on the survivor spaces. After a young-generation GC, one of the two survivor spaces should not be filled more than 75%. At the same time, the old generation should not grow. If adaptive sizing isn't working, adjust the young generation's size manually.

The IBM WebSphere JVM and Oracle JRockit need to be sized manually. The IBM WebSphere JVM has two areas in the young generation instead of three, but the same target is still valid. The survivor area should not be filled more than 75% after a young-generation GC and the old generation should not grow.

Next, we need to assure that there are not so many live objects in the young generation that a minor GC becomes expensive. For the generational GC, middle-lived objects can be problematic. If they stick around too long in the young generation, they lead to long GC cycles. If we tenure them, the old generation grows, eventually bringing about its own excessive GCs.

By monitoring garbage collections under full load we can analyze the situation and find out what's going on. If young-generation GCs become too expensive while the old generation does not grow, we have too many live objects in the young generation. In the HotSpot JVM we can tune the tenure threshold, which defines how often an object must survive a GC cycle before it gets tenured. This enables us to tenure objects sooner while making sure that temporary objects die in the young generation. The effect should be a faster young-generation GC and a slightly growing old generation. In the IBM WebSphere and JRockit JVMs, all we can do is shrink the young generation by resizing.

If we have to accept a spillover of middle-lived objects to the old generation, we should use the CMS old-generation GC to deal with it. However if that spillover becomes too great we will experience very expensive major GCs. In such a case GC tuning alone does not suffice anymore and we need to optimize the transactional memory usage of the application with an allocation analysis.

Old-Generation Tuning

Once you've achieved near perfection in the young-generation size, you know the old generation isn't going to grow under load. This makes it relatively easy to discover the optimal size of the old generation. Check your application's utilization of the old generation after the initial warm-up phase and note the value. Reconfigure your JVM so that the old generation's size is 25% bigger than this observed value to serve as a buffer. For the Oracle HotSpot JVM, you must size the buffer to be at least as big as Eden plus one survivor to accommodate HotSpot's young-generation guarantee (check the HotSpot documentation for further details).

If you were unable to achieve optimal young-generation sizing—sometimes it simply is not possible—things become more complicated.

If your application is response time—oriented, you will want to use a concurrent GC in the old generation. In this case you need to tune both the concurrent GC thresholds and the size of the old generation so that the average fill state is never higher than 75%. Again, the 25% extra represents headroom needed for the concurrent GC. If the old generation fills up too much, the CMS will not be able to free up memory fast enough. In addition, fragmentation can easily lead to allocation errors. Both cases will trigger a real major GC, stopping the entire JVM.

If your application is throughput-oriented, things are not as bad. Use a parallel GC with compaction and make the old generation big enough to accommodate all concurrent running transactions. Without this, the GC will not free sufficient memory to support this level of application concurrency. If necessary, it's okay to expand the headroom a bit, keeping in mind that more memory only delays a GC—it cannot prevent it.