Prior to the introduction of Java 2 Platform, Standard Edition (J2SE) 1.4.1, writing effective applications that needed high throughput or minimal pause time during execution proved difficult for Java developers. Now, version 1.4.1's three new garbage collection algorithms specifically target these types of applications. In this article, I introduce these algorithms in the broader context of Java application development. I discuss the advantages and disadvantages of the algorithms, and recommend when they should and should not be used.

A useful metaphor

Imagine you live in a really small town (let's call it JavaVille). It has one main street, and, for fun, residents cruise the street, throwing candy wrappers and old computer parts out their car windows. This creates quite a mess, and the town has hired a full-time garbage collector to deal with the problem.

Bear with me for a minute. This is a really useful metaphor.

Now, imagine you are the town garbage collector, or sanitation engineer. With your garbage truck, you pick up all the trash left on the street. However, your truck is so big that no other cars can travel on the main street while you're collecting garbage. This gives people something else to complain about—they can't do anything else while the garbage truck travels through town. You try to stay out of the way, but sometimes, certain vehicles must get through, like ambulances and fire trucks.

To alleviate the situation, the town council votes to add some new streets so people can go about their business even when the garbage truck is running. They also decide to buy more garbage trucks so garbage collection takes less time.

This is a really useful metaphor for thinking about Java garbage collection, where multiple application threads (litterbugs) create garbage, and one thread (you) cleans it up. The application threads and the garbage collector share a single CPU (the main street), and all activity in the applications must stop while garbage collection is in progress (stop the world). You can make garbage collection less noticeable by adding more CPUs (streets) or more garbage collection (GC) threads (garbage trucks). Note that adding more GC threads (trucks) won't do you any good unless you have enough CPUs (streets) for them to work effectively in parallel.

Garbage collection is a complex subject. Before I discuss the new garbage collection algorithms, we should look at the history of Java garbage collection and the situation as it existed before J2SE 1.4.1. This should make it easier to understand current garbage collection and the reasons behind the new algorithms.

Historical background

When Java was originally developed, the JDK shipped with a mark-and-sweep garbage collector. A mark-and-sweep garbage collector proceeds in two phases:

Mark: identifies garbage objects

Sweep: reclaims the memory for the garbage objects

Garbage objects are identified by traversing references from the current application stack frames; unreachable objects are assumed to be garbage.

Mark and sweep is a "stop-the-world" garbage collection technique; that is, all application threads stop until garbage collection completes, or until a higher-priority thread interrupts the garbage collector. If the garbage collector is interrupted, it must restart, which can lead to application thrashing with little apparent result. The other problem with mark and sweep is that many types of applications can't tolerate its stop-the-world nature. That is especially true of applications that require near real-time behavior or those that service large numbers of transaction-oriented clients.

Because of these problems, Sun Microsystems' Java HotSpot VM split the heap into three sections and added three garbage collection techniques. Splitting the heap allows different algorithms to be used for newly created objects and for objects that have been around for a while. This technique is based on the observation that most Java objects are small and short-lived. The heap's three sections are:

Permanent space: used for JVM class and method objects

Old object space: used for objects that have been around a while

New (young) object space: used for newly created objects

The new object space is further subdivided into three parts: Eden, where all newly created objects go, and survivor spaces 1 and 2, where objects go before they become old. The survivor spaces make it easier to use copy-compaction with young objects; more details later.

The J2SE 1.3 garbage collection techniques are:

Copy-compaction: used for new object space.

Mark-compact: used in old object space. Similar to mark and sweep, mark-compact marks all unreachable objects; in the second phase, the unreachable objects compact. This technique avoids fragmentation problems and works well when the garbage collector runs infrequently.

Incremental garbage collection (optional): Incremental GC creates a new middle section in the heap, which divides into multiple trains. Garbage is reclaimed from each train one at a time. This provides fewer, more frequent pauses for garbage collection, but it can decrease overall application performance. Incremental garbage collection can be enabled with the -Xincgc command-line option.

All of these techniques are stop-the-world techniques. Though incremental garbage collection makes this effect less obvious, the application threads must still stop. That proves problematic for applications that can't afford to pause for garbage collection.

Garbage collection is based on live objects; that is, those reachable from the current stack space. Live objects are copied from new object space to survivor space (1 or 2), and then from survivor space to old object space. The amount of time objects spend in survivor space can be controlled with command-line parameters (see Tables 2 and 3 below).

The garbage collector typically runs in a low-priority thread, attempting to reclaim memory when the application is idle. This is fine for applications that regularly have idle time, such as graphical user interface (GUI)-driven applications. Unfortunately, if there is little or no idle time, the garbage collector may not get a chance to run.

Garbage collection can also be triggered if the heap's subregions are nearly full. In this case, the garbage collection thread's priority increases, thus increasing the chance that the garbage collection will run to completion. If the new generation is full, a minor collection is triggered; if the old generation is full, a major collection is triggered. The steps in a minor collection are:

Copy objects from Eden to survivor space (1 or 2).

Copy from survivor space 1 to survivor space 2, or vice versa. After a certain number of copies (controllable from the command line), an object becomes tenured, that is, a candidate for old object space.

Tenured objects move from survivor space 1 or 2 to old object space.

A major collection uses the old generation garbage collector (mark-compact for J2SE 1.3) to reclaim old objects.

Garbage collection in J2SE 1.4.1

Some applications can't tolerate even the short pauses from incremental garbage collection. Examples include near real-time applications and applications that must service large transaction volumes. The new 1.4.1 algorithms were created to fill this gap.

The new algorithms are based on the observation that many machines used for low-pause or high-throughput applications have large amounts of memory and multiple processors. The algorithms are optimized to take advantage of the extra resources. Table 1 shows the new algorithms.

Table 1. The complete set of GC algorithms (* = new in 1.4.1)

Young

Old

Stop the world

Multithreaded

Concurrent

Copying

X

X

*Parallel copying

X

X

X

*Parallel scavenging

X

X

X

Incremental

1 (see note below)

X

Mark-compact

X

X

*Concurrent

X

2 (see note below)

X

Note 1: Subdivides the new generation to create an additional middle generation

Note 2: Uses stop-the-world approach for two of its six phases

The parallel (multithreaded) algorithms are optimized for machines with multiple CPUs. In J2SE 1.4.1, they are only used in the young generation. Using multiple threads allows garbage collection to proceed in parallel with the application thread, so the application can proceed without perceptible pauses. By default, the parallel collectors allocate one thread per processor. Note: If you have a single-processor machine, these algorithms will probably not help with application performance and could potentially diminish it.

The parallel scavenging collector is optimized for very large (gigabyte) heaps. It should provide very fast throughput with minimal pauses. It requires using mark-compact for the old generation.

The concurrent collector works with the old generation. It divides garbage collection into six phases:

Initial mark

Mark

Precleaning

Remark

Sweep

Reset

The first (initial-mark) and fourth (remark) phases are stop-the-world techniques; the others can proceed in parallel with application threads. The multistep garbage collection allows the stop-the-world phase to be as short as possible, which means that application pauses for garbage collection should be minimized.

Controlling 1.4.1 garbage collection

To select a particular garbage collection algorithm you will need to use a command-line switch. The switches are listed in Table 2:

Table 2. Switches that affect GC algorithms

Command-line switch

Algorithm

[none]

Copying [default]

-XX:+UseParNewGC

Parallel copying

-XX:+UseParallelGC

Parallel scavenging

-Xincgc

Incremental

[none]

Mark-compact [default]

-XX:+UseConMarkSweepGC

Concurrent mark and sweep

Other command-line switches can be used to affect the relative size of the heap's regions:

The following command-line parameter affects the degree of parallelism:

-XX:ParallelGCThreads=[n]: Number of GC threads to create. Default: same as number of processors.

The command-line parameters should only be used if profiling shows a specific problem they can affect. These parameters are interrelated and may have unexpected side effects. Also note that these are nonstandard switches (the -X or -XX prefix is the clue), and they may change or disappear with future versions of Java.

How to select a GC algorithm

For most Java applications, the default algorithms work fine. If your application performance is reasonable, you don't need to use another GC algorithm. As the saying goes, "If it ain't broke, don't fix it."

If you have a single-processor client machine and are having problems with pause times in your application, try the incremental garbage collector. This might help with perceived performance, even though it may decrease real application performance somewhat. Remember that perceived performance is usually more important than real performance for client applications. If the incremental garbage collector doesn't help (or doesn't help enough), and you have plenty of RAM, try the concurrent garbage collector.

If you have a single-processor server machine with lots of memory and experience trouble with application pause times, try the concurrent garbage collector.

If you have a multiprocessor machine, especially with four or more processors, try one of the parallel garbage collection algorithms. These should significantly decrease pause times. If you have lots of memory (gigabytes), use the scavenging collector; otherwise, use the copying collector.

As a general rule, you shouldn't have to change the heap size (relative or absolute) or parallelism settings. However, if application performance isn't good enough with any of the existing GC algorithms, you might need to make some adjustments. Don't just do this willy-nilly. Make sure you profile your application before and after making the changes, and run tests with significant, realistic loads. Resources has some good suggestions on how to do this. One final note: Don't even consider changing GC parameters until you've profiled and optimized your application.

Parting thoughts

The new Java garbage collection algorithms can significantly improve application performance, especially for applications that can't tolerate pauses or that require maximum throughput.

The concurrent collector is optimized for very large (gigabyte) heaps. The parallel garbage collection algorithms are optimized for multiprocessor machines. And the scavenging collector is optimized for multiprocessor machines with very large heaps.

Choosing a GC algorithm can be somewhat tricky. Make sure you profile your application before and after making a change, and use realistic test cases. Otherwise, you might make things worse rather than better.

Greg
Holling is the president and senior consultant for Visionary Computer
Consulting, with more than 20 years of software development
experience. He has worked in a wide variety of roles, including
programmer, system administrator, team leader, project manager, and
architect. He has developed software for scientific, GIS,
client/server, graphics, medical, insurance, and database
applications.