How to Handle Java Finalization's Memory-Retention Issues

Finalization is a feature of the Java programming language that allows you to perform postmortem cleanup on objects that the garbage collector has found to be unreachable. It is typically used to reclaim native resources associated with an object. Here's a simple finalization example:

Sometime after an Image1 instance has become unreachable, the Java Virtual Machine (JVM) * will call its finalize() method to ensure that the native resource that holds the image data -- pointed to by the integer nativeImg in the example -- has been reclaimed.

Notice, however, that the finalize() method, despite its special treatment by the JVM, is an arbitrary method that contains arbitrary code. In particular, it can access any field in the object -- pos and dim in the example. Surprisingly, it can also make the object reachable again by, say, making it reachable from a static field, for example, randomImg = this;. The latter programming practice is not recommended, but unfortunately the Java programming language allows it.

The following steps and Figure 1 describe the lifetime of a finalizable objectobj -- that is, an object whose class has a nontrivial finalizer.

Figure 1.Lifetime of Finalizable Object obj.

When obj is allocated, the JVM internally records that obj is finalizable. This typically slows down the otherwise fast allocation path that modern JVMs have.

When the garbage collector determines that obj is unreachable, it notices that obj is finalizable -- as it had been recorded upon allocation -- and adds it to the JVM's finalization queue. It also ensures that all objects reachable from obj are retained, even if they are otherwise unreachable, as they might be accessed by the finalizer. Figure 2 illustrates this for an instance of object Image1.

Figure 2.Garbage Collector Determines That obj Is Unreachable.

At some point later, the JVM's finalizer thread will dequeue obj, call its finalize() method, and record that the obj's finalizer has been called. At this point, obj is considered to be finalized.

When the garbage collector rediscovers that obj is unreachable, it will reclaim its space along with everything reachable from it, provided that the latter is otherwise unreachable.

Notice that the garbage collector needs a minimum of two cycles to reclaim obj and needs to retain all other objects reachable from obj during this process. If a programmer is not careful, this can create temporary, subtle, and unpredictable resource-retention issues. Additionally, the JVM does not guarantee that it will call the finalizers of all the finalizable objects that have been allocated. It might exit before the garbage collector discovers some of them to be unreachable.

Avoid Memory-Retention Problems When Subclassing

Finalization can delay the reclamation of resources, even if you do not use it explicitly. Consider the following example:

public class RGBImage1 extends Image1 {
private byte rgbData[];
}

The RGBImage1 class extends Image1 and introduces the field rgbData -- and maybe some methods that the example does not show. Even though you did not explicitly define a finalizer on RGBImage1, the class will naturally inherit the finalize() method from Image1, and all RGBImage1 instances will also be considered to be finalizable. When an RGBImage1 instance becomes unreachable, the reclamation of the potentially very large rgbData array will be delayed until the instance is finalized, as shown in Figure 3. This memory retention problem can be difficult to find because the finalizer might be "hidden" in a deep class hierarchy.

Figure 3.Reclamation of rgbData Array Will Be Delayed Until the Instance Is Finalized.

One way to avoid this problem is to rearrange the code so that it uses composition instead of inheritance, as follows:

Compared with RGBImage1, RGBImage2 contains an instance of Image1 instead of extending Image1. When an instance of RGBImage2 becomes unreachable, the garbage collector will promptly reclaim it, along with the rgbData array -- assuming the latter is not reachable from elsewhere -- and will queue up only the Image1 instance for finalization, as shown in Figure 4. Because class RGBImage2 does not subclass Image1, it will not inherit any methods from it. Therefore, you might have to add delegator methods to RGBImage1 to access the required methods of Image1. The dispose() method is such an example.

Figure 4.GC Will Queue Up Only the Image1 Instance for Finalization.

You cannot always rearrange your code in the manner just described, however. Sometimes, as a user of the class, you will have to do more work to ensure that its instances do not hold on to more space than necessary when they are being finalized. The following code illustrates how to do so:

RGBImage3 is identical to RGBImage1 but with the addition of the dispose() method, which nulls the rgbData field. You are required to explicitly call dispose() after using an RGBImage3 instance to ensure that the rgbData array is promptly reclaimed, as shown in Figure 5. Explicit nulling of fields is rarely good practice, but this is one of the rare occasions when it is justified.

Figure 5.Call dispose() After Using an RGBImage3 Instance.

Shield Users From Memory-Retention Problems

This article has described how to avoid memory-retention problems when working with third-party classes that use finalizers. Now let's look at how to write classes that require postmortem cleanup so that their users do not encounter the problems previously outlined. The best way to do so is to split such classes into two -- one to hold the data that need postmortem cleanup, the other to hold everything else -- and define a finalizer only on the former. The following code illustrates this technique:

The Image2 instance is similar to Image1 but with the nativeImg field included in a separate class, NativeImage2. All accesses to nativeImg from the image class must go through one level of indirection. However, when an Image2 instance becomes unreachable, only the NativeImage2 instance will be queued up for finalization. Anything else reachable from the Image2 instance will be promptly reclaimed, as Figure 6 illustrates. Class NativeImage2 is declared to be final so that users cannot subclass it and reintroduce the memory-retention problems this article has previously described.

Figure 6.When the Image2 Instance Becomes Unreachable, Only the NativeImage2 Instance Will Be Queued Up.

A subtle point is that NativeImage2 should not be an inner class of Image2. Instances of inner classes have an implicit reference to the instance of the outer class that created them. Therefore, if NativeImage2 was an inner class of Image2, and a NativeImage2 instance was queued up for finalization, it would also have retained the corresponding Image2 instance, which is precisely what you are trying to avoid. Assume, however, that the NativeImage2 class will be accessible only from the Image2 class. This is why it has no public methods. Its dispose() method, as well as the class itself, is package-private.

An Alternative to Finalization

The preceding example still has one source of nondeterminism: The JVM does not guarantee the order in which it will call the finalizers of the objects in the finalization queue. And finalizers from all classes -- application, libraries, and so on -- are treated equally. So an object that is holding on to a lot of memory or a scarce native resource can get stuck in the finalization queue behind objects whose finalizers are making slow progress -- not necessarily maliciously but maybe due to sloppy programming.

To avoid this type of nondeterminism, you can use weak references, instead of finalization, as the postmortem notification mechanism. This way, you have total control over how to prioritize the reclamation of native resources instead of relying on the JVM to do so. The following example illustrates this technique:

Image3 is identical to Image2. NativeImage3 is similar to NativeImage2, but its postmortem cleanup relies on weak references instead of finalization. NativeImage3 extends WeakReference, whose referent is the associated Image3 instance. Remember that when the referent of a reference object -- in this case a WeakReference -- becomes unreachable, the reference object is added to the reference queue associated with it. Embedding nativeImg into the reference object itself ensures that the JVM will enqueue exactly what is needed and nothing more. See Figure 7. Again, NativeImage3 should not be an inner class of Image3, for the reasons previously outlined.

Figure 7.Embedding nativeImg into the Reference Object Itself.

You can determine whether the garbage collector has reclaimed the referent of a reference object in two ways: explicitly, by calling the get() method on the reference object, or implicitly, by noticing that the reference object has been enqueued on the associated reference queue. This example uses only the latter.

Notice that reference objects are discovered by the garbage collector and added to their associated reference queues only if they are reachable themselves. Otherwise, they are simply reclaimed like any other unreachable object. This is why you add all NativeImage3 instances to the static list -- actually, any data structure will suffice -- to ensure that they remain reachable and processed when their referents become unreachable. Naturally, you also have to make sure that you remove them from the list when you dispose of them. This is done in the dispose() method.

When the dispose() method is explicitly called on an Image3 instance, no postmortem cleanup will subsequently take place on that instance because none is necessary. The dispose() method removes the NativeImage3 instance from the static list so that it is not reachable when its corresponding Image3 instance becomes unreachable. And, as previously stated, unreachable reference objects are not added to their corresponding reference queues.

In contrast, in all the previous examples that use finalization, the finalizable objects will always be considered for finalization when they become unreachable, whether you have explicitly disposed of their associated native resources or not.

The JVM will ensure that, when the garbage collector finds an Image3 instance to be unreachable, it will add its corresponding NativeImage3 instance to its associated reference queue. You must then dequeue it and dispose of its native resource. You can do this with the following method, executed, say, on a "cleanup" thread:

There are cases, however, in which it might not be easy or desirable to introduce a new thread in an application. In such cases, an alternative is to drain the reference queue before every NativeImage3 instance allocation. You can do this by calling the drainRefQueueBounded() method, which follows from the NativeImage3 constructor, so that you dispose some native images that have been made available, just before you need to allocate new ones:

The main difference between drainRefQueueLoop() and drainRefQueueBounded() is that the former is an infinite operation -- the remove() method blocks until a new entry is made available on the queue -- whereas the latter does a bounded amount of work. The poll() method will return null if there are no entries in the queue, and the method will only loop up to MAX_ITERATIONS times, so it does not take an arbitrarily long time if the reference queue is very long.

The previous examples are quite simplistic. Sophisticated developers can also ensure that different reference objects are associated with different reference queues, according to how they need to prioritize their disposal. And the drainRefQueueLoop() or the drainRefQueueBounded() methods can poll all the available reference queues and dequeue objects according to their required priorities.

Although cleaning up resources in this way is clearly a more involved process than using finalization, it is also more powerful and more flexible, and it minimizes much of the nondeterminism associated with the use of finalization. It is also very similar to the way finalization is actually implemented within the JVM. This approach is recommended for projects that explicitly use a lot of native resources and require more control during cleanup. Using finalization with care will suffice for most other projects.

Use Finalization Only When You Must

This article briefly described how finalization is implemented in a JVM. It then gave examples of how finalizable objects can unnecessarily retain memory and outlined solutions to such problems. Finally, it described a method that uses weak references instead of finalization, which allows you to perform postmortem cleanup in a more flexible and predictable manner.

However, total reliance on the garbage collector to identify unreachable objects so that their associated native -- and potentially scarce -- resources can be reclaimed has a serious flaw: Memory is typically plentiful, and guarding a potentially scarce resource with a plentiful one is not a good strategy. So, when you use an object that you know has native resources associated with it -- for example, a GUI component, file, or socket -- by all means call its dispose() or equivalent method when you are finished using it. This will ensure the immediate reclamation of the native resources and decrease the probability of resource depletion. Thus, you will use the approaches discussed in this article for postmortem cleanup only as last resorts and not as the main cleanup mechanisms.

You should also use finalization only when it is absolutely necessary. Finalization is a nondeterministic -- and sometimes unpredictable -- process. The less you rely on it, the smaller the impact it will have on the JVM and your application. See also Joshua Bloch's book, Effective Java Programming Language Guide, chapter 2, item 6: Avoid finalizers.

Note: This article covered only two types of issues that arise when using finalization: memory- and resource-retention issues. The use of finalization and the Reference classes can also cause very subtle synchronization problems. See Hans-J. Boehm's 2005 JavaOne Conference slides, Finalization, Threads, and the Java Technology-Based Memory Model, for a good overview of these issues.

* As used on this web site, the terms "Java Virtual Machine" or "JVM" mean a virtual machine for the Java platform.

Tony Printezis is a member of the development team of the Java HotSpot Virtual Machine at Sun Microsystems. He spends most of his time working on dynamic memory management, concentrating on the scalability, responsiveness, parallelism, and visualization of garbage collectors.