Featured in DevOps

Adin Scannell talks about gVisor - a container runtime that implements the Linux kernel API in userspace using Go. He talks about the architectural challenges associated with userspace kernels, the positive and negative experiences with Go as an implementation language, and finally, how to ensure API coverage and compatibility.

The Fatal Flaw of Finalizers and Phantoms

A Java finalize() method allows an object to take special cleanup actions before it is ultimately discarded by the garbage collector. It is fairly well-known that this language feature has issues, and that safe usage of it is limited to a very narrow set of use-cases, the primary example being the “safety-net” pattern, where a finalizer is used in case the owner of the object forgets to call the explicit termination method. Unfortunately, it is lesser-known that even such use cases are brittle and, without special precaution, can also fail. And contrary to popular belief PhantomReference, which is often cited as a good alternative to finalizers, suffers from the same fundamental problem.

Before getting into the details of this issue, it’s useful to review the general negatives of Java finalizers.

Related Sponsor

Garbage collectors can choose to hold-off cleaning up used objects until capacity becomes more limited or until certain execution characteristics, such as possible reduced load, indicate that an auspicious collection period has materialized. The drawback of this behavior is that a finalizer won’t run until its object is collected. Also, finalizer methods are often executed in a small thread pool, causing additional delays. The problem is compounded when finalizers are poorly written, which can introduce the potential to execute blocking actions that can significantly delay other finalizers from executing, since they all tend to share the same pool. If the program is prematurely exited, or the garbage collector has abundant resources, a finalizer may never be executed. Therefore, a class must never be designed in such a way as to require that an action be taken by a finalizer.

2. Garbage collection of an object that contains a finalizer is significantly more expensive than without.

Objects with finalize() methods require more work for the garbage collector to track, and the execution requirements of the finalize method require that the garbage collector keep all memory associated with it around until execution has successfully completed. This means a collector is typically required to revisit the object, likely in a whole separate pass. Consequently finalizers on objects with large instance counts and short lifespans are likely to introduce major performance problems.

3. Concurrent execution of finalizers on objects in the same object graph can produce undesirable results.

This can lead to unintuitive behavior in data structures, where nodes often reference each other. Finalizers on these nodes may be called at the same time and in any order, which can lead to corrupt state if they access the state of their respective peers. Care must be to taken to either ensure a specific order, or to handle the resulting volatility.

4. Uncaught exceptions in finalizers are ignored and never reported

Finalizers require proper exception handling and logging in the case of failure. Otherwise critical debugging data will be lost and the object will potentially remain in an unexpected state.

5. Finalizers can unintentionally resurrect objects in corrupt states

If the “this” reference leaks out of a finalize() method, the object can still be visible but in a corrupt half-cleaned state, likely leading to bugs in other portions of the application.

In summary, the combination of one or more of these factors preclude most use-cases common to object cleanup facilities in deterministic languages, such as destructors in C++, which are tied to well defined scopes or explicit free operations. Java is instead designed around caller-oriented cleanup.

Proper Resource Cleanup

Proper resource cleanup in Java should be caller-oriented. This approach requires that resources provide a “close” method, and that callers use Java 7’s try-with-resources (or optionally try/finally). This provides immediate and deterministic cleanup. All code that ever uses a resource (something with a close() or other termination method) should be developed using this mechanism, even if the resource provides its own finalize() method. Doing so greatly reduces the potential for bugs and improves performance.

Example 1: a proper try-with-resources

try (FileInputStream file = new FileInputStream(“file")) {
// Try with resources will call file.close when this block is done
}

However, API designers often wish, or are required by contract, to add additional safeguards to a heavy weight resource in case the caller forgets to make proper use of the try-with-resources (or try/finally) mechanism. As an example of the latter, a JNDI Context is often associated with resources such as a network connection, and its Javadoc explicitly states that calling close() is optional, and that cleanup will still occur if not called.

The Flaw

To protect against such omissions, the only available options are to use a “safety-net finalizer” pattern or use PhantomReference. Unfortunately these options can and do fail if precautions are not employed.

At first glance there appears to be nothing wrong with our flawed safety-net example, and in many cases it will execute correctly. However, under the right conditions it will fail with an unexpected exception. (The astute reader will notice that FileOutputStream already has a built-in finalizer, and thus this example is redundant; nonetheless not all resources have one, and this example is intended to be a concise illustration)

Exception 1: Exception demonstrating premature finalization

Exception in thread "main" java.io.IOException: Stream Closed

at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:325)
at Example.work(Example.java:36)
at Example.main(Example.java:47)

This failure clearly shows that somehow the finalizer ran during the execution of the work method, but the question arises, how and why does this occur?

A detailed look into the mechanics of OpenJDK’s HotSpot will provide insight into how this happens.

How it Happens - Diving into JVM Internals

When looking into HotSpot’s behavior, it’s useful to understand a few key concepts. Under HotSpot objects are considered live if they are reachable from another object on the heap, a JNI handle, or an in use local reference executing from a method on a thread’s stack.

Determining whether a local reference is in use is complicated for HotSpot, thanks to its advanced just-in-time compiler. This compiler translates Java bytecode into optimized native instructions based on the target CPU architecture, and live environmental factors including active load patterns. Since the resulting code can vary greatly, an efficient and reliable mechanism that coordinates with the garbage collector is required.

Under HotSpot, this mechanism is known as a safe-point. When a thread hits a safe-point, the garbage collector has an opportunity to safely manipulate a thread’s state, and determine live local objects since execution of application code is briefly suspended. Only certain points in program execution are candidates for becoming a safe-point, the most notable of which is a method call.

During native code generation, the JIT compiler stores GC maps at every potential safe point. GC maps contain the list of objects which are considered live by the JIT compiler at that point in time. The garbage collector can then use these maps and accurately determine which objects are locally reachable, without having to understand the native code behind them.

By inserting an arbitrary method call such as yield() at the start of the example work() method in the example above, the GC map at that point in time can be compared against the map from later method invocations to determine exactly when HotSpot decides the object reference is eligible for collection. Let’s do some more analysis to see what caused the above exception.

GC maps can be inspected in the printed assembly output from OpenJDK, by first installing a disassembler plugin and then using the appropriate VM options. Output will only occur when the method is selected for compilation, so additional parameters are required to force this to occur eagerly. The most aggressive optimizations are performed by the server compiler (C2), which makes it the ideal choice for this analysis. Note that this mode normally requires ten thousand invocations before methods are natively compiled. Setting the compiler threshold to one allows this to occur immediately.

HotSpot follows the standard x86-64 calling conventions. When the work method is invoked, it places a copy of the “this” object reference in the register “rsi” before any code in the method executes. The first instruction simply copies the “this” reference that was placed into “rsi” into the working register “rbp”.

mov rbp,rsi

On the second instruction “call”, the dummy method Thread.yield is called, which, as mentioned before, is a potential safe-point candidate, and C2 has therefore included a GC map (labeled OopMap in the output). At this point in time, the contents of “rbp”, which is the current “this” reference, is marked as live, so the object can not be collected, and is therefore not finalizable at this point.

The third instruction “mov” copies the contents of the “stream” field into “rbp”, overwriting the previously stored reference to “this”. In HotSpot, Java objects are stored in a contiguous mapping of memory, so a field read is simply an offset added to the address of the object that contains the field. In this case “stream”, is located 16 bytes from the start of “this”.

The next set of instructions sets up and performs the write() invocation on the contents of the “stream” field which is now stored in the “rbp” register. The “test” and “je” instructions perform a null pointer check, throwing a NullPointerException if necessary. The first “mov” instruction copies the constant “PART1” (a byte array reference) into “rdx”, setting up the argument to the method call. The “rbp” register which currently contains the “stream” field is copied into “rsi”, which follows the calling convention for the subsequent write() call.

Finally the write() call is made, and since it is a potential safe-point, another GC map is included. This map indicates that only “rbp” is reachable. Since “rbp” was overwritten with “stream”, it no longer contains “this”, and so “this” is no longer considered reachable at this point of execution. The previous diagram depicts the state of “rbp” throughout the work() method code.

Since the work method’s “this” reference was the sole remaining reference to the object, the finalizer can now execute concurrently with this write() invocation, leading to the failing stack trace mentioned earlier. Likewise, any phantom references associated with this object can be passed to the respective cleanup threads, leading to the same premature close.

In summary, this analysis shows us that an object can be collected before live method calls on that object have completed.

Why it Happens

Unfortunately, this behavior is explicitly allowed by the Java Language Specification (12.6.1):

“Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.”

And more ominously:

“Transformations of this sort may result in invocations of the finalize method occurring earlier than might be otherwise expected.”

While unintuitive, the general notion of eager cleanup is beneficial to performance. As an example, it’s wasteful to hold an object around which will no longer be used by one of its methods engaging in some form of long-running activity. However, the interaction of this behavior with the finalizer and phantom reference features is counter-productive and error-prone.

Mitigation Strategies

Fortunately there are various techniques that can be used to prevent this errant behavior. However, keep in mind that these techniques are delicate, so exercise care when using and maintaining the code.

The Synchronize-It-All Strategy

This strategy is based on a special rule in the JLS (12.6.1):

“If an object's finalizer can result in synchronization on that object, then that object must be alive and considered reachable whenever a lock is held on it.”

In other words, if the finalizer is synchronized, it is guaranteed to not be invoked until after all other pending synchronized method calls have completed. It is the simplest approach, as all it requires is adding a synchronized keyword to the finalizer and all methods that can potentially conflict with it (typically all of them).

The most obvious drawback to this approach is that it serializes all access to the object, which precludes any class which must support concurrent method access. Another drawback is that the overhead of synchronization is a significant performance penalty. In scenarios where the instance is pinned to a single thread for an extended timeframe, the JVM can "check-out" the lock in a process called lock-biasing, which eliminates most of the cost. However, even when this optimization occurs, due to the Java Memory Model requirements, a memory fence will likely be required to synchronize the state between CPU cores, which typically introduces unnecessary latency.

The Synchronize-With-RWLock Strategy

For objects that require concurrent access, the Synchronize-It-All strategy can be modified to support parallel work() method execution. This is accomplished by using a ReadWriteLock and a separate cleanup thread. The work() method acquires a read lock under a brief synchronize to ensure the finalizer does not run, ensuring in turn that the read lock is always acquired before the write lock. The separate cleanup thread is necessary since the cleanup task, once created, blocks on the write lock, and stalling the JVMs finalizer execution thread(s) should be avoided due to the reasons listed earlier.

This isn’t guaranteed to work because the optimizer can easily discard an unused read. A “clever” attempt to correct this would be to capitalize on another JLS rule which states that all field writes must be visible to a finalizer.

Unfortunately, this attempt can also fail because the optimizer is free to reorder instructions as it wishes, or even eliminate the write instruction, since it’s never used. The code in Example 10 is equivalent since the result of the method is still the same.

Fortunately, it is possible to prevent an optimizer from reordering an instruction, by taking advantage of the program order rule in the Java Memory Model. The JMM requires that all memory effects before a write to a volatile field be visible to all other executing threads when the volatile is read, or any other event which establishes a “happens-before” relationship.

By changing counter to be a volatile, HotSpot won’t reorder the instruction above the write calls. However, there is still a theoretical possibility that some future optimizer could determine that the memory effects of the volatile write are not necessary, and since the field is never used, it could still potentially be eliminated. This can be safe-guarded against by publishing the value to a public static field. The contents of a public static field must be visible to not-yet-loaded code. Combining these approaches leads to a working volatile-based strategy, which allows concurrent access without any form of locking.

While the write is unavoidable, there is still a memory fence which occurs every write, and ideally that would be avoided.

The Volatile + Lazy Write Strategy

A small modification to the volatile strategy can reduce the cost of the write, yet still ensure the desired ordering effects are in place. Using an AtomicIntegerFieldUpdater allows a class to perform a lazy write. A lazy write uses a cheaper fence called a store-store, which only ensures write ordering. x86 and SPARC are naturally ordered, so a lazy write is effectively free on these platforms. On some platforms, such as ARM, there is a slight cost, but it is still much less than a normal volatile write.

Native Method Immunity

JNI invocations keep a reference to the host object alive, so no special strategies are required. However, it is common for native methods to be mixed with Java methods, and so a class using native code might still need to take the appropriate precautions for all Java methods.

The Need for Improvement

While effective, these strategies are all cumbersome, brittle, and far more expensive than a clean language construct. Such a construct already exists in the .NET platform. C# applications have the ability to use a simple GC.KeepAlive() call, which tells the JIT compiler to keep the passed object around until that point in time.

If the JDK were to implement a similar construct, the work() method would look like the following:

There is no unnecessary overhead in this approach. The code is very clean, and its purpose is clear to future maintainers. Any doubt simply requires examining the Javadoc of the keepAlive() method.

Conclusion

The finalizer and phantom reference features of Java are error-prone and should generally be avoided. However, there are legitimate use-cases for these features, and when used, one of the strategies in this article should be employed to prevent the problem of premature collection.

All usage of resources, whether finalizable or not, should always utilize either try-with-resources, or try/finally. Even objects with broken finalizers will behave correctly if all callers take this important step.

Hopefully future versions of the JVM will implement a keepAlive() construct, which will greatly improve the developer experience and reduce the potential for bugs when such use-cases are necessary.

Example Code

The code for each of these strategies, as well as the flawed example, are available on GitHub.

About the Author

Jason Greene is a Platform Architect at Red Hat, and currently leads the open-source WildFly application server project. He is also a member of the JCP, and represents Red Hat on the Java EE specification. During his tenure at JBoss he has worked in many areas including the application server, clustering, web services, AOP, and security. His interests include concurrency, distributed computing, network protocols, and programming language design. Follow Jason at twitter.com/jtgreene.

Community comments

Comment

Your message is awaiting moderation. Thank you for participating in the discussion.

Thanks for the nice article. Never thougt that this can collected during instance method execution.But seems to me there is some issue in the code fragment in the 'The Need for Improvemet'. Code looks like

System.keepAlive(stream) should save stream from garbage collector, but it can't save stream from be closed by finalize() function of the class containin work() method. So System.keepAlive(stream) should be replaced with System.keepAlive(this). In this case finalize() won't be called and stream will be kept open.

And another one path to improvement. JDK can introduce annotation like

@Target(ElementType.METHOD)public @interface KeepThis {};

Annotation can be applied to the class instance method. Annotation usage instructs JVM to keep this alive during whole annotated method execution. With @KeepThis method work() can be rewritten like

System.keepAlive() is more general than @KeepThis annotation. It can be put anywhere in the code and any reference can be passed as the argument. Annotation is more narroved, but for me looks more expressive.

Re: Comment

Your message is awaiting moderation. Thank you for participating in the discussion.

Thanks for catching the mistake in the example. I definitely intended for it to apply to "this". It's been corrected now.

I agree that a method annotation is another good solution. It does have a minor drawback though; its scope is possibly broader than the minimal lifespan of "this" required, which would keep the object around a bit longer than necessary. Granted, a developer could always just structure the method body to best align the scope (splitting into multiple methods) if it was important.

Re: I cannot reproduce the problem :(

Your message is awaiting moderation. Thank you for participating in the discussion.

Hello Andrei,

There are two factors that are needed for the failure to occur:

1) The code must be optimized, and hotspot delays optimizing calls for quite some time (10k iterations by default with the server compiler)

2) GC has to decide to run in the middle of work() method's execution

If you look at the GitHub examples link in the article, there is a BrokenFinalizer example which is easier to reproduce, but does require that you using JVM flags encourage the optimizer using to run immediately (see the javadoc for BrokenFinalizer). The article also shows one of the approaches of setting the threshold to 1, but the code only compiles during the first invocation, so you need to calls for that approach. The other approach is to use -Xcomp with tiered compilation.

If you want to see the exact exception in the code example in the article, you can encourage it by sticking a System.gc() and a Thread.sleep(1000); in between the two write calls. Then run with: