Sunday, January 31, 2010

A little while ago, I got asked a question about when an object is allowed to be collected. It turns out that objects can be collected sooner than you think. In this entry, I'll talk a little about that.

When we were formulating the memory model, this question came up with finalizers. Finalizers run in separate threads (usually they run in a dedicated finalizer thread). As a result, we had to worry about memory model effects. The basic question we had to answer was, what writes are the finalizers guaranteed to see? (If that doesn't sound like an interesting question, you should either go read my blog entry on volatiles or admit to yourself that this is not a blog in which you have much interest).

Let's start with a mini-puzzler. A brief digression: I'm calling it a mini-puzzler because in general, for puzzlers, if you actually run them, you will get weird behavior. In this case, you probably won't see the weird behavior. But the weird behavior is perfectly legal Java behavior. That's the problem with multithreading and the memory model — you really never know what the results of a program will be from doing something as distasteful as actually running it.

Let's say that by some miracle the finalizer actually runs (Rule 1 of why you don't use finalizers: they are not guaranteed to run in a timely fashion, or, in fact, at all). What do you think the program is guaranteed to print?

Those of you who are used to reading these entries will realize immediately that unless they actually already know the answer, they have no idea. Let's try to reason it out, then.

First, we notice that the object reference fo is live on the stack when all three variables are set. So, the object shouldn't get garbage collected, right? The finalizer should print out 1 2 3, yes?

The VM can do a few things to effect this (yes, this is the correct spelling of effect). First, it can notice that theobject is never used after the call to setJ, and null out the reference to fo immediately after that. It's reasonably clear that if the finalizer ran immediately after that, you would see 1 2 0.

That's not the end of it, though. The VM can notice that:

This thread isn't using the value written by that write to j, and

There is no evidence that synchronization will make this write visible to another thread.

The VM can then decide that the write to j is redundant, and eliminate it that write altogether. Woosh! You get 1 0 0.

At this point, you are probably expecting me to say that you can also get 0 0 0, because the programmer isn't actually using the write to i, either. As a matter of fact, I'm not going to say that. It turns out that the end of an object's constructor happens-before the execution of its finalize method. In practice, what this means is that any writes that occur in the constructor must be finished and visible to any reads of the same variable in the finalizer, just as if those variables were volatile. This paragraph originally read incorrectly

The immediate question is, how does the programmer avoid this insanity? The answer is: don't use finalization!

Okay, that's not enough of an answer. Sometimes you need to use finalization. There's a hint several paragraphs up. The finalizer takes place in a separate thread. It turns out that what you need to do is — exactly what you would do to make the code thread-safe. Let's do that, and look at the code again.

18 comments:

@fmeulenaars: You could certainly have implemented it that way. I wanted to avoid a discussion of lock acquisition inside constructors.

Also, people still use finalizers when they open native resources; the finalizer will do emergency cleanup of, say, file descriptors. Hopefully, most people know well enough to avoid them - this post is just leading up to a discussion of how this stuff works with SoftReferences.

"At this point, you are probably expecting me to say that you can also get 0 0 0, because the programmer isn't actually using the write to i, either. As a matter of fact, I'm not going to say that."

Actually that is exactly what I expected you to say. And not due to unused write to "i", but because a finalizers are run in separate thread and while field "i" is not volatile it is possible for other threads to see old i values (which is 0). Am I wrong and it is impossible to get 0 0 0 as output?

@Dmitry - everything that happens in the constructor must happen-before the finalizer. That means that every write that occurs in the constructor (including the write to i) must be ordered before and visible to any reads of i that occur in the finalizer.

But you did not say anything (yet) that makes me *not* want to use finalization (w/o the locking) .. as you point out, the example did not use the object after populating it, so the developer got what they deserved, right? They could've used volatile if they were designing their app for the Finalizer.

@ej - The point is less that you shouldn't use finalizers at all, and more that you have to take thread-safety into account when writing them. Many people think of multithreaded programming as hard; such people should think twice before writing finalizers.

The best reason not to write finalizers (in my opinion) is that they are not guaranteed to be run.

@franci - the VM can eliminate the write to j, but only if it can determine that the value of that write is not used by this thread or made visible to another thread via synchronization. I've tried to clean up the wording to make that clearer.

In practice, what this means is that any writes that occur in the constructor must be finished and visible to any reads of the same variable in the finalizer, just as if those variables were volatile.To achieve this effect, the JVM might be injecting code that writes to a volatile variable as the last statement in the constructor. The finalizer process will have to make sure to read this injected variable before the finalizer is run on this object's instance. And this injection should only happen if the object has a finalize() method. right ?

I tried to use weak references for event handling. The advantage is that there is no need to remove event listeners if they are weakly-referenced. However, I hit this very issue: if the garbage collector runs just before the listener object makes it into the main memory, the weak reference to that object is cleared. Now I am screwed. It seemed like such a good idea.

@Jerome - the presence of a synchronized block is not enough (by itself) to guarantee a write's visibility. Consider the following synchronized block:

synchronized(new Object()) { x = 1;}

The system can determine that lock on the new Object() will never be acquired by another thread, and remove the lock acquisition and release entirely.

The point is that you need both ends of the happens-before relationship to guarantee visibility - the reader needs to use synchronization, and the writer needs to use synchronization. I've written a number of other blog entries on this subject.

About Me

I'm a programming languages and software engineering guy who works at Google. Nothing I say represents the views of my employer, of course.
I was one of the authors on JSR-133, the revision of the JLS that dealt with threads and synchronization. I often use this blog to address frequently asked questions about threading.