It seems to me that the importance of garbage collection in Java (and other garbage collected languages) is disproportional to the explanations given about it. While there would be millions of articles when you look for JavaFX examples, there would be only a couple if you tried to search about the â€œParallel Compacting Collectorâ€ mentioned in Sunâ€™s memory management whitepaper.

Since I wanted to understand better how the garbage collection is currently implemented in the Java VM and to see whatâ€™s ahead, Iâ€™ve scourged the internet a bit and found many interesting articles and slides. With the searches, it became like the old saying: there is a lot of stuff you donâ€™t know, but thereâ€™s even more stuff you donâ€™t even know you donâ€™t know. This post is my summary of what Iâ€™ve found, but this is just the way I understand the explained material in publications and whitepapers; if Iâ€™m wrong somewhere, please do correct me.

The basics of garbage collection

The garbage collector first performs a task called marking. The garbage collector traverses the application graph, starting with the root objects; those are objects that are represented by all active stack frames and all the static variables loaded into the system. Each object the garbage collector meets is marked as being used, and will not be deleted in the sweeping stage.

The sweeping stage is where the deletion of objects take place. There are many ways to delete an object: The traditional C way was to mark the space as free, and let the allocator methods use complex data structures to search the memory for the required free space. This was later improved by providing a defragmenting system which compacted memory by moving objects closer to each other, removing any fragments of free space and therefore allowing allocation to be much faster:

For the last trick to be possible a new idea was introduced in garbage collected languages: even though objects are represented by references, much like in C, they donâ€™t really reference their real memory location. Instead, they refer to a location in a dictionary which keeps track of where the object is at any moment.

Fortunately for us – but unfortunately for these garbage collection algorithms – our servers and personal computers got faster (and multiple) processors and bigger memory capacities. Compacting memory areas this large often was very taxing on the application, especially considering that when doing that, the whole application had to freeze due to the changes in the virtual memory map. Fortunately for us though, some smart people improved those algorithms in three ways: concurrency, parallelization and generational collection.

Generational garbage collection

In any application, objects could be categorized according to their life-line. Some objects are short-lived, such as most local variables, and some are long-lived such as the backbone of the application. The thought about generational garbage collection was made possible with the understanding that in an applicationâ€™s lifetime, most instantiated objects are short-lived, and that there are few connections between long-lived objects to short-lived objects.

In order to take advantage of this information, the memory space is divided to two sections: young generation and old generation. In Java, the long-lived objects are further divided again to permanent objects and old generation objects. Permanent objects are usually objects the Java VM itself created for caching like code, reflection information etc. Old generation objects are objects that survived a few collections in the young generation area.

Since we know that objects in the young generation memory space become garbage early, we collect that area frequently while leaving the old generationâ€™s memory space to be collected in larger intervals. The young generation memory space is much smaller, thus having shorter collection times.

An additional advantage to the knowledge that objects die quickly in this area, we can also skip the compacting step and do something else called copying. This means that instead of seeking free areas (by seeking the areas marked as unused after the marking step), we copy the live objects from one young generation area to another young generation area. The originating area is called the From area, and the target area is called the To area, and after the copying is completed the roles switch: the From becomes the To, and the To becomes the From.

In addition, the Java VM splits the young generation to three areas, by adding an area called Eden which is where all objects are allocated into. To my understanding this is done to make allocation faster by always having the allocator reference to the beginning of Eden after a collection.

By using the copying method, garbage collection achieves defragmentation without seeking for dead memory blocks. However, this method proves itself to be more efficient in areas where most objects are garbage, so it is not a good approach to take on the old generation memory area. Indeed, that area is still collected using the compacting algorithm – but now, thanks to the separation of young and old generations, it is done in much larger intervals.

Next up

I didnâ€™t expect the amount of information Iâ€™ve found. I especially didnâ€™t expect the amount of information Iâ€™ve found regarding how the garbage collector makes use of the multiple processors platforms available today in almost all new computers. Iâ€™m not a big believer in extremely long, 3,000 words posts, so all the information regarding parallel and concurrent garbage collectors can be found on the next post, allowing me to upload this one now and have a couple of days to edit the next one before sending it online.

Rick: When the garbage collector does the collection, it moves objects around. From the young generation’s From to the To area, from the young generation area to the old area, and within the old generation area when compacting.

I got this information from the memory management whitepaper Sun published (now appears as a link at the top of the post) and from other resources such as JavaOne slides (publicly available for non-SDN members as well, even though SDN membership is free).

Let me correct my drawing a bit, though: in the generation collection, I made it look as if the marking marked the objects as unused. It’s obviously not true if you read the text, as it marks the used objects, and discards the rest.

[...]Since I wanted to understand better how the garbage collection is currently implemented in the Java VM and to see whatâ€™s ahead, Iâ€™ve scourged the internet a bit and found many interesting articles and slides. With the searches, it became like …

While knowledge of the garbage collector may not help you write better code, it does help you understand how things work underneath.

Recently, my work ran into some issues with our system. We had not been very careful with our memory management over the many years that the system has been developed (and maintained & added to…) and so we were actually running out of memory even with garbage collection.

Understanding how the garbage collection works will help me to keep in mind that even though Java will clean up after me, it is VERY good practice to make sure to clean up after yourself as well. If you are done with an object set it to null, don’t just let it sit there, because eventually, you may very well run into issues where you eat up all of your memory and the garbage collector just can’t handle it.

One more point to bring up about the garbage collector, which you mentioned briefly, is that after an object has been moved back and forth in the young generation a few times, it gets moved to the old generation and it will stick around there much longer. So, this is why it is important for you to clean up after yourself. You don’t want an object sticking around long enough to get bounced back and forth and then end up in the old generation area, since it will be even longer before it gets cleaned up. Just mark it as null when you are done with it and it will get cleaned out quickly.

LadyCoder: I agree with you regarding being careful with your object management even with garbage collector. the garbage collector collects garbage, and if an object is still in use (even if not intentionally) it will created memory leaks even in a garbage collected language like Java.

However, I don’t tend to agree on the “null” setting. There are very specific cases where you actually need to do that, and in most cases the scope system takes care of the problem for you. More than that, the Java compiler creates “mini-scopes” where it can to optimize your usage of objects even further, and sometimes setting an object to null confuses those optimizations and makes them produce less optimal bytecode, prolonging the life of an object instead of making it shorter.

I might make a post about known anti-patterns that a lot of people use with garbage collected languages. Setting to null is not extremely bad, but it might result in less than optimal performance too.

Daniel: Remember that for short-lived objects, the garbage collector does not look for the dead objects, it looks for the live objects and just abandons the ones unmarked. Therefore, knowing which objects are dead is not going to help it.

In the case of the old generation the story is different though, but even here, since the entire graph of objects needs to be traversed in order to mark all the live objects, having another counter would not save any time, as far as I understand the process.

Daniel: you’re right to be curious. In fact, one of the biggest pitfalls of applications working with a non-concurrent garbage collector.

Since the garbage collectors “stop the world” when they perform their act, a too-long finalize method can really make everything go awry in terms of performance.

In the case of the concurrent collector, to the best of my understanding, if the garbage collector’s collection time is too long, the application could receive an OutOfMemoryException since the collector just can’t keep up.

WeakReferences are easy though, and don’t affect the young generation much – they are just not followed when the collector traverses the object graph. In fact, you could say that by using weak references, you’re helping the garbage collector work faster!

@lavnish: Of course! Or at least, to the best of my ability. Historically, the application graph contained all objects directly referenced from your application’s threads’ run() method (or something similar), your static instances of classes, and all objects reachable from these “root” objects.

As far as I know, this has been optimized; unfortunately, I am unfamiliar with the specifics on this optimization.

Jose: I assume you mean instances of anonymous classes. Since anonymous classes are the same as non-static inner classes, there is not a lot of difference at all in how the garbage collection is done.

The one “catch” is that non-static inner classes contain a pointer to their parent class – this is why you can access its members. Because of that, the parent class will not be garbage collected even if all visible pointers to it are gone if you still have a reference to the inner class.

HI Guys.. It is a very good article to know about GC. Thanks for it, Yael.

I am working on a Java application where everytime it goes out of the memory. something like, The UI will be frozen. can you guys gimme few hints where I can make my application more stable. This is an applet running on IE 5.0.

Great info! I was asked in one of my interviews about GC. I could hardly tell them much because books or most of the pages on the internet give restricted info. Now I know how GC really works. Thanks a lot!

@Kelvin: Of course not. You could have memory leaks if you retain objects in a “living” state for longer they are required, or if you use more memory than can be allocated by the JVM. It does, however, prevent a lot of “memory leak” situations found in non-GC languages such as not deleting objects for reasons such as mishandling passed references to objects. If you’re trying to avoid memory leaks and GC-related problems in Java, please read the GC Tips and Memory Leaks post.

One correction: In your picture of scavenging the Eden into the survivor spaces, you make it look as though the “Just allocated” object exists outside of the Eden, and “will go to Eden”. In fact, new objects are allocate *in* the Eden, so we don’t have to copy it to the Eden.

@Peter: You’re correct. I just had to draw it somewhere, and thought that outside of the memory scope will show that this is a new object better. But obviously, the object is allocated inside Eden and not copied into it.

@Muhammad: Whenever it feels like it. More specifically, it depends on which generation you would be talking about. The objects in eden and the to and from areas are constantly GC’d, as only the living objects are moved around. The older generation is GC’d according to some tuning specifications, but generally it would occur when the memory hits the upper limit of the currently allocated memory space, just before allocating more memory (if it didn’t reach the maximum yet) or throwing an OOME. I’m not sure about a periodic run of the GC though.

[...] memory leaks. Garbage collection removes unused blocks of memory so that memory does NOT leak. How does garbage collection work? __________________ Zack "There's more to a heart than just anger or [...]

[...] only if an object does not have any reference to it from other “live” objects (see more details here). In Android it is not so. If a certain GUI gets hidden (it is not visible on the screen), there is [...]