Friday, 10 August 2018

Garbage Collection in Java

Garbage collection in java is one of the advance topic. Java GC knowledge helps us in fine tuning our application runtime performance.

Garbage Collection in Java

◈ In Java, the programmers don’t need to take care of destroying the objects that are out of use. The Garbage Collector takes care of it.
◈ Garbage Collector is a Daemon thread that keeps running in the background. Basically, it frees up the heap memory by destroying the unreachable objects.
◈ Unreachable objects are the ones that are no longer referenced by any part of the program.
◈ We can choose the garbage collector for our java program through JVM options, we will look into these in later section of this tutorial.

How Automatic Garbage Collection works?

Automatic Garbage collection is a process of looking at the Heap memory, identifying(also known as “marking”) the unreachable objects, and destroying them with compaction.

An issue with this approach is that, as the number of objects increases, the Garbage Collection time keeps on increasing, as it needs to go through the entire list of objects, looking for the unreachable object.

However, the empirical analysis of applications shows that most of the objects are short-lived.

This behavior was used to improve the performance of JVM, and the adopted methodology is commonly called Generational Garbage Collection. In this method, the Heap space is divided into generations like Young Generation, Old or Tenured Generation, and Permanent Generation.

The Young generation heap space is the new where all the new Objects are created. Once it gets filled up, minor garbage collection (also known as, Minor GC) takes place. Which means, all the dead objects from this generation are destroyed This process is quick because as we can see from the graph, most of them would be dead. The surviving objects in young generation are aged and eventually moves to the older generations.

The Old Generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually, the old generation needs to be collected. This event is called a Major GC (major garbage collection). Often it is much slower because it involves all live objects.
Also, there is Full GC, which means cleaning the entire Heap – both Young and older generation spaces.

Lastly, up to Java 7, there was a Permanent Generation (or Perm Gen), which contained metadata required by the JVM to describe the classes and methods used in the application. It was removed in Java 8.

Java Garbage Collectors

The JVM actually provides four different garbage collectors, all of them generational. Each one has their own advantages and disadvantages. The choice of which garbage collector to use lies with us and there can be dramatic differences in the throughput and application pauses.

All these, split the managed heap into different segments, using the age-old assumptions that most objects in the heap are short-lived and should be recycled quickly.

So, the four types of garbage collectors are:

Serial GC

This is the simplest garbage collector, designed for single threaded systems and small heap size. It freezes all applications while working. Can be turned on using -XX:+UseSerialGC JVM option.

Parallel/Throughput GC

This is JVM’s default collector in JDK 8. As the name suggests, it uses multiple threads to scan through the heap space and perform compaction. A drawback of this collector is that it pauses the application threads while performing minor or full GC.
It is best suited if applications that can handle such pauses, and try to optimize CPU overhead caused by the collector.

The CMS collector

The CMS collector (“concurrent-mark-sweep”) algorithm uses multiple threads (“concurrent”) to scan through the heap (“mark”) for unused objects that can be recycled (“sweep”).

This collector goes in Stop-The-World(STW) mode in two cases:

-While initializing the initial marking of roots, ie. objects in the old generation that are reachable from thread entry points or static variables
-When the application has changed the state of the heap while the algorithm was running concurrently and forcing it to go back and do some final touches to make sure it has the right objects marked.

This collector may face promotion failures. If some objects from young generation are to be moved to the old generation, and the collector did not have enough time to make space in the old generation space, a promotion failure will occur.

In order to prevent this, we may provide more of the heap size to the old generation or provide more background threads to the collector.

G1 collector

Last but not the least is the Garbage-First collector, designed for heap sizes greater than 4GB. It divides the heap size into regions spanning from 1MB to 32Mb, based on the heap size.

There is a concurrent global marking phase to determine the liveliness of objects throughout the heap. After the marking phase is complete, G1 knows which regions are mostly empty. It collects unreachable objects from these regions first, which usually yields a large amount of free space. So G1 collects these regions(containing garbage) first, and hence the name Garbage-First. G1 also uses a pause prediction model in order to meet a user-defined pause time target. It selects the number of regions to collect based on the specified pause time target.

The G1 garbage collection cycle includes the phases as shown in the figure:

1. Young-only phase: This phase includes only the young generation objects and promotes them to the old generation. The transition between the young-only phase and the space-reclamation phase starts when the old generation is occupied up to a certain threshold, ie. the Initiating Heap Occupancy threshold. At this time, G1 schedules an Initial Mark young-only collection instead of a regular young-only collection.

◈ Initial Marking: This type of collection starts the marking process in addition to a regular young-only collection. Concurrent marking determines all currently live objects in the old generation regions to be kept for the following space-reclamation phase. While marking hasn’t completely finished, regular young-only collections may occur. Marking finishes with two special stop-the-world pauses: Remark and Cleanup.

◈ Remark: This pause finalizes the marking itself, and performs global reference processing and class unloading. Between Remark and Cleanup G1 calculates a summary of the liveness information concurrently, which will be finalized and used in the Cleanup pause to update internal data structures.

◈ Cleanup: This pause also takes the completely empty regions, and determines whether a space-reclamation phase will actually follow. If a space-reclamation phase follows, the young-only phase completes with a single young-only collection.

2. Space-reclamation phase: This phase consists of multiple mixed collections — in addition to young generation regions, also evacuates live objects of old generation regions. The space-reclamation phase ends when G1 determines that evacuating more old generation regions wouldn’t yield enough free space worth the effort.

G1 can be enabled using the –XX:+UseG1GC flag.

This strategy reduced the chances of the heap being depleted before the background threads have finished scanning for unreachable objects. Also, it compacts the heap on-the-go, which the CMS collector can do only in STW mode.

In Java 8 a beautiful optimization is provided with G1 collector, called string deduplication. As we know the character arrays that represent our strings occupies much of our heap space. A new optimization has been made that enables the G1 collector to identify strings which are duplicated more than once across our heap and modify them to point to the same internal char[] array, to avoid multiple copies of the same string residing in the heap unnecessarily. We can use the -XX:+UseStringDeduplication JVM argument to enable this optimization.

G1 is the default garbage collector in JDK 9.

Java 8 PermGen and Metaspace

As mentioned earlier, the Permanent Generation space was removed since Java 8. So now, the JDK 8 HotSpot JVM uses the native memory for the representation of class metadata which is called Metaspace.

Most of the allocations for the class metadata are made out of the native memory. Also, there is a new flag MaxMetaspaceSize, to limit the amount of memory used for class metadata. If we do not specify the value for this, the Metaspace re-sizes at runtime as per the demand of the running application.