Monday, April 11, 2011

I have read many articles on Garbage Collection in Java, some of them are too complex to understand and some of them don’t contain enough information required to understand garbage collection in Java. Then I decided to write my own experience as an article. You can call it a tutorial about garbage collection in simple word, which would be easy to understand and have sufficient information to understand how garbage collection works in Java. Garbage collection works by employing several GC algorithm e.g. Mark and Sweep. There are different kinds of garbage collector available in Java to collect different area of heap memory e.g. you have serial, parallel and concurrent garbage collector in Java. A new collector called G1 (Garbage first) are also introduced in JDK 1.7. First step to learn about GC is to understand when an object becomes eligible to garbage collection? Since JVM provides memory management, Java developers only care about creating object, they don't care about cleaning up, that is done by garbage collector, but it can only collect objects which has no live strong reference or it's not reachable from any thread. If an object, which is suppose to be collected but still live in memory due to unintentional strong reference then it's known as memory leak in Java. ThreadLocal variables in Java web application can easily cause memory leak.

Important points about Garbage Collection in Java

This article is in continuation of my previous articles How Classpath works in Java and How to write Equals method in Java and before moving ahead let's recall few important points about garbage collection in Java.

1) Objects are created on heap in Java irrespective of there scope e.g. local or member variable. while its worth noting that class variables or static members are created in method area of Java memory space and both heap and method area is shared between different thread.

2) Garbage collection is a mechanism provided by Java Virtual Machine to reclaim heap space from objects which are eligible for Garbage collection.

3) Garbage collection relieves Java programmer from memory management which is essential part of C++ programming and gives more time to focus on business logic.

4) Garbage Collection in Java is carried by a daemon thread called Garbage Collector.

5) Before removing an object from memory garbage collection thread invokes finalize() method of that object and gives an opportunity to perform any sort of cleanup required.

6) You as Java programmer can not force garbage collection in Java; it will only trigger if JVM thinks it needs a garbage collection based on Java heap size.

7) There are methods like System.gc() and Runtime.gc() which is used to send request of Garbage collection to JVM but it’s not guaranteed that garbage collection will happen.

9) J2SE 5(Java 2 Standard Edition) adds a new feature called Ergonomics goal of ergonomics is to provide good performance from the JVM with minimum of command line tuning.

When an Object becomes Eligible for Garbage Collection

An object becomes eligible for Garbage collection or GC if its not reachable from any live threads or by any static references. In other words you can say that an object becomes eligible for garbage collection if its all references are null. Cyclic dependencies are not counted as reference so if object A has reference of object B and object B has reference of Object A and they don't have any other live reference then both Objects A and B will be eligible for Garbage collection.

Generally an object becomes eligible for garbage collection in Java on following cases:

1) All references of that object explicitly set to null e.g. object = null

2) Object is created inside a block and reference goes out scope once control exit that block.

3) Parent object set to null, if an object holds reference of another object and when you set container object's reference null, child or contained object automatically becomes eligible for garbage collection.

4) If an object has only live weak references via WeakHashMap it will be eligible for garbage collection.

Heap Generations for Garbage Collection in Java

Java objects are created in Heap and Heap is divided into three parts or generations for sake of garbage collection in Java, these are called as Young generation, Tenured or Old Generation and Perm Area of heap. New Generation is further divided into three parts known as Eden space, Survivor 1 and Survivor 2 space. When an object first created in heap its gets created in new generation inside Eden space and after subsequent minor garbage collection if object survives its gets moved to survivor 1 and then survivor 2 before major garbage collection moved that object to old or tenured generation.

Permanent generation of Heap or Perm Area of Heap is somewhat special and it is used to store Meta data related to classes and method in JVM, it also hosts String pool provided by JVM as discussed in my string tutorial why String is immutable in Java. There are many opinions around whether garbage collection in Java happens in perm area of Java heap or not, as per my knowledge this is something which is JVM dependent and happens at least in Sun's implementation of JVM. You can also try this by just creating millions of String and watching for Garbage collection or OutOfMemoryError.

Types of Garbage Collector in Java

Java Runtime (J2SE 5) provides various types of Garbage collection in Java which you can choose based upon your application's performance requirement. Java 5 adds three additional garbage collectors except serial garbage collector. Each is generational garbage collector which has been implemented to increase throughput of the application or to reduce garbage collection pause times.

1) Throughput Garbage Collector: This garbage collector in Java uses a parallel version of the young generation collector. It is used if the -XX:+UseParallelGC option is passed to the runtime via JVM command line options . The tenured generation collector is same as the serial collector.

2) Concurrent low pause Collector: This Collector is used if the -Xingc or -XX:+UseConcMarkSweepGC is passed on the command line. This is also referred as Concurrent Mark Sweep Garbage collector. The concurrent collector is used to collect the tenured generation and does most of the collection concurrently with the execution of the application. The application is paused for short periods during the collection. A parallel version of the young generation copying collector is sued with the concurrent collector. Concurrent Mark Sweep Garbage collector is most widely used garbage collector in java and it uses algorithm to first mark object which needs to collected when garbage collection triggers.

3) The Incremental (Sometimes called train) low pause collector: This collector is used only if -XX:+UseTrainGC is passed on the command line. This garbage collector has not changed since the java 1.4.2 and is currently not under active development. It will not be supported in future releases so avoid using this and please see 1.4.2 GC Tuning document for information on this collector.

Important point to not is that -XX:+UseParallelGC should not be used with -XX:+UseConcMarkSweepGC. The argument passing in the J2SE platform starting with version 1.4.2 should only allow legal combination of command line options for garbage collector but earlier releases may not find or detect all illegal combination and the results for illegal combination are unpredictable. It’s not recommended to use this garbage collector in java.

JVM Parameters for Garbage Collection in Java

Garbage collection tuning is a long exercise and requires lot of profiling of application and patience to get it right. While working with High volume low latency Electronic trading system I have worked with some of the project where we need to increase the performance of Java application by profiling and finding what causing full GC and I found that Garbage collection tuning largely depends on application profile, what kind of object application has and what are there average lifetime etc. for example if an application has too many short lived object then making Eden space wide enough or larger will reduces number of minor collections. you can also control size of both young and Tenured generation using JVM parameters for example setting -XX:NewRatio=3 means that the ratio among the young and tenured generation is 1:3 , you got to be careful on sizing these generation. As making young generation larger will reduce size of tenured generation which will force Major collection to occur more frequently which pauses application thread during that duration results in degraded or reduced throughput. The parameters NewSize and MaxNewSize are used to specify the young generation size from below and above. Setting these equal to one another fixes the young generation. In my opinion before doing garbage collection tuning detailed understanding of garbage collection in Java is must and I would recommend reading Garbage collection document provided by Sun Microsystems for detail knowledge of garbage collection in Java. Also to get a full list of JVM parameters for a particular Java Virtual machine please refer official documents on garbage collection in Java. I found this link quite helpful though http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html

Full GC and Concurrent Garbage Collection in Java

Concurrent garbage collector in java uses a single garbage collector thread that runs concurrently with the application threads with the goal of completing the collection of the tenured generation before it becomes full. In normal operation, the concurrent garbage collector is able to do most of its work with the application threads still running, so only brief pauses are seen by the application threads. As a fall back, if the concurrent garbage collector is unable to finish before the tenured generation fill up, the application is paused and the collection is completed with all the application threads stopped. Such Collections with the application stopped are referred as full garbage collections or full GC and are a sign that some adjustments need to be made to the concurrent collection parameters. Always try to avoid or minimize full garbage collection or Full GC because it affects performance of Java application. When you work in finance domain for electronic trading platform and with high volume low latency systems performance of Java application becomes extremely critical an you definitely like to avoid full GC during trading period.

Summary on Garbage collection in Java

1) Java Heap is divided into three generation for sake of garbage collection. These are young generation, tenured or old generation and Perm area.

2) New objects are created into young generation and subsequently moved to old generation.

3) String pool is created in PermGen area of Heap, garbage collection can occur in perm space but depends upon JVM to JVM. By the way from JDK 1.7 update, String pool is moved to heap area where objects are created.

4) Minor garbage collection is used to move object from eden space to survivor 1 and survivor 2 space and major collection is used to move object from young to tenured generation.

5) Whenever Major garbage collection occurs application threads stops during that period which will reduce application’s performance and throughput.

6) There are few performance improvement has been applied in garbage collection in java 6 and we usually use JRE 1.6.20 for running our application.

7) JVM command line options –Xmx and -Xms is used to setup starting and max size for Java Heap. Ideal ratio of this parameter is either 1:1 or 1:1.5 based upon my experience for example you can have either both –Xmx and –Xms as 1GB or –Xms 1.2 GB and 1.8 GB.

8) There is no manual way of doing garbage collection in Java.

That's all about garbage collection in Java. In this tutorial we learn how heap is divided into different regions e.g. eden, survivor spaces and perm gen space. An object become eligible to garbage collection when there is no strong reference pointing to it or it is not reachable form any thread. When garbage collector realize need of garbage collection it trigger minor collection and some time stop-the-world major collection. It's all automatic as you cannot force garbage collection in Java.

76 comments
:

Garbage collection is quite important if you are working in finance domain. since for any kind of trading e.g. Electornic, DMA, Forex, Fixed Income or Derivaties performance is most important given high volume and low latency nature of such application. no mater which are of Finance domain you work you always need to tune garbage collection parameter to get maximum performance and provide ultra low latency to your Direct to Market Access (DMA) Clients. even equity exchanges are now days upgrading there systems like Tokyo Stock exchange moved to Arrowhead.

>> "An Object becomes eligible for Garbage collection or GC if its not reachable from any live threads"

or additionally if It isn't reachable via static references, it's important because they are out of live threads scope. The roots of GC are all initialized static variables and all variables in live thread stacks. Cyclic references aren't problems because GC in Java doesn't count references: it goes thru the live objects graph starting from roots and mark it, all other objects that aren't accessible via roots marked as garbage and are eligible for collection.

@Anonymous, Indeed Garbage collection affects performance of high volume low latency electronic trading platform severely if not tuned properly and if major collection occurs multiple time during normal trading hours then latency increases multiple times which is not at all acceptable.for such sophisticated system you need to properly tune your garbage collection parameter and -Xms and -Xmx parameters.

Hi , we have one stock trading system on exchange connectivity side which connects to exchange and sends order to exchange for equity trading. I am working on to optimize that application , can you suggest some tool or method to optimize garbage collection usage for that java application. Since its Equity stock trading system we have latency as big concern.

I recently applied for etrading company but I dont have much experience in Garbage collection and Multi threading cos I was involved in application development before which was related to creating web pages.Now how do I prepare for trading software?

Hi Ramya,Excellent knowledge of Java and FIX Protocol is required for getting into electronic trading firm. in Java they focus mostly on multithreading because all electronic trading applications are multi-threaded and driven by high volume which could be challenge if you are not familiar with concurrent programming. try to prepare these topic well by reading tutorials, articles, blog post etc and you will surely get into a decent job in big IBs.

Thanks a lot for your reply Javin...But some of the garbage collection topics is too theoretical to say.They focus even on memory model rt?I dint know where to start from as I just know basics of all these...Your help will be appreciated for sure:)

Hi Ramya,If you are going for an interview next week then based on my experience at least prepare following topic:1) different generations of heap space 2) how concurrent mark sweep garbage collection works ?3) what is major and minor collection ? why you need to avoid major collections ?

oh thanks a lot Javin..this will surely help..I need not have trading experience to clear the interview rt?I will update myself with multithreading,synchronization,GC and Memory Model..Lets see how it goes..

I was looking for answer of "What is garbage collection in Java" in simple plain english and thanks to you now I know exactly what is garbage collection in java and how garbage collection works in Java.

Thanks Anonymous.just to answer your question "What is garbage collection in Java" "in case if you still have doubt, Garbage collection is the process inside Java Virtual Machine which recollect memory from java heap allocated to object which are eligible for garbage collection.

Hi Javin, Roger again I have dropped you one message on one of your post, it looks like it doesn't appear here it is again " I am working on electronic trading system which we are going to design for foreign exchange and currency trading. I have some question related to FIX Protocol and how we can use FIX Protocol for currency trading. Can you please help me. since I don't have any prior experience on writing any electronic trading system" any suggestion will be appreciated

@Mukesh , to minimise garbage collection in java you need to do some garbage collection tuning and optimization. by properly setting size of eden space and survival space you can minimize garbage collection in java.

@Anonymous , yes choice of garbage collector in java affects performance, mostly people use ConcurrentMarkSweep garbage collection for better performance because it does both minor collection concurrently with application thread and minimize major collection.

I found answered to my basic questions like What is garbage collection and why do garbage collection needed . thanks a lot its indeed very easy to understand and grasp. In my opinion one of the best garbage collection tutorial in java

nice article ....But I have a question regarding it..Is there any way to find out the no of live objects of a class(or if we can figure out how many objects were garbage collected within a specific time interval) ?

@Lets Share, There is no precise way of knowing which objects gets garbage collected at which time, at-least in my knowledge. though you can estimate it by overriding finalize() method of Object class. Since Garbage Collector in Java calls finalize() method before garbage collecting object you can log the time in finalize method. though GC only calls finalize() one time per object and there might be delay between call to finalize and actual garbage collection.

@Anuj, I think number of Objects in heap for a class can be obtained by either profiling the application or by using JConsole. most of pro-filer shows this data. you can also take heap dump and analyze it for object based on counts and type.

Hi Javin, I have a doubt.You have mentioned "Objects are created on heap in Java irrespective of there scope e.g. local or member variable. while its worth noting that class variables or static members are created in method area of Java memory space"

My doubt is suppose we have a class like below. I think arrayList, i3 and i4 instances will be created on heap.

Could you please tell me wherearraylist refernce variable is kept?where m,k is created?where set instance in created?where set reference variable is kept?where i3 and i4 reference variables are created? method stack?where string test1 and test2 are ceated?where s and newString are created?

One more doubt Javin. You said Perm area of heap stores meta data of classes. That means java.lang.Class Object corresponding to each type created is kept in Perm space? then what is stored in the method area (which is not part of heap)

there is nothing like String.new() , it should be new String() but I got your point. in this case string pointed by s is eligible for garbage collection and memory used by this instance is reclaimed when Garbage collector will run.

Hi Javin, we generally keep -Xms512m -Xmx1024m whenever we get Heap Space problem. It means we are following a ratio of 1:2. But you said ideal ratio is 1:1 or 1:1.5. Will my ratio of 1:2 will impact anything badly on the heap space?

Hi..I Read some books Garbage collection is internally fallowing “Mark and Sweep ” algorithm.When the object is eligible to for the GC , that particular object is going to be marked .before exiting that method or class level. Marked object is sweep out.

PermGen area also get GCed. When PermGen are reach to its certain size, it will also GCed and reclaim space from String Pool too.you can check-out by below example which will lead to GC in PermGen area too.for (int i = 0; i < 99999; i++) { String s = String.valueOf(i).intern();}

Both Serial Garbage collector and Parallel Garbage collector or Throughput GC are Stop the world GC, it means application thread stopped when Garbage collection happens. Only difference between Serial Garbage Collection and Parallel (Throughput GC) is that in Serial both Young Generation (Eden Space + survivor) or minor collection and full GC (major collection) happens serially which introduced larger pause time while in case of Parallel GC minor collection and major collection happens in parallel which results in lower or reduced pause time. Another worth noting difference between parallel and serial GC is that former is more suited for Servers written in Java while later is best suited for Client application with lesser heap size.

On the other hand Java now has two concurrent Garbage collector as well 1) CMS (Concurrent mark sweep) and Garbage first (G1) Garbage collector which is introduced in Java 7. Both CMS and G1 are concurrent collector as oppose to stop the world Parallel GC and most of there operation performed in parallel of Application thread to further reduce application pause time but that require more overhead in terms of larger heap space because actual freeing of space require more time. By the way G1 Garbage collector is introduced as replacement of CMS GC and it employes different algorithm for garbage collection like it divide whole space in different region and target region which has most Garbage on it. that's why known as Garbage first.It's important to choose right kind of Garbage collector based upon your application need. Concurrent Mark Sweep (CMS) or G1 Garbage collector are more suited for high response time java application like Web Servers.

There is always couple of Garbage collection interview questions on Java interviews, following are some interview question on garbage collection which I had collected recently from my colleagues, I am looking for answers now:

1) Difference between Serial Garbage collector and Parallel Garbage Collector?2) What is ConcurrentMarkSweep Garbage Collector, Can you explain how Concurrent Mark Sweep GC works?3) What is Garbage collection tuning ?4) What is difference between major collection and minor collection in GC ?5) Can we run Garbage collector explicitly? How do you recognize full garbage collection because of Runtime.gc() or System.gc()6) What is difference between CMS and G1 garbage collector?7) What is Eden space in Heap?8) Have you ever done Garbage collection tuning? What are the heap sizes you have used?9) have you used -XX:+UseCompressedOops in 64 bit JVM? why you should use it?10) Can you extend Garbage Collection mechanism to provide your own Garbage collector ?

if anybody has answers for these Garbage Collection interview questions than please let me know.

Hi @Javin : This is an excellent document. Thanks a Lot.I have a few questions. 1. Is there any specific reason when only Major Collection will take place? I need a little more elaboration on the difference between Major n Minor Collection.

2. When are the times that the JVM does not go for Garbage Collection when it actually should have, and ultimately ends up in causing Memory Leaks? [Leaving out the options of not closing File,DB Handlers]

One doubt. You have mentioned about method area where static variables are stored. I hope method area is different from the perm space of heap.You have mentioned class metadat is stored perm space of heap.

In another site, http://www.artima.com/insidejvm/ed2/jvm2.html, it is said that class metadata is stored in the method area.

Hi. In my project recently getting problem with Connection hits to DB, As we are closing properly bt still some where its giving the problem while asking db dba he told that lot of connections are open. pls let us know the thing. this is happening with WAPT tool while testing the app for performance. could you please tell me to test my code for connection failuer scenarios and gc possible code.

@Javin...When an object first created in heap its gets created in new generation inside Eden space and after subsequent Minor Garbage collection if object survives its gets moved to survivor 1 and then Survivor 2 before Major Garbage collection moved that object to Old or tenured generation... needs a little bit of correction.Young generation divided into three parts eden, From Space(S1), To Space(S2). Initially the objects are allocated to eden and from space. when the minor collection occurs live objects are moved from eden and From Space(S1) to To Space(S2). Then the role of S1 and S2 is changed. The objects surviving one or more minor collection are then moved to the tenured generation.

I want to know how the objects would be allocated to heap and what would be the structure of heap if there is no garbage collector. Further tell me whether G1 is implemented fully or it is still in experiment and the default arrangement of generation in G1.Thanks

1. GC give guarantee that object will be removed ?2. What happens when GC call finalize() methhod , and why it calls ?3. And the last one , difference regions of heap space , i cant understand that regions ?

gc() runs the finalization methods of any objects pending finalization. Calling this method suggests that the Java virtual machine expend effort toward running the finalize methods of objects that have been found to be discarded but whose finalize methods have not yet been run. When control returns from the method call, the virtual machine has made a best effort to complete all outstanding finalizations.

oncurrent-Mark-Sweep collector is most popular garbage collector of Java. CMS collector is popular for its better throughput and less pause time. Because for many applications, end-to-end throughput is not as important as fast response time. For example gaming applications need fast response time to make their gaming experience better, if any game hang for a second only, it lost its charm. As you know young generation collections do not typically cause long pauses, because of its small size and less amount of live objects survived. However, old generation collections is uncertain, can impose long pauses, especially when large heaps are involved. To address this issue, the Java HotSpot JVM includes a collector called the concurrent-mark-sweep (CMS) collector, also known as the low-latency collector

To read more click here http://www.somanyword.com/2014/01/concurrent-mark-sweep-cms-garbage-collector-in-java/

Hi,I have one doubt. I am beginner.I am not understood if Object A has reference of object B and object B has reference of Object A then how both Objects A and B will be eligible for Garbage collection?Ex:If we take Some class like

S a=new S();S b=new S();S t=b;b=a;a=t;t=null; Still a pointing b and b pointing a. Then how a and b both eligible for garbage collection? I am not understood please help me.