Monday, February 16, 2009

Memory leaks are easy to find

An object A dominates on an object B if all the paths to object B pass through object A.

Remember that the Garbage Collector removes all objects that are not referenced anymore. If A dominates B and A could be removed from memory, that means that there's no path anymore that leads to B. B is therefore unreachable and would be reclaimed by the Garbage Collector.
One could also say that A is the single object that is responsible for B still being there!

The Dominator Tree

Using the the "dominates" relationship we can create a dominator tree out of the the graph of objects in memory. At each node of this tree we store the amount of memory that would be freed (= retained size).
At the top of the tree we have a "virtual" root object, which we also use to represent objects that don't have "real" single dominator.
Here's an example of an object tree (on the left) and the corresponding dominator tree (on the right) :

Because of the transitivity of "dominated", the retained size of a parent object within the dominator tree is always greater than the sum of it's child objects.

To get the biggest objects we just need to sort the second level of the dominator tree (the first level is the "virtual" root object) by retained size.

Now if you are looking to find a memory leak, and you have no a priori knowledge that could help you, the typical approach is to run a test that reproduces the leak, and then try to somehow figure out what is leaking.

Do we really need Object allocations tracing?
In my experience people often seem to believe that finding leaks requires recording object creations, also called "object allocations tracing "sometimes, because you want to know where in the code objects are always allocates but never released.
Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft) has an example of how this method for finding leaks can be applied to .NET applications. Dear Microsoft I encourage you to look at the Eclipse Memory Analyzer ;)

All you need is the dominator tree

With the dominator tree finding these kind of leaks is much easier and you don't need the high overhead of allocation tracing, which is typically not acceptable in a production environment. Allocation tracing has it's uses, but in general IMHO it is overrated.

In my experience people always assume the OutOfMemory errors in production are the result of a leakage whereas a significant number are due to memory capacity issues with high workload concurrency and deep/prolonged call chains. For this you need to have already obtained object allocation counts and sizes per activity. Naturally this data should not be obtained in production just like complete memory heap dumps but during testing for the purpose of capacity planning.

Agreed, allocation traces is generally not that useful in the first steps of locating leaks. I have found a much more effective technique is to use generational counts. From there execution traces are much more useful in narrowing down the problem. This technique is so predictable that I fix price all of my memory leak engagements.

Hi Nick, Yes,finding bugs, that only show up under realistic load on the system, is always difficult. Memory usage/leaks often fall into this category, also IMHO much more could be done to avoid "simple" mistakes, e.g. people should be doing more (junit performance testing).

Still what you can do is to have a load test, which runs for a while where you monitor whether there's a systematic increase in memory usage until the load test is finished.

If that is the case you might want to trigger (potentially automatically)a heap dump that can be analyzed by the Eclipse Memory Analyzer, which can produce a report of potential problems.

This is the short answer and a longer answer would probably need to be another blog post :)

I should state that I was referring to the relatively cheap mechanism is simply measuring object allocation counts and sizes during the execution interval of a method which includes possibly nested non-instrumented method calls.

I was not referring to the backtrac(k)ing of object allocation call sites as this is horrendously expensive even on a local developer workstation and just creates so much noise (strings, maps) within a profile model.

You typically only need to know what is the memory cost for particular entry points in an enterprise application which can of course include a lot of transient allocation that might even be GC prior to the finishing of the execution.

That is pretty difficult at the moment, because you won't see some Bitmaps for example. If your problem has to do with bitmaps then you might try to run your application on Android 3.0 or 3.1. IIRC bitmaps are allocated on the heap since 3.0Greetings Markus

I have been using MAT since last few years now and I can assure you it has helped either resolved Java Heap memory leaks and/or better understand the production environment memory footprint.

In my experience tuning Java EE production environments, I have seen about 50/50 split regarding the source OOM errors. Half related to true application or Java EE container/API memory leak and the other half capacity or tuning related. A small portion also related to PermGen or other native memory problems.

I have written an article recently on Java Heap Dumps along with recommendations so please feel free to review and provide your comments.

Looking forward for new posts from you in 2012, including some potential new features of MAT.

a memory leak is an object that the system unintentionally hangs on to, thereby making it impossible for the garbage collector to remove this object. The way that profilers find memory leaks is to trace references to a leaked object. plumbers claremont ca