This figure shows a typical memory histogram. In general this table does not help very much to figure out which objects really cause the high memory usage. Typically String and char[] will show up in the list of the classes with the highest shallow size, just because they are frequently used.
There is probably a correlation between the high number of char[] instances and the high number of String instances(indicated by the yellow arrows), but we cannot really know from the information in the table. We also don't know whether the Order object could potentially be the root cause for all those Objects there.

So how can we find out, who is really responsible for the memory usage?

Let's take a step back and ask ourselves how Garbage Collection works in a language like Java (and also in other "modern" languages ;)).
Garbage Collection

The general idea is that the Garbage Collector will reclaim all objects that cannot be reached from a so called "GC root object".

GC root objects are

Local variables on the stack

Parameters of functions on the stack

JNI native references

All classes loaded by the bootstrap Loader

Now, let's come back to our original problem, we can ask us the interesting question:"How much memory would be freed, if we would remove all those "Order" objects and afterwards a full GC would run?"
This would give us a measure for how much memory would be freed if those objects would not be there.
We call this measure the "retained size". The set of objects that would be freed is called the "retained set".
With the "retained set" we can figure out how many Strings would be freed if those "Order" objects would not be there. We can therefore solve the problem to find the objects "responsible" for the memory usage.
Definition "Retained size/set"

Retained set of X is the set of objects which would be collected by the GCRetained size of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
Generally speaking, shallow size of an object is its "flat" size in the heap whereas the retained size of the same object is the amount of heap memory that will be freed when the object is garbage collected.
The retained set for a leading set of objects, such as all objects of a particular class or all objects of all classes loaded by a particular class loader or simply a bunch of arbitrary objects, is the set of objects that is released if all objects of that leading set become inaccessible. The retained set includes these objects as well as all other objects only accessible through these objects. The retained size is the total heap size of all objects contained in the retained set.

How to find the object with the biggest retained size?

The Eclipse Memory Analyzer was built from the beginning with support for computing the retained set/size of objects.
The retained size was pioneered, as far as I know, by the commercialYourkit profiler(great product!). When we started to develop the Eclipse Memory Analyzer (SAP Memory Analyzer at that point in time), Yourkit already had this very nice feature to find the single object with the biggest retained size. We still don't know how Yourkit does it, but for sure at that point in time they used a different approach than we do now. The reason is that it took Yourkit several minutes to get the first 10 (I think) biggest (in terms of retained size) objects in the heap. We knew that this was already still much faster than the naive method, that would have to simulate a Garbage Collector run each of the object in the heap. With millions of objects on the heap this would have taken years!

So I went on to research, what kind of Graph algorithms could help us to speed things up. After a lot of Google searches I finally found the dominator tree.

The dominator tree

The dominator tree is a datastructure that allows you to answer the question about the biggest objects in almost no time. After further research I found that it could be build in (almost) O(n) time, by a complicated algorithm invented by Lengauer and Tarjan. Since the plan by the architect of the project, was always to store precomputed information on disk, we could compute the dominator tree once and then store it on disk. Therefore opening a heap dump for the first time is a little bit slow (but still often faster than with any other tool), but opening it again is blazing fast (just a few seconds).
Still implementing the algorithm was a high risk, because at that point in time relatively new paper "Finding Dominators in Practice" was talking about graphs with at most a few thousand nodes, whereas we wanted to run it on a few millions of nodes on a 32bit machine.

Still the team would get it done (I was always only in "consultant" role), and it turned out that the dominator tree would not only allow us to speed up the process of finding the biggest objects, but would also allow us to really significantly simplify the process of finding the root causes for high memory usage.

You could use the dzone voting button that I have in my own site. One thing I noticed is that people usually don't go back to dzone after reading the article, so if you add a dzone button it is more likely that people will vote.

Really a great article, I have started using Eclipse Memory Analyzer and it really looks cool, fast and your post provides really good information and understanding about profiling and what should one consider during application profiling.

Thanks.The reason that MAT is as fast as it is, is that it is heavily optimized.It uses for example special primitve collections similiar to http://pcj.sourceforge.net/. Certain operations (retained set for example) are also multithreaded and take advantage of multiple cores. Indexing lowers memory usage and speeds up certain operations by an order of magnitude.

No weak references are not treated specially. One could argue whether this makes sense or not, but this decision was taken when we decided that we want to see softreferences, because they caused us a lot of trouble in our applications.

We probably should offer an option to exclude at least weak references but when parsin a dump.

The issue is that we can't easily support both "modes" without slowing down parsing significantly.

Hi Markus,Thanks for the valuable inputs!I want to analyze the class where leak is happening, but my application is on HP Unix, so when it is running, I can not use some of the Windows tools as I need to take reading. Can you please suggest something which will help me.