GC Utilization chart. (Best chart resolution to use)

Hello,

I hang my head in shame, I am like a child that was wondered into the middle of a movie asking what did I miss. I am not very well versed in Garbage Collection and am trying to obtain a grasp on what good vs bad GC Utilization looks like. For instance. I have an app that when viewing a GC Utilization chart with a resolution of 10s. I can see it spike every 30 seconds to 100% and drop back down to 0%. In my mind, this seems to be too frequent, and does the fact that it even goes to 100% and drop back down to 0% be a concern?

When viewing a GC Utilization chart, what is the best resolution I should use. As if I select 30 seconds, I get a nicer looking chart where the spikes only go to 30%.

Are there any other metrics that I should add to a Garbage Collection dashboard that would benefit the GC Utilization chart?

My apologies on the lack of knowledge on this, any help would be very much appreciated and rewarded with points. Best answer gets all my points. : )

I tend to look first to the impact GC has on my application and the transaction.

Depending on the GC strategy used, frequent collections can occur and are used to remedy infrequent but very long running runs grinding the application to a halt.

All PurePaths contain Suspension Time, the time that the runtime was paused during the execution of the transaction. Knowing how long the runtime was suspended during the transaction and how long the transaction took in total, we can calculate how much time the transaction would have taken if no suspension had occurred and thus the impact. This is depicted in the measures PurePath Duration (which includes suspension) and PurePath Duration w/o Suspension.

In the Start Center, there is a dashboard called "Analyze Garbage Collection Impact" (under Diagnose Applications) which can give you an idea of how much GC is impacting your application response times.

Similarly, the dashboard Graeme showed above depicts the GC Caused Suspension Time which has an incident associated with it once GC contribution hits 15%.

Once you have identified that you are indeed suffering from Bad GC, it becomes time to find the cause of this. I tend to look more at trending than too short time frames. Reason for that is: if your app is in a 10s interval where not much CPU time has to be spent processing requests and then a GC happens, it will always show a high value. GC will use as much CPU as it can in many cases. That is also why you see spikes of GC to 100% in such short resolutions and then drop back to 0. So my suggestion: trend it out over longer periods of time, e.g. the last 1h and correlate it with heap size.

There is a lot more to be said on this topic of course, mainly as to how to interpret this data and what to do next, but together with the posts that Pierrick listed, it should give you someplace to start :-).