Basic thread profiling methodology

The underlying thread profiling technique we'll be looking at here
involves periodically "polling" the JVM to ask it for the current point of
execution of all currently alive threads. This is a kind of "passive"
profiling technique in which we don't need to modify the code that we are
profiling. (A technique that we won't be considering in this article
is to "actively" inject code
at key points for the purpose of profiling1.) The basic
thread profiling procedure is thus as follows:

make a call asking the JVM to give us the current stack traces of all threads;

if we desire, ask the JVM which of these threads have actually consumed CPU
(and how much);

from these stack traces, increment our counts/profiling data;

wait for some (generally short) period of time, then repeat the procedure.

It's clear from this description that profiling isn't "free". Asking the JVM
to take the stack trace snapshots, in addition to any processing we then do, has
an impact.

It's also clear that this is essentially a sampling approach.
We won't be able to tell with complete accuracy how long a thread spends in a particular
place. For example, say we repeat the above process 20 times a second. When we are
told that a thread is in a particular place and that place was different from the last
time we sampled, that means that it has been there for some period between 0 and 1/20 seconds.
(And even if it's the same place, we don't know that the thread actually stayed there for
the whole time between the two instances when we sampled.)
So we have to make some kind of best guess. Generally, the least effort option (and one
that is good enough) is to simply assume that the method was running for the
full 1/20 second (or whatever the sample period). The more frequently we sample, the
more accurate our profile will be, but the more impact our profiling routine will have
on the server or application that we're trying to profile.

Many systems can also provide us with information on how much CPU time (in
user and kernel mode) a given thread has consumed. But even if we request this information,
we are usually left with the same problem: if a thread has used X nanoseconds of CPU
time since the last sample, we still don't know if that consumption all went on the
line or method now being executed.

What this generally boils down to is:

we need to make sure that, over the time period of the function or
application we're profiling, we take a "reasonable number" of samples;

the shorter the task we're profiling, the more frequent we need to
sample (or the less accurate our results).

For profiling a long-running server that executes similar functions over and over,
this is actually quite good news. We can generally sample fairly infrequently
(say, between 1 and 5 times a second), and simply leaving things going for a few
hours to get enough samples. This will give us a good picture of generally where
the server is spending its time.

1. For example, at class loading we could inject
code on entry to and exit from methods we were interested in, in order to measure
how often threads were inside that method. This technique is more fiddly,
although possible, in Java. Where the information sought is not very granular,
this technique can minimise the impact of profiling, but it is generally less
flexible than the technique we'll discuss here.