Hi...
On Tue, Mar 2, 2010 at 3:08 PM, Warren Harris <warrensomebody@gmail.com> wrote:
>
> Peter - gprof with ocaml works quite well:
> http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html
I'm fully aware of gprof and ocaml's support of profiling.
OCaml's profiling support works by adding calls to the _mcount library
function at the entry point to every compiled function, which takes
approximately 10 instructions on x86 (pushes and pops to save
registers, and a call instruction). The _mcount function records
function call counts, and is also responsible for producing the call
graph. Separately, the profile library samples the program counter at
some frequency, which lets us work out in which functions the program
is spending its time.
Using OCaml's profiling support has three problems:
1) programs compiled with profiling are slower, and
2) the profiling instrumentation itself distorts the resulting profile, and
3) the call graph accounting is inaccurate.
Let's discuss each of these in turn:
Problem (1) is simply that your program has extra overhead from all of
those _mcount calls, which occur on every function invocation. You
can't turn them off, and you can't make them happen less frequently.
It's an all-or-nothing proposition. It would be unusual to include
profiling instrumentation in a production system.
Problem (2) is a little more subtle. Recall that the profiling
instrumentation adds ~10 instructions to the start of each function,
regardless of its size. For a large function, this may be a negligible
overhead. For a small function, say one that was only 5 or 10
instructions in size to begin with, that is a substantial overhead.
Since we determine how much time is spent in each function by sampling
the program counter, small and frequently called functions will appear
to take relatively longer than larger functions in the resulting
profile. Small functions are common in OCaml code so we should see an
appreciable amount of distortion.
Problem (3) is a criticism of the _mcount mechanism in general. For
each function f(), the profiler knows (a) how long we spent executing
f() in total, and (b) how many times each of f()'s callers invoked
f(). We do not know how much time f() spent executing on behalf of any
given caller. If we assume that all of f()'s invocations took
approximately the same amount of time, then we can use the caller
counts to approximate the time spent executing f() on behalf of each
caller. However, the assumption that f() always takes approximately
the same amount of time is not necessarily a good one. I think it's an
especially bad assumption in a functional program.
These problems are avoided by using a sampling profiler like oprofile
or shark, which samples an _uninstrumented_ binary at a particular
frequency. Because the binary is unmodified, we can turn profiling on
and off on a running system, avoiding point (1); furthermore we can
adjust the sampling rate so profiling overhead is low enough to be
tolerable. Since there is no instrumentation added to the program, the
resulting profile does not suffer from the distortion of point (2).
Some profilers (e.g. shark on Mac OS X) can deal with point (3) as
well --- all we need to do is record a complete stack trace at
sampling time.
My point was that oprofile or one of its cousins (e.g. shark) is
probably adequate for your needs. You can set the sampling rate low
enough that your service can run more or less as normal. To determine
GC overhead, you simply need to look at the total amount of time spent
in the various GC functions of the runtime.
Peter