Using OProfile

Sysprof is great, and the tree view UI makes it quick and easy to narrow down performance issues. The call stack information is invaluable, and being able to quickly sort the times is awesome.

OProfile, like sysprof, is a kernel-level profiler which I’ve been using for working on cairo. While it lacks sysprof’s UI, the command line interface can be handy, and the resulting data is very raw and believable.

You basically use it like this:

# Clear the data from the last run first
opcontrol --reset
opcontrol --start
# Run your benchmark here
opcontrol --stop

The opcontrol commands are run as root, while the benchmark can run anywhere. OProfile gathers data for the entire system.

There are four commands I found useful for gathering information from OProfile. First, it can break down the samples by binary:

All of the data here is from a simple text measuring benchmark under pango 1.10.1. I had to build pango without inlining to show which functions were really doing the work. This slows the benchmark down, making it more difficult to evaluate changes. Only use the non-inlined results as a guide.

Steps to enlightenment:

Run your benchmark a bunch of times, record timing information

Compile without inlining (I used “-Wall -g -Os -fno-inline”)

Run your benchmark and gather profiler output

Make some performance-enhancing modifications and test them

Run your benchmark and gather profiler output, estimate speedup

Recompile with normal CFLAGS

Run your benchmark a bunch of times, record timing information, compare with step 1