How to profile

perf offers two major modes: record then report, or real-time "top" mode. Both are useful in different situations.

TIP: On many systems where -g is shown below, you may have to write --call-graph dwarf instead to get working stack traces.

Record then report

Recording data

To profile the system for a specific length of time, for example 60 seconds:

perf record -a -g -s sleep 60

To profile the system for the duration of a command:

perf record -a -g -s -- pg_ctl -D pgdata start

To profile the system until cancelled, just omit any command:

perf record -a -g

By default perf record will generate a perf.data file in the current working directory.

See "what to profile" for ways to narrow what you record.

Reporting

By default perf report will look for perf.data in the current working directory.

perf report -n

You can include call-graph output, if captured, with:

perf report -g

and a variety of options are available to sort, filter and aggregate the output; see the documentation for perf.

In real time

To profile system-wide in real time:

perf top

To profile in real time without accumulating stats, i.e. instantaneous profiling:

perf top -z

You can use the usual target-selection options with perf top, but it's most commonly used for system wide profiling with perf top -a.

Troubleshooting

Many systems will produce incomplete call-graphs with only the top stack entry when -g is used, if the PostgreSQL binaries are built with -fomit-frame-pointer. That's the default on x64. To work around this, you may be able to use --call-graph dwarf on a new enough perf with libunwind support. Otherwise you'll need to recompile PostgreSQL with -fno-omit-frame-pointer.

What to profile

perf lets you choose what to profile in a variety of ways, the most useful of which are:

Including user-space stacks

Newer perf releases support capturing the whole user-space call stack, not just the top function on the stack. This lets it provide extremely useful call graphs. Use the --call-graph dwarf option to perf record to enable this, and the -g option to perf report to display the results.

On x64 hosts, you must either rebuild PostgreSQL with ./configure CFLAGS="-fno-omit-frame-pointer -ggdb" ... or use a version of perf built with libunwind support.

If you don't see these in perf list, see static probe points (gdb) and perf-buildid-cache. The exact steps required to enable probe points vary between perf versions. If you see mention of perf sdt-cache you're reading something outdated.

Also: Internally PostgreSQL generates these using systemtap on Linux, so you must have had systemtap installed at build time and passed --enable-dtrace to configure to get these tracepoints. They're enabled by default in the PGDG binary builds.

You must initialize sdt events with perf probe before using them. perf probe may report multiple event names, as in the example below.

Adding new trace points

It's trivial to add new tracepoints in your own builds.

All you have to do is edit src/backend/utils/probes.d to add the tracepoint name, then add macros to call it where you want it to fire. The tracepoint name in probes.d must be lower case, have all underscores doubled, and lacks a leading TRACE_POSTGRESQL_. The tracepoint invocations are upper case, with single underscores, and lead with TRACE_POSTGRESQL_. For example, probes.d has

probe transaction__start(LocalTransactionId);

and it's called in StartTransaction in src/backend/access/transam/xact.c as

TRACE_POSTGRESQL_TRANSACTION_START(vxid.localTransactionId);

Note that you cannot use typedefs provided by the PostgreSQL headers; the probes.d file lists its own typedefs.

If you think it's generally useful, submit a patch to the pgsql-hackers mailing list. A tracepoint is probably useful if you cannot trivially achieve the same thing with a dynamic userspace probe and it touches on something that's likely to be relevant to a broader user base. Even probes that are easily done with dynamic tracepoints are useful if they can serve as documentation for how to find and trace some event many people will be interested in.

Dynamic tracepoints

Often there isn't a SDT event for just what you want. But that's OK. If you have debug symbols and a vaguely recent perf (tested: Linux 4.4) you can inject dynamic tracepoints where you want them, and even capture function arguments!

e.g. if you want to see the rate at which queries are executed you might inject a tracepoint into standard_ExecutorStart:

This will then be available as the perf event probe_postgres:standard_ExecutorStart for use in perf record etc.

Probing extensions

Extensions are valid probe targets too. Just specify them as the executable, e.g. to see the lsn of every 100th change processed by pglogical's logical decoding output plugin as it examines it, you could:

Less common reports

Benchmarking and statistics

If you're interested in particular tracepoints or statistics, the perf stat command is also useful. For example, the following will count fsync and write calls, and block device activity, during a command. It will run the command five times and summarize the results.

Availability of perf

It may be built and installed from the linux source code in the tools/perf subdirectory. You must use a perf that matches your running kernel version.

Advanced: Viewing and capturing function arguments

Lets say you want to collect information on the kinds of queries that are run. You can capture the CmdType operation field of the QueryDesc passed to standard_ExecutorStart with perf!. No patching postgres, no gdb.

The overhead of perf and size of traces is greatly increased by capturing arguments in probes, so only do it if you need to.

perf report doesn't seem to do anything with captured tracepoint arguments right now, but perf script can show them, and you can filter the results through its perl/python interpreters (if enabled) or via a pipe to another tool/script.

e.g. if you want to see every planned query run in realtime, you might set up the tracepoint above, then

Of course this is a bit of a silly example since you'd just enable query logging, and it ignores unplanned queries (utility statements) but it illustrates the methods required for capturing data with probes.