pg_test_timing

Name

pg_test_timing -- measure timing overhead

Synopsis

pg_test_timing [option...]

Description

pg_test_timing is a tool to
measure the timing overhead on your system and confirm that the
system time never moves backwards. Systems that are slow to
collect timing data can give less accurate EXPLAIN ANALYZE results.

Options

pg_test_timing accepts the
following command-line options:

-d duration--duration=duration

Specifies the test duration, in seconds. Longer
durations give slightly better accuracy, and are more
likely to discover problems with the system clock moving
backwards. The default test duration is 3 seconds.

-V--version

Print the pg_test_timing version and exit.

-?--help

Show help about pg_test_timing command line arguments,
and exit.

Usage

Interpreting results

Good results will show most (>90%) individual timing
calls take less than one microsecond. Average per loop overhead
will be even lower, below 100 nanoseconds. This example from an
Intel i7-860 system using a TSC clock source shows excellent
performance:

Note that different units are used for the per loop time
than the histogram. The loop can have resolution within a few
nanoseconds (nsec), while the individual timing calls can only
resolve down to one microsecond (usec).

Measuring executor timing overhead

When the query executor is running a statement using
EXPLAIN ANALYZE, individual operations
are timed as well as showing a summary. The overhead of your
system can be checked by counting rows with the psql program:

The i7-860 system measured runs the count query in 9.8 ms
while the EXPLAIN ANALYZE version
takes 16.6 ms, each processing just over 100,000 rows. That 6.8
ms difference means the timing overhead per row is 68 ns, about
twice what pg_test_timing estimated it would be. Even that
relatively small amount of overhead is making the fully timed
count statement take almost 70% longer. On more substantial
queries, the timing overhead would be less problematic.

Changing time sources

On some newer Linux systems, it's possible to change the
clock source used to collect timing data at any time. A second
example shows the slowdown possible from switching to the
slower acpi_pm time source, on the same system used for the
fast results above:

In this configuration, the sample EXPLAIN ANALYZE above takes 115.9 ms. That's
1061 nsec of timing overhead, again a small multiple of what's
measured directly by this utility. That much timing overhead
means the actual query itself is only taking a tiny fraction of
the accounted for time, most of it is being consumed in
overhead instead. In this configuration, any EXPLAIN ANALYZE totals involving many timed
operations would be inflated significantly by timing
overhead.

FreeBSD also allows changing the time source on the fly, and
it logs information about the timer selected during boot:

dmesg | grep "Timecounter"
sysctl kern.timecounter.hardware=TSC

Other systems may only allow setting the time source on
boot. On older Linux systems the "clock" kernel setting is the
only way to make this sort of change. And even on some more
recent ones, the only option you'll see for a clock source is
"jiffies". Jiffies are the older Linux software clock
implementation, which can have good resolution when it's backed
by fast enough timing hardware, as in this example:

Clock hardware and timing accuracy

Collecting accurate timing information is normally done on
computers using hardware clocks with various levels of
accuracy. With some hardware the operating systems can pass the
system clock time almost directly to programs. A system clock
can also be derived from a chip that simply provides timing
interrupts, periodic ticks at some known time interval. In
either case, operating system kernels provide a clock source
that hides these details. But the accuracy of that clock source
and how quickly it can return results varies based on the
underlying hardware.

Inaccurate time keeping can result in system instability.
Test any change to the clock source very carefully. Operating
system defaults are sometimes made to favor reliability over
best accuracy. And if you are using a virtual machine, look
into the recommended time sources compatible with it. Virtual
hardware faces additional difficulties when emulating timers,
and there are often per operating system settings suggested by
vendors.

The Time Stamp Counter (TSC) clock source is the most
accurate one available on current generation CPUs. It's the
preferred way to track the system time when it's supported by
the operating system and the TSC clock is reliable. There are
several ways that TSC can fail to provide an accurate timing
source, making it unreliable. Older systems can have a TSC
clock that varies based on the CPU temperature, making it
unusable for timing. Trying to use TSC on some older multicore
CPUs can give a reported time that's inconsistent among
multiple cores. This can result in the time going backwards, a
problem this program checks for. And even the newest systems
can fail to provide accurate TSC timing with very aggressive
power saving configurations.

Newer operating systems may check for the known TSC problems
and switch to a slower, more stable clock source when they are
seen. If your system supports TSC time but doesn't default to
that, it may be disabled for a good reason. And some operating
systems may not detect all the possible problems correctly, or
will allow using TSC even in situations where it's known to be
inaccurate.

The High Precision Event Timer (HPET) is the preferred timer
on systems where it's available and TSC is not accurate. The
timer chip itself is programmable to allow up to 100 nanosecond
resolution, but you may not see that much accuracy in your
system clock.

Advanced Configuration and Power Interface (ACPI) provides a
Power Management (PM) Timer, which Linux refers to as the
acpi_pm. The clock derived from acpi_pm will at best provide
300 nanosecond resolution.