21.2.3.1 Performance Schema Event Timing

Events are collected by means of instrumentation added to the
server source code. Instruments time events, which is how the
Performance Schema provides an idea of how long events take.
It is also possible to configure instruments not to collect
timing information. This section discusses the available
timers and their characteristics, and how timing values are
represented in events.

The TIMER_NAME column shows the names of
the available timers. CYCLE refers to the
timer that is based on the CPU (processor) cycle counter. If
the values associated with a given timer name are
NULL, that timer is not supported on your
platform. The rows that do not have NULL
indicate which timers you can use in
setup_timers.

TIMER_FREQUENCY indicates the number of
timer units per second. For a cycle timer, the frequency is
generally related to the CPU speed. The value shown was
obtained on a system with a 2.4GHz processor. The other timers
are based on fixed fractions of seconds. For
TICK, the frequency may vary by platform
(for example, some use 100 ticks/second, others 1000
ticks/second).

TIMER_RESOLUTION indicates the number of
timer units by which timer values increase at a time. If a
timer has a resolution of 10, its value increases by 10 each
time.

TIMER_OVERHEAD is the minimal number of
cycles of overhead to obtain one timing with the given timer.
The overhead per event is twice the value displayed because
the timer is invoked at the beginning and end of the event.

To see which timer is in effect or to change the timer, access
the setup_timers table:

By default, the Performance Schema uses the best timer
available for each instrument type, but you can select a
different one.

To time waits, the most important criterion is to reduce
overhead, at the possible expense of the timer accuracy, so
using the CYCLE timer is the best.

The time a statement (or stage) takes to execute is in general
orders of magnitude larger than the time it takes to execute a
single wait. To time statements, the most important criterion
is to have an accurate measure, which is not affected by
changes in processor frequency, so using a timer which is not
based on cycles is the best. The default timer for statements
is NANOSECOND. The extra
“overhead” compared to the cycle timer is not
significant, because the overhead caused by calling a timer
twice (once when the statement starts, once when it ends) is
orders of magnitude less compared to the CPU time used to
execute the statement itself. Using the
CYCLE timer has no benefit here, only
drawbacks.

The precision offered by the cycle counter depends on
processor speed. If the processor runs at 1 GHz (one billion
cycles/second) or higher, the cycle counter delivers
sub-nanosecond precision. Using the cycle counter is much
cheaper than getting the actual time of day. For example, the
standard gettimeofday() function can take
hundreds of cycles, which is an unacceptable overhead for data
gathering that may occur thousands or millions of times per
second.

Cycle counters also have disadvantages:

End users expect to see timings in wall-clock units, such
as fractions of a second. Converting from cycles to
fractions of seconds can be expensive. For this reason,
the conversion is a quick and fairly rough multiplication
operation.

Processor cycle rate might change, such as when a laptop
goes into power-saving mode or when a CPU slows down to
reduce heat generation. If a processor's cycle rate
fluctuates, conversion from cycles to real-time units is
subject to error.

Cycle counters might be unreliable or unavailable
depending on the processor or the operating system. For
example, on Pentiums, the instruction is
RDTSC (an assembly-language rather than
a C instruction) and it is theoretically possible for the
operating system to prevent user-mode programs from using
it.

Some processor details related to out-of-order execution
or multiprocessor synchronization might cause the counter
to seem fast or slow by up to 1000 cycles.

Currently, MySQL works with cycle counters on x386 (Windows,
OS X, Linux, Solaris, and other Unix flavors), PowerPC, and
IA-64.

The setup_instruments table has
an ENABLED column to indicate the
instruments for which to collect events. The table also has a
TIMED column to indicate which instruments
are timed. If an instrument is not enabled, it produces no
events. If an enabled instrument is not timed, events produced
by the instrument have NULL for the
TIMER_START, TIMER_END,
and TIMER_WAIT timer values. This in turn
causes those values to be ignored when calculating the sum,
minimum, maximum, and average time values in summary tables.

Within events, times are stored in units given by the timer in
effect when event timing begins. For display, times are shown
in picoseconds (trillionths of a second) to normalize them to
a standard unit, regardless of which timer is selected.

Modifications to the setup_timers
table affect monitoring immediately. Events already in
progress may use the original timer for the begin time and the
new timer for the end time, which may lead to unpredictable
results. If you make timer changes, you may want to use
TRUNCATE TABLE to reset
Performance Schema statistics.

The timer baseline (“time zero”) occurs at
Performance Schema initialization during server startup.
TIMER_START and
TIMER_END values in events represent
picoseconds since the baseline. TIMER_WAIT
values are durations in picoseconds.

Picosecond values in events are approximate. Their accuracy is
subject to the usual forms of error associated with conversion
from one unit to another. If the CYCLE
timer is used and the processor rate varies, there might be
drift. For these reasons, it is not reasonable to look at the
TIMER_START value for an event as an
accurate measure of time elapsed since server startup. On the
other hand, it is reasonable to use
TIMER_START or
TIMER_WAIT values in ORDER
BY clauses to order events by start time or
duration.

The choice of picoseconds in events rather than a value such
as microseconds has a performance basis. One implementation
goal was to show results in a uniform time unit, regardless of
the timer. In an ideal world this time unit would look like a
wall-clock unit and be reasonably precise; in other words,
microseconds. But to convert cycles or nanoseconds to
microseconds, it would be necessary to perform a division for
every instrumentation. Division is expensive on many
platforms. Multiplication is not expensive, so that is what is
used. Therefore, the time unit is an integer multiple of the
highest possible TIMER_FREQUENCY value,
using a multiplier large enough to ensure that there is no
major precision loss. The result is that the time unit is
“picoseconds.” This precision is spurious, but
the decision enables overhead to be minimized.