As there was a lot of complain recently about various cases with
PERFORMANCE SCHEMA (PFS) overhead, I've decided to start a series of
articles to help you to use PFS in the best way to analyze your
workloads. This is the first part, starting with an overview of
instrumentation problems generally..

First of all, instrumentation
cannot be free! - when you want to trace or get some stats about
your application internals you're adding a new code, and this code is
increasing an overall code path, so your final binary simply cannot run
on the same speed by definition!.. Of course, if your instrumentation
code is poor, the overall overhead will be bigger ;-) but if you're
designing it well, it'll be lower, but still not zero!..

Dynamic
instrumentation brings additional cost - as soon as you're willing
to add some advanced stuff within your instrumentation, it makes sense
to not keep it active all of the time.. - but checking of all these
flag/states on active/inactive will bring additional cost as will
require more additional code to manage it.. (the best realization for
today of dynamic instrumentation will probably attributed to DTrace, but
currently you cannot get it yet available on most of platforms, and
which is remaining a story on itself)..

Instrumentation
overhead is directly depending on event frequency - sounds trivial,
right? - more frequently your instrumentation code is executed - bigger
overhead you may expect. So, before you activate any instrumentations on
some execution levels of your code you have to think about a frequency
of events on this level.. BTW, exactly the same problem you're meeting
with DTrace as well (for ex. see my old article with example
of x10 slowdown under DTrace). However, when we're adding
instrumentation into our own code, we may then keep in mind what kind of
level of event frequency to expect within each part of code, and make
and overhead of such an instrumentation predictable(!)..

All this
sounds simple.. - until we're not starting to feel an overhead directly
related to involved instrumentation ;-) To give you a feeling of what it
could be I wrote a simple "dumb code" of a program which is not doing
anything useful, but just looping to keep CPU warm, and within it loops
checking different conditions ;-) (the full source code you may find
here if you want to play with it too).

The program is calling
recursively Loop_func(), which will do "N" loops of a given level, and
then call "M" times Loop_func() again to involve execution of the next
level (in fact the program is simulating a work like: main()->do some
work-> call function1() -> do some work -> call in loop func2() -> do
some work -> call in loop fun3() -> and so on...)

So, more we're going in depth in levels - bigger a probability that a code
on this level is executed more frequently (more loop iterations). And
then, we want to instrument it to measure exactly the time we're spending
there. And to follow the PFS idea, I'll add the following to my code
around call of the Loop_func():

first I'm able to compile the code with or without "PFS instructions"
(depending if PFS_DEF is declared or not)

and then according to enabled instrumentation of each level, I'm then
executing or not the instrumentation code on such a level (for TIMED
instrumentation I'm using gettimeofday() which is probably not the
most optimal, but good for an example)..

Loop1_N - number of loops to execute on the "level1" to
simulate a kind of work

Loop1_M - number of loops with call of the Loop_func()
function for the next level ("level2")

Instr1 - 1/0 : enabled/disabled instrumentation on the "level1"

Timed1 - 1/0 : enabled/disabled TIMED instrumentation on the
"level1"

... - and so on for each next level

the total number of levels is calculated then automatically

So far, in the following example I'm executing the "pfs_loop" program with
4 levels in total, on the first 3 levels the program will run 1000 times
in the "work loop" (N), and then will 500 times call execution of the next
level. And only on the last (4th) level the "work loop" will be shorter -
10 times only (similar like you read few rows from FS cache, etc.)..

As you can see, more in "lower level" we're involving instrumentation -
bigger overhead we're getting (trivial, right? - on a more lower level the
same instrumentation code is executed way more often, so as the result the
impact is more bigger too)..

Exactly the same is going with
PERFORMANCE SCHEMA in MySQL! - more low level instrumentation you're
enabling, bigger overhead you may hit.. However, keep in mind:

ALL PFS instrumentation is completely dynamic

at any time you may disable everything!

you may go very progressively and detailed level by level in
instrumentation, just be aware about what exactly you're doing ;-)

Now, let's take a popular example of Sysbench OLTP_RO Point-Select
8-tables workload. This workload on 16cores-HT is getting its max
performance level usually on 64 or 256 concurrent user sessions. Let's
analyze the PFS impact on this workload on the following situations:

noPFS -- MySQL binary was compiled without PFS

withPFS, PFS=off -- MySQL binary was compiled with PFS, but PFS is
disabled

withPFS, PFS=def -- same as before, but PFS is enabled with default
instrumentation (as in MySQL 5.6 by default)

withPFS, PFS=none -- PFS is enabled, but all instrumentations are
disabled

However, the "default" PFS instrumentation is composed from "table I/O"
and "statements/digest" instrumentations, let's see the impact of the each
one:Observations
:

PFS=none : 64usr-> 245K QPS, 256usr-> 270K QPS

PFS=tables_only : 64usr-> 240K QPS, 256usr-> 265K QPS

PFS=statements_only : 64usr-> 240K QPS, 256usr-> 260K QPS

so, as you can see, by disabling statements instrumentation you can
reduce the max overhead from 8% to 5%

and all this dynamic, without MySQL Server restart!

Well, I'll go more in details within next articles, but to finish with
this one, let's see how well the "alternative" solutions are working on
the same workload...

Let me start with MySQL 5.6 code patched by
Facebook and providing TABLE_STATISTICS (and other stats). As mentioned
Domas, there is no need any enable/disable settings, just start MySQL
server and you get it. So, let's compare its overhead on the same workload
with noPFS and PFS=def :

Oservations
:

as you can see, there is near no difference in overhead between
PFS=def and Facebook stats instrumentation

there are the same max 8% overhead -vs- noPFS.. - so, there is
no instrumentation without cost, right?..

while PFS=def is not only have table stats, but also query digest..

and PFS instrumentation you may dynamically reduce at any moment,
while the Facebook code is just here..

Another "alternative" stats solution is implemented in Percona Server, but
this instrumentation is dynamic, in fact binary: you may just enable or
disable it completely by setting the "userstat" global variable to 1 or 0.
Percona Server 5.6 is only RC for the moment, and I was unable to compile
it without PFS due errors in Percona code.. So, let's compare it on the
same workload with noPFS and PFS=off (the difference in PFS=off config
between MySQL 5.6 and Percona 5.6 may be a sign of yet additional overhead
coming from the "userstat" related code) :

Observations
:

in fact, yes, MySQL 5.6 max QPS with PFS=off is higher than on Percona
5.6..

and then, once userstat=1 the regression is horrible.. over 30%...

So, it's very easy to complain about PFS overhead.. - but what if we'll
try together to improve it rather to blame?.. ;-)) PFS is an universal
solution for MySQL profiling, it has a HUGE power, we all just need to
learn how to use it right, and then life will become much more easier.. ;-)

INSTEAD
OF SUMMARY :

Instrumentation is not free ;-)

Instrumentation of higher frequency events having a bigger overhead!

PFS is a really great tool, just need some love ;-)

Well, it was only a story about PFS instrumentation overhead, but nothing
about practical usage.. So, the next one in this series will be all about
practical ;-)

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.