A Big Bag of Epic Awesomeness

I tried to come up with a number of topics for this post, but none seemed to really convey what I really feel.. And really this blog is about all of them..

A look at PERFORMANCE_SCHEMA.STAGES and PERFORMANCE_SCHEMA.STATEMENTS

Graphing MySQL Statement Execution

A MySQL visual EXPLAIN plan with PERFORMANCE_SCHEMA

Tracing Sessions with PERFORMANCE_SCHEMA

If that doesn’t whet your appetite (and trust me, I need to, this post is long, but I feel is worth reading all the way to the end), then let me start out by asking the question:

Wouldn’t you like to be able to trace what a SQL statement did, either in the same or another session, on a production instance, after the fact? Wouldn’t you like to know where all of it’s time was spent, and some statistics on what it was doing specifically?

I know, I know, you can do this with the slow query log to some extent, you can do it with external tools like MySQL Enterprise Monitor to some extent… But these are all a very high level. How about something like:

InnoDB PK Lookup

Yes? Not sure what you’re seeing here anyway? Think this is an April Fools? Well then read on…

As Marc Alff announced recently, performance_schema development within mysql-trunk (5.6.x) has come on in leaps and bounds, with 4 new levels of instrumentation added. The first two that Marc lists, table IO and table lock instrumentation, are in the same class as previous instrumentation – wait/. This means that they are instantly tracked and aggregated in the same way as previous 5.5 instrumentation, in the wait events and events waits summary tables (although they too have new summary tables as well, I won’t talk about those here).

The next two however, stages and statement instrumentation are each individually new classes of instrumentation on their own. This means they get a new set of tables that track their events in both raw and aggregate form. Here’s a recent table listing of a build from launchpad (for comparison, performance_schema in 5.5 has 17 tables):

So what are “stages”? As Marc says, they are “Major execution phases of a statement”, to classic MySQL users, they are what you know as the “State” column within SHOW PROCESSLIST for a running session, or the “Status” within a SHOW PROFILE snapshot when run within your own session.

However, in performance_schema they not only named differently (yea, who knows why), they are tracked individually and over time for every instrumented thread (as Marc also noted, we can now filter instrumentation on threads as well, though I won’t talk about that in this post).

So, for example if I wanted to see the stages that “thread 15” went through:

If you scroll to the right, you can see something that hasn’t been talked about a lot so far with performance_schema – it’s concept of “event nesting”. Each of the tables has a NESTING_EVENT_ID column, which links to it’s parent, and a NESTING_EVENT_TYPE column, which defines the parent event class – as either STATEMENT, STAGE or WAIT.

5.5 has the NESTING_EVENT_ID column in the wait tables, but it is always NULL. Now however this is used throughout, and the stages themselves point to a parent event – 7 in the above case, of a STATEMENT class. So let’s take a look at what that statement was:

We use the adjacency list model to link the events together, so each event just points to it’s parent. From a pure SQL perspective this has some downsides, but given that we don’t update this data with SQL, the only real downside is not being able to represent the tree view nicely in a pure SQL output – it’s hard to find the depth of the tree, to be able to indent your output in the columns.

So I had to find another way to represent this nicely, otherwise it’s just not that obvious how freaking awesome of an advancement in MySQL statement tracing this is.. Yet I love making SQL do things for me without having to wrap a program around it, so I thought I’d do another take on Shlomi Noach’s graphs with SQL.

My approach is slightly different though, I wanted to be able to generate a true graph, like the picture above. The DOT language is perfect for representing that hierarchy, and is supported by a number of graphing libraries and tools – including Graphviz, which I used to generate my own images here.

And so, without further ado, here’s how to generate an entire hierarchical stack trace of what a thread has done, with what is essentially a single SQL statement, in DOT format:

The first parameter to the procedure is the thread id that you want to dump the stack for. The second is a debug flag, which includes the source file and line number that wait events came from for each node.

To find the thread that you are interested in, you should check the performance_schema.threads table, which includes everything you should need to link this back to a running connection:

Load the dot file in to your favorite graphing solution, and you will get the full stack of event information that is available within performance_schema for that thread. If you’re lucky enough to capture a full trace of what the session did since it started, you might get a thread stack looking something like this:

Full Stack

Statements at the top of the graph are the most recent, with the earliest statement executed at the bottom of the graph. The legend for the graph would be:

Grey – Stages

Blue – Mutexes

Orchid – RW Locks

Tan – Table Locks

Light Coral – Table IO

Red – File IO

If the statement box is a solid red colour, then the statement ran for longer than the long_query_time setting for the instance the trace was generated on.

You will not always be this lucky however, wait information within performance_schema degrades very quickly depending how you have it configured, especially on busy systems. Each table holds this many events of each class by default:

The each statement can have 10-15 or more stages per statement, and 10’s to 1000’s of wait events individually, so you will need to use some good judgement on how to size this, depending on your use cases.

However, when you do have a degraded output, this still gives you quite a bit of information for the statements anyway:

Degraded Stack

I’ve found it to be quite an eye opener in some respects, to see how many times certain things are locked in the background.

For the eagle eyed and in the know – these graphs actually show a bug in the underlying performance_schema instrumentation in the release I am testing against, in two areas. First, some wait events are getting linked up to the wrong stages. We also seem to double account for IO in InnoDB in some cases. We’re tracking those things down at the moment..

But even with those wrinkles, I think it’s obvious what kind of value this is going to provide in 5.6. And there’s more – I’ll have to talk about filtering, and the other summary views, at some point as well!