Tag: performance_schema

This release is hugely pleasing to me, in that I actually didn’t have to do too much work on it myself! There were a significant number of contributions from Jesper Wisborg Krogh and Arnaud Adant, both MySQL Support Engineers (at the time at least, Arnaud has moved on to pastures new now), as well as again from Joe Grasse.

Thank you all for your contributions!

Here’s a summary of the changes:

Improvements

Added host summary views, which have the same structure as the user summary views, but aggregated by host instead (Contributed by Arnaud Adant)

host_summary

host_summary_by_file_io_type

host_summary_by_file_io

host_summary_by_statement_type

host_summary_by_statement_latency

host_summary_by_stages

waits_by_host_by_latency

Added functions which return instruments are either enabled, or timed by default (#15) (Contributed by Jesper Wisborg Krogh)

ps_is_instrument_default_enabled

ps_is_instrument_default_timed

Added a ps_thread_id function, which returns the thread_id value exposed within performance_schema for the current connection (Contributed by Jesper Wisborg Krogh)

Improved each of the user specific views to return aggregate values for background threads, instead of ignoring them (Contributed by Joe Grasse)

Optimized the schema_table_statistics and schema_table_statistics_with_buffer views, to use a new view that will get materialized (x$ps_schema_table_statistics_io), along with the changes to the RETURN types for extract_schema_from_file_name and extract_table_from_file_name, this results in a significant performance improvement – in one test changing the run time from 14 minutes to 20 seconds. (Conceived by Roy Lyseng, Mark Leith and Jesper Wisborg Krogh, implemented and contributed by Jesper Wisborg Krogh)

Like this:

I’ve just released the 1.0.1 version of the MySQL sys schema. It contains a few bug fixes (including a contribution from Joe Grasse, thanks Joe!), and a number of new helper procedures around viewing and altering configuration for consumers and instruments, contributed by the MySQL QA team, as they’ve started to incorporate more Performance Schema trace data in to their automated testing.

Next up is the 1.1.0 release, that also includes a bunch of new host summary views that were contributed by Arnaud Adant (of the MySQL Support team). I have a number of new things in development to add as well before then though.

Let me know if there are things you’d like to see as well, maybe I can find time to work on those too.

After a slight delay (travel and catching up with “real work”), I’ve now uploaded the talks that I gave at MySQL Connect and Oracle Open World.

They are available on my Presentations Page, and inline below for convenience. The “Introduction to MySQL Enterprise Monitor” talk was actually a full demo, but there are some screenshots of MEM 3.0 in there if you’re interested in seeing a high level picture of what it looks like now.

Thanks to all that attended my talks, I got a lot of good questions and feedback!

Use something like mk-heartbeat, which operates in the same way as Seconds_Behind_Master (trying to show you the actual time difference between the master and slave), and is a little more robust in complex replication chains, and other situations where Seconds_Behind_Master falls down (such as when the IO thread is lagging).

You can also attempt to monitor replication lag according to the differences in where the two threads are reading / writing to, based on the binary log and relay log positions compared against the master. This allows you to dive a little more in to where the delays may be happening in the chain, rather than relying on a single time behind metric.

Here’s a (somewhat simplified and not entirely accurate – the IO thread reads via the master) representation of how an event moves through replication, and the points that can be monitored with current instrumentation (including time delayed replication within 5.6):

Now, I say “Delay” above, that’s not entirely accurate, as most of those metrics actually record the file, and position within that file, that each stage is at – they allow you to measure the number of bytes difference at each stage, not the latency of each of the stages.

Actually generating this data consistently is problematic however. As it requires two statements to be executed on two entirely different systems, you will need to ensure that you execute these as close as possible to each other in time, so ntp will need to be installed on each system, and they will need to be kicked off on a precise clock based schedule on each server. Even then, they really can only be used as an indicator, not a source of truth, simply because of this.

The expressions above are slightly simplified as well. Logs roll over, so you also need to know the size that logs can get to using either max_binlog_size or max_relay_log_size (if non-zero), as appropriate, and calculate the difference depending on the number of logs between each position – not entirely simple.

But even with these metrics in hand, they don’t really tell us much – the size of an event within the replication stream can differ depending on whether statement based or row based binary logging is enabled. A “small” event could be a statement that takes a very long time to run, or a single row based event that executes in no time at all. YMMV.

So we can use these to profile the system, and graphing over time probably wouldn’t hurt, but relying on them as an indicator that you should start looking at a slave because of performance issues gets tough, unless you rely on something that tells you that you are already behind, in a reactive rather than proactive manner.

What we really need is an indicator that tells us that we should be looking in to performance problems before the slave even starts to lag at all.

And so we come back to the title of this blog. The idea is simple, we need a “Replication Load Average”, that can tell us how busy the slave really is – if it’s not executing 100% of the time, then the chances are it’s not really falling behind. If it’s starting to be busy 90-95% of the time, then the chances are, it could start to lag pretty soon. It’s not a new idea either, I’ve seen a couple of prominent people wishing for such a thing:

Peter Zaitsev: Idea: MySQL Should add “Replication Load Average” metric which would tell what time replication thread was busy processing events compared to being idle. This would help a lot to see when you’re close to the limit.

Jeremy Cole: I agree however that getting saturation information is important and is something we will potentially implement in Twitter MySQL. It would be useful to get essentially a %busy/%idle stat for the SQL thread.

I’m happy to say gentlemen, this is already within 5.5, and has been since it was released.

When the SQL thread has executed all events from the relay log, it enters MYSQL_BIN_LOG::wait_for_update_relay_log(), which then waits for a condition called update_cond to be raised. This is exactly where the slave goes “idle”. And luckily within Performance Schema in 5.5, we can track conditions. Conditions get timed within Performance Schema for the time they were waiting for the condition to be raised by another thread – so in the above case, the total time that the SQL thread was idle.

The condition waited for within wait_for_update_relay_log() can be mapped to the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event within Performance Schema – here’s a snapshot from a 5.5 server that is in the middle of a replication tree (it’s both a slave and a master, so has both relay logs and binary logs enabled – the wait/synch/cond/sql/MYSQL_BIN_LOG::update_cond is a similar condition that is used by the IO thread logged in to a master, to signal that new events have been written to the binary log.):

OK, so we can get to the raw data.. Great.. But how do we turn that in to a load average? The key is to read the values on a set interval, compute the deltas of the SUM_TIMER_WAIT statistic, and compare the delta – which is the idle time within the interval – to the overall interval.

As a quick proof of concept, I’ll use an event to do this, to call a procedure to gather the data, compute the deltas, and generate some extra statistics. Before boring you with the details of the implementation, let me first give a snapshot of the end result that we are striving for:

This gives what both Jeremy and Peter were asking for – a single “busy” counter, with some load average like statistics over 3 different intervals, with a snapshot every 5 seconds in this example. This allows us to see that there are peaks and troughs in the load of the slave on the shorter interval, but flattens it’s overall load with the larger averaged intervals to give you a better baseline of how busy the slave really is over time.

At a high level, this gathers the wait data for the wait/synch/cond/sql/MYSQL_RELAY_LOG::update_cond event from the events_waits_summary_by_thread_by_event_name table for only the slave SQL thread, which is exposed as thread/sql/slave_sql, then grabs the last snapshot of data for comparison. It then looks at the current and previous wait info to determine how to to calculate the actual busy percentage based on how the event data has changed over the period, computes the deltas and busy percent for that last interval appropriately, and inserts that in to a table.

Finally it computes the moving averages for the 3 sets of defined intervals and updates the just inserted row with that data (so the moving average also takes in to account the current snapshot), and finally purges anything older than two hours, to store a rolling two hour window of 5 second snapshots.

It also disables binary logging (because these are local statistics that should not be propagated to downstream systems), and does some checking whether performance schema is enabled, and logs appropriately within a logging table if not.

The time delta recorded within 5.5 is only possible at a 1 second granularity, which makes us lose quite a bit of precision when compared to picosecond precision wait information, within 5.6 we can use microsecond precision with SYSDATE(6), and get even more accurate reporting – and note, SYSDATE() is used instead of NOW(), as within procedures NOW() returns the timestamp for when the procedure started, not when the function is called

The first 15 minutes worth of aggregate data should be taken with a pinch of salt, wait for the data set to “warm up”

This doesn’t necessarily negate the need for the other statistics described above – which can all be used to trace where latency might be, and in the case of Seconds_Behind_Master or a heartbeating implementation can still give information on how far behind you are getting, once your slave does really start to lag, as well as using something like Jeremy’s velocity metric (linked via is name above), which can give an idea of the general direction of of a slave once it’s started lagging.

With all that said though, I think this is the *key* metric that anybody should be tracking first and foremost. It is far better to fix problems or spread load before you start lagging.

The joys of digging through the different events in Performance Schema to see what they actually record! There’s gold in there still to be mined, but I think this one is a genuine 14 oz nugget.

Joining to INFORMATION_SCHEMA.PROCESSLIST is a necessary evil within MySQL 5.5, to be able to get the user and host information etc. – however it is certainly not optimal, because using INFORMATION_SCHEMA.PROCESSLIST (and it’s SHOW [FULL] PROCESSLIST cousin) incurs a bunch of mutexes to be locked – the fill_schema_processlist() function attempts to lock LOCK_thread_count globally (which will block new connections whilst the entire processlist output is generated), as well as tmp->LOCK_thd_data per thread that it iterates over (which could in turn stall each thread if they try to access this) and mysys_var->mutex which protects the mysys_var struct (used around killing connections).

If you are polling the processlist output frequently (I’ve seen some solutions out there poll it once a second, or even lower, sometimes), then the chances are, you are stalling real application connections as well. This is why we don’t use these commands within MySQL Enterprise Monitor for things like Query Analysis.

As with all other instrumentation within Performance Schema, this is entirely non-blocking. When you upgrade to MySQL 5.6, it’s time to start forgetting about SHOW PROCESSLIST, or INFORMATION_SCHEMA.PROCESSLIST. They’ve served us well for a long time, but it’s time to wave goodbye to their contentious ways.

There’s another bonus in the threads table within MySQL 5.6 as well, and that is the newly added INSTRUMENTED column. This allows you turn on and off instrumentation, per thread, dynamically. You do this by simply running an UPDATE:

This then enables/disables all instrumentation within Performance Schema for each thread appropriately.

Now we have all of this data directly within the threads table, we can also start joining it to some of the other tables within Performance Schema as well, to get even more data on what threads are currently doing within the database. Particularly, we can join to the new events_statements_current table as well, to get extra information on statement execution (you will need to ensure that the events_statements_current consumer is enabled within the setup_consumers table).

Listed here are two idle connections, and one active connection (which was the one gathering the data).

The events_statements_current table stores the currently executing, or last executed (if no currently executing), statement per thread. This means that this new form of processlist can not only tell you what is currently executing, but also what last executed for sleeping threads.

All of the statistics listed, such as lock latency, row stats, temporary table stats are for either the current or last statement too. So with “conn_id: 12453” we can see that it last executed a COMMIT (in the last_statement column), which took 952.49 µs (in the last_statement_latency column).

I can’t count how many times I wanted to know the last statement that executed for sleeping connections (particularly those holding open a transaction), nor how many times customers and users asked that question whilst I was in the MySQL Support group, so this is a great step forward.

For idle connections, we now have a new “idle” wait type, which records how long sessions sit idle for as well. For sleeping connections this will always be the current wait (unless you disable the idle instrument). However for active connections, we show the last wait that happened within that statement’s execution.

You can see that the last session listed is copying to a temporary table, had lock latency of 1 ms, and last waited on the wait/synch/mutex/sql/THD::LOCK_thd_data event, which basically corresponds to the LOCK_thd_data mutex I was talking about above.

But hang on? I said above that SHOW PROCESSLIST etc. lock this in a blocking way as well right? And this is also grabbing this mutex? What gives? Well, performance_schema is true table storage engine, and the statement executed is still grabbing a read lock against the performance schema table. The statement itself is also creating a temporary table, doing it’s own thing within it’s own thread as well, and whilst doing that, threads tend to grab their own LOCK_thd_data mutex (there is one per thread), to protect access from other threads – for instance, we LOCK_thd_data to set the current query string, which protects SHOW FULL PROCESSLIST from reading a half written query string for it’s INFO. That doesn’t change for Performance Schema access – but it does help to show just the kind of contention that SHOW FULL PROCESSLIST can cause!

The above view only gets a subset of the available data per statement available within the events_statements_current table as well – I’ve omitted all of the sort / scan information for instance. You can add these to the view as you see fit.

I’ll be adding some views like this to ps_helper shortly. If you have any feedback in the meantime feel free to comment!

Not long after posting my update on ps_helper, I had a couple of comments around the formatting of values within the output. Daniël van Eeden gave the suggestion that I could add a couple of Stored Functions, for formatting byte and time based values.

Of course, this was a great idea – not least for myself, because I no longer have to worry about how to format certain columns in the output.

The changelog includes this nugget, which I’ll expand upon here:
The Performance Schema now maintains statement digest information. This normalizes and groups statements with the same “signature” and permits questions to be answered about the types of statements the server is executing and how often they occur.

It is the Query Analysis view from MySQL Enterprise Monitor. This was first released at the end of 2008, and was conceived (actually in 2006, presented and much discussed on a snowy day at a MEM team meeting in Amsterdam) from the need to easily monitor what statements are running within the database instance.

Yes, there were the Slow Query Log and General Query Log, but they had their downfalls when we were evaluating this – not fine grained enough on the one side, not enough info with too much overhead on the other, and both require access at the OS level to read the log files – not always a problem, but certainly is in some cases. The only tool that did any kind of analysis and statistics aggregation on queries that MySQL had been executing, at the time, was mysqldumpslow.

And so MySQL Proxy was born, and hooked up with MEM to collect statistics on queries that were funneled through it. It was revolutionary (for the MySQL world, at least). Hot on it’s heels we saw new GUI tools from the community trying to do the same thing. We’ve also seen extensions to the old ways of doing things as well with the extended slow query logging, and command line scripts such as mk-query-digest being born (and re-born). We then extended Query Analysis in to the Connectors as well, to share the load of statistics gathering across application servers, and not have to funnel everything through MySQL Proxy (among other benefits).

These are all still great solutions today. Yet they all have one major downfall – they try and skirt the issue of the database instance … actually generating this data and having it available with a SQL interface. That’s what databases are supposed to be good at … right?

Strikingly similar, no? And it’s being returned by the database directly.It gets better.

One of the problems with trying to gather statistics on statement traffic from within the protocol stream (done by Proxy, Connectors, any tcpdump solution, etc.) is that there are certain statistics not exposed within the MySQL protocol – everything like the number of rows examined, temporary table usage, select type etc. that can be seen in a SHOW SESSION STATUS for example are practically impossible to correlate (we would have to inject that in to each connection to wrap each “real” statement).

The Slow Query Log does not have this problem – and the extensions there in the community have proved that it’s useful to be able to aggregate more at the statement level too.

Statement Digests most certainly do not have this problem. Here’s the current structure of the new events_statements_summary_by_digest table:

Before diving in to some of the use cases, first it’s worth describing exactly what a digest is.

As with Query Analysis in MEM, we track and aggregate statistics by normalizing the statements. Performance Schema does this in slightly different ways to how the MEM components do this, however they both follow roughly the same process, by working on a tokenized representation of the statement.

Performance Schema does this by tying in to the lexer within the MySQL server, which spits out the tokenized statement to be parsed, by recording it’s token output stream and doing some of it’s own normalization on top.

“long” is currently defined as 1024 bytes, set by PSI_MAX_DIGEST_STORAGE_SIZE in psi.h, unfortunately this is not configurable yet, however it should also be noted that this is on the normalized statement – so large string values etc. are not a concern here

MEM does not fold lists of values down, i.e. “INSERT INTO foobar VALUES (1,2,3,4,5)” will be normalized to “INSERT INTO foobar VALUES (?,?,?,?,?)” within MEM. Performance Schema does this to save memory, by discarding the value tokens along the way and using the above forms for normalization.

With that description out of the way, now let’s take a look at some the value that you can get from the data!

First, let’s get the obvious out the way:

A high level overview of the statements like Query Analysis, sorted by those queries with the highest latency

My test data is a little bland (this is a MEM schema monitoring a single MySQL instance), but you get the idea. You can instantly get a picture of your top statements, ordered in this case by latency, along with some extra statistics like whether it caused a full scan, their average latency, how many rows were both scanned and returned etc.

But you could slice and dice this data any way you want – it’s just SQL.

Here’s a few more use cases:

List all normalized statements that use temporary tables ordered by number of on disk temporary tables descending first, then by the number of memory tables.

All of these greatly show the flexibility of having this instrumentation directly within the server, accessible via the server itself. If you haven’t done it already now is the time to take a little breath, and then shout Hurrah.

Now, it’s worth noting that all of this is stored in memory. By default, the maximum number of rows in this table in 5.6.5 is 200, which effectively lets you monitor 199 normalized statements – as there is a row reserved for a NULL digest, which is a catch all for once this number of normalized statements have been generated – Performance Schema then stores the aggregated statistics for all other statement traffic whose statement digests do not match those already within the table within this NULL row.

In the MEM team we’ve found in practice that people often have many more than this – often 500+ different kinds of statements being executed. You can increase the number of digests that are tracked with the performance_schema_digests_size system variable. You can track how much memory this uses with the SHOW ENGINE PERFORMANCE_SCHEMA STATUS command (here I have the number of digests to track set to 400, which takes roughly 490KB of memory):

Any scripts or tools that are used to analyze this table will need to take this in to account.

We’re just scratching the surface of possibilities with this table at the moment. I have a number of things that I’d like to see added (like.. a summary of each of the major classes of lower wait events, if enabled, like “SUM_FILE_IO_WAIT”, “SUM_SOCKET_IO_WAIT”, “SUM_MUTEX_WAIT” etc.), and I’m sure many of you out there in the community will have your own ideas as well.

Welcome to the future of statement diagnostics within MySQL, we hope you enjoy it! Tell us what you’d like to see next!