Database

Detailed Profiling of SQL Activity in MySQL 5.6

By Mark Leith, June 03, 2013

MySQL's latest update to the Performance Schema brings the ability to profile a statement's activity, low-level wait events, and I/O impact. It is the easiest and most detailed way to identify what statements to tune and how.

Finding High-Overhead Statements

With a development instance set up to trace everything, you can run tests against your applications to acquire performance metrics and focus tuning efforts. Generally, this starts with trying to pinpoint the statements with the most overhead.

In the past, this task was done by turning on the Slow Query Log, then post-processing this log file with a tool such as mysqldumpslow to aggregate the statistics. As part of the new statement instrumentation within MySQL 5.6, a handy aggregate table has been implemented: performance_schema.events_statements_summary_by_digest. With the aggregate table, you don't have to access the file system to look at the log and you can analyze the statistics easily with simple SQL.

Within this table, Performance Schema tracks and aggregates statistics by normalizing the statements, and assigning that normalized statement a "Digest," which is an MD5 hash of the normalized form of the statement. These digests are available in the DIGEST and DIGEST_TEXT columns. The table is aggregated by both the SCHEMA_NAME and DIGEST columns, so that you can also drill into which schemas are generating the most overhead, even if you have an instance that has multiple schemas executing the same kinds of statements (such as a shared instance running multiple Wordpress installations, for example).

A normalized statement may have any of the following actions performed upon it:

Long normalized statements are truncated, with "..." added at the end. "Long" is currently defined as 1024 bytes; however, note that this is on the normalized statement, so large string values are not a concern

Finding the statements that have the highest latency in the server is simple: just specify a criteria against one of the *_TIMER_WAIT columns. For instance, to find the top five statements that have the highest latency overall (and are probably prime candidates for tuning):

The statement is scanning many more rows than are actually sent back to the client (SUM_ROWS_SENT is much less than SUM_ROWS_EXAMINED, on average the statement is scanning some 9,500 rows to return around 215 rows per execution). This is likely due to the fact that no index is ever being used to satisfy the query (SUM_NO_INDEX_USED == COUNT_STAR).

The statement is creating a lot of temporary tables (an average of seven per execution), and more than 40% of those are on disk averaging three per execution (SUM_CREATED_TMP_TABLES vs. SUM_CREATED_TMP_DISK_TABLES)

Interestingly, we seem to be doing joining against the temporary tables that are created on disk, and that's not using an index (SUM_SELECT_FULL_JOIN == SUM_CREATED_TMP_DISK_TABLES)

This statement is certainly worth tuning. First, try and find the right combination of indexes within your schema to satisfy the query, and also to see if something can be done about the excessive temporary table use  first by seeing if it can be created in memory only (assuming it has no long text columns and so forth). If not, try to make it perform better, perhaps by creating the temporary table up front with an index defined.

Locating Where Statements Have High Latency

Once you have identified a statement to tune, you can infer where problems might lie; however, a more scientific approach would be to find out exactly where the time is being spent within the statement's execution window. This is what the new Stages instrumentation is for. Stages record the time that a connection spends in the various thread states that you see within the output of something like SHOW PROCESSLIST.

In the past, you might have used SHOW PROFILE to get this kind of data. If so, this approach should be fairly familiar to you. The new Stages instrumentation within Performance Schema in MySQL 5.6 is intended to eventually replace SHOW PROFILE (which has been deprecated), and it has several advantages:

You can now get stage/state information for all connections, not just your current connection

Stages are linked to statements, and linked further to underlying waits (more on this shortly)

Stages are exposed in regular tables, giving you a SQL interface to look at the data

Stage instruments are aggregated in multiple ways: globally, by thread, by user, by host, by account (user@host)

When profiling a specific statement, you are not interested in the aggregated information. Rather, you need to drill down into the raw data that is linked to a specific statement. On a busy system, this data can be particularly volatile, however, so you need to take this into account and set up your environment appropriately.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!