This is for beginners – If you have no experience with MySQL, or if you are a developer that wants to learn how to administer MySQL, or an administrator that wants to learn how to query MySQL, this course is what you want. If you are not a beginner, you will likely still learn some nuances, and it will be easy and fast to do. If you have absolutely zero experience with MySQL, this is perfect for you. The first few chapters walk you through getting and installing MySQL, so all you need is a computer and the book.

The format of a virtual self-study group is as follows:
Each participant acquires the same textbook (Learning MySQL, the “butterfly O’Reilly book”, published 2007). You can acquire the textbook however you want (e.g. from the libary or from a friend, hard copy or online). Yes, the book is old, but SQL dates back to at least the 1970’s and the basics haven’t changed! There are notes and errata for each chapter so you will have updated information. The book looks like this:

O’Reilly Butterfly book picture

Each participant commits to reading each chapter (we suggest one chapter per week as a good deadline), complete the exercises and post a link to the completed work.

Each participant obtains assistance by posting questions to the comments on a particular chapter.

– Get Learning MySQL
Acquire a book (the only item that may cost money). Simply acquire Learning MySQL – see if your local library has it, if someone is selling their copy, or buy it new.

– Start!
When your book arrives, start your virtual learning by reading one chapter per week. Complete the exercises; if you have any questions, comments or want to learn more in-depth, that’s what the comments for!

FAQs:
Q: Does this cover the Percona patch set or MariaDB forks?

A: This covers the basics of MySQL, which are applicable to Percona’s patched MySQL or MariaDB builds, as well as newer versions of MySQL.

Q: What do I need in order to complete the course?

A: All you need is the book and access to a computer, preferably one that you have control over. Windows, Mac OS X or Unix/Linux will work. A Chromebook or tablet is not recommended for this course.

A: Yes! The basics are still accurate, and we will let you know what in the book is outdated. I have contacted O’Reilly, offering to produce a new edition, and they are not interested in updating the book. We will also have optional supplemental material (blog posts, videos, slides) for those who want to learn more right away. We are confident that this self-study course will make you ready to dive into other, more advanced material.

We all know that if we add a MySQL index to speed up a read, we end up making writes slower. How often do we do the analysis to look at how much more work is done?

Recently, a developer came to me and wanted to add an index to a very large table (hundreds of gigabytes) to speed up a query. We did some testing on a moderately used server:

Set long_query_time to 0 and turn slow query logging on
Turn slow query logging off after 30 minutes.

Add the index (was on a single field)

Repeat the slow query logging for 30 minutes at a similar time frame (in our case, we did middle of the day usage on a Tuesday and Wednesday, when the database is heavily used).

Then I looked at the write analysis – there were no DELETEs, no UPDATEs that updated the indexed field, and no UPDATEs that used the indexed field in the filtering. There were only INSERTs, and with the help of pt-query-digest, here’s what I found:

Again, extrapolating to average for 150 queries:
**Total, based on 150 queries: 150ms saved**

So we can see in this case, the index created a delay of 16.8 ms in a half-hour timeframe, but saved 150 ms in reads.

It is also impressive that the write index added very little time – 70 microseconds – but saved so much time – 1 millisecond – that there were 16 times the number of writes than reads, but we still had huge improvement, especially given the cost.

I cannot make a blanket statement, that this kind of index will always have this kind of profile – very tiny write cost for a very large read savings – but I am glad I did this analysis and would love to do it more in the future, to see what the real costs and savings are.

The 1st post in the series gave an overview of what a query review is and the value they can bring you. So now let’s talk about how one is done, specifically, how to do a query review using pt-query-digest.

The point of a query review is that it is a comprehensive review of queries. Imagine if you could get a list of all queries that run on your system, and then you systematically looked at each query to determine if it is optimized. That is the basic concept behind a query review.

So, how do you get a list of queries?

pt-query-digest can use a slow query log, binary log, general log or tcpdump. I usually use a slow query log with long_query_time set to 0, so I can capture all the successful queries and their timings. If this is too much overhead, consider using Percona Server’s log_slow_rate_limit and log_slow_rate_type parameters to only log every nth session/query. This means that if you have 5000 queries per second, you can set the slow logging rate to every 100th query, and reduce the write overhead for the slow query log to 50 queries per second (instead of all 5000 queries).

So you have your log, now what? Well, we need to process it. The –type option is where you set what your log type is (binlog, genlog, slowlog, tcpdump). Default is slowlog.

By default, pt-query-digest will give you a report of the top 95% worst queries. You can change that with the –limit parameter – note that –limit just limits the output; pt-query-digest still processes all the queries in the log file. If –limit is followed by an integer, it will limit the output to the top X queries; if it’s followed by a percentage (e.g. 10%) it will output the top percentage of queries.

As this is a query review of all queries, we will want to set the limit to 100%.

There are a lot of other options that pt-query-digest has, but many of them are there so we can distill and get queries that meet a certain criteria. The point of a query review is to look at ALL queries, so we do not need to use those options.

In fact, the only other options we need are related to the review itself. Because a review is systematic, we need a place to store information related to the review. How about a database for that? In fact, pt-query-digest has a –review option that takes parameters to store the information into a table.

Here is the command I recently used to start a query review. It was run from the shell commandline, and I used –no-report because I did not want anything other than the table and its rows created:[sheeri.cabral@localhost]$ pt-query-digest --no-report --type slowlog --limit 100% --review h=localhost,u=sheeri.cabral,D=test,t=query_review --create-review-table --ask-pass mysql_slow.log

You can see that –review has a number of arguments, comma-separated, to identify a table on a host to put the queries into. I used the –create-review-table flag to create the table, since it did not already exist, and –ask-pass because I do not type in passwords in a shell command.

pt-query-digest then spends some time analyzing the file then creating and populating the table. Here’s a sample row in the table:

The checksum and fingerprint are ways to make the query portable, no matter what values are used. The fingerprint takes out all the differences among iterations of the query, and puts ? in its place. So if you have a query that’s used over and over, like
SELECT first_name FROM customers WHERE id in (1,2,3)
the fingerprint would look like
SELECT first_name FROM customers WHERE id in (?+)

The sample provides a way for us to copy and paste into an EXPLAIN (or my favorite, EXPLAIN FORMAT=JSON) statement, so that we can assess the query.

So then we can go through the process of optimizing the query. In the end, this query has nothing to tweak to optimize, so I update the reviewed_on date, the reviewed_by person, and the comments:

One great feature is that you can add columns to the table. For example, maybe you want to add an “indexes” column to the table, and list the index or indexes used. Then after the query review is complete, you can look at all the indexes in use, and see if there is an index defined in a table that is NOT in use.

You can review all the queries and run a query review every 6 months or every year, to look at any new queries that have popped up, or queries that have been removed (note first_seen and last_seen in the table).

You can also see how the query performance changed over time using the –history flag to pt-query-digest, which can populate a table with statistics about each query. But that is a topic for another post!

Query reviews are excellent ways to look comprehensively at your queries, instead of just the “top 10” slow, locking, most frequent, etc. queries. The EXPLAINing is long and slow work but the results are worth it!

Posted onJune 19, 2017bysheeri|Comments Off on Why does the MySQL optimizer not do what I think it should?

In May, I presented two talks – one called “Are you getting the best out of your indexes?” and “Optimizing Queries Using EXPLAIN”. I now have slides and video for both of them.

The first talk about indexing should probably be titled “Why is MySQL doing this?!!?!!?” It gives insight into why the MySQL optimizer chooses indexes that you do not expect; especially when it does not use an index you expect it to.

The talk has something for everyone – for beginners it explains B-trees and how they work, and for the more seasoned DBA it explains concepts like average value group size, and how the optimizer uses those concepts applied to metadata to make decisions.

The EXPLAIN talk goes through everything in EXPLAIN – both the regular and JSON formats – and describes what the fields mean, and how you can use them to figure out how to best optimize your query. There are examples that show where you can find red flags, so that when you EXPLAIN your own queries, you can be better prepared for gotchas. The EXPLAIN talk references the indexing talk in a few places (both talks were given to the same audience, about a week apart), so I highly recommend you watch that one first.