pt-query-digest for MongoDB profiler logs

One of my favorite MySQL tools ever is pt-query-digest. It's the tool you use to generate a report of your slow query log (or some other supported sources), and is similar to for example the Query Analyzer in MySQL Enterprise Monitor. Especially in the kind of job I am, where I often just land out of nowhere on a server at the customer site, and need to quickly get an overview of what is going on, this tool is always the first one I run. I will show some examples below.

When I started engaging with MongoDB projects at Nokia, I needed some project of my own to tinker with. So I decided to add support for MongoDB "slow query log". Back then, my first task was to figure out if MongoDB even has such logging for slow queries. Lucky me, it has: it's known as the MongoDB Profiler.

Nokia allowed me to contribute changes back upstream to Percona. In the past weeks I finally had some time to upgrade it to understand the changed log format of MongoDB 2.4. (Earlier versions are no longer supported.)

Btw, the other reason I like this and other Percona Toolkit tools: They ship with all their dependencies embedded into the same file. This means, when I'm on a customer server without root access, I can just do

Example usage

My pt-query-digest supports reading slow queries from the mongodb log file. To get some queries there, you need to set the threshold to lower than 100 ms. In fact, I like to collect as much as possible, so I set it to -1 ms. (Updated 2013-10-18...)For some reason you cannot set the threshold to 0, but -1 works and has the same effect.

db.setProfilingLevel(1,-1)

Then run pt-query digest with the --type=mongolog option. Using --limit is optional and means the same is in normal pt-query-digest, see --help for other options.

pt-query-digest --type=mongolog --limit=20 < ../mongodb.log

At the top of the output you first get some summary information of all the queries in the log.

From the above we can already see that we are looking at a fairly well behaved system. For example Execution time for all queries is well below 100 ms (so wouldn't even show up by default in the log). Largest result sizes are just below 13k, there was really little time spent in write lock, and so forth.

Next we get a summary of the breakdown per query. Now we get to the parts where pt-query-digest really shines. The typical use of a slow query log would be to just look at whichever queries happen to be slowest, and then try to fix them. Of course, if you have a query that takes tens of seconds to run, there may or may not be something wrong. But experience shows, your energy is often better spent somewhere else completely, and pt-query-digest will tell you where to focus.

Pt-query-digest instead will de-parameterize all of the queries, so that it can recognize which queries are actually the same, just that the parameters were different. It then sums up and sorts all these queries (by default) by execution time. A typical application will only have between 10 to 100 different queries, so this is the perfect tool to give you an overview of what the app is doing to your database:

Now, if this was a MySQL based application, I would be very worried with the above results. There is one query that is taking up 98% of the total execution time of everything that happens in the database! It turns out, the benchmark I used to generate some "noise" for this example was set to use WriteConcern j=1, so the top line is simply wait time resulting from getLastError calls:

If your application is doing this, it might not be what you want. As you can see, your clients end up waiting a lot compared to what it actually takes to execute the real queries. But for the MongoDB server this is actually harmless, it's not executing or blocking anything, it's just you that is waiting all the time.

Ok, let's fix that and run pt-query-digest again. We get a different distribution that's more interesting to look at:

So it seems we have 2 queries that together consume over 70% of the total execution time spent in the database and the top 3 queries use up 95% of your MongoDB capacity! This means, it makes sense to focus solely on optimizing these top 2-3 queries and ignore everything else.

As you can see, there is a lot of information presented for each query. And thanks to the wonder of open source and a few regular expressions, I can now benefit from this reporting tool also in my work with MongoDB.