Month: September 2016

For a long time, I’ve never been able to answer some basic questions that I thought fundamental to optimizing server performance. MySQL gives you some server-wide metrics about activity, but none of it is broken down per-table so that an application developer could look into where to reduce the number of writes, or generally where to focus their attention in order to improve the server performance.

A commenter named barat pointed me to this post which had the insightful idea of parsing the binary log for analysis.
Since my servers are generally hosted on AWS, I don’t have direct access to the binary log, so I had to retrieve those. The MySQL documentation for the mysqlbinlog command briefly mentions how to read the binary log from a remote server. It took some experimentation to get the right command and output options with all of the data I wanted. Specifically, the `–base64-output=DECODE-ROWS –verbose` options which translate some of the row-based logging into MySQL commands that can be parsed.

The first step is to create a user that has access to the binary logs. I used the main ‘admin’ user that RDS creates because it was convenient. If creating a new user, you probably need to grant the REPLICATION_SLAVE privilege.

You can see which binary logs are available on the server with the SHOW BINARY LOGS; command:

Note that if you get the error below, you need to make sure that your MySQL client and server tools are using the same version. I originally attempted to use MySQL 5.5 tools with a MySQL 5.6 server.

ERROR: Got error reading packet from server: Slave can not handle replication events with
the checksum that master is configured to log; the first event 'mysql-bin-changelog.232519'
at 4, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120,
the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.232519' at 120.

After that, it was just a matter of parsing the file for the relevant commands. I’ve put all of that logic now into a quick PHP script that I can reuse anywhere. Now, I can go through a bunch of binary logs on a server and see which tables are updated the most frequently with output like this: