How analysing your binlogs can be quite informative

If you have used MySQL for some time you know that mysqld can write binlogs. This is usually used for backup purposes and JITR or for replication purposes so a slave can collect the changes made on the master and apply them locally.

Most of the time apart from configuring how long you keep these binlogs they are pretty much ignored.

Recently I came across an issue. I have a slave server which is NOT configured read only and which has an additional database used to collect statistics from the replicated database and provided aggregation and business information. The typical sales per country, per product, per day, week, month, year, per whatever etc. This is the usual datawarehouse type functionality. It’s done on a slave and not the master so that the load is not passed on down stream to all boxes.

So this slave is quite busy and in fact is bigger than other normal slaves which are used for other things. The data mining processes are pretty heavy and create a lot of I/O on the server as would be expected. Some times I noticed that the daily batch process would take a lot longer than usual, sometimes significantly longer. I monitor the server with various tools including cacti and MySQL Enterprise Monitor (Merlin), but for this particular issue which would happen from time to time the tools I was using could not identify why the batch jobs took so much longer. This was very frustrating. Managers often get upset if their reporting tools do not provide up to date information so you can imagine…

After quite a lot of head-scratching eventually it seemed like a good idea to look at the binlogs. These files are not in a very readable format but using the mysqlbinlog binary you can get the output into something which is readable. This is true if you use statement based replication which I do (the reasons I do not yet use row-based replication needs a post another day). So a bit of perl later and I was able to analyze the binlogs and see on a day to day basis the number of statements that were stored in the binlogs and also the operations (INSERT, UPDATE, DELETE) which were applied on these tables. A bit more manipulation enables you to count the rows affected for each statement.

So I looked at the results for a few different days and hey presto. The cause of the slowdown on the analytical slaves co-incided exactly with days were the slave was processing more changes via replication. The number of statements had gone up by more than a factor of three, due to a new script which had been recently added. Neither the master or other slaves were really affected by this extra load but the analytical slaves generated a lot more extra I/O and basically this change was enough to trace the cause and via the replicated tables which were affected it was possible to find out which process was being heavier than expected and address the problem.

Since coming across this problem and because there is no real other way to see the “replication” related load on many slaves (at least using Merlin) I’ve now put in place some scripts which push this table activity statistics into a dba database so I can look at these number and see how they change over time. If I have a similar problem I can look at the statistics and hopefully identify the source of the problem more quickly and thus resolve the problem. This also keeps managers happy which is never a bad thing.

Processing binlogs takes some time, and mine are configured as 1GB files. Storing this summary information is much more compact and thus I can keep a much longer history than would otherwise be possible.

So if you have not considered doing this before and want to “measure” your replication load and perhaps see how it changes over time then consider analyzing your binlogs, storing the results in a database, and providing some pretty tables or graphs which you can use to show colleagues or managers the cause behind unexpected changes in batch processing times…

Update 24/10/2010 11:00 CEST

Since writing this article I also came across this web page which does something similar to what I am doing. It does not go into the detail I am trying to do but does collect some extra statistics which perhaps I should incorporate into my scripts. Certainly the events, bytes and transactions per second counters (max and average) are worthwhile adding, especially if you record data over time.

It was also pointed out to me to look at Maatkit’s mk-query-digest but despite reading the different options I do not see a combination of options which gives per table or per database statistics (number of statements, average and maximum size of each one, number of rows affected). This is a command with a lot of options but I don’t think it covers my needs.

Since perhaps it was not clear actually what I am collecting this table below gives you a quick brief summary taken from one of the master servers (table names have been adjusted).

columns:
——-
count – number of statements related to this table
values – for INSERT type statements the number of VALUES ( ..) , ( … ) … values, basically rows inserted
columns – for INSERT and UPDATE statements the number of columns affected ( multiplied by the number of rows in the case of an INSERT )
bytes – total sum of bytes in each statement for each table.
op – I = Insert Statements, U = Update Statements, ….

Ordering by bytes or count, or grouping by table name allows you to focus on exactly which where the chunk of processing takes place.

While I have only been focusing on SBR replication you can reasonably easily do similar processing and get similar statistics based on RBR.

Missing from my current stats are using the exec time values (MySQL should record the exact number of seconds taken to do an execution, not the whole number of seconds. I should really file a feature request for this if it’s not in 5.5.) and correctly assigning the table to the appropriate database. That’s something that will happen shortly in my code.

Published by

Simon J Mudd

Born in England, I now live in Spain, but spent a few years living in the Netherlands.
I previously worked in banking (financial markets) both in IT and as a broker, but IT has always had a stronger influence. Now working at booking.com as a Senior Database Administrator.
Other interests include photography, and travel.
Simon is married, with two children and lives in Madrid.
View all posts by Simon J Mudd

3 thoughts on “How analysing your binlogs can be quite informative”

Hi Simon,
have you tried out the new binlog API that’s now available from labs.mysql.com for MySQL 5.6? This would give you a programatic way to access the data from the binary and so may give you more flexibility and easier maintenance.

However, I’ve not found a reference to the actual API. We currently use a filter around the mysqlbinlog but avoiding this and accessing via an API would be more direct and efficient.

I assume that this is part of the libmysql library provided with MySQL 5.6. If so my interest would be to use the library from perl on my live MySQL 5.5 servers, thus requiring the library to be installable separately from any other MySQL library. I currently use RHEL and MySQL’s rpms don’t allow for the installation of packages from multiple versions of MySQL at the same time. So away from experimental, possibly manual usage, this makes trying out the API more difficult. A feature request for 5.6 from someone using rpm packaged binaries would be: please allow your packages to support the installation of multiple versions of mysql at once. (Yes, I’ve already made a feature request of this some time ago.) Do that or backport this API into the MySQL 5.5 library and mark its use as experimental. Since this is I guess not part of any other “used” code that should be possible, and should then give you a much wider audience of this new facility.

In the meantime if I get time I’ll try and see if I can find the docs on this API and a spare server to try this on.