Wednesday, December 14, 2016

In place transparent compression of MySQL binary logs

Any DBA who has administered a busy master knows how fast the disk space occupied by binary logs may grow. DBAs have no control on this: the growth depends on the workload, and the workload depends on many factors, e.g.:

- application changes (the applications start writing more due to code changes)

So either you have being thoughtful and have planned in advance for a large enough storage space (to handle the increase in number of binary logs), or, sooner or later, you will face the usual dilemma - how many retention days dare you give up to accommodate for the binlog growth?

So, the obvious answer at first was to reduce the retention of binary logs via theexpire_logs_days variable. Not the ideal option, as in my situation, I had to reduce it to a mere 2 days, not even enough to survive a weekend. At the same time I didn't like the idea to keep more days and be waken up at 3am by a filesystem full alert.

So the idea of compressing the binary logs came to my mind, but I wanted to do it in a transparent way, preserving the existing binlog retention mechanisms (expire_logs_days variable and PURGE BINARY LOGS command should continue to work as before) and without breaking the MySQL server and/or replication. Ideally, the timestamps of the files themselves should also preserved.

I decided to try and compress the binary logs in place. In place means that when you look at your binlogs folder, you don't notice anything unusual. Here's (a part of ) the listing of one of our servers with compression active:

-rw-rw---- 1 mysql mysql 13631245 Dec 14 14:17 dbmd01.079826

-rw-rw---- 1 mysql mysql 13909321 Dec 14 14:23 dbmd01.079827

-rw-rw---- 1 mysql mysql 13656948 Dec 14 14:30 dbmd01.079828

-rw-rw---- 1 mysql mysql 13915222 Dec 14 14:37 dbmd01.079829

-rw-rw---- 1 mysql mysql 104883640 Dec 14 14:44 dbmd01.079830

-rw-rw---- 1 mysql mysql 104898032 Dec 14 14:50 dbmd01.079831

-rw-rw---- 1 mysql mysql 104861122 Dec 14 14:57 dbmd01.079832

You can spot which files are compressed by looking at the file size, but everything else is preserved. Looking with file, here's what's in each of them:

dbmd01.079826: gzip compressed data

dbmd01.079827: gzip compressed data

dbmd01.079828: gzip compressed data

dbmd01.079829: gzip compressed data

dbmd01.079830: MySQL replication log

dbmd01.079831: MySQL replication log

dbmd01.079832: MySQL replication log

Basically, after compression, the compressed file retains the original file name. This is to keep MySQL happy - you know, these files are tracked in the index file and read when mysqld starts.

Speaking of which, here's what happens when you start mysqld with in-place compression:

2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number; It's not a binary log file that can be used by this version of MySQL

2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number; It's not a binary log file that can be used by this version of MySQL

2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number; It's not a binary log file that can be used by this version of MySQL

2016-10-26 03:06:13 1407 [ERROR] Binlog has bad magic number; It's not a binary log file that can be used by this version of MySQL

Although these entries are flagged as ERROR, actually mysqld just ignores the compressed files, and more importantly, does not touch them.

This way, compressed binlog files can be managed exactly like uncompressed, standard ones. Of course, I need to remember that if I need to peruse their content, I have to uncompress them before feeding them to the mysqlbinlog utility.

Last but not least, in order to ensure that we do not inadvertently compress binlogs that haven't been downloaded by all slaves yet, the following extra safety rules are enforced:

- do not compress any binlog which is not at least 1 hour old

- only compress binlogs that have a sequence number which is less than the current sequence number of the most lagging slave

The script runs every 10 minutes, extracts the list of the binary logs which aren't compressed yet, applies the above rules, then it proceeds to compress them.

I have set expire_logs_days back to 7 days and have plenty of disk space for extra growth now....

You can find the compress_binlog.sh which I crafted for the above on my GitHub page.

One caveat - script hasn't been tested with binlog sequence numbers greater than 999,999. I am actually curious to see what happens to the binlog file name when we reach one million - need to either be very patient, or have a look at the source code... :-)

14 comments:

I wouldn't call this transparent. However I think this can be useful. Note that you might have to edit the binlog index file if you want to make the binlogs available again after uncompressing them. Another option is to use mysqlbinlog on the decompressed files.

Note that Tencent has a patch to get MySQL to compress the binlog files. https://bugs.mysql.com/bug.php?id=48396https://github.com/TencentDBA/TMySQL

It'd be cool if you could store the binlogs on a compressed filesystem so they're still readable by mysql. Even cooler if you could somehow setup tiered storage so recent binlogs are on fast storage and older ones get rotated out to the compressed filesystem.

Am I curious with your current gzip design what's the benefit of keeping gzipped binlogs on the master if mysql can't read them? Archiving perhaps? I'd love to hear your use case.

a compressed filesystem would be nice, but could also be a burden if you have many slaves competing to read a binlog file, and a busy master which has a write intensive workload . Remember that the current binlog is kept open and has one writer and many readers, I feel that the overhead of a compressed filesystem wouldn't play nice with this due to the contention point which happens on this single binlog file. And these servers aren't connected to a Storage Area Network unfortunately so we don't have fancy options here, hence we built this simple compression setup which actually works well for this particular situation.

Your question about what benefit we get actually makes sense: why don't we just delete them if we are short on space, you say.

Well, to us, the reason for keeping one week full of binlogs is twofold: first, to have a last resort backup if for some reason the "official" daily backup comes up corrupt and we are hit by a disaster; and second, to be able to inspect statements that have been issued against the master in the last week, in order to troubleshoot application issues (it happens to us quite often that we are asked questions like, "who put that value in that column?"). For this last purpose, we just have to uncompress them before feeding them to the mysqlbinlog utility.

I forgot to mention one more reason (maybe the most important one) for keeping one week of binlogs (although I mentioned it briefly in the post) - if replication breaks e.g. during the weekend, and you don't notice it (alerts fail for example) you may find yourself on monday morning with an unrecoverable slave because the needed binlogs are no longer on the master.

I'm not sure that would happen in practice with say zfs. If a file (actually a block) is hot it will be in the arc cache, or at worst the l2arc. Depending on the zfs version this could be compressed in ram not not (older versions not) However it has been shown in most cases its still quicker to come from ram compressed than disk, and from disk compressed than uncompressed. Their are obviously edge cases, so it's always vital to benchmark it yourself to verify your situation. Also bare in mind you will never get the same compression ratios as say gzip on the file itself even if you use the same settings. This is due to the file system compressing at the block level as opposed to the whole file level. This means the compression window is much smaller, and hence more limited opportunities.

One good thing about zfs compression is you can toggle it on and off seamlessly, so if you start to notice problems you can revert it.

In theory the same things should be evident on btrfs, but i'm not as confident there as its not as mature yet.

You're welcome.Actually you can use whatever you want. I used bzip2 first but it was taking too much time to compress each file (we have them set to grow to 100MB before they rotate) so I switched to pigz (parallel gzip).

My approach would use one of my favorite tool: do not keep the binlogs on the master, but on a Binlog Server. A smart Binlog Server should be able to managed transparently compressed/uncompressed binlogs. And an even smarter Binlog Server should be able to store recent binlogs on the local disks, and old ones in S3 or HDFS.

Instead of in-place compression, I would keep the binlogs just few days but backup last 1 month/week of binlogs for example. I made a small script that automates that with compression and rotation. Here it is: https://github.com/ardabeyazoglu/mysql-binlog-backup

Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic. If possible, as you gain expertise, would you mind updating your blog with extra information? It is extremely helpful for me. Binary options recovery

About Me

I have been always interested in MySQL since the early days of v3.23. After spending 20+ years in the IT field mostly as a system administrator, In the last few years I decided to focus on MySQL and have been working as full time DBA since. You can check my resume on LinkedIN