Monday, December 16, 2013

At Facebook, we have upgraded most of MySQL database tiers to 5.6, except very few tiers that have a special requirement -- very fast single threaded replication speed.

As Oli mentioned, single threaded performance is worse in 5.6. The regression is actually not visible in most cases. For remote clients, the performance regression is almost negligible because network latency is longer than 5.1->5.6 overhead. If clients are running locally but MySQL server is disk i/o bound, the overhead is negligible too because disk i/o latency is much longer than 5.1->5.6 overhead.

But the regression is obvious when clients run locally and queries are CPU bound. The most well known local client program for MySQL is SQL Thread (Replication Thread). Yes, 5.6 has a slower replication performance problem, if SQL thread is CPU bound.

If all of the following conditions are met, SQL thread in 5.6 is significantly slower than 5.1/5.5.

Almost all working sets fit in memory = Almost all INSERT/UPDATE/DELETE statements are done without reading disks

Very heavily inserted/updated and committed (7000~15000 commits/s)

5.6 Multi-threaded slave can not be used (due to application side restrictions etc)

You want to use crash safe slave feature and optionally GTID

Here is a micro-benchmark result. Please refer a bug report for details (how to repeat etc).

Fig 1. Single thread replication speed (commits per second)

On this graph, 5.1, 5.5, and 5.6 FILE are not crash safe. fb5.1 and 5.6 TABLE are crash safe. 5.6p has some experimental patches. Crash safe means even if slave mysqld/OS are crashed, you can recover the slave and continue replication without restoring MySQL databases. To make crash safe slave work, you have to use InnoDB storage engine only, and in 5.6 you need to set relay_log_info_repository=TABLE and relay_log_recovery=1. Durability (sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1) is NOT required.

There are a few observations from the benchmark.

17% drop from 5.1.69 to 5.5.33. I did not profile 5.5 much, but I suspect the regression was mostly caused by Metadata Locking feature introduced from 5.5.

From 5.5.33 to 5.6.15(FILE), 20% drop if log-slave-updates was enabled, 18% drop if log-slave-updates was disabled. In 5.6, FILE based replication has an inefficiency bug that it writes to OS filesystem cache too often (bug#70342). 5.6.15p(FILE) applied the fix for the bug, but was still 15% lower than 5.5.33 if log-slave-updates was enabled, 2.5% lower if log-slave-updates was disabled. This shows writing to binlog (from single threaded SQL thread) in 5.6 takes more time than 5.1/5.5.

Performance drop for 5.6.15 TABLE was serious. Compared to fb5.1 (crash safe), performance drop was 51.2% (14520/s->6103/s). slave_relay_log_info table is opened/fetched/updated/closed per every transaction commit. By keeping the table opened and skipping fetching row (updating only), performance improved to 7528/s (with log-slave-updates) and 9993/s (without log-slave-updates), but these were still much lower than fb5.1 (12505/s and 17056/s).

How can we fix and/or avoid the problem?
From MySQL development point of view, there are some room for performance optimizations.

Writing to slave_relay_log_info table should be optimized more. Currently the table is opened/fetched/updated/closed via Storage Engine API for each commit. It will be more efficient by updating the table via Embedded InnoDB API, which is currently used from InnoDB memcached plugin. slave_relay_log_info is a system table, and is updated by SQL thread (and MTS worker threads) only, so some tricky optimizations can be done here.

Writing to binlog should be faster in 5.6. 5.6 added many features such as group commit, so it is not surprising to get lower single threaded performance without any optimization.

5.1 to 5.5 performance drop was likely caused by Metadata Locking feature. I have not profiled in depth yet, but I think some functions (i.e. MDL_context::acquire_lock()) can be refactored to get better performance.

From DBA point of view, there are couple of workarounds. First of all, the regression won't happen in most cases. Most application data should be much larger than RAM, and SQL thread is doing many I/O, so CPU overhead is negligible. Even if working sets fit in memory, many applications don't need over 7000 commits per second on single instance. If you don't need high commit speed, you won't hit the problem either.
But some applications may really hit the problem -- a kind of queuing application is a typical example. At Facebook we are continuing to use 5.1 for such applications.
If you want to use 5.6 for such applications, you may disable crash safe slave, because usually database size is very small. If database size is small enough (i.e. < 200GB), you can easily restore from other running master/slaves without taking hours. If you don't need crash safe slave feature, you can use FILE based (default) replication, which is still slower than 5.1/5.5 (see "5.6.15 FILE" on the above graph) but is much better than TABLE based.
Also, Consider not using log-slave-updates. log-slave-udpates is needed when you use GTID. If you want master failover solution without GTID, MHA is a good enough solution and it does not require log-slave-updates.

About Me

I am a database engineer at Facebook.
Before joining Facebook, I was a principal database and infrastructure architect at DeNA. My primary responsibility at DeNA was to make our database infrastructure more reliable, faster and more scalable. Before joining DeNA, I worked at MySQL/Sun/Oracle as a lead MySQL consultant in APAC for four years.
You can contact me on Yoshinori.Matsunobu_at_gmail.com (replace _at_ with @).