Advanced MySQL Replication - Improving Performance

February 12, 2009

Introduction

MySQL
Replication is some sophisticated and flexible technology. As we discussed in
our recent article "Fixing
MySQL Replication", it can be made quite reliable and robust if the
right tools are used to keep it running smoothly.

But
what of sites who are experiencing enormous load on the primary server, which
is overloading the slave server. Are there ways to speed up performance, so
the slave can keep up? As it turns out there are quite a number of ways to do
this. We'll discuss a number of them.

Architecture

First,
a quick discussion of the architecture. As you recall, there are two threads
on the slave that are performing work. One is the IO thread, which copies
transactions from the master servers binary log to the slaves relay log. The
next is the SQL thread. This is the important one. It reads the relay logs
and applies the SQL queries against the slave database in a serial fashion.
Once again that means all writes are serialized on the slave database. This
can become a performance bottleneck in high transaction environments.

Use Multiple Slaves

One
method would be to break up data into multiple slaves. This could be done
database by database (schema by schema) creating one slave database. Each
slave would be pointing to the same master, and using replicate_do_db and replicate_ignore_db
statements to specify which data to capture for it's slave. However it may be
that your applications are not broken up cleanly that way, or that load favors
certain tables, rather than certain schemas. So for your site there may be
reason to do it on a table-by-table basis. You would then use replicate_do_table
and replicate_ignore_table. You'll also need to keep in mind that more slaves
mean more threads on the master database, which could add load there. If you
run into that problem, consider creating a distribution master, and then
slaving off of that. As a further consideration, realize that if you are
running backups off of slave databases, your backup scripts will obviously be
more complicated in this arrangement, so document well.

Priming the Cache

A
second method would be to do what's called "priming the cache".
Recall that each of the SQL statements, which executed on the master, is re-executed
on the slave database. All the work involved in reading blocks of data,
sorting, and caching query execution details has to happen again on the slave.
However, what if we could do this all BEFORE the slave SQL thread gets to that
query. That's what the maatkit mk-slave-prefetch tool was built to do. Some
sites have had great success with this method, so it is worth investigating.
As with anything, test, tweak, and evaluate before rolling out in production.

Move Writes Outside of Replication

A
third method would be to move writes outside of replication. What does that
mean? Suppose you have tables that load apache logs into your primary
database. You can then use replicate_ignore_tables option to skip these heavy
archive tables, and then just run a separate data load process on the slave to
get that data into your slave database. Since this can also be done in parallel,
it may provide surprising speedups.

Tune Disk IO Subsystem

A
fourth method could be to look at the underlying disk subsystem. Are you using
RAID? If not, consider it. Can you get faster disks, or add more spindles to
your RAID array? If so, these options can speed up overall IO throughput to
the volume where your datafiles are sitting.

Consider the MyISAM Storage Engine

A
fifth option would be to look at using the MyISAM storage engine on the slave
side for those high write-heavy tables. Recall that MyISAM is not
transactional. There is a huge amount of code involved in providing the
row-based locking, and ACID compliance you find in InnoDB and related
transactional storage engines. MyISAM is blazingly fast because it doesn't
have to check anything. It simply writes. Therefore, it is very very fast.

You
might ask, but what about my transactional integrity. Remember though that all
transactions are serialized on the slave, so there is no worry of other
sessions reading or writing the same data. You only need to worry about the
SQL thread.

Give Up Some Safety

The
sixth option we'll mention involves giving up some safety and recoverability on
the slave. Since your slave database is presumably a copy of data held
elsewhere, recommending less safe options can be seen in that context. Obviously,
be sure all your data has been sufficiently backed up in various places.

Firstly,
you can disable the binary log on the slave. This will reduce the amount of
data that needs to be written while the slave is executing queries from the SQL
thread.

Secondly
you can configure InnoDB to flush its changes less frequently using innodb_flush_log_at_trx_commit=2.
In addition, you can set innodb_locks_unsafe_for_binlog=1. For MyISAM there is
a setting delay_key_write=ALL which may help.

Since
these options make your database less recoverable, you want to be VERY SURE to
disable them if this slave becomes the master at some point.

If
the above options don't work for you, consider looking at MySQL 5.1. The new
version of MySQL includes a feature called row-based replication. In contrast
with the existing statement-based replication, row-based can often avoid re-executing
statements on the slave by passing along the change vector of actual data that
changed on the primary. Time will tell if the overall performance is
noticeably faster, but it's worth investigating.

Conclusion

When
looking at ways to speed up the slave, keep in mind that lag on the slave
server is normal. MySQL's out-of-the-box slave technology is not meant to be
synchronous. If your application has that requirement, we recommend looking at
the Google patches to provide semi-synchronous replication. We'll discuss that
in more detail in next month's article as we continue our investigation of
advanced MySQL replication techniques.