Follow me on Twitter

MySQL Replication Troubleshooting

Sometimes things don’t work out with replication. When I first started experimenting with it I thought this was a “setup and forget about it” kind of job.

Experience has shown though that you have to regularly triple check and see if things may have broken (despite a good and once working setup).

Let’s take a look at what you can do when your Slave isn’t replicating anymore. If you want to know more about how to setup replication, have a look at my previous article in which I explain how this works.

Master Troubleshooting

Replication problems on your Slave do not usually indicate a fault on the Master; as long as it’s creating a log, all is well. You can check this by issuing a couple of “show master status” commands in 30 second intervals and see the log position change.

Provided data is being created or changed, the log positions should be different. If no log status shows up, then you’re being told that “binary loggin is disabled” – in which case you need to update your my.cnf file (see my other article about this).

Slave Troubleshooting

More often replication fails because of something which isn’t working well on the Slave. Common scenarios include:

login problems: wrong user/password/host combination

data import/synchronization problems

problematic statements

corrupt tables or databases

Let’s go through these one by one. I intend to update this article if I find other problems (and solutions). Feel free to share your experiences in a comment at the bottom.

How to check what’s bothering your Slave

Your best place to see where something’s gone wrong is the MySQL log. On CentOS and many other Linux distributions that’s in /var/log/mysqld.log and you can check the last 10 lines with

tail /var/log/mysqld.log

Not the easiest thing to read, but you get the basic idea of what the problem is.

Slave can’t login to the Master

Any issues pertaining to log data not being read, or connections not being possible are most certainly due to login problems.

See if you yourself can login to your Master as the replication user if you’ve set one up. Log in like this:

mysql -h domain+or+ip -u replicationuser -p

If this isn’t possible then you need to look at your credentials on the Master. Perhaps the replication password is wrong, perhaps the user doesn’t have the “replication slave” privilege, or is not allowed to connect from your host.

You can always change your slave to use your root user momentarily to see if connections in general are possible. Note that this is not meant as a long term solution.

Master and Slave are out of sync

If you CAN login and your Slave still throws up errors then it’s usually because the Slave is trying to execute a command from the Master log and fails. For example, he may try to delete a database that doesn’t exist. Such things can happen if you’ve (accidentally) changed data on the Slave. That’s a big no-no.

It’s “read only”, all else is just a replica of the Master. Even though you can introduce new data onto the Slave, changing data that has been replicated from the Master will cause problems.

Sync problems can also occur when you didn’t restart MySQL on the Slave just after the big data import.

Lucky for us this is not a biggie though: simply stop the Slave, restart MySQL and give the your Slave the Master Log Coordinates manually (that’s why it’s handy to make a note of those).

Corrupt Data on the Slave

Sometimes tables or databases can get corrupt when they’re not properly closed. These things can happen on the Master – and as a result find their way onto the Slave, or something bad can happen on the Slave.

In either case, the Slave may not be able to process a statement and as a result will also stop and complain in his log.

To overcome this, you have two options:

skip the bad statement and carry on

repairing the corrupt tables and carry on

Let’s discuss how to tackle both.

Skip a Statement or two (not recommended)

This is the quick and dirty way of making the Slave ignore one (or several) Master log statements. This is not a problem if we’re dealing with disposable or temporary data – but it is a problem if you’re not (and frankly, who can really tell).

Be aware that skipping statements on the Slave will cause data inconsistencies – if these are noticeable or not depends entirely on your data and applications.

Here’s how you tell the Slave to skip the next 1 statement:

stop slave;
set global sql_slave_skip_counter = 1;
start slave;

You can skip as many events as you like by increasing the value. Once the Slave is running again, go back to the log and see if processing goes well. Use this approach with extreme caution!

Repairing your Tables and Databases

Ignoring a problem and hoping it doesn’t bite you later is never a good idea – repairing what’s broken on the other hand sounds like a good plan.

You can repair single tables or databases with the “repair table” and “repair database” commands respectively – but what happens when you have hundreds of databases containing several thousand tables? You’d sit there all night issuing commands.

Share this:

Related

Jay is the CEO and founder of WP Hosting, a boutique style managed WordPress hosting and support service. He has been working with Plesk since version 9 and is a qualified Parallels Automation Professional. In his spare time he likes to develop iOS apps and WordPress plugins, or draw on tablet devices.
He blogs about his coding journey at http://wpguru.co.uk and http://pinkstone.co.uk.

Removing a database will not delete those log files, it will merely add a log entry that you did it. Re-creating it with the same name in the future will also create a log entry, so as long as the slave does all the same steps it should not have a negative impact in the future. Good luck with the rebuild 😉