16.4.4 Troubleshooting Replication

If you have followed the instructions but your replication setup
is not working, the first thing to do is check the error
log for messages. Many users have lost time by not
doing this soon enough after encountering problems.

If you cannot tell from the error log what the problem was, try
the following techniques:

Verify that the master has binary logging enabled by issuing a
SHOW MASTER STATUS statement.
If logging is enabled, Position is nonzero.
If binary logging is not enabled, verify that you are running
the master with the --log-bin
option.

Verify that the master and slave both were started with the
--server-id option and that the
ID value is unique on each server.

Verify that the slave is running. Use
SHOW SLAVE STATUS to check
whether the Slave_IO_Running and
Slave_SQL_Running values are both
Yes. If not, verify the options that were
used when starting the slave server. For example,
--skip-slave-start prevents the
slave threads from starting until you issue a
START SLAVE statement.

Verify the privileges for the user being used for
replication on the master.

Check that the host name of the master is correct and that
you are using the correct port to connect to the master.
The port used for replication is the same as used for
client network communication (the default is
3306). For the host name, ensure that
the name resolves to the correct IP address.

Check that networking has not been disabled on the master
or slave. Look for the
skip-networking option in
the configuration file. If present, comment it out or
remove it.

If the master has a firewall or IP filtering
configuration, ensure that the network port being used for
MySQL is not being filtered.

Check that you can reach the master by using
ping or
traceroute/tracert
to reach the host.

If the slave was running previously but has stopped, the
reason usually is that some statement that succeeded on the
master failed on the slave. This should never happen if you
have taken a proper snapshot of the master, and never modified
the data on the slave outside of the slave thread. If the
slave stops unexpectedly, it is a bug or you have encountered
one of the known replication limitations described in
Section 16.4.1, “Replication Features and Issues”. If it is a bug, see
Section 16.4.5, “How to Report Replication Bugs or Problems”, for instructions on how to
report it.

If a statement that succeeded on the master refuses to run on
the slave, try the following procedure if it is not feasible
to do a full database resynchronization by deleting the
slave's databases and copying a new snapshot from the master:

Determine whether the affected table on the slave is
different from the master table. Try to understand how
this happened. Then make the slave's table identical to
the master's and run START
SLAVE.

If the preceding step does not work or does not apply, try
to understand whether it would be safe to make the update
manually (if needed) and then ignore the next statement
from the master.

If you decide that the slave can skip the next statement
from the master, issue the following statements:

mysql> SET GLOBAL sql_slave_skip_counter = N;
mysql> START SLAVE;

The value of N should be 1 if
the next statement from the master does not use
AUTO_INCREMENT or
LAST_INSERT_ID().
Otherwise, the value should be 2. The reason for using a
value of 2 for statements that use
AUTO_INCREMENT or
LAST_INSERT_ID() is that
they take two events in the binary log of the master.

If you are sure that the slave started out perfectly
synchronized with the master, and that no one has updated
the tables involved outside of the slave thread, then
presumably the discrepancy is the result of a bug. If you
are running the most recent version of MySQL, please
report the problem. If you are running an older version,
try upgrading to the latest production release to
determine whether the problem persists.