MySQL 5.7.6: It is easier to switch master now!

One of the primary objectives of MySQL replication is providing an easy failover process i.e. switching to a redundant system if the primary MySQL server fails. In MySQL this translates to switching to the most appropriate slave server in the eventuality of a failure of the master server.

The promotion of a candidate slave to become a new master is based on a lot of factors, undoubtedly the most important factor being that the chosen slave should be most up to date with the primary server (the old master) before we lost it! This blog explains how to use new features in MySQL 5.7.4 and later to make failover easier.

To find the most up to date slave server, a failover script looks at the set of transactions received by the slave and compares it with every other slave to find the one that has received the biggest set of transactions. There could be more sophisticated ways of doing this, for instance you could choose one preferred slave that you want to promote (perhaps it has a better hardware configuration, it has no filters, physical location etc) and make sure it receives every transaction that has been received by all the other slaves. For simplicity though, let’s narrow down our definition of the most appropriate slave to promote to be the one that is most up to date with the lost master.

How to failover using GTIDbasedreplication

To denote a set of transactions, MySQL uses GTID sets (a set of global transaction identifiers). To read more about GTIDs, you can refer to our official documentation or developer blogs. To find the set of transactions received by a MySQL slave server, you simply execute:

Execute this on every slave and compare the sets to find the slave with largest received transaction set- that’s your candidate slave for promotion. Let us call this slave the new master. Before you switch a slave to replicate from the new master, you earlier had to make sure all the transactions the slave has received are executed. In versions of MySQL prior to 5.7.4, you needed to do the following steps:

stop slave.

start slave to replicate until all received transactions are executed.

Note the removal of the wait function (wait_for_gtid_executed_set) to ensure that the received transactions are executed before you switch master. There is no need for that step anymore!

How to failover using non-GTIDbased replication

If you are not using GTIDs, though we highly recommend you should, you can still take advantage of the current improvements. This means you can change the following part of your failover script:

STOP SLAVE

SHOW SLAVE STATUS to get coordinates (Read_Master_Log_Pos, Master_Log_File)

START SLAVE UNTIL coordinates saved in step (2).

SELECT MASTER_POS_WAIT (coordinates,…).

CHANGE MASTER TO <new_master_def>.

START SLAVE.

to the following simpler steps:

STOP SLAVE IO_THREAD;

CHANGE MASTER TO <new_master_def>;

START SLAVE IO_THREAD.

Conclusion

Using the improvements in the newer versions of MySQL 5.7, failover becomes easier. Our effort to improve MySQL high availability continues and we remain committed to easing the processes. Please do try this, and as always let us know your feedback. You can also use mysqlfailover or MySQL Fabric that automate the failover process.