Various musings on mainly the development and technical side of MySQL.

Wednesday, April 13, 2011

Round-Robin Multi-Source in Pure SQL

With the addition of the new tables to implement crash-safe replication we also get access to replication information through the SQL interface. This might not seem like a big advantage, but it should not be taken lightly. To demonstrate the power of using this approach, I will show how to implement a multi-source round-robin replication described at other places (including our book). However, compared to the other implementations—where the implementation requires a client to parse the output of SHOW SLAVE STATUS—the twist is that the implementation is entirely done in the server, using pure SQL.

If you're familiar with replication, you know that a slave can just replication from a single master. The trick used to replicate from multiple master—this is usually called multi-source—is to switch between masters in a time-share fashion as illustrated in Figure 1. The schema used to pick the master to replicate can vary, but it is common to use a round robin schedule.

The steps necessary to switch master are:

Stop reading events from the master and empty the relay log. To stop reading events from the master, it is necessary to ensure that there are no outstanding events in the relay log before switching to another master. If this is not done, some will not be applied and will have to be re-fetched from the master.

Stop the I/O thread.

Wait for the events in the relay log to be applied.

Stop the SQL thread.

Save away the replication information.

Fetch the saved information about the next master to replicate from.

Change master using the new information.

Start the slave threads.

Simple, right? So, let's make an implementation! So, what pieces do we need?

To handle the periodic switching, we use an SQL event for executing the above procedure.

We need a table to store the state of each master. The table should contain all the necessary information for configuring the master, including the binlog position.

We need to be able to store what master we're currently replicating from.

We need two tables: a table my_masters to record information about the available masters and a table current_master that keeps information about the current master. The my_masters table will contain information on how to connect to the masters as well as the last seen position. We assume that the user and password information is stored in the table and won't save away that information when switching master. To store the current master being replicated from, We cannot use a user defined variable—because each invocation of an event spawns a new session—so we store this information in a table.

Switching masters

To be able to execute a CHANGE MASTER statement with the information we need, it would be perfect to use a prepared statement, but unfortunately, the CHANGE MASTER statement is one of those statements that cannot be used inside a prepared statement, so we have to build the statement dynamically. To make it easier, we create a change_master procedure that does the job of building, preparing, executing, and deallocating a prepared statement. We also allow the file name and position passed to be NULL, in which case we start replication without these parameters, essentially starting from the beginning of the masters binary log.

The last step is to create the event that switch master for us. As a specific feature, we implement the event handling so that we can add and remove rows from the my_masters table and the event will just pick the next one in order. To solve this, we use queries to pick the next one in order based on the index of the last used master and then an additional query to handle the case of a wrap-around with a missing table at index 1.

To allow the table to be changed while the events are executing, we place all the updates of our tables into a transaction. That way, any updates done to the table while the event is executing will not affect the logic for picking the next table.

There are some extra logic added to handle the case that there are "holes" in the index numbers: it is possible that there is no master with index 1 and it is possible that the next master does not have the next index in sequence. This also allow the server ID of the master to be used, but in the current implementation, we use a simple index instead.

SELECT idx INTO l_next_idx FROM my_masters
WHERE idx > (SELECT idx FROM current_master)
ORDER BY idx LIMIT 1;

Find the next master in turn. To handle that masters have been removed, we will pick the next one index-wise. Wrap-around is handled by using the default of 1 above.

SELECT idx INTO l_next_idx FROM my_masters
WHERE idx >= l_next_idx
ORDER BY idx LIMIT 1;

If we did a wrap-around, it might be the case that master with index 1 does not exist (the default for l_next_idx), so then we have to scan and find the first index that exists which is equal to or greater than l_next_idx.

1 comment:

How to use this with multiple databases on source DB and single DB on destination DB server. Multiple source DBs replicated to single DB. I am trying out this with replicate rewrite db option but its not working