MySQL replication in action - Part 3: all-masters P2P topology

In the previous article, we saw the basics of establishing replication from multiple origins to the same destination. By extending that concept, we can deploy more complex topologies, such as the point-to-point (P2P) all-masters topology, a robust and fast way of moving data.

Introduction to P2P all-masters topology

A P2P (Point-to-point) topology is a kind of deployment where replication happens in a single step from the producer to the consumers. For example, in a master/slave topology, replication from the master (producer) reaches every slave (consumer) in one step. This is simple P2P replication. If we use a hierarchical deployment, where every slave that is connected to the master is also replicating to one or more slaves, we will have a 2-step replication (Figure 1). Similarly, in circular replication, we have as many steps as the number of nodes minus one (Figure 2.)Figure 1 - Hierarchical replication depth of processingFigure 2 - Circular replication depth of processing
Why is this important? The number of steps affects performance, resilience, and, potentially, accuracy.

Performance depends on the number of steps. Before the final leaf of the topology graph gets the data, it will replicate N times, one for each step. In figure 1, host4 will be updated twice as slower as host2. In Figure 2, host4 will be three times slower than host2, as it has to wait for two steps before data reaches its tables.

Resilience, or the capacity to withstand failures, also depends on the number of intermediate steps. Intermediate masters are single points of failure (SPOF) that can break a branch of the topology graph, or the whole deployment. In this context, a master/slave deployment has one SPOF; the topology in figure 1 has 2, and the circular replication has 4 of them.

Accuracy can be different if the data goes from master to slave directly, compared to the data going through one or more intermediaries. If data is applied and then extracted again, its chances of reaching the final destination unchanged depend on the intermediate masters to have exactly the same configuration as its predecessors in the chain.

With multi-source replication, we can overcome the limitations of circular topologies, and create a functionally corresponding deployment that has no SPOF, and it is, by virtue of its direct connections, faster and potentially more accurate than its predecessors.Figure 3 - All-masters P2P replication
An all-masters P2P topology is a lot like fan-in topology, but with the number of nodes, masters, and slaves being the same. If all the nodes are fan-in slaves, and are also masters at the same time, every node can get data from the others and can send data at the same time.Figure 4 - All-masters P2P replication depth of processing
In an all-masters P2P topology, each node replicates to every other node. Compared to circular replication, this deployment requires more connections per node (it's a small price to pay) but the data flows faster and more cleanly, as the origin of each transaction is easier to track.

Deploying a P2P all-masters topology in MySQL 5.7

The procedure is the same that we have seen for fan-in replication, but with a few differences:

Every node needs to be a master, and therefore it must have binary logs configured;

The procedure for connecting to the other nodes needs to be repeated for each node. In a N-node deployment, you will end up having, for each node, N-1 slave channels.

We will repeat the installation that we used for FAN-IN, running the same script from mysql-replication-samples. The difference in invocation will be that we ask for ALL-MASTERS:

The procedure is similar, but since we are connecting all nodes instead of just one, the list of operations is longer. You can see that we have enabled GTID and crash-safe tables, as we did for FAN-IN, and we have executed a grand total of 12 'CHANGE MASTER TO' statements. AT the end of the installation, we have a test script that will tell us if replication is working. This script will create one table for each node, and then check that each node has got 4 tables

The output shows that each node has got 4 tables. Replication is working as expected. We can have a look at the monitoring options, to see how useful and clear they are in this topology.
As we did for fan-in topologies, we load the Sakila database in one of the nodes, to get some differences, and then look at the GTID situation:

It's not a pretty sight. It's what we saw for fan-in, but multiplied by 4. Now we know that the price to pay for this efficient topology is an increase in monitoring complexity.
Let's have a look inside:

This is a partial view of replication in this deployment. It only applies to node #1, where we see the status of its slave channels. We need to run the same command in all nodes to make sure that replication is healthy everywhere. As mentioned before, we have 12 channels to monitor. Looking at one node only will give us a possibly misleading picture.
And here we can see once more why it was a bad decision not to have a table for master status:

If we want to match GTID positions in master and slave, we need to get the value of Executed_Gtid_set from master status, or the same information from @@global.gtid_executed, then find the GTID containing the server_uuid belonging to this master within that long string, and finally extract the GTID sequence.

Get the raw info: "18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,226e3350-4119-11e5-8242-de985f123dfc:1-3,270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3"

Find the server UUID: "18fd3be0-4119-11e5-97cd-24acf2bbd1e4"

Find the relevant GTID: "18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3"

Extract the GTID: "3"

The information in mysql.slave_relay_log_info and performance_schema.replication_* tables will not help us to simplify the task of monitoring replication. All the shortcomings that we have noticed for fan-in are also present for all-masters topologies. The main difference is that the information in SHOW MASTER STATUS and SHOW SLAVE STATUS is more crowded.

Deploying a P2P all-masters topology in MariaDB 10

The installation is fairly similar to MySQL 5.7. We only see the same syntax differences already noted for fan-in topologies.

After loading the Sakila database into node #2, we see a familiar pattern, already noted for fan-in. The GTID is shown as a comma delimited list of all the data streams that have conveyed in each server.

Summing up

Using the methods already learned for fan-in deployments, an all-masters P2P topology is easy to install, albeit longer and more complex.
Monitoring this topology presents the same hurdles already seen for fan-in, increased by the number of connections. For a N-node deployment, we will need to monitor N*(N-1) channels.
The lack of a table for master status is felt more acutely in this topology, as the current data is more difficult to parse.

What's next

This topology shows that we can deploy a very efficient multi-source replication system, at the expense of having many connections and enduring more complex monitoring data.
We can, however, compromise between the need of having many masters and the complexity of the deployment. We will see the star topology, where, by introducing a SPOF in the system we can deploy a more agile all-masters topology. And we will also see some hybrid deployments, all made possible by the multi-source enhancements in MySQL 5.7 and MariaDB 10.