How Does Semisynchronous MySQL Replication Work?

How Does Semisynchronous MySQL Replication Work?

With the recent release of Percona XtraDB Cluster, I am increasingly being asked about MySQL’s semi-synchronous replication. I find that there are often a number of misconceptions about how semi-synchronous replication really works. I think it is very important to understand what guarantees you actually get with semi-synchronous replication, and what you don’t get.

The first thing to understand is that despite the name, semi-synchronous replication is still asynchronous. Semi-synchronous is actually a pretty bad name, because there is no strong coupling between a commit on the master and a commit on the replicas. To understand why, let’s look at what truly synchronous replication means. In truly synchronous replication, when you commit a transaction, the commit does not complete until all replicas have also committed successfully. In MySQL’s semi-synchronous replication, the commit completes before the transaction is even sent to any of the replicas. Therefore, by definition the transaction cannot have committed on any of the replicas. If there’s any problem after the commit happens on the master, it’s possible that the replicas won’t get the transaction, and even after they do, there’s no guarantee they can apply and commit it successfully themselves (duplicate key error, anyone?). If any of these problems happens, it’s too late–the commit is already permanent on the master, and can’t be rolled back.

What should semi-synchronous replication be called instead? I believe that it should be called delayed-acknowledgment commits, because this is what actually happens. When a transaction commits on the master, the commit proceeds as normal, and the transaction is sent to the replicas as normal, but the client connection to the master is not told that the commit has completed until after at least one replica has acknowledged receiving the transaction.

Another way to look at the same thing is that semi-synchronous replication actually forces the client to be synchronized, not the replicas. The client is forced to wait until the transaction has been sent to one of the replicas, but the commit on the master is not forced to wait at all, nor are replicas forced to do anything. The commit has already happened on the master, so the cat’s out of the bag and there’s no way to force replicas to do anything. As a result, the effect is that the client’s activity is throttled so that it cannot outpace the replica’s ability to fetch updates from the master. Have you seen the bumper sticker that says “don’t drive faster than your Guardian Angel can fly?” That is the effect of this throttling.

Semi-synchronous replication also does not guarantee that your replicas will not become delayed. The client connection is forced to wait until at least one of the replicas has retrieved the transaction, but not until the transaction has actually been applied to the replica. As you probably know, it is perfectly possible to send a very long transaction to the replica in a matter of milliseconds. The replica will take a long time to apply this transaction to its own data, and during that time, it will be delayed relative to the master. However, other transactions can continue committing and sending their changes to the replica, because the process of retrieving changes from the master and applying them run in separate threads on the replica.

Finally, semi-synchronous replication does not provide strong guarantees against data loss. What do I mean by a strong guarantee against data loss? I consider the safety of my data strongly guaranteed when at least one other server must have a copy of the data before it can be committed on the master. However, that is not what happens in semi-synchronous replication. And if there is an error in semi-synchronous replication, such as a crash at the wrong moment, or a timeout, then even the throttling is abandoned, and everything defaults back to the traditional mode of replication.

What does semi-synchronous replication guarantee me then? If there are no errors or timeouts, then the guarantee is essentially that only one transaction per client is likely to be lost if the master crashes.

I do not mean to sound negative, or to send the message that semi-synchronous replication is not useful. It is useful, but if you misunderstand it, you could be relying on a strong guarantee that is not actually provided.

If you want to learn more about this, then I encourage you to read the relevant section of the MySQL manual. But read carefully, for example, the following sentences:

When a commit returns successfully, it is known that the data exists in at least two places (on the master and at least one slave). If the master commits but a crash occurs while the master is waiting for acknowledgment from a slave, it is possible that the transaction may not have reached any slave.

Finally, I would be interested to hear how many people are actually running semi-synchronous replication in production. I have a feeling that very few people are, even though a lot of people seem to have heard about it. What are your experiences?

Related

Author

Share this post

Comments (24)

Baron – This “feature” has been on my watch list for a while. In addition to the points you made, one should note that master has a timer when waiting on the replica to write the transaction to its relay logs.

“If semisynchronous replication is enabled on the master side and there is at least one semisynchronous slave, a thread that performs a transaction commit on the master blocks after the commit is done and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction, *or until a timeout occurs*.”

The default timeout is 10 seconds…which seems REALLY long for a client to wait, I think a second is plenty unless your replica is on the other side of the planet.

Wow, you really dig into stuff when someone asks a question. I would just have said that Galera is so much easier to setup than semi-sync. Galera is easier than MySQL replication, which is quite an achievement, and semi-sync has at least 2x the manual steps of classic MySQL replication.

Dave: otoh, if you want to use semi-sync for HA, you really should configure it with a really high time-out so that it cannot lose your transactions. I don’t know…

With Enhanced Semi Sync Replication the data is committed locally only after slave acknowledges it. This makes more sense as committed data can’t be read from master by other connections when it is not yet propagated. However it can cause another problem of slave having transaction committed when master have never committed it, if there is a problem on commit.

I think it is not so bad to have an asynchronous replication. And the semi synchronous replication is not so bad. If you want to have synchronous replication you could use distributed transactions instead. And there we have similar problems …

MySQL Replication is great and this is what made MySQL so successful in Web world. It is relatively simple low overhead on master and works with great distances and large networks. The problems come when people start to look to build truly highly available solutions with replication which also require consistency and no data loss.

Semi Synchronous replication in fact can be enough for many applications but it just needs to be well understood it does not guaranty no data loss by design.

Semi-sync has a few features that made it useful to one deployment (one large & busy deployment):
* it makes it much more likely that fewer transactions will be lost when a master disappears
* it throttles busy clients to run no faster than master-slave networking

These are not guarantees but they still made things better when the inevitable failures occurred as fewer transactions would be lost. I will distinguish between protect data (old transactions) and protect recent transactions. We already have great protection for old transactions (backup, etc). Semi-sync provides better protection for recent transactions.

I like the idea of Enhanced Semi-Sync but suspect that the implementation will make commit stalls even worse as it probably increases the duration of the time for which prepare_commit_mutex is held. This needs to be integrated with the changes for group commit so that we can do:
1) get many transactions into state PREPARED on the master
2) write the PREPARED transactions into the binlog
3) wait for one slave to ACK all of them
4) commit on the master

The race in this case is that if there is a problem on the master the slave has the transactions. I would promote the slave to the master in that case. There is also a race if the master blows up and then the slave is unable to apply the transactions. I think that is unlikely and am willing to perform manual intervention to recover in that case.

Now I just need to figure out if Kristian and MariaDB think this is interesting.

> However it can cause another problem of slave having transaction committed
> when master have never committed it, if there is a problem on commit.

> The race in this case is that if there is a problem on the master the slave
> has the transactions.

This should not be possible. The transaction is prepared and written to binlog
before being sent to the slave. The prepare ensures that the commit will not
fail, and the prepare + write to binlog ensures that XA recovery will commit
the transaction later if master crashes before it can commit. So slave will
never get commits that don’t exist on the master.

The question should be about whether the enhanced semi-sync patch breaks it. The google patch did not. Slaves don’t get to ack until they read it from the master’s binlog and the binlog is not written until commit is done (PREPARE and COMMIT for InnoDB) on the master. I assume enhanced semi-sync follows the same sequence but I have not looked at that code.

I believe with Asynchronous replication (ewen with Semi synchronous) you will get the problem of nodes getting out of sync and either slave is ahead or master has data while slave does not. This goes back to Two General’s problem I believehttp://en.wikipedia.org/wiki/Two_Generals%27_Problem

The solution for this is to maintain some cluster “state” where the node has to do a suicide if it encourages the problem while applying the changes the cluster has already agreed to.

This is a theory though which sets on 100% guaranties of data consistency. While it is possible in theory due to software bugs, people mistakes, complicated faults everything can fail and data loss can occur and we just should be working with probabilities and chances. For some case MySQL replication (even without Semi Sync) is just fine.

I agree, it is all about probability. Semi-sync can lose data in theory and in practice. Sync replication only loses it in practice. (from bugs and people making mistakes). It is more interesting to talk about this in terms of the chance of losing transactions but that data takes time to collect.

There is another way to lose transactions — performance degradation. Some implementations of sync replication will also lose transactions by not allowing them to commit because the extra latency cannot be tolerated by the application or because optimistic concurrency control leads to too many aborts. This is much more likely for workloads with hotspots. I hope that most of these problems can be resolved by redesigning applications.

I think you make a very good point: a benefit of Percona XtraDB Cluster, or for that matter generic Galera + generic InnoDB, is said to be an existing technology that you know and love; no need to learn a lot of new skills. But the fact that it uses optimistic concurrency control in a distributed fashion is a change to be aware of. We need real-world experience to see how this works out.

At the same time I always like to remind people, that actually using Galera in a multi-master mode is an option, or bonus if you will. If you are worried (or have actual issues) about the optimistic locking, you can always just fall back to writing to a single node. That will give you exactly the same behavior as a single node MySQL, including same performance, and more robust and simple HA than any of the other alternatives that currently exist. To use multi-master capabilities for scale-out is just an option on top of that.

Currently analyzing different existing solutions to secure our web architecture using 3 mysql production servers and read your article which I found very interesting.
Like you said, I would really be interested to know people who put semi sync replication in production on large mysql databases and so don’t hesitate to contact us to share your throughts and experience 🙂
For the moment after evaluating semi sync replication (lots of problems with combining it to master/master circular replication and haproxy load balancing queries in front of mysql servers) and others open source solutions (tungsten / galera), i’m still wondering if there is any which is reliable and allows to have data integrity, high availability and easy failover with MySQL ?

Thousands of people have many years of experience with MySQL replication. A few people have months of experience with Tungsten and Galera (a.k.a. Percona XtraDB Cluster), and I am sure we will learn a lot more after they mature more, but in principle they have fewer inherent obstacles to being reliable and robust.

This is not correct. With semi-sync the protocol is:
1) commit to master
2) wait for a slave to ack it
3) return to the user

“Basically, what semi-synchronous replication does is ensuring that a transaction/event has been written to at least one slave’s relay log and flushed to disk before doing the commit on the master node.”

“If semisynchronous replication is enabled on the master side and there is at least one semisynchronous slave, a thread that performs a transaction commit on the master blocks after the commit is done and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction, or until a timeout occurs. ”
..
“The slave acknowledges receipt of a transaction’s events only after the events have been written to its relay log and flushed to disk. ”

Hi Baron. I’ve been looking an asynchronous replication solution for MySQL and this looks like what I need, but I would like your help clarifying a couple of doubts I have:

1) According to with I read, the Semisynchronous replication guarantee that at least 1 slave will have the transaction commited, is this mean that the semisynchronous replication becomes synchronous for at least 2 servers? (the master and 1 slave)?