Wednesday, January 3, 2018

An update on Write Set (parallel replication) bug fix in MySQL 8.0

In my MySQL Parallel Replication session at Percona Live Santa Clara 2017, I talked about a bug in Write Set tracking for parallel replication (Bug#86078). At the time, I did not fully understand what was going wrong but since then, we (Engineers at Oracle and me) understood what happened and the bug is supposed to be fixed in MySQL 8.0.4. This journey thought me interesting MySQL behavior and bug reporting practices. In this post, I am sharing both in addition to some insight on Write Set tracking for parallel replication.Update: I wrote another post about Write Set: Write Set in MySQL 5.7: Group Replication.

Let's first start with a little note: even though this post is talking about a bug, my goal is not to do product bashing but to use this bug for describing the inner workings of MySQL. Write Set is a new technology and bugs/corner cases are often found in new stuff. My opinion is that Write Set is an awesome improvement for Parallel Replication ! With this out of the way, let's dive in Write Set.

Write Set tracking for parallel replication

The Write Set is a new technology in MySQL 8.0 (also in 5.7 but a little hidden: this will be the subject of another post) for improving parallel replication. It is a data structure that lives in the MySQL Server that allows widening the parallelism intervals (it also allows other things, but let's focus on Parallel Replication in this post). You can read more about parallelism intervals in my previous post: A Metric for Tuning Parallel Replication in MySQL 5.7.

MySQL 5.7 use parallelism on the master to generate parallelism intervals. Without good parallelism on the master, intervals will be short and the slaves will not run much faster. The Write Set allows to track actual transaction dependencies, which will produce much wider intervals. Explaining in details how Write Set works would need a complete post (more information can be found on the MySQL High Availability blog in the post Improving the Parallel Applier with Writeset-based Dependency Tracking by Vitor Oliveira) but the example below allows to understand the important parts:

As the intervals of the three INSERTs overlap, those three transactions can be run in parallel on slaves. Note that even though the transactions are executed in the same session on the master (so not in parallel), the Write Set is able to detect that they are non-conflicting. This is a great improvement for replication speed in MySQL 8.0.

Note: I am writing this section only with the information I had at the time, the full analysis is in the next two sections.

I discovered this bug in the case of a DELETE followed by an INSERT where the INSERT is creating a row (new auto-increment as the primary key) but with the same UNIQUE KEY as the previously deleted row. This is a weird usage of MySQL as normally, this DELETE and INSERT would be done in a single transaction. Using two transactions means that a SELECT using the UNIQUE KEY could return zero row (between the DELETE and INSERT). This might be a bug in the usage of MySQL but this is out of scope of this post.

The problem with this DELETE followed by an INSERT in two different transactions is that the Write Set was not correctly detecting the dependency between the transactions. The INSERT is obviously depending on the DELETE but I got the following in the binary logs (comments at the end added for clarity):

Transaction with sequence_number 113433 (the INSERT) should have last_committed to 113432 (the DELETE) but its last_committed is something else (113398). This allows both transactions to be executed in parallel on slaves where the INSERT might fail if it runs just before the DELETE. If this happens, replication on the slave breaks with a duplicate key error.

I got this replication breakage many times during my tests. When this happened, restarting replication would solve the problem as the DELETE would now be completed (just before replication breaks) and the INSERT would then succeed. This is still a bug in Write Set tracking for parallel replication and it needed fixing. However, without more information, Oracle was not able to reproduce this bug, which made things hard to fix.

MySQL behavior

In above, I described a situation leading to a duplicate key error on the slave. I was (naively) assuming that a duplicate key error could only happen if the data was strictly the same. That was not taking into account case-insensitive collation.

So 'd' and 'D' are causing a duplicate entry error as the field str is in a UNIQUE KEY. So even if 'd' and 'D' are not exactly the same value, their comparison returns equal because a case-insensitive collation is used. So even if data is not strictly the same ('d' and 'D'), a duplicate key can happen.

MySQL collation behavior combined with Write Set

Knowing the collation behavior of MySQL, let's look again at our Write Set bug:

So the second INSERT is not identified as depending on the DELETE. This is the bug we were looking for. To understand why this happens, we need to dive in the Write Set implementation.

The Write Set is tracking rows modified by each transaction. With this information, the parallelism intervals can be computed. Keeping the full rows in the Write Set would consume a lot of RAM and would make dependency tracking computationally expensive. To make the dependency tracking compact and efficient, MySQL only stores hashes of the PRIMARY and UNIQUE KEYs of the rows modified by a transaction. With storing the hashes of the previous transactions in the Write Set, it becomes easy to compute dependencies. Hashing collision can happen but in this case, a non-existent dependency is found, which is not a big problem.

However, hashing and case-insensitive collations do not play well together. If you hash 'e' and 'E', you will end up with different values. So this is the source of Bug#86078: case-insensitive collation (and accent-insensitive) were lost when hashing was done.

Bug reporting practices

While working on this bug, I did not see that the field causing a UNIQUE KEY violation had a case difference. I probably was too focused on the duplicate key error assuming the data would be the same.

When I reported the bug and because it happened on a production system with production data, I did not share the exact binary logs with Oracle. I shared a manually anonymized output of mysqlbinlog where I lost the case difference.

To give an example, let's take my last name: Gagné. If the bug above was happening with "Gagné" and "gagné", sharing the exact information would have allowed to reproduce the problem. However, if we manually anonymize the data to "xyz" and "xyz" thinking it is the same because of the duplicate key violation, we lost the case difference. This is exactly what I did in my bug report, making it impossible for Oracle to reproduce the bug.

So my learning on bug reporting practices are:

Data anonymization is hard,

and sometimes, it might be better to share the exact production data.

But sharing production data is not easy: the other party must be able to process that data with the necessary care and agreements should be in place allowing sharing that data.