Saturday, August 9, 2014

Bug marketing - fixing MongoDB replication bugs in TokuMX

Asynchronous replication is lossy by nature. It is also useful in production especially for replication across a WAN where the latency for one or two network roundtrips between replicas is too much. Latency is reduced when replicas are nearby but that greatly increases cost from using more replicas. In MySQL land we have work-in-progress that uses enhanced semi-sync replication and binlog archivers to make the binlog 2-safe within a datacenter and greatly reduce the chance that a committed transaction is lost (some people call this lossless). Even better, it can be done without running another local replica so it has a minor impact on the cost of the tier and the network latency overhead from it is very low.

In a perfect world we could always afford the network latency and/or extra hardware cost from synchronous replication. But production would be much less fun in a perfect world. Even for systems that have implemented sync replication I am skeptical that the system behaves as advertised until it has aged in production for a few years

Replication can also be lossy in practice because of bugs. MySQL replication slaves were not crash-safe for too long because the replication state was maintained in a file that wasn't kept consistent with InnoDB on crash recovery. This was first fixed in the Google patch for MySQL via rpl_transaction_enabled. It was fixed many years later in upstream MySQL.

Reputations from bugs and missing features can linger long after the bugs have been fixed and the features have been implemented. MySQL suffered from this because of MyISAM but some of the damage was self inflicted. Compare the claims about integrity for atomic operations & MyISAM in an older version of the manual with the current version.

MongoDB had a MyISAM moment with the initial version of its storage engine that was not crash safe. Fortunately they rallied and added journalling. I think they are having another MyISAM moment with replication. The TokuMX team has an excellent series of posts and a great white paper describing performance and correctness problems that exist in MongoDB and have been fixed in TokuMX. The problems include slow elections and unexpected data loss on master failover.

The post by Aphyr is also worth reading. While there read the posts on other systems. A lot of innovative correctness testing & debugging was done to produce those results.

Will these problems be fixed before the reputation sticks? This is a good reason to consider TokuMX.