MongoDB Approach to database synchronization

I went to MongoSF today – quite an event, and I hope to have a chance to write more about it. This post is about one replication problem and how MongoDB solves it.

If you’re using MySQL Replication when your master goes down it is possible for some writes to be executed on the master, but not on the slave, which gets promoted to the master. When Master comes back up it has some updates done to it which cause it to be inconsistent with data on the new Master. In MySQL world we can chose to either ignore this problem (or may be even replay those changes on slaves and hope it works out), re-clone it from the slave or use mk-table-checksum to find inconsistencies and re-sync them with mk-table-sync. Both of these operations can be very expensive for large databases.

MongoDB approach used in Replication Sets is for failed master to scan its log files to find all object ids which were modified from the point slave synchronized successfully and retrieve those objects back from the new master (or delete them if they no more exist). Such approach allows quick synchronization without any complex support of rolling back changes. In MongoDB there is a catch with this approach – because there is no local durability this also works as long as network goes down but server stays up, however once Single Server Durability is implemented it will be pretty cool.

What is really interesting – it should be possible to apply the same concept to MySQL Replication, possibly with help of some tools like MMM. Row level Replication makes it possible to identify the objects which were changed on the Master after failover to Slave happened and they can be dumped to local file (in case one wants to synchronize them manually) and when fetched again from the master. This of course will require IDEMPOTENT slave mode but otherwise it should work unless you have DDL operations in between.

In general listening the great presentation on MongoDB Replication by Dwight Merriman as well as previously looking at how replication done in Redis I should say things can be done a lot more simple way when there is no schema and when you do not have to mess with complex features like triggers or multiple storage engines.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

When a MongoDB master goes it can lose data from any page in the database for which changes were in progress. Many deployments won’t be able to use it after recovery without restoring it from a backup or another master. There are many reasons for it to go down including the MongoDB software crashed, was killed (by a person, OOM killer or buggy script) or because of an unplanned hardware reboot.

Hopefully, single-server durability will fix this. The current approach only allows for quick synchronization when you don’t care about losing that data. That is an odd definition of ‘synchronization’.

I get that a MySQL master can fail before propagating some changes to slaves and that requires manual conflict resolution if the master is to be put back into service. But the following only happens when you configure it to happen: >>> When Master comes back up it has some updates done to it which cause it to be inconsistent with data on the new Master. >>>

Yes. I think single server durability is a serious problem for a lot of deployments. Though this is very close to MyISAM and there have been a lot of (and still are) applications which can work well in such cases. In many cases these people are just lucky – I’ve seen number of people which thought MyISAM is OK for them because they do not need to repair table but just switch to the slave until one day they lose power in their data center and has to spend long hours repairing tables.

Regarding loosing your data – I think this is design choice for MongoDB which I is good enough for many web applications. MyISAM or Innodb with innodb_flush_log_at_trx_commit=2 offers somewhat similar behavior – you can lose data you thought was stored in database for certain period of time.

In practice I think single server durability will be important step towards making MongoDB usable for number of application, no question here. I just found this synchronization idea brilliant.

I am at SFO waiting for a delayed flight. Thanks for keeping me active.

I have no doubt that MongoDB and MyISAM are useful. But I wonder whether enough users understand the crash behavior. And I look on with envy at the advanced replication features in MongoDB and other members of the N(ot)O(nly)SQL family.

MongoDB/MyISAM behavior on a crash is very different from innodb_flush_log_at_trx_commit=2. With that setting for InnoDB you can lose the transactions from the last second. The behavior for MongoDB/MyISAM allows you to lose pieces from any transaction ever committed (last week or last year). I think the window for MongoDB is worse because the database is accessed via mmap and there can be a crash in the middle of writing a page leading to a corrupt page. For MyISAM the crash occurs between page transfers so individual pages won’t be corrupted but the data structures (sets of pages) can be corrupted.

Yes… You’re right. Just this week I’ve been to the customer having several servers with 1.000.000 (!) of MyISAM tables and even no replication.

When I’m comparing MongoDB to innodb_flush_log_at_trx_commit=2 I’m comparing replicated setup – MongoDB guys do not recommend running single node in production. In this case if your master crashes you lose transactions for few last seconds (or less) same as with Innodb. You could argue of course power down will cause both database to be corrupted, on which MongoDB guys can argue RAID controller melt down will cause database corruption on single MySQL Innodb issue.

So the point is – you need replication to get durability similar to innodb_flush_log_at_trx_commit=2 and you will still get your database corrupted in case of power down