How To Lose Data

1 September, 2010 7:22 am

As I mentioned in my last post, I’ve been getting increasingly annoyed at a lot of the flak that has been directed toward MongoDB over data-protection issues. I’m certainly no big fan of systems that treat memory as primary storage (with or without periodic flushes to disk) instead of a cache or buffer for the real thing. I’ve written enough here to back that up, but I’ve also written plenty about something that bugs me even more: FUD. Merely raising an issue isn’t FUD, but the volume and tone and repetition of the criticism are all totally out of proportion when there are so many other data-protection issues we should also worry about. Here are just a few ways to lose data.

Don’t provide full redundancy at all levels of your system. It’s amazing how many “distributed” systems out there aren’t really distributed at all, leaving users entirely vulnerable to loss or extended unreachability of a single node, without one peep of protest from the people who are so quick to point the finger at systems which can at least survive that most-common failure mode.

Be careless about non-battery-backed disk caches. If data gets stranded in a disk cache when the power goes out, it’s no different than if it was stranded in memory, and yet many projects do absolutely nothing to detect let alone correct for obvious problems in this area.

Be careless about data ordering in the kernel. My colleagues who work on local filesystems and pieces of the block-device subsystem in Linux (and others working on other OSes) have done a great deal of too-little-appreciated work to provide the very highest levels of data safety that they can without sacrificing any more performance than necessary. Then folks who preach the virtues of append-only files without knowing anything at all about how they work turn around and subvert all that effort by giving mount-command and fstab-line examples that explicitly put filesystems into async mode, turn off barriers, etc.

A special case of the previous point is when people actually do seem to know the options that assure data protection, but forego those options for the sake of getting better benchmark numbers. That’s simply dishonest. You can’t claim great performance and great data protection if users can only really get one or the other depending on which options they choose. Pick one, and shut up about the other.

Be careless about your own data ordering. A single I/O operation can require several block-level updates. Many overlapping operations can create a huge bucket of such updates, conflicting in complex ways and requiring very careful attention to the order in which the updates actually occur. If you screw it up just once, and it takes a special brand of arrogance to believe that could never happen to you, then you corrupt data. If you corrupt metadata, you might well lose the user data it points to. If you corrupt user data that can be even worse than losing it, because there are security implications as well. It’s not nice when some of your confidential data becomes part of somebody else’s file/document/whatever. At least with mmap-based approaches, it’s fairly straightforward to do things with msync and fork and hypervisor/filesystem/LVM snapshots to at least guarantee that the state on disk remains consistent even if it’s not absolutely current.

Don’t provide any reasonable way to take a backup, which would protect against the nightmare scenario where data is lost not because of a hardware failure but because of a bug or user error that makes your internal redundancy irrelevant.

Of course, some of these issues won’t apply to Your Favorite Data Store, e.g. if it doesn’t have a hierarchical data model or a concept of multiple users. Then again, the list is also incomplete because the real point I’m making is that there are plenty of data-protection pitfalls and plenty of people falling into them. Some of the loudest complainers already had to suspend their FUD campaign to deal with their own data-corruption fiasco. Others are vulnerable to having the same thing happen – I can tell by looking at their designs or code – but those particular chickens haven’t come home to roost yet.

Look, I laughed at the “redundant coffee mug” joke too. It was funny at the time, but that was a while ago. Since then it’s been looking more and more like junior-high-school cliquishness, poking fun at a common target as a way to fit in with the herd. It’s not helping users, it’s not advancing the state of the art, and it’s actively harming the community. As one of the worst offenders once had the gall to tell me, be part of the solution. Find and fix new data-protection issues in whichever projects have them, instead of going on and on about the one everybody already recognizes.

It’s nice to get a detailed response from a real storage computer scientist…rather than so-claimed ‘storage experts’.

I would add if you are really nutty about losing storage data, you might also be worried about hardware errors that introduce silent data corruption in your packets. For instance ECC memory protects against corrupted memory, network byte errors (TCP/IP 16-bit checksum can result in escapes..), etc. A filesystem like ZFS that computes checksums on blocks ensures that packet in memory is same as the one written to disk…

Good point, Kashif. Having filesystem checksums and/or T10 DIF is also a good way to ensure integrity. With increasing storage densities, *not* having them should therefore be on the list of ways to lose.