Tester's Digest

TESTER’S DIGEST

Today’s theme is failure injection testing, prominently featuring Netflix who gave the world ChaosMonkey. In the off-topic section you will find a few fun email bugs.

Topic: Failure Injection

Why is fault injection testing important? Because “sooner or later, all complex systems will fail. It’s not a matter of if, it’s a matter of when.” Breaking things on purpose, at a time and in a way that is convenient, is much preferable to having them break as a surprise to you.

From Netflix, the well known “Chaos Monkey” and the rest of the Simian Army,
n use since 2011 to randomly break your production system and see if it is in fact
fault tolerant. This is not for the faint of heart:

The academic underpinning for Netflix’s failure testing was this paper on “Lineage-driven fault injection”. It is a technique for reasoning backwards from correct system outcomes to determine whether failures could have prevented that outcome (if so, those are bugs). The paper describes a prototype called MOLLY.

Using errfs, a file system layer that simulates block corruption, read/write errors and out of space conditions, researchers find that distributed storage systems (incl. Redis, ZooKeeper, Cassandra, Kafka, RethinkDB, MongoDB, LogCabin, and CockroachDB) will silently corrupt data, lose data, or return unexpected errors, despite the fault being injected in a single node while the system is configured for redundancy. While this is bad news, having a new testing tool in addition to Jepsen.io is great!

If you received this email directly then you’re already signed up, thanks! Else
if this newsletter issue was forwarded to you and you’d like to get one weekly,
then you can subscribe at http://testersdigest.mehras.net