Introducing Doozer

There is no shortage of tools that provide one of these properties or
the other. For example, mysql, postgres, and redis provide
consistency but not high availability; riak, couchdb, and redis
cluster (when it’s ready) provide high availability but not
consistency. The only freely-available tool that attempts to provide
both of these is zookeeper, but zookeeper is specialized, focused on
locking and server management. We need a general-purpose tool that
provides the same guarantees in a clean, well-designed package.

Background

Soon after I started working at Heroku, Blake Mizerany got me involved
in designing a “distributed init” system, something that could manage
processes across multiple machine instances and recover gracefully
from single instance failures and network partitions. One necessary
building block for this is a network service that can provide
synchronization for its clients – in other words, locking.

When we started building this distributed process monitor, we quickly
realized that the lock service should be packaged as a separate tool.
In true Unix style, it strives to do one thing well. In fact, doozer
doesn’t actually provide locks directly. Its “one thing” is
consistent, highly-available storage; locks are a separate thing, that
clients can implement on top of doozer’s primitives. This
makes doozer both simpler and more powerful, as we’ll see in a moment.

Locks Are Not Primitive Enough

Other similar systems (for example Chubby, Zookeeper2)
provide locks as a primitive
operation, alongside data storage. But that requires the lock service
to decide when the holder of a lock has died, in order to release the
lock. In practice this means that every client that wants to obtain a
lock must establish a session and send periodic heartbeat messages to
the server. This works well enough for some clients, but it’s not
always the most appropriate way to determine liveness. It’s
particularly troublesome when dealing with off-the-shelf software that
wasn’t designed to send heartbeat messages.

Doozer takes a different approach. It provides only data storage and
a single, fundamental synchronization primitive, compare-and-set.
This operation is complete (you can build any other synchronization
operation using it), but it’s simpler than a
higher-level lock, and it doesn’t require the server to have any
notion of liveness of its clients.

In the future, doozer might ship with companion tools that provide
higher-level synchronization for users that want it, but these tools
will operate as regular doozer clients, and they’ll be completely
optional. If you don’t need their services, just don’t run them; if
you need something slightly different from what they provide, you are
free to make your own.

What is it good for?

How does doozer compare to other data stores out there? Redis is
amazingly fast, with lots of interesting data structures to play with.
HBase is amazingly scalable. Doozer isn’t particularly fast or
scalable; its claim to fame is high availability.

Doozer is where you put the family jewels.

The doozer readme has a few concrete examples, but consider
this one: imagine you have three redis servers – one master and two
slaves. If the master goes down, wouldn’t it be nice to promote one of
the slaves to be the new master? Imagine trying to automate that
promotion. You’ll have to get all the clients to agree which one to use.
It doesn’t sound easy, and it’s even harder than it sounds. If there is
a network partition, some of the clients might disagree about which
slave to promote, and you’d wind up with the classic “split-brain”
problem. If you want this promotion to work reliably, you have to use
a tool like doozer, which guarantees consistency, to coordinate the
fail-over.

Using Doozer

We have client drivers for Go (doozer) and Ruby
(fraggle), with an Erlang driver in the works. The doozer
protocol documentation gives the nitty-gritty of talking to
doozer, but most of you will be interested in the interface provided
by the driver you’re using; they each have documentation.

The words “consistency” and “high availability” have several
reasonable definitions, and people too often use them without
saying exactly what they mean.
When I speak of consistency, I mean absolute prevention of
inconsistent writes. The phrase “eventual consistency” is a
synonym for “inconsistency”.
When I speak of high availability, I mean primarily the
ability to face a network partition and continue providing
write service to all clients transitively connected to a
majority of servers. Secondarily, I mean the ability to
continue providing read-only service to all clients connected
to any server.

Update: This is not the same as being “available” in the sense of
the frequenetly-cited and almost-as-frequently-misunderstood CAP
theorem. Coda Hale has a good discussion of the CAP theorem’s
applicability to real-world systems. In those terms, we choose
consistency over availability.