While the content of this book is still valid, the code may not run with
latest versions of the tools and libraries, for an updated version of the
code check the Riak Core Tutorial

You know computers cannot be trusted, so we may want to run our commands in
more than one vnode and wait for a subset (or all of them) to finish before
considering the operation to be successful, for this when a command is ran we
will send the command to a number of vnodes, let’s call it W and wait for a
number of them to succeed, let’s call it N.

To do this we will need to do something similar than what we did with coverage
calls, we will need to setup a process that will send the command to a number
of vnodes and accumulate the responses or timeout if it takes to long, then
send the result back to the caller. We will also need a supervisor for it and
to register this supervisor in our main supervisor tree.

To implement quorum based writes and deletes we will introduce two new modules,
a gen_fsm implementation called tanodb_write_fsm
and its supervisor, tanodb_write_fsm_sup. The supervisor is a simple supervisor behavior so
I won’t go into details here other than observing that we add it to the
supervisor hierarchy as we did with the coverage supervisor, the gen_fsm is the one that is
interesting.

With quorum based writes we are half there, our values are written to more than
one vnode but if a node dies and another takes his work or if we add a new node
and the vnodes must be rebalanced we need to handle handoff.

The reasons to start a handoff are:

A ring update event for a ring that all other nodes have already seen.

A secondary vnode is idle for a period of time and the primary, original
owner of the partition is up again.

When this happen riak_core will inform the vnode that handoff is starting,
calling handoff_starting, if it returns false it’s cancelled, if it returns
true it calls is_empty, that must return false to inform that the vnode has
something to handoff (it’s not empty) or true to inform that the vnode is
empty, in our case we ask for the first element of the ets table and if it’s
the special value ‘$end_of_table’ we know it’s empty, if it returns true the
handoff is considered finished, if false then a call is done to
handle_handoff_command
passing as first parameter an opaque structure that contains two fields we are
insterested in, foldfun and acc0, they can be unpacked with a macro like this:

For each call to Fun(Key, Entry, AccIn0) riak_core will send it to the new
vnode, to do that it must encode the data before sending, it does this by
calling encode_handoff_item(Key, Value), where you must encode the data before sending it.

When the value is received by the new vnode it must decode it and do something
with it, this is done by the function handle_handoff_data, where we decode the received data and do the appropriate thing with it.