Basho Blog

Understanding riak_core: Building Handoff

At Erlang Factory 2015, I presented a talk entitled “How to build applications on top of riak_core.” I wanted to do this talk because there is a serious lack of “one-stop” documentation around riak_core. In particular, implementing handoffs has been underdocumented and not well disseminated. To help, I have a few blog posts to share.

In my first post, Understanding riak_core: Handoff, I explored some background, defined a handoff and answer the question of “should I use riak_core?” In this post, we’ll walk through building an application that uses riak_core as its foundation.

Build your first riak_core application

Building an application on riak_core means leveraging the powerful toolset that makes writing this code easier. You will benefit from using rebar: a self-contained script designed to minimize the amount of build configuration work you have to do. One of the lesser known talents that the rebar build tool possesses is the ability to stamp out new Erlang/OTP applications using a set of template files. Conveniently, Basho has a set of template files available for riak_core applications, which you will find in the rebar_riak_core github repository.

On line 2, the code takes the Erlang “now” counter value and sends it to riak_core’s built in consistent hash function. On line 3, we take the hash value and use it to build a “preference list” or a list of one or more {Node, VnodeType} tuples. On line 4, we use the Node value to send a command to the (possibly) remote VNode to execute the ping method. Since this is a synchronous command, the call will wait until it returns from the (possibly remote) vnode method call or the call times out. So that should give you a pretty decent idea how you would implement other vnode based API calls. (Some of the details that are omitted include writing replicas to different vnodes – if you’re interested in that topic, then I will refer you to the example code in the udon application which covers how to implement that using a finite state machine and the actual code that writes bits to disk in the vnode callback module.)

Writing handoff code

Although there are different scenarios when a handoff might occur for a vnode, each of these different types of handoff uses the same code to implement it. Fortunately, riak_core does a lot of the hard work for you – it manages all of the network connections, keeps track of what keys and values it has transmitted and so forth – so implementation comes down to a few things:

Is a vnode empty? (if a vnode is empty, we don’t need to do any handoffs over the network)

How do we collect the data for each key and value?

How do we serialize the vnode data?

How do we deserialize the vnode data at the receiver?

If you’re already familiar with the OTP gen_server behavior, then implementing handoffs will feel very familiar. Handoffs are implemented by writing a series of function callbacks. Let’s look at the callbacks as defined in the demoapp we created above.

Those are the callbacks which need to be implemented. As you can see, in the demoapp they’re just stubs which return valid (but likely incorrect) values. If you’re using riak_core as a mechanism to distribute work among a set of workers (and don’t need to worry about vnode migration) then these stubs are all you need to have for your application.

The bare minimum
Most of you who are reading this far are interested in writing handoff code to move data from one (physical) node to another. So let’s take a look at the handle_handoff_command/3 callback. Here’s the function head from my udon application.

Whoa. What’s that ?FOLD_REQ macro?
How do I implement magic/2?!
Let’s break this down step by step before we fill in the details.

As noted above, at the big picture level, we need a way to find all of the objects (that is, each key and each value) that a particular vnode owns. So we need a way to get all of them – that’s what the object_list() function is supposed to do. By the way, object_list() is not a callback supplied by riak_core – that’s a function you need to write yourself. (Also, the function name in the fold parameters is not important, only the property that this function returns a list of the keys to fold over.)

Next, we need a way to take each object and serialize it. That’s the purpose of the encode_handoff_item/2 function callback.

After that, we need to send that data over the wire to the (probably remote) node. And that’s what the mysterious VisitFun() in the function head does. More on VisitFun() in a moment, but for now, it’s part of riak_core that handles all of the messy details around network connections, sockets, and pumping serialized data out on the wire.

And on the receiving end, we need a way to deserialize and store the incoming vnode data. That is the purpose of handle_handoff_data/2.

So at a bare minimum, you must write four callbacks to implement vnode handoffs. These are:

is_empty/1

encode_handoff_data/2

handle_handoff_data/2

handle_handoff_command/3

Next Steps

At this point, you should have a good understanding on the basics of handoff, as coordinated by riak_core, and an understanding of the steps involved in implementing handoff. In my final post in this series, we will delve into the mysterious world of VisitFun().