Riak KV Key/Value Modeling

While Riak enables you to take advantage of a wide variety of features
that can be useful in application development, such as Search, secondary indexes (2i), and Riak Data Types, Riak almost always performs best when you
build your application around basic CRUD operations (create, read,
update, and delete) on objects, i.e. when you use Riak as a “pure”
key/value store.

In this tutorial, we’ll suggest some strategies for naming and modeling
for key/value object interactions with Riak. If you’d like to use some
of Riak’s other features, we recommend checking out the documentation
for each of them or consulting our guide to building applications with Riak for a better sense of which features you might need.

Advantages of Key/Value Operations

Riak’s key/value architecture enables it to be more performant than
relational databases in many scenarios because Riak doesn’t need to
perform lock, join, union, or other operations when working with
objects. Instead, it interacts with objects on a one-by-one basis, using
primary key lookups.

Primary key lookups store and fetch objects in Riak on the basis of
three basic locators:

It may be useful to think of this system as analogous to a nested
key/value hash as you
would find in most programming languages. Below is an example from
Ruby. The hash
simpsons contains keys for all of the available seasons, while each
key houses a hash for each episode of that season:

If we want to find out the title of an episode, we can retrieve it based
on hash keys:

simpsons['season 4']['episode 12']
# => "Marge vs. the Monorail"

Storing data in Riak is a lot like this. Let’s say that we want to store
JSON objects with a variety of information about every episode of the
Simpsons. We could store each season in its own bucket and each episode
in its own key within that bucket. Here’s what the URL structure would
look like (for the HTTP API):

GET/PUT/DELETE /bucket/<season>/keys/<episode number>

The most important benefit of sorting Riak objects this way is that
these types of lookup operations are extremely fast. Riak doesn’t need
to search through columns or tables to find an object. If it knows the
bucket/key “address” of the object, so to speak, it can locate that
object just about as quickly with billions of objects in a cluster as
when the cluster holds only a handful of objects.

Overcoming the Limitations of Key/Value Operations

Using any key/value store can be tricky at first, especially if you’re
used to relational databases. The central difficulty is that your
application cannot run arbitrary selection queries like SELECT * FROM
table, and so it needs to know where to look for objects in advance.

One of the best ways to enable applications to discover objects in Riak
more easily is to provide structured bucket and key names for
objects. This approach often involves wrapping information about the
object in the object’s location data itself.

We could use these markers by themselves or in combination with other
markers. For example, sensor data keys could be prefaced by sensor_ or
temp_sensor1_ followed by a timestamp (e.g.
sensor1_2013-11-05T08:15:30-05:00), or user data keys could be
prefaced with user_ followed by a UUID (e.g.
user_9b1899b5-eb8c-47e4-83c9-2c62f0300596).

Any of the above suggestions could apply to bucket names as well as key
names. If you were building Twitter using Riak, for example, you could
store tweets from each user in a different bucket and then construct key
names using a combination of the prefix tweet_ and then a timestamp.
In that case, all the tweets from the user BashoWhisperer123 could be
housed in a bucket named BashoWhisperer123, and keys for tweets would
look like tweet_<timestamp>.

The possibilities are essentially endless and, as always, defined by the
use case at hand.

Object Discovery with Riak Sets

Let’s say that we’ve created a solid bucket/key naming scheme for a user
information store that enables your application to easily fetch user
records, which are all stored in the bucket users with each user’s
username acting as the key. The problem at this point is this: how can
Riak know which user records actually exist?

One way to determine this is to list all keys in the
bucket users. This approach, however, is not recommended, because
listing all keys in a bucket is a very expensive operation that should
not be used in production. And so another strategy must be employed.

A better possibility is to use Riak sets to
store lists of keys in a bucket. Riak sets are a Riak Data Type that enable you to store lists of binaries or strings in Riak.
Unlike normal Riak objects, you can interact with Riak sets much like
you interact with sets in most programming languages, i.e. you can add
and remove elements at will.

Going back to our user data example, instead of simply storing user
records in our users bucket, we could set up our application to store
each key in a set when a new record is created. We’ll store this set in
the bucket user_info_sets (we’ll keep it simple) and in the key
usernames. The following will also assume that we’ve set up a bucket type called
sets.

require 'riak'
set_bucket = client.bucket('user_info_sets')
# We'll make this set global because we'll use it
# inside of a function later on
$user_id_set = Riak::Crdt::Set.new(set_bucket, 'usernames', 'sets')

Now, let’s say that we want to be able to pull up all user records in
the bucket at once. We could do so by iterating through the usernames
stored in our set and then fetching the object corresponding to each
username:

# We'll create a generator object that will yield a list of Riak objects
def fetch_all_user_records():
users_bucket = client.bucket('users')
user_id_list = list(user_id_set.reload().value)
for user_id in user_id_list:
yield users_bucket.get(user_id)
# We can retrieve that list of Riak objects later on
list(fetch_all_user_records())

Naming and Object Verification

Another advantage of structured naming is that you can prevent queries
for objects that don’t exist or that don’t conform to how your
application has named them. For example, you could store all user data
in the bucket users with keys beginning with the fragment user_
followed by a username, e.g. user_coderoshi or user_macintux. If an
object with an inappropriate key is stored in that bucket, it won’t even
be seen by your application because it will only ever query keys that
begin with user_:

Bucket Types as Additional Namespaces

Riak bucket types have two essential functions:
they enable you to manage bucket configurations in an
efficient and streamlined way and, more importantly for our purposes
here, they act as a third namespace in Riak in addition to buckets and
keys. Thus, in Riak versions 2.0 and later you have access to a third
layer of information for locating objects if you wish.

While bucket types are typically used to assign different bucket
properties to groups of buckets, you can also create named bucket types
that simply extend Riak’s defaults or multiple bucket types that have
the same configuration but have different names.

Here’s an example of creating four bucket types that only extend Riak’s
defaults:

Bucket Types Example

To extend our Simpsons example from above, imagine that we become
dissatisfied with our storage scheme because we want to separate the
seasons into good seasons and bad seasons (we’ll leave it up to you to
make that determination).

One way to improve our scheme might be to change our bucket naming
system and preface each bucket name with good or bad, but a more
elegant way would be to use bucket types instead. So instead of this URL
structure…