README.md

Riak Protocol Buffers Client Introduction

This document assumes that you have already started your Riak cluster.
For instructions on that prerequisite, refer to
Installation and Setup
in the Riak Wiki. You can also view the Riak Erlang Client EDocs here.

Dependencies

To build the riak-erlang-client you will need Erlang OTP R15B01 or later, and Git.

Debian

On a Debian based system (Debian, Ubuntu, ...) you will need to make sure that certain packages are installed:

Storing New Data

Each bit of data in Riak is stored in a "bucket" at a "key" that is unique to that bucket. The bucket is intended as an organizational aid, for example to help segregate data by type, but Riak doesn't care what values it stores, so choose whatever scheme suits you. Buckets, keys and values are all binaries.

If you want to have the server generate you a key (similar to the REST API) pass the atom undefined as the second parameter to new().

The Object refers to a key <<"mine">> in a bucket named <<"groceries">> with the value <<"eggs & bacon">>. Using the client you opened earlier, store the object:

5> riakc_pb_socket:put(Pid, Object).
ok

If the return value of the last command was anything but the atom ok (or {ok, Key} when you instruct the server to generate the key), then the store failed. The return value may give you a clue as to why the store failed, but check the Troubleshooting section below if not.

The object is now stored in Riak. put/2 uses default parameters for storing the object. There is also a put/3 call that takes a proplist of options.

Option

Description

{w, W}

the minimum number of nodes that must respond with success for the write to be considered successful. The default is currently set on the server at 2

{dw, DW}

the minimum number of nodes that must respond with success * *after durably storing* the object for the write to be considered successful. The default is currently set on the server at 0.

the minimum number of nodes that must respond with success for the read to be considered successfu2

If the data was originally stored using the distributed erlang client (riak_client), the server
will automatically term_to_binary/1 the value before sending it, with the content
type set to application/x-erlang-binary (replacing any user-set value). The application is
responsible for calling binary_to_term to access the content and calling term_to_binary
when modifying it.

Modifying Data

Say you had the "grocery list" from the examples above, reminding you to get <<"eggs & bacon">>, and you want to add <<"milk">> to it. The easiest way is:

That is, fetch the object from Riak, modify its value with riakc_obj:update_value/2, then store the modified object back in Riak. You can get your updated object to convince yourself that your list is updated:

Deleting Data

Throwing away data is quick and simple: just use the delete/3 function.

10> riakc_pb_socket:delete(Pid, <<"groceries">>, <<"mine">>).
ok

As with get and put, delete can also take options

Option

Description

{rw, RW}

the number of nodes to wait for responses from

Issuing a delete for an object that does not exist returns just returns ok.

Encoding

The initial release of the erlang protocol buffers client treats all values as binaries. The caller needs to make sure data is serialized and deserialized correctly. The content type stored along with the object may be used to store the encoding. For example

If resolution simply requires one of the existing siblings to be selected, this can be done through the riakc_obj:select_sibling function. This function updates the record with the value and metadata of the selected Nth sibling.

It is also possible to get a list of tuples representing all the siblings through the riakc_obj:get_contents function. This returns a list of tuples in the form {metadata(), value()} which can be used when more complex sibling resolution is required.

Once the correct combination of metadata and value has been determined, the record can be updated with these using the riakc_obj:update_value and riakc_obj:update_metadata functions. If the resulting content type needs to be updated, the riakc_obj:update_content_type can be used.

Listing Keys

Most uses of key-value stores are structured in such a way that requests know which keys they want in a bucket. Sometimes, though, it's necessary to find out what keys are available (when debugging, for example). For that, there is list_keys:

1> riakc_pb_socket:list_keys(Pid, <<"groceries">>).
{ok,[<<"mine">>]}

Note that keylist updates are asynchronous to the object storage primitives, and may not be updated immediately after a put or delete. This function is primarily intended as a debugging aid.

list_keys/2 is just a convenience function around the streaming version of the call stream_list_keys(Pid, Bucket).

Bucket Properties

Bucket properties can be retrieved and modified using get_bucket/2 and set_bucket/3. The bucket properties are represented as a proplist. Only a subset of the properties can be retrieved and set using the protocol buffers interface - currently only n_val and allow_mult.

User Metadata

User metadata are stored in the object metadata dictionary, and can be manipulated by using the get_user_metadata_entry/2, get_user_metadata_entries/1, clear_user_metadata_entries/1, delete_user_metadata_entry/2 and set_user_metadata_entry/2 functions.

These functions act upon the dictionary retuened by the get_metadata/1, get_metadatas/1 and get_update_metadata/1 functions.

Secondary Indexes

Secondary indexes are set through the object metadata dictionary, and can be manipulated by using the get_secondary_index/2, get_secondary_indexes/1, clear_secondary_indexes/1, delete_secondary_index/2, set_secondary_index/2 and add_secondary_index/2 functions. These functions act upon the dictionary retuened by the get_metadata/1, get_metadatas/1 and get_update_metadata/1 functions.

When using these functions, secondary indexes are identified by a tuple, {binary_index, string()} or {integer_index, string()}, where the string is the name of the index. {integer_index, "id"} therefore corresponds to the index "id_int". As secondary indexes may have more than one value, the index values are specified as lists of integers or binaries, depending on index type.

The following example illustrates getting and setting secondary indexes.

In order to query based on secondary indexes, the riakc_pb_socket:get_index/4, riakc_pb_socket:get_index/5, riakc_pb_socket:get_index/6 and riakc_pb_socket:get_index/7 functions can be used. These functions also allows secondary indexes to be specifiued using the tuple described above.

The following example illustrates how to perform exact match as well as range queries based on the record and associated indexes created above.

Links

Links are also stored in the object metadata dictionary, and can be manipulated by using the get_links/2, get_all_links/1, clear_links/1, delete_links/2, set_link/2 and add_link/2 functions. When using these functions, a link is identified by a tag, and may therefore contain multiple record IDs.

These functions act upon the dictionary retuened by the get_metadata/1, get_metadatas/1 and get_update_metadata/1 functions.

MapReduce

MapReduce jobs can be executed using the riakc_pb_socket:mapred function. This takes an input specification as well as a list of mapreduce phase specifications as arguments. It also allows a non-default timeout to be specified if required.

The function riakc_pb_socket:mapred uses riakc_pb_socket:mapred_stream under the hood, and if results need to be processed as they are streamed to the client, this function can be used instead. The implementation of riakc_pb_socket:mapred provides a good example of how to implement this.

It is possible to define a wide range of inputs for a mapreduce job. Some examples are given below:

The query is given as a list of map, reduce and link phases. Map and reduce phases are each expressed as tuples in the following form:

{Type, FunTerm, Arg, Keep}

Type is an atom, either map or reduce. Arg is a static argument (any Erlang term) to pass to each execution of the phase. Keep is either true or false and determines whether results from the phase will be included in the final value of the query. Riak assumes the final phase will return results.

FunTerm is a reference to the function that the phase will execute and takes any of the following forms:

{modfun, Module, Function} where Module and Function are atoms that name an Erlang function in a specific module.

{qfun,Fun} where Fun is a callable fun term (closure or anonymous function).

{jsfun,Name} where Name is a binary that, when evaluated in Javascript, points to a built-in Javascript function.

{jsanon, Source} where Source is a binary that, when evaluated in Javascript is an anonymous function.

{jsanon, {Bucket, Key}} where the object at {Bucket, Key} contains the source for an anonymous Javascript function.

Below are a few examples of different types of mapreduce queries. These assume that the following test data has been created:

Test Data

Create two test records in the <<"mr">> bucket with secondary indexes and a link as follows: