Connecting To Riak

There are two supported ways to connect to Riak, the HTTP interface & the
Protocol Buffers interface. Both provide the same API & full access to
Riak.

The HTTP interface is easier to setup & is well suited for development use. It
is the slower of the two interfaces, but if you are only making a handful of
requests, it is more than capable.

The Protocol Buffers (also called protobuf) is more difficult to setup but
is significantly faster (2-3x) and is more suitable for production use. This
interface is better suited to a higher number of requests.

To use the HTTP interface and connecting to a local Riak on the default port,
no arguments are needed:

import riak
client = riak.RiakClient()

The constructor also configuration options such as host, port &
prefix. Please refer to the :doc:`client` documentation for full details.

As mentioned, Riak can also handle binary data, such as images, audio files,
etc. Storing binary data looks almost identical:

import riak
client = riak.RiakClient()
user_photo_bucket = client.bucket('user_photo')
# For example purposes, we'll read a file off the filesystem, but you can get
# the data from anywhere.
the_photo_data = open('/tmp/johndoe_headshot.jpg', 'rb').read()
# We're storing the photo in a different bucket but keyed off the same
# username.
new_user = user_photo_bucket.new_binary('johndoe', data=the_photo_data, content_type='image/jpeg')
new_user.store()

Getting Single Values Out

Storing data is all well and good, but you'll need to get that data out at a
later date.

Riak provides several ways to get data out, though fetching single key/value
pairs is the easiest. Just like storing the data, you can pull the data out
in either the JSON-decoded form or a binary blob. Getting the JSON-decoded
data out looks like:

Fetching Data Via Map/Reduce

When you need to work with larger sets of data, one of the tools at your
disposal is MapReduce. This technique iterates over all of the data, returning
data from the map phase & combining all the different maps in the reduce
phase(s).

To perform a map operation, such as returning all active users, you can do
something like:

import riak
client = riak.RiakClient()
user_bucket = client.bucket('user')
johndoe = user_bucket.get('johndoe')
for status_link in johndoe.get_links():
# Since what we get back are lightweight ``RiakLink`` objects, we need to
# get the associated ``RiakObject`` to access its data.
status = status_link.get()
print status.get_data()['message']

Using Search

Riak Search is a new feature available as of Riak 0.13. It allows you to create
queries that filter on data in the values without writing a MapReduce. It takes
inspiration from Lucene, a popular Java-based search library, and incorporates
a Solr-like interface into Riak. The setup of this is outside the realm of this
tutorial, but usage of this feature looks like:

You can enable and disable search for specific buckets through convenience
methods that install/remove the precommit hook

bucket = client.bucket('search')

if bucket.search_enabled():

bucket.disable_search()

else:

bucket.enable_search()

Search using the Solr Interface

The search as outlined above goes through Riak's MapReduce facilities to find
and fetch objects. Sometimes you either want to go through the Solr-like
interface Riak Search offers, e.g. to index and search documents without storing
them in Riak KV and relying on the pre-commit hook to index.

Using the Solr interface also allows you to specify sort and limit parameters,
which, using the search based on MapReduce, you'd have to do that with reduce
functions.

You can index documents into search indexes as simple Python dicts, which need
to have an attribute named "id":

Using Key Filters

Key filters are a new feature available as of Riak 0.14. They are
a way to pre-process MapReduce inputs from a full bucket query simply
by examining the key — without loading the object first. This is
especially useful if your keys are composed of domain-specific
information that can be analyzed at query-time.

To illustrate this, let’s contrive an example. Let’s say we’re storing
customer invoices with a key constructed from the customer name and
the date, in a bucket called “invoices”. Here are some sample keys:

Test Server

The client includes a Riak test server that can be used to start a Riak instance
on demand for testing purposes in your application. It uses in-memory storage
backends for both Riak KV and Riak Search and is therefore reasonably fast for a
testing setup. The in-memory setups also make it easier to wipe all data in the
instance without having to list and delete all keys manually. The original code
comes from Ripple, as do the file system implementations.

The server needs a local Riak installation, of which it uses only the installed
Erlang libraries and the configuration files to generate and run a temporary
server in a different directory. Make sure you run the most recent stable
version of Riak, and not a development snapshot, where your mileage may vary.

By default, the HTTP port is set to 9000 and the Protocol Buffers interface
listens on port 9001.

To use it, simply point it to your local Riak installation, and the rest is done
automagically:

The server is started as an external process, with communication going through
the Erlang console. That allows it to easily wipe the in-memory backends used by
Riak and Riak Search. You can use the recycle() method to clean up the server:

server.recycle()

To change the default configuration, you can specify additional arguments for
the Erlang VM. Let's raise the maximum number of processes to 1000000, just for
fun:

server = TestServer(vm_args={"+P": "1000000"})

You can also change the default configuration used to generate the app.config
file for the Riak instance. The format of the attributes follows the convention
of the app.config file itself, using a dict with keys for every section in the
configuration file, so "riak_core", "riak_kv", and so on. These in turn are also
dicts, following the same key-value format of the app.config file.

So to change the default HTTP port to 8080, you can do the following:

server = TestServer(riak_core={"web_port": 8080})

The server should shut down properly when you stop the Python process, but if
you only need it for a subset of your tests, just stop the server:

server.stop()

If you plan on repeatedly running the test server, either in multiple test
suites or in subsequent test runs, be sure to call cleanup() before starting or
after stopping it.

Luwak for Large File Storage

If your Riak installation has Luwak support enabled, you can use the client to
interact with it, storing, fetching and deleting files. Note that Luwak is HTTP
only and will always use the settings provided for the HTTP transport. If you
mix Luwak with normal Riak usage through the Protocol Buffers interface, it's
best to use multiple client objects for each separate use case: