Latency Reduction Checklist

Although latency is unavoidable in distributed systems like Riak, there
are a number of actions that can be undertaken to reduce latency
to the lowest levels possible within a cluster. In this guide, we’ll
list potential sources of high latency and what you can do about it.

Large Objects

Riak always performs best with smaller objects. Large objects, which can
be mistakenly inserted into Riak by your application or caused by
siblings (see below), can often increase latency.

We recommend keeping all objects stored in Riak smaller than 1-2 MB,
preferably below 100 KB. Large objects lead to increased I/O activity
and can put strain on memory resources. In some cases, just a few large
objects can impact latency in a cluster, even for requests that are
unrelated to those objects.

If your use case requires large objects, we recommend checking out
Riak CS, which is intended as a storage system for large objects.

Mitigation

The best way to find out if large objects are impacting latency is to
monitor each node’s object size stats. If you run riak-admin status or make an HTTP GET request
to Riak’s /stats endpoint, you will see the results for the following
metrics related to object size, all of which are calculated only for
GET operations (i.e. reads):

Metric

Explanation

fsm_node_get_objsize_mean

The mean object size encountered by this node in the last minute

fsm_node_get_objsize_median

The median object size encountered by this node in the last minute

fsm_node_get_objsize_95

The 95th-percentile object size encountered by this node in the last minute

fsm_node_get_objsize_99

The 99th-percentile object size encountered by this node in the last minute

fsm_node_get_objsize_100

The 100th-percentile object size encountered by this node in the last minute

The mean and median measurements may not be good indicators,
especially if you’re storing billions of keys. Instead, you should be on
the lookout for trends in the 95, 99, and 100 measures:

Is there an upward trend?

Do the metrics indicate that there are outliers?

Do these trends coincide with increased latency?

If you suspect that large object size is impacting latency, try making
the following changes to each node’s configuration:

If you are using the newer, riak.conf-based configuration system,
the commented-out value for erlang.distribution_buffer_size is 32MB.
Uncomment this setting and re-start your node.

If you are using the older, app.config/vm.args-based configuration
system, try increasing the +zddbl setting in vm.args to 32768 or
higher (measured in kilobytes). This increases the size of the
distributed Erlang buffer from its default of 1024 KB. Re-start your
node when configuration changes have been made.

Large objects can also impact latency even if they’re only present on
some nodes. If increased latency occurs only on N nodes, where N is your
replication factor, also known as n_val, this could indicate that a single large object and its replicas are slowing down all requests on those nodes.

If large objects are suspected, you should also audit the behavior of
siblings in your cluster, as explained in the next section.

Siblings

In Riak, object conflicts are handled by keeping multiple versions of
the object in the cluster either until a client takes action to resolve
the conflict or until active anti-entropy resolves the conflict without client intervention. While sibling production is normal, sibling explosion is a problem that can come about if many siblings of an object are produced. The negative effects are the same as those associated with large objects.

Mitigation

The best way to monitor siblings is through the same riak-admin status interface used to monitor
object size (or via an HTTP GET request to /stats). In the output of
riak-admin status in each node, you’ll see the following
sibling-related statistics:

Metric

Explanation

node_get_fsm_siblings_mean

The mean number of siblings encountered during all GET operations by this node within the last minute

node_get_fsm_siblings_median

The median number of siblings encountered during all GET operations by this node within the last minute

node_get_fsm_siblings_95

The 95th percentile of the number of siblings encountered during all GET operations by this node within the last minute

node_get_fsm_siblings_99

The 99th percentile of the number of siblings encountered during all GET operations by this node within the last minute

node_get_fsm_siblings_100

The 100th percentile of the number of siblings encountered during all GET operations by this node within the last minute

Is there an upward trend in these statistics over time? Are there any
large outliers? Do these trends correspond to your observed latency
spikes?

If you believe that sibling creation problems could be responsible for
latency issues in your cluster, you can start by checking the following:

If allow_mult is set to true for some or all of your buckets, be
sure that your application is correctly resolving siblings. Be sure to
read our documentation on conflict resolution for a fuller picture of how this can be done. Note: In Riak versions 2.0 and later, allow_mult is set to true by default for all bucket types that you create and activate.
If you wish to set allow_mult to false on a bucket type, you will have to do so explicitly.

Application errors are a common source of problems with
siblings. Updating the same key over and over without passing a
causal context to Riak can cause sibling explosion. If this seems to be the issue, modify your application’s conflict resolution
strategy. Another possibility worth exploring is using dotted version vectors (DVVs) in place of traditional vector clocks. DVVs can be enabled using bucket types by setting the dvv_enabled parameter to true for buckets that seem to be experiencing sibling explosion.

Compaction and Merging

The Bitcask and LevelDB storage backends occasionally go through
heavily I/O-intensive compaction phases during which they remove deleted
data and reorganize data files on disk. During these phases, affected
nodes may be slower to respond to requests than other nodes. If your
cluster is using one or both of these backends, there are steps that can
be taken to monitor and address latency issues.

If so, our first recommendation is to examine your replication properties to make sure that neither R nor W are set to N, i.e. that you’re not requiring that reads or writes go to all nodes in the cluster. The problem with setting R=N or W=N is that any request will only respond as quickly as the slowest node amongst the N nodes involved in the request.

Beyond checking for R=N or W=N for requests, the recommended
mitigation strategy depends on the backend:

Bitcask

With Bitcask, it’s recommended that you:

Limit merging to off-peak hours to decrease the effect of merging
cycles on node traffic

Stagger merge windows between nodes so that no more than one node is
undergoing a merge phase at any given time

Instructions on how to accomplish both can be found in our guide to
tuning Bitcask.

It’s also important that you adjust your maximum file size and merge
threshold settings appropriately. This setting is labeled
bitcask.max_file_size in the newer, riak.conf-based configuration files and max_file_size in the older, app.config-based system.

Setting the maximum file size lower will cause Bitcask to merge more
often (with less I/O churn), while setting it higher will induce less
frequent merges with more I/O churn. To find settings that are ideal for
your use case, we recommend checking out our guide to configuring Bitcask.

LevelDB

The more files you keep in memory, the faster LevelDB will perform in
general. To make sure that you are using your system resources
appropriately with LevelDB, check out our guide to LevelDB parameter planning.

OS Tuning

While a number of latency-related problems can manifest themselves in
development and testing environments, some performance limits only
become clear in production environments.

Mitigation

If you suspect that OS-level issues might be impacting latency, it might
be worthwhile to revisit your OS-specific configurations. The following
guides may be of help:

I/O and Network Bottlenecks

Riak is a heavily I/O- and network resource-intensive system.
Bottlenecks on either front can lead to undue latency in your cluster.
We recommend an active monitoring strategy to detect problems
immediately when they arise.

Mitigation

To diagnose potential I/O bottlenecks, there are a number of Linux tools
at your disposal, including
iowait
and netstat.

To diagnose potential overloads, Riak versions 1.3.2 and later come
equipped with an overload protection feature designed to prevent
cascading failures in overly busy nodes. This feature limits the number
of GET and PUT finite state machines (FSMs) that can exist
simultaneously on a single Riak node. Increased latency can result if a
node is frequently running up against these maximums.

Monitor node_get_fsm_active and node_get_fsm_active_60s to get an
idea of how many operations your nodes are coordinating. If you see
non-zero values in node_get_fsm_rejected or
node_get_fsm_rejected_60s, that means that some of your requests are
being discarded due to overload protection.

The FSM limits can be increased, but disabling overload protection
entirely is not recommended. More details on these settings are
available in the release
notes for
Riak version 1.3.

Object Settings

In versions 2.0 and later, Riak enables you to configure a variety of
settings regarding Riak objects, including allowable object sizes, how
many siblings to allow, and so on. If you suspect that undue latency in your cluster stems from object size or related factors, you may consider adjusting these settings.

A concise listing of object-related settings can be found in the Riak configuration documentation. The sections below explain these settings in detail.

Note on configuration files in 2.0

The object settings listed below are only available using the new system
for configuration files in Riak 2.0. If you are using the older, app.config-based system, you will not have access to
these settings.

Object Size

As stated above, we recommend always keeping objects below 1-2 MB
and preferably below 100 KB if possible. If you want to ensure that
objects above a certain size do not get stored in Riak, you can do so by
setting the object.size.maximum parameter lower than the default of
50MB, which is far above the ideal object size. If you set this
parameter to, say, 1MB and attempt to store a 2 MB object, the write
will fail and an error message will be returned to the client.

You can also set an object size threshold past which a write will
succeed but will register a warning in the logs, you can adjust the
object.size.warning_threshold parameter. The default is 5MB.

Sibling Explosion Management

In order to prevent or cut down on sibling explosion, you can either prevent Riak from storing
additional siblings when a specified sibling count is reached or set a
warning threshold past which Riak logs an error (or both). This can be
done using the object.siblings.maximum and
object.siblings.warning_threshold settings. The default maximum is 100
and the default warning threshold is 25.

Object Storage Format

There are currently two possible binary representations for objects
stored in Riak:

Erlang’s native term_to_binary format, which tends to have a higher
space overhead

A newer, Riak-specific format developed for more compact storage of
smaller values

You can set the object storage format using the object.format
parameter: 0 selects Erlang’s term_to_binary format while 1 (the
default) selects the Riak-specific format.