Benchmarking

Basho Bench is a benchmarking tool created to conduct accurate and
repeatable performance tests and stress tests, and to produce
performance graphs.

Basho Bench exposes a pluggable driver interface and has been extended
to serve as a benchmarking tool against a variety of projects. New
drivers can be written in Erlang and are generally less than 200 lines
of code.

Installation

You will need:

One or more load-generating machines on which to install
basho_bench. Especially when testing larger clusters, a
single machine cannot generate enough load to properly exercise
the cluster. Do not run the basho_bench instances on the
Riak nodes themselves, since the load generation will compete with
Riak for resources.

Download basho_bench

Building from Source

Prerequisites

Erlang must be installed. See Installing Erlang for instructions
and versioning requirements. Note: Unless you’re an experienced
Erlang developer, we recommend that you use Ubuntu 14.04 LTS (and
not CentOS), when building basho_bench from source. Later
versions of CentOS (6 and 7) have difficulty with installing and
enabling certain parts of the erlang-crypto package, which
is required by basho_bench.

Install git (to check out the basho_bench code)

Compiling

git clone git://github.com/basho/basho_bench.git
cd basho_bench
make

Usage

Run the basho_bench script, pass in the config file and the
directory to generate the results into:

basho_bench --results-dir <results dir> <config file>

If you’ve installed basho_bench from a pre-built package, you
must specify full paths for the test results directory and config
file. (Also, don’t use the common ~/ shell notation, specify the
user’s home directory explicitly)

If you built ```basho_bench``` from source, you can get away with
relative paths (and the results directory will be created in the
current directory):
```bash
./basho_bench myconfig.config

This will generate results in tests/current/. You will need to
create a configuration file. The recommended approach is to start from
a file in the examples directory and modify settings using the
Configuration section below for
reference.

Generating Benchmark Graphs

The output of from running the basho_bench script can be used to
create graphs showing the following:

Throughput — Operations per second over the duration of the test.

Latency at 99th percentile, 99.9th percentile and max latency for
the selected operations.

Prerequisites

The R statistics language is needed to generate graphs. Note: If
necessary, R can be installed on a different machine than the one
running basho_bench, and the performance data can be copied (via
rsync, for example) from the load testing machine to the one that will
be generating and viewing the graphs (such as a desktop).

Troubleshooting Graph Generation

How does it work?

When Basho Bench starts (basho_bench.erl), it reads the
configuration (basho_bench_config.erl), creates a new results
directory, and then sets up the test (basho_bench_app.erl and
basho_bench_sup.erl).

During test setup, Basho Bench creates the following:

One stats process (basho_bench_stats.erl). This process
receives notifications when an operation completes, plus the
elapsed time of the operation, and stores it in a histogram. At
regular intervals, the histograms are dumped to summary.csv as
well as operation-specific latency CSVs (e.g. put_latencies.csv
for the PUT operation).

N workers, where N is specified by the concurrent configuration setting
(basho_bench_worker.erl). The worker process wraps a driver
module, specified by the driver
configuration setting. The driver is randomly invoked using the
distribution of operations as specified by the operations configuration setting. The rate at which the
driver invokes operations is governed by the mode setting.

Once these processes have been created and initialized, Basho Bench
sends a run command to all worker processes, causing them to begin the
test. Each worker is initialized with a common seed value for random
number generation to ensure that the generated workload is reproducible
at a later date.

During the test, the workers repeatedly call driver:run/4, passing in
the next operation to run, a keygen function, a valuegen function, and
the last state of the driver. The worker process times the operation,
and reports this to the stats process when the operation has completed.

Finally, once the test has been run for the duration specified in the
config file, all workers and stats processes are terminated and the
benchmark ends. The measured latency and throughput of the test can be
found in ./tests/current/. Previous results are in timestamped
directories of the form ./tests/YYYYMMDD-HHMMSS/.

Configuration

Basho Bench ships with a number of sample configuration files, available
in the /examples directory.

Global Config Settings

mode

The mode setting controls the rate at which workers invoke the
{driver:run/4} function with a new operation. There are two possible
values:

Note that this setting is applied to each driver independently. For
example, if {rate, 5} is used with 3 concurrent workers, Basho Bench
will be generating 15 (i.e. 5 * 3) operations per second.

% Run at max, i.e.: as quickly as possible
{mode, max}
% Run 15 operations per second per worker
{mode, {rate, 15}}

concurrent

The number of concurrent worker processes. The default is 3 worker
processes. This determines the number of concurrent clients running
requests on API under test.

% Run 10 concurrent processes
{concurrent, 10}

duration

The duration of the test, in minutes. The default is 5 minutes.

% Run the test for one hour
{duration, 60}

operations

The possible operations that the driver will run, plus their “weight,”
or likelihood of being run. The default is [{get,4},{put,4},{delete,
1}], which means that out of every 9 operations, GET will be called
four times, PUT will be called four times, and DELETE will be called
once, on average.

{operations, [{get, 4}, {put, 1}]}.

Operations are defined on a per-driver basis. Not all drivers will
implement the GET/PUT operations discussed above. Consult the driver
source to determine the valid operations. If you’re testing the HTTP
interface, for example, the corresponding operations are GET and
UPDATE, respectively.

If a driver does not support a specified operation (asdfput in this
example), you may see errors like this:

driver

The module name of the driver that Basho Bench will use to generate
load. A driver may simply invoke code in-process (such as when
measuring the performance of DETS) or may open network connections and
generate load on a remote system (such as when testing a Riak
server/cluster).

{pareto_int, MaxKey} — selects an integer from a Pareto
distribution, such that 20% of the available keys get selected 80%
of the time. Note that the current implementation of this
generator may yield values larger than MaxKey due to the
mathematical properties of the Pareto distribution.

{truncated_pareto_int, MaxKey} — the same as {pareto_int}, but
will _not> yield values above MaxKey.

{function, Module, Function, Args} — specifies an external
function that should return a key generator function. The worker
Id will be prepended to Args when the function is called.

{int_to_bin, Generator} — takes any of the above _int
generators and converts the number to a 32-bit binary. This is
needed for some drivers that require a binary key.

{int_to_str, Generator} — takes any of the above _int
generators and converts the number to a string. This is needed for
some drivers that require a string key.

The default key generator is {uniform_int, 100000}.

Examples:

% Use a randomly selected integer between 1 and 10,000
{key_generator, {uniform_int, 10000}}.
% Use a randomly selected integer between 1 and 10,000, as binary.
{key_generator, {int_to_bin, {uniform_int, 10000}}}.
% Use a pareto distributed integer between 1 and 10,000; values < 2000
% will be returned 80% of the time.
{key_generator, {pareto_int, 10000}}.

value_generator

The generator function to use for creating values. Generators are
defined in basho_bench_valgen.erl. Available generators include:

{fixed_bin, Size} — generates a random binary of Size
bytes. Every binary is the same size, but varies in content.

{exponential_bin, MinSize, Mean} — generates a random binary
which has an exponentially distributed size. Most values will be
approximately MinSize + Mean bytes in size, with a long tail
of larger values.

{uniform_bin, MinSize, MaxSize} — generates a random binary
which has an evenly distributed size between MinSize and
MaxSize.

{function, Module, Function, Args} — specifies an external
function that should return a value generator function. The worker
Id will be prepended to Args when the function is called.