1. Introduction

Apache Accumulo is a highly scalable structured store based on Google’s BigTable.
Accumulo is written in Java and operates over the Hadoop Distributed File System
(HDFS), which is part of the popular Apache Hadoop project. Accumulo supports
efficient storage and retrieval of structured data, including queries for ranges, and
provides support for using Accumulo tables as input and output for MapReduce
jobs.

2. Accumulo Design

2.1. Data Model

Accumulo provides a richer data model than simple key-value stores, but is not a
fully relational database. Data is represented as key-value pairs, where the key and
value are comprised of the following elements:

Key

Value

Row ID

Column

Timestamp

Family

Qualifier

Visibility

All elements of the Key and the Value are represented as byte arrays except for
Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically
in ascending order. Timestamps are sorted in descending order so that later
versions of the same Key appear first in a sequential scan. Tables consist of a set of
sorted key-value pairs.

2.2. Architecture

Accumulo is a distributed data storage and retrieval system and as such consists of
several architectural components, some of which run on many individual servers.
Much of the work Accumulo does involves maintaining certain properties of the
data, such as organization, availability, and integrity, across many commodity-class
machines.

2.3. Components

An instance of Accumulo includes many TabletServers, one Garbage Collector process,
one Master server and many Clients.

2.3.1. Tablet Server

The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a
write-ahead log, sorting new key-value pairs in memory, periodically
flushing sorted key-value pairs to new files in HDFS, and responding
to reads from clients, forming a merge-sorted view of all keys and
values from all the files it has created and the sorted in-memory
store.

TabletServers also perform recovery of a tablet
that was previously on a server that failed, reapplying any writes
found in the write-ahead log to the tablet.

2.3.2. Garbage Collector

Accumulo processes will share files stored in HDFS. Periodically, the Garbage
Collector will identify files that are no longer needed by any process, and
delete them. Multiple garbage collectors can be run to provide hot-standby support.
They will perform leader election among themselves to choose a single active instance.

2.3.3. Master

The Accumulo Master is responsible for detecting and responding to TabletServer
failure. It tries to balance the load across TabletServer by assigning tablets carefully
and instructing TabletServers to unload tablets when necessary. The Master ensures all
tablets are assigned to one TabletServer each, and handles table creation, alteration,
and deletion requests from clients. The Master also coordinates startup, graceful
shutdown and recovery of changes in write-ahead logs when Tablet servers fail.

Multiple masters may be run. The masters will choose among themselves a single master,
and the others will become backups if the master should fail.

2.3.4. Tracer

The Accumulo Tracer process supports the distributed timing API provided by Accumulo.
One to many of these processes can be run on a cluster which will write the timing
information to a given Accumulo table for future reference. Seeing the section on
Tracing for more information on this support.

2.3.5. Monitor

The Accumulo Monitor is a web application that provides a wealth of information about
the state of an instance. The Monitor shows graphs and tables which contain information
about read/write rates, cache hit/miss rates, and Accumulo table information such as scan
rate and active/queued compactions. Additionally, the Monitor should always be the first
point of entry when attempting to debug an Accumulo problem as it will show high-level problems
in addition to aggregated errors from all nodes in the cluster. See the section on Monitoring
for more information.

Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the
forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
at one time. Leader election will be performed internally to choose the active Monitor.

2.3.6. Client

Accumulo includes a client library that is linked to every application. The client
library contains logic for finding servers managing a particular tablet, and
communicating with TabletServers to write and retrieve key-value pairs.

2.4. Data Management

Accumulo stores data in tables, which are partitioned into tablets. Tablets are
partitioned on row boundaries so that all of the columns and values for a particular
row are found together within the same tablet. The Master assigns Tablets to one
TabletServer at a time. This enables row-level transactions to take place without
using distributed locking or some other complicated synchronization mechanism. As
clients insert and query data, and as machines are added and removed from the
cluster, the Master migrates tablets to ensure they remain available and that the
ingest and query load is balanced across the cluster.

2.5. Tablet Service

When a write arrives at a TabletServer it is written to a Write-Ahead Log and
then inserted into a sorted data structure in memory called a MemTable. When the
MemTable reaches a certain size, the TabletServer writes out the sorted
key-value pairs to a file in HDFS called a Relative Key File (RFile), which is a
kind of Indexed Sequential Access Method (ISAM) file. This process is called a
minor compaction. A new MemTable is then created and the fact of the compaction
is recorded in the Write-Ahead Log.

When a request to read data arrives at a TabletServer, the TabletServer does a
binary search across the MemTable as well as the in-memory indexes associated
with each RFile to find the relevant values. If clients are performing a scan,
several key-value pairs are returned to the client in order from the MemTable
and the set of RFiles by performing a merge-sort as they are read.

2.6. Compactions

In order to manage the number of files per tablet, periodically the TabletServer
performs Major Compactions of files within a tablet, in which some set of RFiles
are combined into one file. The previous files will eventually be removed by the
Garbage Collector. This also provides an opportunity to permanently remove
deleted key-value pairs by omitting key-value pairs suppressed by a delete entry
when the new file is created.

2.7. Splitting

When a table is created it has one tablet. As the table grows its initial
tablet eventually splits into two tablets. Its likely that one of these
tablets will migrate to another tablet server. As the table continues to grow,
its tablets will continue to split and be migrated. The decision to
automatically split a tablet is based on the size of a tablets files. The
size threshold at which a tablet splits is configurable per table. In addition
to automatic splitting, a user can manually add split points to a table to
create new tablets. Manually splitting a new table can parallelize reads and
writes giving better initial performance without waiting for automatic
splitting.

As data is deleted from a table, tablets may shrink. Over time this can lead
to small or empty tablets. To deal with this, merging of tablets was
introduced in Accumulo 1.4. This is discussed in more detail later.

2.8. Fault-Tolerance

If a TabletServer fails, the Master detects it and automatically reassigns the tablets
assigned from the failed server to other servers. Any key-value pairs that were in
memory at the time the TabletServer fails are automatically reapplied from the Write-Ahead
Log(WAL) to prevent any loss of data.

Tablet servers write their WALs directly to HDFS so the logs are available to all tablet
servers for recovery. To make the recovery process efficient, the updates within a log are
grouped by tablet. TabletServers can quickly apply the mutations from the sorted logs
that are destined for the tablets they have now been assigned.

4. Writing Accumulo Clients

4.1. Running Client Code

There are multiple ways to run Java code that uses Accumulo. Below is a list
of the different ways to execute client code.

using java executable

using the accumulo script

using the tool script

In order to run client code written to run against Accumulo, you will need to
include the jars that Accumulo depends on in your classpath. Accumulo client
code depends on Hadoop and Zookeeper. For Hadoop add the hadoop client jar, all
of the jars in the Hadoop lib directory, and the conf directory to the
classpath. For recent Zookeeper versions, you only need to add the Zookeeper jar, and not
what is in the Zookeeper lib directory. You can run the following command on a
configured Accumulo system to see what its using for its classpath.

$ACCUMULO_HOME/bin/accumulo classpath

Another option for running your code is to put a jar file in
$ACCUMULO_HOME/lib/ext. After doing this you can use the accumulo
script to execute your code. For example if you create a jar containing the
class com.foo.Client and placed that in lib/ext, then you could use the command
$ACCUMULO_HOME/bin/accumulo com.foo.Client to execute your code.

If you are writing map reduce job that access Accumulo, then you can use the
bin/tool.sh script to run those jobs. See the map reduce example.

4.2. Connecting

All clients must first identify the Accumulo instance to which they will be
communicating. Code to do this is as follows:

The PasswordToken is the most common implementation of an AuthenticationToken.
This general interface allow authentication as an Accumulo user to come from
a variety of sources or means. The CredentialProviderToken leverages the Hadoop
CredentialProviders (new in Hadoop 2.6).

For example, the CredentialProviderToken can be used in conjunction with a Java
KeyStore to alleviate passwords stored in cleartext. When stored in HDFS, a single
KeyStore can be used across an entire instance. Be aware that KeyStores stored on
the local filesystem must be made available to all nodes in the Accumulo cluster.

The KerberosToken can be provided to use the authentication provided by Kerberos.
Using Kerberos requires external setup and additional configuration, but provides
a single point of authentication through HDFS, YARN and ZooKeeper and allowing
for password-less authentication with Accumulo.

4.3. Writing Data

Data are written to Accumulo by creating Mutation objects that represent all the
changes to the columns of a single row. The changes are made atomically in the
TabletServer. Clients then add Mutations to a BatchWriter which submits them to
the appropriate TabletServers.

4.3.1. BatchWriter

The BatchWriter is highly optimized to send Mutations to multiple TabletServers
and automatically batches Mutations destined for the same TabletServer to
amortize network overhead. Care must be taken to avoid changing the contents of
any Object passed to the BatchWriter since it keeps objects in memory while
batching.

An example of using the batch writer can be found at
accumulo/docs/examples/README.batch.

4.3.2. ConditionalWriter

The ConditionalWriter enables efficient, atomic read-modify-write operations on
rows. The ConditionalWriter writes special Mutations which have a list of per
column conditions that must all be met before the mutation is applied. The
conditions are checked in the tablet server while a row lock is
held (Mutations written by the BatchWriter will not obtain a row
lock). The conditions that can be checked for a column are equality and
absence. For example a conditional mutation can require that column A is
absent inorder to be applied. Iterators can be applied when checking
conditions. Using iterators, many other operations besides equality and
absence can be checked. For example, using an iterator that converts values
less than 5 to 0 and everything else to 1, its possible to only apply a
mutation when a column is less than 5.

In the case when a tablet server dies after a client sent a conditional
mutation, its not known if the mutation was applied or not. When this happens
the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation.
In many cases this situation can be dealt with by simply reading the row again
and possibly sending another conditional mutation. If this is not sufficient,
then a higher level of abstraction can be built by storing transactional
information within a row.

An example of using the batch writer can be found at
accumulo/docs/examples/README.reservations.

4.3.3. Durability

By default, Accumulo writes out any updates to the Write-Ahead Log (WAL). Every change
goes into a file in HDFS and is sync’d to disk for maximum durability. In
the event of a failure, writes held in memory are replayed from the WAL. Like
all files in HDFS, this file is also replicated. Sending updates to the
replicas, and waiting for a permanent sync to disk can significantly write speeds.

Accumulo allows users to use less tolerant forms of durability when writing.
These levels are:

none: no durability guarantees are made, the WAL is not used

log: the WAL is used, but not flushed; loss of the server probably means recent writes are lost

flush: updates are written to the WAL, and flushed out to replicas; loss of a single server is unlikely to result in data loss.

sync: updates are written to the WAL, and synced to disk on all replicas before the write is acknowledge. Data will not be lost even if the entire cluster suddenly loses power.

The user can set the default durability of a table in the shell. When
writing, the user can configure the BatchWriter or ConditionalWriter to use
a different level of durability for the session. This will override the
default durability setting.

4.4. Reading Data

Accumulo is optimized to quickly retrieve the value associated with a given key, and
to efficiently return ranges of consecutive keys and their associated values.

4.4.1. Scanner

To retrieve data, Clients use a Scanner, which acts like an Iterator over
keys and values. Scanners can be configured to start and stop at particular keys, and
to return a subset of the columns available.

4.4.2. Isolated Scanner

Accumulo supports the ability to present an isolated view of rows when
scanning. There are three possible ways that a row could change in Accumulo :

a mutation applied to a table

iterators executed as part of a minor or major compaction

bulk import of new files

Isolation guarantees that either all or none of the changes made by these
operations on a row are seen. Use the IsolatedScanner to obtain an isolated
view of an Accumulo table. When using the regular scanner it is possible to see
a non isolated view of a row. For example if a mutation modifies three
columns, it is possible that you will only see two of those modifications.
With the isolated scanner either all three of the changes are seen or none.

The IsolatedScanner buffers rows on the client side so a large row will not
crash a tablet server. By default rows are buffered in memory, but the user
can easily supply their own buffer if they wish to buffer to disk when rows are
large.

4.4.3. BatchScanner

For some types of access, it is more efficient to retrieve several ranges
simultaneously. This arises when accessing a set of rows that are not consecutive
whose IDs have been retrieved from a secondary index, for example.

The BatchScanner is configured similarly to the Scanner; it can be configured to
retrieve a subset of the columns available, but rather than passing a single Range,
BatchScanners accept a set of Ranges. It is important to note that the keys returned
by a BatchScanner are not in sorted order since the keys streamed are from multiple
TabletServers in parallel.

An example of the BatchScanner can be found at
accumulo/docs/examples/README.batch.

4.5. Proxy

The proxy API allows the interaction with Accumulo with languages other than Java.
A proxy server is provided in the codebase and a client can further be generated.
The proxy API can also be used instead of the traditional ZooKeeperInstance class to
provide a single TCP port in which clients can be securely routed through a firewall,
without requiring access to all tablet servers in the cluster.

4.5.1. Prerequisites

The proxy server can live on any node in which the basic client API would work. That
means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
DataNodes. A proxy client only needs the ability to communicate with the proxy server.

4.5.2. Configuration

The configuration options for the proxy server live inside of a properties file. At
the very least, you need to supply the following properties:

This sample configuration file further demonstrates an ability to back the proxy server
by MockAccumulo or the MiniAccumuloCluster.

4.5.3. Running the Proxy Server

After the properties file holding the configuration is created, the proxy server
can be started using the following command in the Accumulo distribution (assuming
your properties file is named config.properties):

$ACCUMULO_HOME/bin/accumulo proxy -p config.properties

4.5.4. Creating a Proxy Client

Aside from installing the Thrift compiler, you will also need the language-specific library
for Thrift installed to generate client code in that language. Typically, your operating
system’s package manager will be able to automatically install these for you in an expected
location such as /usr/lib/python/site-packages/thrift.

You can find the thrift file for generating the client:

$ACCUMULO_HOME/proxy/proxy.thrift.

After a client is generated, the port specified in the configuration properties above will be
used to connect to the server.

4.5.5. Using a Proxy Client

The following examples have been written in Java and the method signatures may be
slightly different depending on the language specified when generating client with
the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift’s
documentation for examples of connecting to a Thrift service), the methods on the
proxy client will be available. The first thing to do is log in:

5. Development Clients

Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If
you want to write a unit test that uses Accumulo, you need a lot of infrastructure
in place before your test can run.

5.1. Mock Accumulo

Mock Accumulo supplies mock implementations for much of the client API. It presently
does not enforce users, logins, permissions, etc. It does support Iterators and Combiners.
Note that MockAccumulo holds all data in memory, and will not retain any data or
settings between runs.

5.2. Mini Accumulo Cluster

While the Mock Accumulo provides a lightweight implementation of the client API for unit
testing, it is often necessary to write more realistic end-to-end integration tests that
take advantage of the entire ecosystem. The Mini Accumulo Cluster makes this possible by
configuring and starting Zookeeper, initializing Accumulo, and starting the Master as well
as some Tablet Servers. It runs against the local filesystem instead of having to start
up HDFS.

To start it up, you will need to supply an empty directory and a root password as arguments:

Upon completion of our development code, we will want to shutdown our MiniAccumuloCluster:

accumulo.stop();
// delete your temporary folder

6. Table Configuration

Accumulo tables have a few options that can be configured to alter the default
behavior of Accumulo as well as improve performance based on the data stored.
These include locality groups, constraints, bloom filters, iterators, and block
cache. For a complete list of available configuration options, see Configuration Management.

6.1. Locality Groups

Accumulo supports storing sets of column families separately on disk to allow
clients to efficiently scan over columns that are frequently used together and to avoid
scanning over column families that are not requested. After a locality group is set,
Scanner and BatchScanner operations will automatically take advantage of them
whenever the fetchColumnFamilies() method is used.

By default, tables place all column families into the same “default” locality group.
Additional locality groups can be configured at any time via the shell or
programmatically as follows:

The assignment of Column Families to Locality Groups can be changed at any time. The
physical movement of column families into their new locality groups takes place via
the periodic Major Compaction process that takes place continuously in the
background. Major Compaction can also be scheduled to take place immediately
through the shell:

user@myinstance mytable> compact -t mytable

6.2. Constraints

Accumulo supports constraints applied on mutations at insert time. This can be
used to disallow certain inserts according to a user defined policy. Any mutation
that fails to meet the requirements of the constraint is rejected and sent back to the
client.

Currently there are no general-purpose constraints provided with the Accumulo
distribution. New constraints can be created by writing a Java class that implements
the following interface:

org.apache.accumulo.core.constraints.Constraint

To deploy a new constraint, create a jar file containing the class implementing the
new constraint and place it in the lib directory of the Accumulo installation. New
constraint jars can be added to Accumulo and enabled without restarting but any
change to an existing constraint class requires Accumulo to be restarted.

An example of constraints can be found in
accumulo/docs/examples/README.constraints with corresponding code under
accumulo/examples/simple/src/main/java/accumulo/examples/simple/constraints .

6.3. Bloom Filters

As mutations are applied to an Accumulo table, several files are created per tablet. If
bloom filters are enabled, Accumulo will create and load a small data structure into
memory to determine whether a file contains a given key before opening the file.
This can speed up lookups considerably.

To enable bloom filters, enter the following command in the Shell:

user@myinstance> config -t mytable -s table.bloom.enabled=true

An extensive example of using Bloom Filters can be found at
accumulo/docs/examples/README.bloom .

6.4. Iterators

Iterators provide a modular mechanism for adding functionality to be executed by
TabletServers when scanning or compacting data. This allows users to efficiently
summarize, filter, and aggregate data. In fact, the built-in features of cell-level
security and column fetching are implemented using Iterators.
Some useful Iterators are provided with Accumulo and can be found in the
org.apache.accumulo.core.iterators.user package.
In each case, any custom Iterators must be included in Accumulo’s classpath,
typically by including a jar in $ACCUMULO_HOME/lib or
$ACCUMULO_HOME/lib/ext, although the VFS classloader allows for
classpath manipulation using a variety of schemes including URLs and HDFS URIs.

6.4.1. Setting Iterators via the Shell

Iterators can be configured on a table at scan, minor compaction and/or major
compaction scopes. If the Iterator implements the OptionDescriber interface, the
setiter command can be used which will interactively prompt the user to provide
values for the given necessary options.

Typically, a table will have multiple iterators. Accumulo configures a set of
system level iterators for each table. These iterators provide core
functionality like visibility label filtering and may not be removed by
users. User level iterators are applied in the order of their priority.
Priority is a user configured integer; iterators with lower numbers go first,
passing the results of their iteration on to the other iterators up the
stack.

Tables support separate Iterator settings to be applied at scan time, upon minor
compaction and upon major compaction. For most uses, tables will have identical
iterator settings for all three to avoid inconsistent results.

6.4.3. Versioning Iterators and Timestamps

Accumulo provides the capability to manage versioned data through the use of
timestamps within the Key. If a timestamp is not specified in the key created by the
client then the system will set the timestamp to the current time. Two keys with
identical rowIDs and columns but different timestamps are considered two versions
of the same key. If two inserts are made into Accumulo with the same rowID,
column, and timestamp, then the behavior is non-deterministic.

Timestamps are sorted in descending order, so the most recent data comes first.
Accumulo can be configured to return the top k versions, or versions later than a
given date. The default is to return the one most recent version.

The version policy can be changed by changing the VersioningIterator options for a
table as follows:

When a table is created, by default its configured to use the
VersioningIterator and keep one version. A table can be created without the
VersioningIterator with the -ndi option in the shell. Also the Java API
has the following method

Logical Time

Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
set by Accumulo always move forward. This helps avoid problems caused by
TabletServers that have different time settings. The per tablet counter gives unique
one up time stamps on a per mutation basis. When using time in milliseconds, if
two things arrive within the same millisecond then both receive the same
timestamp. When using time in milliseconds, Accumulo set times will still
always move forward and never backwards.

A table can be configured to use logical timestamps at creation time as follows:

user@myinstance> createtable -tl logical

Deletes

Deletes are special keys in Accumulo that get sorted along will all the other data.
When a delete key is inserted, Accumulo will not show anything that has a
timestamp less than or equal to the delete key. During major compaction, any keys
older than a delete key are omitted from the new file created, and the omitted keys
are removed from disk as part of the regular garbage collection process.

6.4.4. Filters

When scanning over a set of key-value pairs it is possible to apply an arbitrary
filtering policy through the use of a Filter. Filters are types of iterators that return
only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters
that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added
by writing a Java class that extends the
org.apache.accumulo.core.iterators.Filter class.

The AgeOff filter can be configured to remove data older than a certain date or a fixed
amount of time from the present. The following example sets a table to delete
everything inserted over 30 seconds ago:

6.4.5. Combiners

Accumulo allows Combiners to be configured on tables and column
families. When a Combiner is set it is applied across the values
associated with any keys that share rowID, column family, and column qualifier.
This is similar to the reduce step in MapReduce, which applied some function to all
the values associated with a particular key.

For example, if a summing combiner were configured on a table and the following
mutations were inserted:

6.5. Block Cache

In order to increase throughput of commonly accessed entries, Accumulo employs a block cache.
This block cache buffers data in memory so that it doesn’t have to be read off of disk.
The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks.
Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks.

The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool.
To configure the size of the tablet server’s block cache, set the following properties:

tserver.cache.data.size: Specifies the size of the cache for file data blocks.
tserver.cache.index.size: Specifies the size of the cache for file indices.

To enable the block cache for your table, set the following properties:

The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
It is enabled by default for the metadata tables.

6.6. Compaction

As data is written to Accumulo it is buffered in memory. The data buffered in
memory is eventually written to HDFS on a per tablet basis. Files can also be
added to tablets directly by bulk import. In the background tablet servers run
major compactions to merge multiple files into one. The tablet server has to
decide which tablets to compact and which files within a tablet to compact.
This decision is made using the compaction ratio, which is configurable on a
per table basis. To configure this ratio modify the following property:

table.compaction.major.ratio

Increasing this ratio will result in more files per tablet and less compaction
work. More files per tablet means more higher query latency. So adjusting
this ratio is a trade off between ingest and query performance. The ratio
defaults to 3.

The way the ratio works is that a set of files is compacted into one file if the
sum of the sizes of the files in the set is larger than the ratio multiplied by
the size of the largest file in the set. If this is not true for the set of all
files in a tablet, the largest file is removed from consideration, and the
remaining files are considered for compaction. This is repeated until a
compaction is triggered or there are no files left to consider.

The number of background threads tablet servers use to run major compactions is
configurable. To configure this modify the following property:

tserver.compaction.major.concurrent.max

Also, the number of threads tablet servers use for minor compactions is
configurable. To configure this modify the following property:

tserver.compaction.minor.concurrent.max

The numbers of minor and major compactions running and queued is visible on the
Accumulo monitor page. This allows you to see if compactions are backing up
and adjustments to the above settings are needed. When adjusting the number of
threads available for compactions, consider the number of cores and other tasks
running on the nodes such as maps and reduces.

If major compactions are not keeping up, then the number of files per tablet
will grow to a point such that query performance starts to suffer. One way to
handle this situation is to increase the compaction ratio. For example, if the
compaction ratio were set to 1, then every new file added to a tablet by minor
compaction would immediately queue the tablet for major compaction. So if a
tablet has a 200M file and minor compaction writes a 1M file, then the major
compaction will attempt to merge the 200M and 1M file. If the tablet server
has lots of tablets trying to do this sort of thing, then major compactions
will back up and the number of files per tablet will start to grow, assuming
data is being continuously written. Increasing the compaction ratio will
alleviate backups by lowering the amount of major compaction work that needs to
be done.

Another option to deal with the files per tablet growing too large is to adjust
the following property:

table.file.max

When a tablet reaches this number of files and needs to flush its in-memory
data to disk, it will choose to do a merging minor compaction. A merging minor
compaction will merge the tablet’s smallest file with the data in memory at
minor compaction time. Therefore the number of files will not grow beyond this
limit. This will make minor compactions take longer, which will cause ingest
performance to decrease. This can cause ingest to slow down until major
compactions have enough time to catch up. When adjusting this property, also
consider adjusting the compaction ratio. Ideally, merging minor compactions
never need to occur and major compactions will keep up. It is possible to
configure the file max and compaction ratio such that only merging minor
compactions occur and major compactions never occur. This should be avoided
because doing only merging minor compactions causes O(N2) work to be done.
The amount of work done by major compactions is O(N*logR(N)) where
R is the compaction ratio.

Compactions can be initiated manually for a table. To initiate a minor
compaction, use the flush command in the shell. To initiate a major compaction,
use the compact command in the shell. The compact command will compact all
tablets in a table to one file. Even tablets with one file are compacted. This
is useful for the case where a major compaction filter is configured for a
table. In 1.4 the ability to compact a range of a table was added. To use this
feature specify start and stop rows for the compact command. This will only
compact tablets that overlap the given row range.

6.7. Pre-splitting tables

Accumulo will balance and distribute tables across servers. Before a
table gets large, it will be maintained as a single tablet on a single
server. This limits the speed at which data can be added or queried
to the speed of a single node. To improve performance when the a table
is new, or small, you can add split points and generate new tablets.

This will create a new table with 4 tablets. The table will be split
on the letters “g”, “n”, and “t” which will work nicely if the
row data start with lower-case alphabetic characters. If your row
data includes binary information or numeric information, or if the
distribution of the row information is not flat, then you would pick
different split points. Now ingest and query can proceed on 4 nodes
which can improve performance.

6.8. Merging tablets

Over time, a table can get very large, so large that it has hundreds
of thousands of split points. Once there are enough tablets to spread
a table across the entire cluster, additional splits may not improve
performance, and may create unnecessary bookkeeping. The distribution
of data may change over time. For example, if row data contains date
information, and data is continually added and removed to maintain a
window of current information, tablets for older rows may be empty.

Accumulo supports tablet merging, which can be used to reduce
the number of split points. The following command will merge all rows
from “A” to “Z” into a single tablet:

root@myinstance> merge -t myTable -s A -e Z

If the result of a merge produces a tablet that is larger than the
configured split size, the tablet may be split by the tablet server.
Be sure to increase your tablet size prior to any merges if the goal
is to have larger tablets:

root@myinstance> config -t myTable -s table.split.threshold=2G

In order to merge small tablets, you can ask Accumulo to merge
sections of a table smaller than a given size.

root@myinstance> merge -t myTable -s 100M

By default, small tablets will not be merged into tablets that are
already larger than the given size. This can leave isolated small
tablets. To force small tablets to be merged into larger tablets use
the --force option:

root@myinstance> merge -t myTable -s 100M --force

Merging away small tablets works on one section at a time. If your
table contains many sections of small split points, or you are
attempting to change the split size of the entire table, it will be
faster to set the split point and merge the entire table:

6.9. Delete Range

Consider an indexing scheme that uses date information in each row.
For example “20110823-15:20:25.013” might be a row that specifies a
date and time. In some cases, we might like to delete rows based on
this date, say to remove all the data older than the current year.
Accumulo supports a delete range operation which efficiently
removes data between two rows. For example:

root@myinstance> deleterange -t myTable -s 2010 -e 2011

This will delete all rows starting with “2010” and it will stop at
any row starting “2011”. You can delete any data prior to 2011
with:

root@myinstance> deleterange -t myTable -e 2011 --force

The shell will not allow you to delete an unbounded range (no start)
unless you provide the --force option.

Range deletion is implemented using splits at the given start/end
positions, and will affect the number of splits in the table.

6.10. Cloning Tables

A new table can be created that points to an existing table’s data. This is a
very quick metadata operation, no data is actually copied. The cloned table
and the source table can change independently after the clone operation. One
use case for this feature is testing. For example to test a new filtering
iterator, clone the table, add the filter to the clone, and force a major
compaction. To perform a test on less data, clone a table and then use delete
range to efficiently remove a lot of data from the clone. Another use case is
generating a snapshot to guard against human error. To create a snapshot,
clone a table and then disable write permissions on the clone.

The clone operation will point to the source table’s files. This is why the
flush option is present and is enabled by default in the shell. If the flush
option is not enabled, then any data the source table currently has in memory
will not exist in the clone.

A cloned table copies the configuration of the source table. However the
permissions of the source table are not copied to the clone. After a clone is
created, only the user that created the clone can read and write to it.

In the following example we see that data inserted after the clone operation is
not visible in the clone.

The du command in the shell shows how much space a table is using in HDFS.
This command can also show how much overlapping space two cloned tables have in
HDFS. In the example below du shows table ci is using 428M. Then ci is cloned
to cic and du shows that both tables share 428M. After three entries are
inserted into cic and its flushed, du shows the two tables still share 428M but
cic has 226 bytes to itself. Finally, table cic is compacted and then du shows
that each table uses 428M.

6.11. Exporting Tables

Accumulo supports exporting tables for the purpose of copying tables to another
cluster. Exporting and importing tables preserves the tables configuration,
splits, and logical time. Tables are exported and then copied via the hadoop
distcp command. To export a table, it must be offline and stay offline while
discp runs. The reason it needs to stay offline is to prevent files from being
deleted. A table can be cloned and the clone taken offline inorder to avoid
losing access to the table. See docs/examples/README.export for an example.

7. Iterator Design

Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, are server-side programming constructs
that allow users to implement custom retrieval or computational purpose within Accumulo TabletServers. The name rightly
brings forward similarities to the Java Iterator interface; however, Accumulo Iterators are more complex than Java
Iterators. Notably, in addition to the expected methods to retrieve the current element and advance to the next element
in the iteration, Accumulo Iterators must also support the ability to "move" (seek) to an specified point in the
iteration (the Accumulo table). Accumulo Iterators are designed to be concatenated together, similar to applying a
series of transformations to a list of elements. Accumulo Iterators can duplicate their underlying source to create
multiple "pointers" over the same underlying data (which is extremely powerful since each stream is sorted) or they can
merge multiple Iterators into a single view. In this sense, a collection of Iterators operating in tandem is close to
a tree-structure than a list, but there is always a sense of a flow of Key-Value pairs through some Iterators. Iterators
are not designed to act as triggers nor are they designed to operate outside of the purview of a single table.

Understanding how TabletServers invoke the methods on a SortedKeyValueIterator can be obtuse as the actual code is
buried within the implementation of the TabletServer; however, it is generally unnecessary to have a strong
understanding of this as the interface provides clear definitions about what each action each method should take. This
chapter aims to provide a more detailed description of how Iterators are invoked, some best practices and some common
pitfalls.

7.1. Instantiation

To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class must be on the classpath of every
TabletServer. For production environments, it is common to place a JAR file which contains the Iterator in
$ACCUMULO_HOME/lib. In development environments, it is convenient to instead place the JAR file in
$ACCUMULO_HOME/lib/ext as JAR files in this directory are dynamically reloaded by the TabletServers alleviating the
need to restart Accumulo while testing an Iterator. Advanced classloader features which enable other types of
filesystems and per-table classpath configurations (as opposed to process-wide classpaths). These features
are not covered here, but elsewhere in the user manual.

Accumulo references the Iterator class by name and uses Java reflection to instantiate the Iterator. This means that
Iterators must have a public no-args constructor.

7.2. Interface

A normal implementation of the SortedKeyValueIterator defines functionality for the following methods:

7.2.1. init

The init method is called by the TabletServer after it constructs an instance of the Iterator. This method should
clear/reset any internal state in the Iterator and prepare it to process data. The first argument, the source, is the
Iterator "below" this Iterator (where the client is at "top" and the Iterator for files in HDFS are at the "bottom").
The "source" Iterator provides the Key-Value pairs which this Iterator will operate upon.

The second argument, a Map of options, is made up of options provided by the user, options set in the table’s
configuration, and/or options set in the containing namespace’s configuration.
These options allow for Iterators to dynamically configure themselves on the fly. If no options are used in the current context
(a Scan or Compaction), the Map will be empty. An example of a configuration item for an Iterator could be a pattern used to filter
Key-Value pairs in a regular expression Iterator.

The third argument, the IteratorEnvironment, is a special object which provides information to this Iterator about the
context in which it was invoked. Commonly, this information is not necessary to inspect. For example, if an Iterator
knows that it is running in the context of a full-major compaction (reading all of the data) as opposed to a user scan
(which may strongly limit the number of columns), the Iterator might make different algorithmic decisions in an attempt to
optimize itself.

7.2.2. seek

The seek method is likely the most confusing method on the Iterator interface. The purpose of this method is to
advance the stream of Key-Value pairs to a certain point in the iteration (the Accumulo table). It is common that before
the implementation of this method returns some additional processing is performed which may further advance the current
position past the startKey of the Range. This, however, is dependent on the functionality the iterator provides. For
example, a filtering iterator would consume a number Key-Value pairs which do not meets its criteria before seek
returns. The important condition for seek to meet is that this Iterator should be ready to return the first Key-Value
pair, or none if no such pair is available, when the method returns. The Key-Value pair would be returned by getTopKey
and getTopValue, respectively, and hasTop should return a boolean denoting whether or not there is
a Key-Value pair to return.

The arguments passed to seek are as follows:

The TabletServer first provides a Range, an object which defines some collection of Accumulo Key`s, which defines the
Key-Value pairs that this Iterator should return. Each `Range has a startKey and endKey with an inclusive flag for
both. While this Range is often similar to the Range(s) set by the client on a Scanner or BatchScanner, it is not
guaranteed to be a Range that the client set. Accumulo will split up larger ranges and group them together based on
Tablet boundaries per TabletServer. Iterators should not attempt to implement any custom logic based on the Range(s)
provided to seek and Iterators should not return any Keys that fall outside of the provided Range.

The second argument, a Collection<ByteSequence>, is the set of column families which should be retained or
excluded by this Iterator. The third argument, a boolean, defines whether the collection of column families
should be treated as an inclusion collection (true) or an exclusion collection (false).

It is likely that all implementations of seek will first make a call to the seek method on the
"source" Iterator that was provided in the init method. The collection of column families and
the boolean include argument should be passed down as well as the Range. Somewhat commonly, the Iterator will
also implement some sort of additional logic to find or compute the first Key-Value pair in the provided
Range. For example, a regular expression Iterator would consume all records which do not match the given
pattern before returning from seek.

It is important to retain the original Range passed to this method to know when this Iterator should stop
reading more Key-Value pairs. Ignoring this typically does not affect scans from a Scanner, but it
will result in duplicate keys emitting from a BatchScan if the scanned table has more than one tablet.
Best practice is to never emit entries outside the seek range.

7.2.3. next

The next method is analogous to the next method on a Java Iterator: this method should advance
the Iterator to the next Key-Value pair. For implementations that perform some filtering or complex
logic, this may result in more than one Key-Value pair being inspected. This method alters
some internal state that is exposed via the hasTop, getTopKey, and getTopValue methods.

The result of this method is commonly caching a Key-Value pair which getTopKey and getTopValue
can later return. While there is another Key-Value pair to return, hasTop should return true.
If there are no more Key-Value pairs to return from this Iterator since the last call to
seek, hasTop should return false.

7.2.4. hasTop

The hasTop method is similar to the hasNext method on a Java Iterator in that it informs
the caller if there is a Key-Value pair to be returned. If there is no pair to return, this method
should return false. Like a Java Iterator, multiple calls to hasTop (without calling next) should not
alter the internal state of the Iterator.

7.2.5. getTopKey and getTopValue

These methods simply return the current Key-Value pair for this iterator. If hasTop returns true,
both of these methods should return non-null objects. If hasTop returns false, it is undefined
what these methods should return. Like hasTop, multiple calls to these methods should not alter
the state of the Iterator.

Users should take caution when either

caching the Key/Value from getTopKey/getTopValue, for use after calling next on the source iterator.
In this case, the cached Key/Value object is aliased to the reference returned by the source iterator.
Iterators may reuse the same Key/Value object in a next call for performance reasons, changing the data
that the cached Key/Value object references and resulting in a logic bug.

modifying the Key/Value from getTopKey/getTopValue. If the source iterator reuses data stored in the Key/Value,
then the source iterator may use the modified data that the Key/Value references. This may/may not result in a logic bug.

In both cases, copying the Key/Value’s data into a new object ensures iterator correctness. If neither case applies,
it is safe to not copy the Key/Value. The general guideline is to be aware of who else may use Key/Value objects
returned from getTopKey/getTopValue.

7.2.6. deepCopy

The deepCopy method is similar to the clone method from the Java Cloneable interface.
Implementations of this method should return a new object of the same type as the Accumulo Iterator
instance it was called on. Any internal state from the instance deepCopy was called
on should be carried over to the returned copy. The returned copy should be ready to have
seek called on it. The SortedKeyValueIterator interface guarantees that init will be called on
an iterator before deepCopy and that init will not be called on the iterator returned by
deepCopy.

Typically, implementations of deepCopy call a copy-constructor which will initialize
internal data structures. As with seek, it is common for the IteratorEnvironment
argument to be ignored as most Iterator implementations can be written without the explicit
information the environment provides.

In the analogy of a series of Iterators representing a tree, deepCopy can be thought of as
early programming assignments which implement their own tree data structures. deepCopy calls
copy on its sources (the children), copies itself, attaches the copies of the children, and
then returns itself.

7.3. TabletServer invocation of Iterators

The following code is a general outline for how TabletServers invoke Iterators.

7.4. Isolation

Accumulo provides a feature which clients can enable to prevent the viewing of partially
applied mutations within the context of rows. If a client is submitting multiple column
updates to rows at a time, isolation would ensure that a client would either see all of
updates made to that row or none of the updates (until they are all applied).

When using Isolation, there are additional concerns in iterator design. A scan time iterator in accumulo
reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a
key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources
may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user enables
Isolation, this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the
next possible row.

7.5. Abstract Iterators

A number of Abstract implementations of Iterators are provided to allow for faster creation
of common patterns. The most commonly used abstract implementations are the Filter and
Combiner classes. When possible these classes should be used instead as they have been
thoroughly tested inside Accumulo itself.

7.5.1. Filter

The Filter abstract Iterator provides a very simple implementation which allows implementations
to define whether or not a Key-Value pair should be returned via an accept(Key, Value) method.

Filters are extremely simple to implement; however, when the implementation is filtering a
large percentage of Key-Value pairs with respect to the total number of pairs examined,
it can be very inefficient. For example, if a Filter implementation can determine after examining
part of the row that no other pairs in this row will be accepted, there is no mechanism to
efficiently skip the remaining Key-Value pairs. Concretely, take a row which is comprised of
1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is determined
that no other Key-Value pairs in this row will be accepted. The Filter must still examine each
remaining 990 Key-Value pairs in this row. Another way to express this deficiency is that
Filters have no means to leverage the seek method to efficiently skip large portions
of Key-Value pairs.

As such, the Filter class functions well for filtering small amounts of data, but is
inefficient for filtering large amounts of data. The decision to use a Filter strongly
depends on the use case and distribution of data being filtered.

7.5.2. Combiner

The Combiner class is another common abstract Iterator. Similar to the Combiner interface
define in Hadoop’s MapReduce framework, implementations of this abstract class reduce
multiple Values for different versions of a Key (Keys which only differ by timestamps) into one Key-Value pair.
Combiners provide a simple way to implement common operations like summation and
aggregation without the need to implement the entire Accumulo Iterator interface.

One important consideration when choosing to design a Combiner is that the "reduction" operation
is often best represented when it is associative and commutative. Operations which do not meet
these criteria can be implemented; however, the implementation can be difficult.

A second consideration is that a Combiner is not guaranteed to see every Key-Value pair
which differ only by timestamp every time it is invoked. For example, if there are 5 Key-Value
pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is not guaranteed that
every invocation of the Combiner will see 5 timestamps. One invocation might see the Values for
Keys with timestamp 1 and 4, while another invocation might see the Values for Keys with the
timestamps 1, 2, 4 and 5.

Finally, when configuring an Accumulo table to use a Combiner, be sure to disable the Versioning Iterator or set the
Combiner at a priority less than the Combiner (the Versioning Iterator is added at a priority of 20 by default). The
Versioning Iterator will filter out multiple Key-Value pairs that differ only by timestamp and return only the Key-Value
pair that has the largest timestamp.

7.6. Best practices

Because of the flexibility that the SortedKeyValueInterface provides, it doesn’t directly disallow
many implementations which are poor design decisions. The following are some common recommendations to
follow and pitfalls to avoid in Iterator implementations.

7.6.1. Avoid special logic encoded in Ranges

Commonly, granular Ranges that a client passes to an Iterator from a Scanner or BatchScanner are unmodified.
If a Range falls within the boundaries of a Tablet, an Iterator will often see that same Range in the
seek method. However, there is no guarantee that the Range will remain unaltered from client to server. As such, Iterators
should never make assumptions about the current state/context based on the Range.

The common failure condition is referred to as a "re-seek". In the context of a Scan, TabletServers construct the
"stack" of Iterators and batch up Key-Value pairs to send back to the client. When a sufficient number of Key-Value
pairs are collected, it is common for the Iterators to be "torn down" until the client asks for the next batch of
Key-Value pairs. This is done by the TabletServer to add fairness in ensuring one Scan does not monopolize the available
resources. When the client asks for the next batch, the implementation modifies the original Range so that servers know
the point to resume the iteration (to avoid returning duplicate Key-Value pairs). Specifically, the new Range is created
from the original but is shortened by setting the startKey of the original Range to the Key last returned by the Scan,
non-inclusive.

7.6.2. seek'ing backwards

The ability for an Iterator to "skip over" large blocks of Key-Value pairs is a major tenet behind Iterators.
By seek'ing when it is known that there is a collection of Key-Value pairs which can be ignored can
greatly increase the speed of a scan as many Key-Value pairs do not have to be deserialized and processed.

While the seek method provides the Range that should be used to seek the underlying source Iterator,
there is no guarantee that the implementing Iterator uses that Range to perform the seek on its
"source" Iterator. As such, it is possible to seek to any Range and the interface has no assertions
to prevent this from happening.

Since Iterators are allowed to seek to arbitrary Keys, it also allows Iterators to create infinite loops
inside Scans that will repeatedly read the same data without end. If an arbitrary Range is constructed, it should
construct a completely new Range as it allows for bugs to be introduced which will break Accumulo.

Thus, seek's should always be thought of as making "forward progress" in the view of the total iteration. The
startKey of a Range should always be greater than the current Key seen by the Iterator while the endKey of the
Range should always retain the original endKey (and endKey inclusivity) of the last Range seen by your
Iterator’s implementation of seek.

7.6.3. Take caution in constructing new data in an Iterator

Implementations of Iterator might be tempted to open BatchWriters inside of an Iterator as a means
to implement triggers for writing additional data outside of their client application. The lifecycle of an Iterator
is not managed in such a way that guarantees that this is safe nor efficient. Specifically, there
is no way to guarantee that the internal ThreadPool inside of the BatchWriter is closed (and the thread(s)
are reaped) without calling the close() method. close'ing and recreating a BatchWriter after every
Key-Value pair is also prohibitively performance limiting to be considered an option.

The only safe way to generate additional data in an Iterator is to alter the current Key-Value pair.
For example, the WholeRowIterator serializes the all of the Key-Values pairs that fall within each
row. A safe way to generate more data in an Iterator would be to construct an Iterator that is
"higher" (at a larger priority) than the WholeRowIterator, that is, the Iterator receives the Key-Value pairs which are
a serialization of many Key-Value pairs. The custom Iterator could deserialize the pairs, compute
some function, and add a new Key-Value pair to the original collection, re-serializing the collection
of Key-Value pairs back into a single Key-Value pair.

Any other situation is likely not guaranteed to ensure that the caller (a Scan or a Compaction) will
always see all intended data that is generated.

7.7. Final things to remember

Some simple recommendations/points to keep in mind:

7.7.1. Method call order

On an instance of an Iterator: init is always called before seek, seek is always called before hasTop,
getTopKey and getTopValue will not be called if hasTop returns false.

7.7.2. Teardown

As mentioned, instance of Iterators may be torn down inside of the server transparently. When a complex
collection of iterators is performing some advanced functionality, they will not be torn down until a Key-Value
pair is returned out of the "stack" of Iterators (and added into the batch of Key-Values to be returned
to the caller). Being torn-down is equivalent to a new instance of the Iterator being creating and deepCopy
being called on the new instance with the old instance provided as the argument to deepCopy. References
to the old instance are removed and the object is lazily garbage collected by the JVM.

7.8. Compaction-time Iterators

When Iterators are configured to run during compactions, at the minc or majc scope, these Iterators sometimes need
to make different assertions than those who only operate at scan time. Iterators won’t see the delete entries; however,
Iterators will not necessarily see all of the Key-Value pairs in ever invocation. Because compactions often do not rewrite
all files (only a subset of them), it is possible that the logic take this into consideration.

For example, a Combiner that runs over data at during compactions, might not see all of the values for a given Key. The
Combiner must recognize this and not perform any function that would be incorrect due
to the missing values.

8. Table Design

8.1. Basic Table

Since Accumulo tables are sorted by row ID, each table can be thought of as being
indexed by the row ID. Lookups performed by row ID can be executed quickly, by doing
a binary search, first across the tablets, and then within a tablet. Clients should
choose a row ID carefully in order to support their desired application. A simple rule
is to select a unique identifier as the row ID for each entity to be stored and assign
all the other attributes to be tracked to be columns under this row ID. For example,
if we have the following data in a comma-separated file:

userid,age,address,account-balance

We might choose to store this data using the userid as the rowID, the column
name in the column family, and a blank column qualifier:

8.2. RowID Design

Often it is necessary to transform the rowID in order to have rows ordered in a way
that is optimal for anticipated access patterns. A good example of this is reversing
the order of components of internet domain names in order to group rows of the
same parent domain together:

Some data may result in the creation of very large rows - rows with many columns.
In this case the table designer may wish to split up these rows for better load
balancing while keeping them sorted together for scanning purposes. This can be
done by appending a random substring at the end of the row:

Appending dates provides the additional capability of restricting a scan to a given
date range.

8.3. Lexicoders

Since Keys in Accumulo are sorted lexicographically by default, it’s often useful to encode
common data types into a byte format in which their sort order corresponds to the sort order
in their native form. An example of this is encoding dates and numerical data so that they can
be better seeked or searched in ranges.

The lexicoders are a standard and extensible way of encoding Java types. Here’s an example
of a lexicoder that encodes a java Date object so that it sorts lexicographically:

8.4. Indexing

In order to support lookups via more than one attribute of an entity, additional
indexes can be built. However, because Accumulo tables can support any number of
columns without specifying them beforehand, a single additional index will often
suffice for supporting lookups of records in the main table. Here, the index has, as
the rowID, the Value or Term from the main table, the column families are the same,
and the column qualifier of the index table contains the rowID from the main table.

RowID

Column Family

Column Qualifier

Value

Term

Field Name

MainRowID

Note: We store rowIDs in the column qualifier rather than the Value so that we can
have more than one rowID associated with a particular term within the index. If we
stored this in the Value we would only see one of the rows in which the value
appears since Accumulo is configured by default to return the one most recent
value associated with a key.

Lookups can then be done by scanning the Index Table first for occurrences of the
desired values in the columns specified, which returns a list of row ID from the main
table. These can then be used to retrieve each matching record, in their entirety, or a
subset of their columns, from the Main Table.

To support efficient lookups of multiple rowIDs from the same table, the Accumulo
client library provides a BatchScanner. Users specify a set of Ranges to the
BatchScanner, which performs the lookups in multiple threads to multiple servers
and returns an Iterator over all the rows retrieved. The rows returned are NOT in
sorted order, as is the case with the basic Scanner interface.

One advantage of the dynamic schema capabilities of Accumulo is that different
fields may be indexed into the same physical table. However, it may be necessary to
create different index tables if the terms must be formatted differently in order to
maintain proper sort order. For example, real numbers must be formatted
differently than their usual notation in order to be sorted correctly. In these cases,
usually one index per unique data type will suffice.

8.5. Entity-Attribute and Graph Tables

Accumulo is ideal for storing entities and their attributes, especially of the
attributes are sparse. It is often useful to join several datasets together on common
entities within the same table. This can allow for the representation of graphs,
including nodes, their attributes, and connections to other nodes.

Rather than storing individual events, Entity-Attribute or Graph tables store
aggregate information about the entities involved in the events and the
relationships between entities. This is often preferrable when single events aren’t
very useful and when a continuously updated summarization is desired.

The physical schema for an entity-attribute or graph table is as follows:

RowID

Column Family

Column Qualifier

Value

EntityID

Attribute Name

Attribute Value

Weight

EntityID

Edge Type

Related EntityID

Weight

For example, to keep track of employees, managers and products the following
entity-attribute table could be used. Note that the weights are not always necessary
and are set to 0 when not used.

RowID

Column Family

Column Qualifier

Value

E001

name

bob

0

E001

department

sales

0

E001

hire_date

20030102

0

E001

units_sold

P001

780

E002

name

george

0

E002

department

sales

0

E002

manager_of

E001

0

E002

manager_of

E003

0

E003

name

harry

0

E003

department

accounts_recv

0

E003

hire_date

20000405

0

E003

units_sold

P002

566

E003

units_sold

P001

232

P001

product_name

nike_airs

0

P001

product_type

shoe

0

P001

in_stock

germany

900

P001

in_stock

brazil

200

P002

product_name

basic_jacket

0

P002

product_type

clothing

0

P002

in_stock

usa

3454

P002

in_stock

germany

700

To allow efficient updating of edge weights, an aggregating iterator can be
configured to add the value of all mutations applied with the same key. These types
of tables can easily be created from raw events by simply extracting the entities,
attributes, and relationships from individual events and inserting the keys into
Accumulo each with a count of 1. The aggregating iterator will take care of
maintaining the edge weights.

8.6. Document-Partitioned Indexing

Using a simple index as described above works well when looking for records that
match one of a set of given criteria. When looking for records that match more than
one criterion simultaneously, such as when looking for documents that contain all of
the words ‘the’ and ‘white’ and ‘house’, there are several issues.

First is that the set of all records matching any one of the search terms must be sent
to the client, which incurs a lot of network traffic. The second problem is that the
client is responsible for performing set intersection on the sets of records returned
to eliminate all but the records matching all search terms. The memory of the client
may easily be overwhelmed during this operation.

For these reasons Accumulo includes support for a scheme known as sharded
indexing, in which these set operations can be performed at the TabletServers and
decisions about which records to include in the result set can be made without
incurring network traffic.

This is accomplished via partitioning records into bins that each reside on at most
one TabletServer, and then creating an index of terms per record within each bin as
follows:

RowID

Column Family

Column Qualifier

Value

BinID

Term

DocID

Weight

Documents or records are mapped into bins by a user-defined ingest application. By
storing the BinID as the RowID we ensure that all the information for a particular
bin is contained in a single tablet and hosted on a single TabletServer since
Accumulo never splits rows across tablets. Storing the Terms as column families
serves to enable fast lookups of all the documents within this bin that contain the
given term.

Finally, we perform set intersection operations on the TabletServer via a special
iterator called the Intersecting Iterator. Since documents are partitioned into many
bins, a search of all documents must search every bin. We can use the BatchScanner
to scan all bins in parallel. The Intersecting Iterator should be enabled on a
BatchScanner within user query code as follows:

This code effectively has the BatchScanner scan all tablets of a table, looking for
documents that match all the given terms. Because all tablets are being scanned for
every query, each query is more expensive than other Accumulo scans, which
typically involve a small number of TabletServers. This reduces the number of
concurrent queries supported and is subject to what is known as the ‘straggler’
problem in which every query runs as slow as the slowest server participating.

Of course, fast servers will return their results to the client which can display them
to the user immediately while they wait for the rest of the results to arrive. If the
results are unordered this is quite effective as the first results to arrive are as good
as any others to the user.

9. High-Speed Ingest

Accumulo is often used as part of a larger data processing and storage system. To
maximize the performance of a parallel system involving Accumulo, the ingestion
and query components should be designed to provide enough parallelism and
concurrency to avoid creating bottlenecks for users and other systems writing to
and reading from Accumulo. There are several ways to achieve high ingest
performance.

9.1. Pre-Splitting New Tables

New tables consist of a single tablet by default. As mutations are applied, the table
grows and splits into multiple tablets which are balanced by the Master across
TabletServers. This implies that the aggregate ingest rate will be limited to fewer
servers than are available within the cluster until the table has reached the point
where there are tablets on every TabletServer.

Pre-splitting a table ensures that there are as many tablets as desired available
before ingest begins to take advantage of all the parallelism possible with the cluster
hardware. Tables can be split at any time by using the shell:

user@myinstance mytable> addsplits -sf /local_splitfile -t mytable

For the purposes of providing parallelism to ingest it is not necessary to create more
tablets than there are physical machines within the cluster as the aggregate ingest
rate is a function of the number of physical machines. Note that the aggregate ingest
rate is still subject to the number of machines running ingest clients, and the
distribution of rowIDs across the table. The aggregation ingest rate will be
suboptimal if there are many inserts into a small number of rowIDs.

9.2. Multiple Ingester Clients

Accumulo is capable of scaling to very high rates of ingest, which is dependent upon
not just the number of TabletServers in operation but also the number of ingest
clients. This is because a single client, while capable of batching mutations and
sending them to all TabletServers, is ultimately limited by the amount of data that
can be processed on a single machine. The aggregate ingest rate will scale linearly
with the number of clients up to the point at which either the aggregate I/O of
TabletServers or total network bandwidth capacity is reached.

In operational settings where high rates of ingest are paramount, clusters are often
configured to dedicate some number of machines solely to running Ingester Clients.
The exact ratio of clients to TabletServers necessary for optimum ingestion rates
will vary according to the distribution of resources per machine and by data type.

9.3. Bulk Ingest

Accumulo supports the ability to import files produced by an external process such
as MapReduce into an existing table. In some cases it may be faster to load data this
way rather than via ingesting through clients using BatchWriters. This allows a large
number of machines to format data the way Accumulo expects. The new files can
then simply be introduced to Accumulo via a shell command.

To configure MapReduce to format data in preparation for bulk loading, the job
should be set to use a range partitioner instead of the default hash partitioner. The
range partitioner uses the split points of the Accumulo table that will receive the
data. The split points can be obtained from the shell and used by the MapReduce
RangePartitioner. Note that this is only useful if the existing table is already split
into multiple tablets.

user@myinstance mytable> getsplits
aa
ab
ac
...
zx
zy
zz

Run the MapReduce job, using the AccumuloFileOutputFormat to create the files to
be introduced to Accumulo. Once this is complete, the files can be added to
Accumulo via the shell:

user@myinstance mytable> importdirectory /files_dir /failures

Note that the paths referenced are directories within the same HDFS instance over
which Accumulo is running. Accumulo places any files that failed to be added to the
second directory specified.

A complete example of using Bulk Ingest can be found at
accumulo/docs/examples/README.bulkIngest.

9.4. Logical Time for Bulk Ingest

Logical time is important for bulk imported data, for which the client code may
be choosing a timestamp. At bulk import time, the user can choose to enable
logical time for the set of files being imported. When its enabled, Accumulo
uses a specialized system iterator to lazily set times in a bulk imported file.
This mechanism guarantees that times set by unsynchronized multi-node
applications (such as those running on MapReduce) will maintain some semblance
of causal ordering. This mitigates the problem of the time being wrong on the
system that created the file for bulk import. These times are not set when the
file is imported, but whenever it is read by scans or compactions. At import, a
time is obtained and always used by the specialized system iterator to set that
time.

The timestamp assigned by Accumulo will be the same for every key in the file.
This could cause problems if the file contains multiple keys that are identical
except for the timestamp. In this case, the sort order of the keys will be
undefined. This could occur if an insert and an update were in the same bulk
import file.

9.5. MapReduce Ingest

It is possible to efficiently write many mutations to Accumulo in parallel via a
MapReduce job. In this scenario the MapReduce is written to process data that lives
in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See
the MapReduce section under Analytics for details.

An example of using MapReduce can be found under
accumulo/docs/examples/README.mapred.

10. Analytics

Accumulo supports more advanced data processing than simply keeping keys
sorted and performing efficient lookups. Analytics can be developed by using
MapReduce and Iterators in conjunction with Accumulo tables.

10.1. MapReduce

Accumulo tables can be used as the source and destination of MapReduce jobs. To
use an Accumulo table with a MapReduce job (specifically with the new Hadoop API
as of version 0.20), configure the job parameters to use the AccumuloInputFormat
and AccumuloOutputFormat. Accumulo specific parameters can be set via these
two format classes to do the following:

Authenticate and provide user credentials for the input

Restrict the scan to a range of rows

Restrict the input to a subset of available columns

10.1.1. Mapper and Reducer classes

To read from an Accumulo table create a Mapper with the following class
parameterization and be sure to configure the AccumuloInputFormat.

To write to an Accumulo table, create a Reducer with the following class
parameterization and be sure to configure the AccumuloOutputFormat. The key
emitted from the Reducer identifies the table to which the mutation is sent. This
allows a single Reducer to write to more than one table if desired. A default table
can be configured using the AccumuloOutputFormat, in which case the output table
name does not have to be passed to the Context object within the Reducer.

The Text object passed as the output should contain the name of the table to which
this mutation should be applied. The Text can be null in which case the mutation
will be applied to the default table name specified in the AccumuloOutputFormat
options.

ArrayList<Pair<Text,Text>> tableOneColumns = new ArrayList<Pair<Text,Text>>();
ArrayList<Pair<Text,Text>> tableTwoColumns = new ArrayList<Pair<Text,Text>>();
// populate lists of columns for each of the tables ...
tableOneConfig.fetchColumns(tableOneColumns);
tableTwoConfig.fetchColumns(tableTwoColumns);

To set scan iterators:

List<IteratorSetting> tableOneIterators = new ArrayList<IteratorSetting>();
List<IteratorSetting> tableTwoIterators = new ArrayList<IteratorSetting>();
// populate the lists of iterator settings for each of the tables ...
tableOneConfig.setIterators(tableOneIterators);
tableTwoConfig.setIterators(tableTwoIterators);

An example of using MapReduce with Accumulo can be found at
accumulo/docs/examples/README.mapred.

10.2. Combiners

Many applications can benefit from the ability to aggregate values across common
keys. This can be done via Combiner iterators and is similar to the Reduce step in
MapReduce. This provides the ability to define online, incrementally updated
analytics without the overhead or latency associated with batch-oriented
MapReduce jobs.

All that is needed to aggregate values of a table is to identify the fields over which
values will be grouped, insert mutations with those fields as the key, and configure
the table with a combining iterator that supports the summarizing operation
desired.

The only restriction on an combining iterator is that the combiner developer
should not assume that all values for a given key have been seen, since new
mutations can be inserted at anytime. This precludes using the total number of
values in the aggregation such as when calculating an average, for example.

10.2.1. Feature Vectors

An interesting use of combining iterators within an Accumulo table is to store
feature vectors for use in machine learning algorithms. For example, many
algorithms such as k-means clustering, support vector machines, anomaly detection,
etc. use the concept of a feature vector and the calculation of distance metrics to
learn a particular model. The columns in an Accumulo table can be used to efficiently
store sparse features and their weights to be incrementally updated via the use of an
combining iterator.

10.3. Statistical Modeling

Statistical models that need to be updated by many machines in parallel could be
similarly stored within an Accumulo table. For example, a MapReduce job that is
iteratively updating a global statistical model could have each map or reduce worker
reference the parts of the model to be read and updated through an embedded
Accumulo client.

Using Accumulo this way enables efficient and fast lookups and updates of small
pieces of information in a random access pattern, which is complementary to
MapReduce’s sequential access model.

11. Security

Accumulo extends the BigTable data model to implement a security mechanism
known as cell-level security. Every key-value pair has its own security label, stored
under the column visibility element of the key, which is used to determine whether
a given user meets the security requirements to read the value. This enables data of
various security levels to be stored within the same row, and users of varying
degrees of access to query the same table, while preserving data confidentiality.

11.1. Security Label Expressions

When mutations are applied, users can specify a security label for each value. This is
done as the Mutation is created by passing a ColumnVisibility object to the put()
method:

11.2. Security Label Expression Syntax

Security labels consist of a set of user-defined tokens that are required to read the
value the label is associated with. The set of tokens required can be specified using
syntax that supports logical AND & and OR | combinations of terms, as
well as nesting groups () of terms together.

Each term is comprised of one to many alpha-numeric characters, hyphens, underscores or
periods. Optionally, each term may be wrapped in quotation marks
which removes the restriction on valid characters. In quoted terms, quotation marks
and backslash characters can be used as characters in the term by escaping them
with a backslash.

For example, suppose within our organization we want to label our data values with
security labels defined in terms of user roles. We might have tokens such as:

admin
audit
system

These can be specified alone or combined using logical operators:

// Users must have admin privileges
admin
// Users must have admin and audit privileges
admin&audit
// Users with either admin or audit privileges
admin|audit
// Users must have audit and one or both of admin or system
(admin|system)&audit

When both | and & operators are used, parentheses must be used to specify
precedence of the operators.

11.3. Authorization

When clients attempt to read data from Accumulo, any security labels present are
examined against the set of authorizations passed by the client code when the
Scanner or BatchScanner are created. If the authorizations are determined to be
insufficient to satisfy the security label, the value is suppressed from the set of
results sent back to the client.

Authorizations are specified as a comma-separated list of tokens the user possesses:

11.4. User Authorizations

Each Accumulo user has a set of associated security labels. To manipulate
these in the shell while using the default authorizor, use the setuaths and getauths commands.
These may also be modified for the default authorizor using the java security operations API.

When a user creates a scanner a set of Authorizations is passed. If the
authorizations passed to the scanner are not a subset of the users
authorizations, then an exception will be thrown.

To prevent users from writing data they can not read, add the visibility
constraint to a table. Use the -evc option in the createtable shell command to
enable this constraint. For existing tables use the following shell command to
enable the visibility constraint. Ensure the constraint number does not
conflict with any existing constraints.

Any user with the alter table permission can add or remove this constraint.
This constraint is not applied to bulk imported data, if this a concern then
disable the bulk import permission.

11.5. Pluggable Security

New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken into three actions — authentication, authorization, and permission handling. By default all of these are handled in
Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is worth noting at this
point, that it is a new feature in 1.5 and may be adjusted in future releases without the standard
deprecation cycle.

Authentication simply handles the ability for a user to verify their integrity. A combination of
principal and authentication token are used to verify a user is who they say they are. An
authentication token should be constructed, either directly through its constructor, but it is
advised to use the init(Property) method to populate an authentication token. It is expected that a
user knows what the appropriate token to use for their system is. The default token is
PasswordToken.

Once a user is authenticated by the Authenticator, the user has access to the other actions within
Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by the Permission
Handler. This is what manages all of the permissions, which are divided in system and per table
level. From there, if a user is doing an action which requires authorizations, the Authorizor is
queried to determine what authorizations the user has.

This setup allows a variety of different mechanisms to be used for handling different aspects of
Accumulo’s security. A system like Kerberos can be used for authentication, then a system like LDAP
could be used to determine if a user has a specific permission, and then it may default back to the
default ZookeeperAuthorizor to determine what Authorizations a user is ultimately allowed to use.
This is a pluggable system so custom components can be created depending on your need.

11.6. Secure Authorizations Handling

For applications serving many users, it is not expected that an Accumulo user
will be created for each application user. In this case an Accumulo user with
all authorizations needed by any of the applications users must be created. To
service queries, the application should create a scanner with the application
user’s authorizations. These authorizations could be obtained from a trusted 3rd
party.

Often production systems will integrate with Public-Key Infrastructure (PKI) and
designate client code within the query layer to negotiate with PKI servers in order
to authenticate users and retrieve their authorization tokens (credentials). This
requires users to specify only the information necessary to authenticate themselves
to the system. Once user identity is established, their credentials can be accessed by
the client code and passed to Accumulo outside of the reach of the user.

11.7. Query Services Layer

Since the primary method of interaction with Accumulo is through the Java API,
production environments often call for the implementation of a Query layer. This
can be done using web services in containers such as Apache Tomcat, but is not a
requirement. The Query Services Layer provides a mechanism for providing a
platform on which user facing applications can be built. This allows the application
designers to isolate potentially complex query logic, and enables a convenient point
at which to perform essential security functions.

Several production environments choose to implement authentication at this layer,
where users identifiers are used to retrieve their access credentials which are then
cached within the query layer and presented to Accumulo through the
Authorizations mechanism.

12. Replication

12.1. Overview

Replication is a feature of Accumulo which provides a mechanism to automatically
copy data to other systems, typically for the purpose of disaster recovery,
high availability, or geographic locality. It is best to consider this feature
as a framework for automatic replication instead of the ability to copy data
from to another Accumulo instance as copying to another Accumulo cluster is
only an implementation detail. The local Accumulo cluster is hereby referred
to as the primary while systems being replicated to are known as
peers.

This replication framework makes two Accumulo instances, where one instance
replicates to another, eventually consistent between one another, as opposed
to the strong consistency that each single Accumulo instance still holds. That
is to say, attempts to read data from a table on a peer which has pending replication
from the primary will not wait for that data to be replicated before running the scan.
This is desirable for a number of reasons, the most important is that the replication
framework is not limited by network outages or offline peers, but only by the HDFS
space available on the primary system.

Replication configurations can be considered as a directed graph which allows cycles.
The systems in which data was replicated from is maintained in each Mutation which
allow each system to determine if a peer has already has the data in which
the system wants to send.

Data is replicated by using the Write-Ahead logs (WAL) that each TabletServer is
already maintaining. TabletServers records which WALs have data that need to be
replicated to the accumulo.metadata table. The Master uses these records,
combined with the local Accumulo table that the WAL was used with, to create records
in the replication table which track which peers the given WAL should be
replicated to. The Master latter uses these work entries to assign the actual
replication task to a local TabletServer using ZooKeeper. A TabletServer will get
a lock in ZooKeeper for the replication of this file to a peer, and proceed to
replicate to the peer, recording progress in the replication table as
data is successfully replicated on the peer. Later, the Master and Garbage Collector
will remove records from the accumulo.metadata and replication tables
and files from HDFS, respectively, after replication to all peers is complete.

12.2. Configuration

Configuration of Accumulo to replicate data to another system can be categorized
into the following sections.

12.2.1. Site Configuration

Each system involved in replication (even the primary) needs a name that uniquely
identifies it across all peers in the replication graph. This should be considered
fixed for an instance, and set in accumulo-site.xml.

<property>
<name>replication.name</name>
<value>primary</value>
<description>Unique name for this system used by replication</description>
</property>

12.2.2. Instance Configuration

For each peer of this system, Accumulo needs to know the name of that peer,
the class used to replicate data to that system and some configuration information
to connect to this remote peer. In the case of Accumulo, this additional data
is the Accumulo instance name and ZooKeeper quorum; however, this varies on the
replication implementation for the peer.

These can be set in the site configuration to ease deployments; however, as they may
change, it can be useful to set this information using the Accumulo shell.

To configure a peer with the name peer1 which is an Accumulo system with an instance name of accumulo_peer
and a ZooKeeper quorum of 10.0.0.1,10.0.2.1,10.0.3.1, invoke the following
command in the shell.

Since this is an Accumulo system, we also want to set a username and password
to use when authenticating with this peer. On our peer, we make a special user
which has permission to write to the tables we want to replicate data into, "replication"
with a password of "password". We then need to record this in the primary’s configuration.

Alternatively, when configuring replication on Accumulo running Kerberos, a keytab
file per peer can be configured instead of a password. The provided keytabs must be readable
by the unix user running Accumulo. They keytab for a peer can be unique from the
keytab used by Accumulo or any keytabs for other peers.

12.2.3. Table Configuration

Now, we presently have a peer defined, so we just need to configure which tables will
replicate to that peer. We also need to configure an identifier to determine where
this data will be replicated on the peer. Since we’re replicating to another Accumulo
cluster, this is a table ID. In this example, we want to enable replication on
my_table and configure our peer accumulo_peer as a target, sending
the data to the table with an ID of 2 in accumulo_peer.

To replicate a single table on the primary to multiple peers, the second command
in the above shell snippet can be issued, for each peer and remote identifier pair.

12.3. Monitoring

Basic information about replication status from a primary can be found on the Accumulo
Monitor server, using the Replication link the sidebar.

On this page, information is broken down into the following sections:

Files pending replication by peer and target

Files queued for replication, with progress made

12.4. Work Assignment

Depending on the schema of a table, different implementations of the WorkAssigner used could
be configured. The implementation is controlled via the property replication.work.assigner
and the full class name for the implementation. This can be configured via the shell or
accumulo-site.xml.

<property>
<name>replication.work.assigner</name>
<value>org.apache.accumulo.master.replication.SequentialWorkAssigner</value>
<description>Implementation used to assign work for replication</description>
</property>

Two implementations are provided. By default, the SequentialWorkAssigner is configured for an
instance. The SequentialWorkAssigner ensures that, per peer and each remote identifier, each WAL is
replicated in the order in which they were created. This is sufficient to ensure that updates to a table
will be replayed in the correct order on the peer. This implementation has the downside of only replicating
a single WAL at a time.

The second implementation, the UnorderedWorkAssigner can be used to overcome the limitation
of only a single WAL being replicated to a target and peer at any time. Depending on the table schema,
it’s possible that multiple versions of the same Key with different values are infrequent or nonexistent.
In this case, parallel replication to a peer and target is possible without any downsides. In the case
where this implementation is used were column updates are frequent, it is possible that there will be
an inconsistency between the primary and the peer.

12.5. ReplicaSystems

ReplicaSystem is the interface which allows abstraction of replication of data
to peers of various types. Presently, only an AccumuloReplicaSystem is provided
which will replicate data to another Accumulo instance. A ReplicaSystem implementation
is run inside of the TabletServer process, and can be configured as mentioned in the
Instance Configuration section of this document. Theoretically, an implementation
of this interface could send data to other filesystems, databases, etc.

12.5.1. AccumuloReplicaSystem

The AccumuloReplicaSystem uses Thrift to communicate with a peer Accumulo instance
and replicate the necessary data. The TabletServer running on the primary will communicate
with the Master on the peer to request the address of a TabletServer on the peer which
this TabletServer will use to replicate the data.

The TabletServer on the primary will then replicate data in batches of a configurable
size (replication.max.unit.size). The TabletServer on the peer will report how many
records were applied back to the primary, which will be used to record how many records
were successfully replicated. The TabletServer on the primary will continue to replicate
data in these batches until no more data can be read from the file.

12.6. Other Configuration

There are a number of configuration values that can be used to control how
the implementation of various components operate.

Property

Description

Default

replication.max.work.queue

Maximum number of files queued for replication at one time

1000

replication.work.assignment.sleep

Time between invocations of the WorkAssigner

30s

replication.worker.threads

Size of threadpool used to replicate data to peers

4

replication.receipt.service.port

Thrift service port to listen for replication requests, can use 0 for a random port

10002

replication.work.attempts

Number of attempts to replicate to a peer before aborting the attempt

10

replication.receiver.min.threads

Minimum number of idle threads for handling incoming replication

1

replication.receiver.threadcheck.time

Time between attempting adjustments of thread pool for incoming replications

30s

replication.max.unit.size

Maximum amount of data to be replicated in one RPC

64M

replication.work.assigner

Work Assigner implementation

org.apache.accumulo.master.replication.SequentialWorkAssigner

tserver.replication.batchwriter.replayer.memory

Size of BatchWriter cache to use in applying replication requests

50M

12.7. Example Practical Configuration

A real-life example is now provided to give concrete application of replication configuration. This
example is a two instance Accumulo system, one primary system and one peer system. They are called
primary and peer, respectively. Each system also have a table of the same name, "my_table". The instance
name for each is also the same (primary and peer), and both have ZooKeeper hosts on a node with a hostname
with that name as well (primary:2181 and peer:2181).

We want to configure these systems so that "my_table" on "primary" replicates to "my_table" on "peer".

12.7.1. conf/accumulo-site.xml

We can assign the "unique" name that identifies this Accumulo instance among all others that might participate
in replication together. In this example, we will use the names provided in the description.

Peer

12.7.2. conf/masters and conf/slaves

Be sure to use non-local IP addresses. Other nodes need to connect to it and using localhost will likely result in
a local node talking to another local node.

12.7.3. Start both instances

The rest of the configuration is dynamic and is best configured on the fly (in ZooKeeper) than in accumulo-site.xml.

12.7.4. Peer

The next series of command are to be run on the peer system. Create a user account for the primary instance called
"peer". The password for this account will need to be saved in the configuration on the primary

Remember what the table ID for my_table is. You’ll need that to configured the primary instance.

12.7.5. Primary

Next, configure the primary instance.

Set up the table

root@primary> createtable my_table

Define the Peer as a replication peer to the Primary

We’re defining the instance with replication.name of peer as a peer. We provide the implementation of ReplicaSystem
that we want to use, and the configuration for the AccumuloReplicaSystem. In this case, the configuration is the Accumulo
Instance name for peer and the ZooKeeper quorum string. The configuration key is of the form
"replication.peer.$peer_name".

Set the authentication credentials

We want to use that special username and password that we created on the peer, so we have a means to write data to
the table that we want to replicate to. The configuration key is of the form "replication.peer.user.$peer_name".

Enable replication on the table

Now that we have defined the peer on the primary and provided the authentication credentials, we need to configure
our table with the implementation of ReplicaSystem we want to use to replicate to the peer. In this case, our peer
is an Accumulo instance, so we want to use the AccumuloReplicaSystem.

The configuration for the AccumuloReplicaSystem is the table ID for the table on the peer instance that we
want to replicate into. Be sure to use the correct value for $peer_table_id. The configuration key is of
the form "table.replication.target.$peer_name".

12.8. Extra considerations for use

While this feature is intended for general-purpose use, its implementation does carry some baggage. Like any software,
replication is a feature that operates well within some set of use cases but is not meant to support all use cases.
For the benefit of the users, we can enumerate these cases.

12.8.1. Latency

As previously mentioned, the replication feature uses the Write-Ahead Log files for a number of reasons, one of which
is to prevent the need for data to be written to RFiles before it is available to be replicated. While this can help
reduce the latency for a batch of Mutations that have been written to Accumulo, the latency is at least seconds to tens
of seconds for replication once ingest is active. For a table which replication has just been enabled on, this is likely
to take a few minutes before replication will begin.

Once ingest is active and flowing into the system at a regular rate, replication should be occurring at a similar rate,
given sufficient computing resources. Replication attempts to copy data at a rate that is to be considered low latency
but is not a replacement for custom indexing code which can ensure near real-time referential integrity on secondary indexes.

12.8.2. Table-Configured Iterators

Accumulo Iterators tend to be a heavy hammer which can be used to solve a variety of problems. In general, it is highly
recommended that Iterators which are applied at major compaction time are both idempotent and associative due to the
non-determinism in which some set of files for a Tablet might be compacted. In practice, this translates to common patterns,
such as aggregation, which are implemented in a manner resilient to duplication (such as using a Set instead of a List).

Due to the asynchronous nature of replication and the expectation that hardware failures and network partitions will exist,
it is generally not recommended to not configure replication on a table which has Iterators set which are not idempotent.
While the replication implementation can make some simple assertions to try to avoid re-replication of data, it is not
presently guaranteed that all data will only be sent to a peer once. Data will be replicated at least once. Typically,
this is not a problem as the VersioningIterator will automaticaly deduplicate this over-replication because they will
have the same timestamp; however, certain Combiners may result in inaccurate aggregations.

As a concrete example, consider a table which has the SummingCombiner configured to sum all values for
multiple versions of the same Key. For some key, consider a set of numeric values that are written to a table on the
primary: [1, 2, 3]. On the primary, all of these are successfully written and thus the current value for the given key
would be 6, (1 + 2 + 3). Consider, however, that each of these updates to the peer were done independently (because
other data was also included in the write-ahead log that needed to be replicated). The update with a value of 1 was
successfully replicated, and then we attempted to replicate the update with a value of 2 but the remote server never
responded. The primary does not know whether the update with a value of 2 was actually applied or not, so the
only recourse is to re-send the update. After we receive confirmation that the update with a value of 2 was replicated,
we will then replicate the update with 3. If the peer did never apply the first update of 2, the summation is accurate.
If the update was applied but the acknowledgement was lost for some reason (system failure, network partition), the
update will be resent to the peer. Because addition is non-idempotent, we have created an inconsistency between the
primary and peer. As such, the SummingCombiner wouldn’t be recommended on a table being replicated.

While there are changes that could be made to the replication implementation which could attempt to mitigate this risk,
presently, it is not recommended to configure Iterators or Combiners which are not idempotent to support cases where
inaccuracy of aggregations is not acceptable.

12.8.3. Duplicate Keys

In Accumulo, when more than one key exists that are exactly the same, keys that are equal down to the timestamp,
the retained value is non-deterministic. Replication introduces another level of non-determinism in this case.
For a table that is being replicated and has multiple equal keys with different values inserted into it, the final
value in that table on the primary instance is not guaranteed to be the final value on all replicas.

For example, say the values that were inserted on the primary instance were value1 and value2 and the final
value was value1, it is not guaranteed that all replicas will have value1 like the primary. The final value is
non-deterministic for each instance.

As is the recommendation without replication enabled, if multiple values for the same key (sans timestamp) are written to
Accumulo, it is strongly recommended that the value in the timestamp properly reflects the intended version by
the client. That is to say, newer values inserted into the table should have larger timestamps. If the time between
writing updates to the same key is significant (order minutes), this concern can likely be ignored.

12.8.4. Bulk Imports

Currently, files that are bulk imported into a table configured for replication are not replicated. There is no
technical reason why it was not implemented, it was simply omitted from the initial implementation. This is considered a
fair limitation because bulk importing generated files multiple locations is much simpler than bifurcating "live" ingest
data into two instances. Given some existing bulk import process which creates files and them imports them into an
Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them into another Accumulo
instance using the same process. Hadoop’s distcp command provides an easy way to copy large amounts of data to another
HDFS instance which makes the problem of duplicating bulk imports very easy to solve.

13. Implementation Details

13.1. Fault-Tolerant Executor (FATE)

Accumulo must implement a number of distributed, multi-step operations to support
the client API. Creating a new table is a simple example of an atomic client call
which requires multiple steps in the implementation: get a unique table ID, configure
default table permissions, populate information in ZooKeeper to record the table’s
existence, create directories in HDFS for the table’s data, etc. Implementing these
steps in a way that is tolerant to node failure and other concurrent operations is
very difficult to achieve. Accumulo includes a Fault-Tolerant Executor (FATE) which
is widely used server-side to implement the client API safely and correctly.

FATE is the implementation detail which ensures that tables in creation when the
Master dies will be successfully created when another Master process is started.
This alleviates the need for any external tools to correct some bad state — Accumulo can
undo the failure and self-heal without any external intervention.

13.2. Overview

FATE consists of two primary components: a repeatable, persisted operation (REPO), a storage
layer for REPOs and an execution system to run REPOs. Accumulo uses ZooKeeper as the storage
layer for FATE and the Accumulo Master acts as the execution system to run REPOs.

The important characteristic of REPOs are that they implemented in a way that is idempotent:
every operation must be able to undo or replay a partial execution of itself. Requiring the
implementation of the operation to support this functional greatly simplifies the execution
of these operations. This property is also what guarantees safety in light of failure conditions.

13.3. Administration

Sometimes, it is useful to inspect the current FATE operations, both pending and executing.
For example, a command that is not completing could be blocked on the execution of another
operation. Accumulo provides an Accumulo shell command to interact with fate.

The fate shell command accepts a number of arguments for different functionality:
list/print, fail, delete.

13.3.1. List/Print

Without any additional arguments, this command will print all operations that still exist in
the FATE store (ZooKeeper). This will include active, pending, and completed operations (completed
operations are lazily removed from the store). Each operation includes a unique "transaction ID", the
state of the operation (e.g. NEW, IN_PROGRESS, FAILED), any locks the
transaction actively holds and any locks it is waiting to acquire.

This option can also accept transaction IDs which will restrict the list of transactions shown.

13.3.2. Fail

This command can be used to manually fail a FATE transaction and requires a transaction ID
as an argument. Failing an operation is not a normal procedure and should only be performed
by an administrator who understands the implications of why they are failing the operation.

13.3.3. Delete

This command requires a transaction ID and will delete any locks that the transaction
holds. Like the fail command, this command should only be used in extreme circumstances
by an administrator that understands the implications of the command they are about to
invoke. It is not normal to invoke this command.

14. SSL

Accumulo, through Thrift’s TSSLTransport, provides the ability to encrypt
wire communication between Accumulo servers and clients using secure
sockets layer (SSL). SSL certifcates signed by the same certificate authority
control the "circle of trust" in which a secure connection can be established.
Typically, each host running Accumulo processes would be given a certificate
which identifies itself.

Clients can optionally also be given a certificate, when client-auth is enabled,
which prevents unwanted clients from accessing the system. The SSL integration
presently provides no authentication support within Accumulo (an Accumulo username
and password are still required) and is only used to establish a means for
secure communication.

14.1. Server configuration

As previously mentioned, the circle of trust is established by the certificate
authority which created the certificates in use. Because of the tight coupling
of certificate generation with an organization’s policies, Accumulo does not
provide a method in which to automatically create the necessary SSL components.

Administrators without existing infrastructure built on SSL are encourage to
use OpenSSL and the keytool command. An example of these commands are
included in a section below. Accumulo servers require a certificate and keystore,
in the form of Java KeyStores, to enable SSL. The following configuration assumes
these files already exist.

In $ACCUMULO_CONF_DIR/accumulo-site.xml, the following properties are required:

rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server’s certificate

rpc.javax.net.ssl.keyStorePassword=The password for the keystore containing the server’s certificate

rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority’s public key

rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority’s public key

instance.rpc.ssl.enabled=true

Optionally, SSL client-authentication (two-way SSL) can also be enabled by setting
instance.rpc.ssl.clientAuth=true in $ACCUMULO_CONF_DIR/accumulo-site.xml.
This requires that each client has access to valid certificate to set up a secure connection
to the servers. By default, Accumulo uses one-way SSL which does not require clients to have
their own certificate.

14.2. Client configuration

To establish a connection to Accumulo servers, each client must also have
special configuration. This is typically accomplished through the use of
the client configuration file whose default location is ~/.accumulo/config.

The following properties must be set to connect to an Accumulo instance using SSL:

rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority’s public key

rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority’s public key

instance.rpc.ssl.enabled=true

If two-way SSL if enabled (instance.rpc.ssl.clientAuth=true) for the instance, the client must also define
their own certificate and enable client authenticate as well.

rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server’s certificate

rpc.javax.net.ssl.keyStorePassword=The password for the keystore containing the server’s certificate

instance.rpc.ssl.clientAuth=true

14.3. Generating SSL material using OpenSSL

The following is included as an example for generating your own SSL material (certificate authority and server/client
certificates) using OpenSSL and Java’s KeyTool command.

The server.jks file is the Java keystore containing the certificate for a given host. The above
methods are equivalent whether the certficate is generate for an Accumulo server or a client.

15. Kerberos

15.1. Overview

Kerberos is a network authentication protocol that provides a secure way for
peers to prove their identity over an unsecure network in a client-server model.
A centralized key-distribution center (KDC) is the service that coordinates
authentication between a client and a server. Clients and servers use "tickets",
obtained from the KDC via a password or a special file called a "keytab", to
communicate with the KDC and prove their identity. A KDC administrator must
create the principal (name for the client/server identiy) and the password
or keytab, securely passing the necessary information to the actual user/service.
Properly securing the KDC and generated ticket material is central to the security
model and is mentioned only as a warning to administrators running their own KDC.

To interact with Kerberos programmatically, GSSAPI and SASL are two standards
which allow cross-language integration with Kerberos for authentication. GSSAPI,
the generic security service application program interface, is a standard which
Kerberos implements. In the Java programming language, the language itself also implements
GSSAPI which is leveraged by other applications, like Apache Hadoop and Apache Thrift.
SASL, simple authentication and security layer, is a framework for authentication and
and security over the network. SASL provides a number of mechanisms for authentication,
one of which is GSSAPI. Thus, SASL provides the transport which authenticates
using GSSAPI that Kerberos implements.

Kerberos is a very complicated software application and is deserving of much
more description than can be provided here. An explain like
I’m 5 blog post is very good at distilling the basics, while MIT Kerberos’s project page
contains lots of documentation for users or administrators. Various Hadoop "vendors"
also provide free documentation that includes step-by-step instructions for
configuring Hadoop and ZooKeeper (which will be henceforth considered as prerequisites).

15.2. Within Hadoop

Out of the box, HDFS and YARN have no ability to enforce that a user is who
they claim they are. Thus, any basic Hadoop installation should be treated as
unsecure: any user with access to the cluster has the ability to access any data.
Using Kerberos to provide authentication, users can be strongly identified, delegating
to Kerberos to determine who a user is and enforce that a user is who they claim to be.
As such, Kerberos is widely used across the entire Hadoop ecosystem for strong
authentication. Since server processes accessing HDFS or YARN are required
to use Kerberos to authenticate with HDFS, it makes sense that they also require
Kerberos authentication from their clients, in addition to other features provided
by SASL.

A typical deployment involves the creation of Kerberos principals for all server
processes (Hadoop datanodes and namenode(s), ZooKeepers), the creation of a keytab
file for each principal and then proper configuration for the Hadoop site xml files.
Users also need Kerberos principals created for them; however, a user typically
uses a password to identify themselves instead of a keytab. Users can obtain a
ticket granting ticket (TGT) from the KDC using their password which allows them
to authenticate for the lifetime of the TGT (typically one day by default) and alleviates
the need for further password authentication.

For client server applications, like web servers, a keytab can be created which
allow for fully-automated Kerberos identification removing the need to enter any
password, at the cost of needing to protect the keytab file. These principals
will apply directly to authentication for clients accessing Accumulo and the
Accumulo processes accessing HDFS.

15.3. Delegation Tokens

MapReduce, a common way that clients interact with Accumulo, does not map well to the
client-server model that Kerberos was originally designed to support. Specifically, the parallelization
of tasks across many nodes introduces the problem of securely sharing the user credentials across
these tasks in as safe a manner as possible. To address this problem, Hadoop introduced the notion
of a delegation token to be used in distributed execution settings.

A delegation token is nothing more than a short-term, on-the-fly password generated after authenticating with the user’s
credentials. In Hadoop itself, the Namenode and ResourceManager, for HDFS and YARN respectively, act as the gateway for
delegation tokens requests. For example, before a YARN job is submitted, the implementation will request delegation
tokens from the NameNode and ResourceManager so the YARN tasks can communicate with HDFS and YARN. In the same manner,
support has been added in the Accumulo Master to generate delegation tokens to enable interaction with Accumulo via
MapReduce when Kerberos authentication is enabled in a manner similar to HDFS and YARN.

Generating an expiring password is, arguably, more secure than distributing the user’s
credentials across the cluster as only access to HDFS, YARN and Accumulo would be
compromised in the case of the token being compromised as opposed to the entire
Kerberos credential. Additional details for clients and servers will be covered
in subsequent sections.

15.4. Configuring Accumulo

To configure Accumulo for use with Kerberos, both client-facing and server-facing
changes must be made for a functional system on secured Hadoop. As previously mentioned,
numerous guidelines already exist on the subject of configuring Hadoop and ZooKeeper for
use with Kerberos and won’t be covered here. It is assumed that you have functional
Hadoop and ZooKeeper already installed.

Note that on an existing cluster the server side changes will require a full cluster shutdown and restart. You should
wait to restart the TraceServers until after you’ve completed the rest of the cluster set up and provisioned
a trace user with appropriate permissions.

15.4.1. Servers

The first step is to obtain a Kerberos identity for the Accumulo server processes.
When running Accumulo with Kerberos enabled, a valid Kerberos identity will be required
to initiate any RPC between Accumulo processes (e.g. Master and TabletServer) in addition
to any HDFS action (e.g. client to HDFS or TabletServer to HDFS).

Generate Principal and Keytab

In the kadmin.local shell or using the -q option on kadmin.local, create a
principal for Accumulo for all hosts that are running Accumulo processes. A Kerberos
principal is of the form "primary/instance@REALM". "accumulo" is commonly the "primary"
(although not required) and the "instance" is the fully-qualified domain name for
the host that will be running the Accumulo process — this is required.

kadmin.local -q "addprinc -randkey accumulo/host.domain.com"

Perform the above for each node running Accumulo processes in the instance, modifying
"host.domain.com" for your network. The randkey option generates a random password
because we will use a keytab for authentication, not a password, since the Accumulo
server processes don’t have an interactive console to enter a password into.

To simplify deployments, at thet cost of security, all Accumulo principals could
be globbed into a single keytab

kadmin.local -q "xst -k accumulo.service.keytab -glob accumulo*"

To ensure that the SASL handshake can occur from clients to servers and servers to servers,
all Accumulo servers must share the same instance and realm principal components as the
"client" needs to know these to set up the connection with the "server".

Server Configuration

A number of properties need to be changed to account to properly configure servers
in accumulo-site.xml.

Key

Default Value

Description

general.kerberos.keytab

/etc/security/keytabs/accumulo.service.keytab

The path to the keytab for Accumulo on local filesystem. Change the value to the actual path on your system.

general.kerberos.principal

accumulo/_HOST@REALM

The Kerberos principal for Accumulo, needs to match the keytab. "_HOST" can be used instead of the actual hostname in the principal and will be automatically expanded to the current FQDN which reduces the configuration file burden.

instance.rpc.sasl.enabled

true

Enables SASL for the Thrift Servers (supports GSSAPI)

rpc.sasl.qop

auth

One of "auth", "auth-int", or "auth-conf". These map to the SASL defined properties for
quality of protection. "auth" is authentication only. "auth-int" is authentication and data
integrity. "auth-conf" is authentication, data integrity and confidentiality.

instance.security.authenticator

org.apache.accumulo.server.security.
handler.KerberosAuthenticator

Configures Accumulo to use the Kerberos principal as the Accumulo username/principal

instance.security.authorizor

org.apache.accumulo.server.security.
handler.KerberosAuthorizor

Configures Accumulo to use the Kerberos principal for authorization purposes

Configures Accumulo to use the Kerberos principal for permission purposes

trace.token.type

org.apache.accumulo.core.client.
security.tokens.KerberosToken

Configures the Accumulo Tracer to use the KerberosToken for authentication when serializing traces to the trace table.

trace.user

accumulo/_HOST@REALM

The tracer process needs valid credentials to serialize traces to Accumulo. While the other server processes are
creating a SystemToken from the provided keytab and principal, we can still use a normal KerberosToken and the same
keytab/principal to serialize traces. Like non-Kerberized instances, the table must be created and permissions granted
to the trace.user. The same _HOST replacement is performed on this value, substituted the FQDN for _HOST.

trace.token.property.keytab

You can optionally specify the path to a keytab file for the principal given in the trace.user property. If you don’t
set this path, it will default to the value given in general.kerberos.principal.

general.delegation.token.lifetime

7d

The length of time that the server-side secret used to create delegation tokens is valid. After a server-side secret
expires, a delegation token created with that secret is no longer valid.

general.delegation.token.update.interval

1d

The frequency in which new server-side secrets should be generated to create delegation tokens for clients. Generating
new secrets reduces the likelihood of cryptographic attacks.

Although it should be a prerequisite, it is ever important that you have DNS properly
configured for your nodes and that Accumulo is configured to use the FQDN. It
is extremely important to use the FQDN in each of the "hosts" files for each
Accumulo process: masters, monitors, slaves, tracers, and gc.

Normally, no changes are needed in accumulo-env.sh to enable Kerberos. Typically, the krb5.conf
is installed on the local machine in /etc/, and the Java library implementations will look
here to find the necessary configuration to communicate with the KDC. Some installations
may require a different krb5.conf to be used for Accumulo: ACCUMULO_KRB5_CONF enables this.

ACCUMULO_KRB5_CONF can be configured to a directory containing a file named krb5.conf or
the path to the file itself. This will be provided to all Accumulo server and client processes
via the JVM system property java.security.krb5.conf. If the environment variable is not set,
java.security.krb5.conf will not be set either.

KerberosAuthenticator

The KerberosAuthenticator is an implementation of the pluggable security interfaces
that Accumulo provides. It builds on top of what the default ZooKeeper-based implementation,
but removes the need to create user accounts with passwords in Accumulo for clients. As
long as a client has a valid Kerberos identity, they can connect to and interact with
Accumulo, but without any permissions (e.g. cannot create tables or write data). Leveraging
ZooKeeper removes the need to change the permission handler and authorizor, so other Accumulo
functions regarding permissions and cell-level authorizations do not change.

It is extremely important to note that, while user operations like SecurityOperations.listLocalUsers(),
SecurityOperations.dropLocalUser(), and SecurityOperations.createLocalUser() will not return
errors, these methods are not equivalent to normal installations, as they will only operate on
users which have, at one point in time, authenticated with Accumulo using their Kerberos identity.
The KDC is still the authoritative entity for user management. The previously mentioned methods
are provided as they simplify management of users within Accumulo, especially with respect
to granting Authorizations and Permissions to new users.

Administrative User

Out of the box (without Kerberos enabled), Accumulo has a single user with administrative permissions "root".
This users is used to "bootstrap" other users, creating less-privileged users for applications using
the system. In Kerberos, to authenticate with the system, it’s required that the client presents Kerberos
credentials for the principal (user) the client is trying to authenticate as.

Because of this, an administrative user named "root" would be useless in an instance using Kerberos,
because it is very unlikely to have Kerberos credentials for a principal named root. When Kerberos is
enabled, Accumulo will prompt for the name of a user to grant the same permissions as what the root
user would normally have. The name of the Accumulo user to grant administrative permissions to can
also be given by the -u or --user options.

If you are enabling Kerberos on an existing cluster, you will need to reinitialize the security system in
order to replace the existing "root" user with one that can be used with Kerberos. These steps should be
completed after you have done the previously described configuration changes and will require access to
a complete accumulo-site.xml, including the instance secret. Note that this process will delete all
existing users in the system; you will need to reassign user permissions based on Kerberos principals.

Ensure Accumulo is not running.

Given the path to a accumulo-site.xml with the instance secret, run the security reset tool. If you are
prompted for a password you can just hit return, since it won’t be used.

Verifying secure access

To verify that servers have correctly started with Kerberos enabled, ensure that the processes
are actually running (they should exit immediately if login fails) and verify that you see
something similar to the following in the application log.

Impersonation

Impersonation is functionality which allows a certain user to act as another. One direct application
of this concept within Accumulo is the Thrift proxy. The Thrift proxy is configured to accept
user requests and pass them onto Accumulo, enabling client access to Accumulo via any thrift-compatible
language. When the proxy is running with SASL transports, this enforces that clients present a valid
Kerberos identity to make a connection. In this situation, the Thrift proxy server does not have
access to the secret key material in order to make a secure connection to Accumulo as the client,
it can only connect to Accumulo as itself. Impersonation, in this context, refers to the ability
of the proxy to authenticate to Accumulo as itself, but act on behalf of an Accumulo user.

Accumulo supports basic impersonation of end-users by a third party via static rules in Accumulo’s
site configuration file. These two properties are semi-colon separated properties which are aligned
by index. This first element in the user impersonation property value matches the first element
in the host impersonation property value, etc.

Here, $PROXY_USER can impersonate user1 and user2 only from host1.domain.com or host2.domain.com.
$PROXY_USER2 can impersonate user2 and user4 from any host.

In these examples, the value $PROXY_USER is the Kerberos principal of the server which is acting on behalf of a user.
Impersonation is enforced by the Kerberos principal and the host from which the RPC originated (from the perspective
of the Accumulo TabletServers/Masters). An asterisk (*) can be used to specify all users or all hosts (depending on the context).

Delegation Tokens

Within Accumulo services, the primary task to implement delegation tokens is the generation and distribution
of a shared secret among all Accumulo tabletservers and the master. The secret key allows for generation
of delegation tokens for users and verification of delegation tokens presented by clients. If a server
process is unaware of the secret key used to create a delegation token, the client cannot be authenticated.
As ZooKeeper distribution is an asynchronous operation (typically on the order of seconds), the
value for general.delegation.token.update.interval should be on the order of hours to days to reduce the
likelihood of servers rejecting valid clients because the server did not yet see a new secret key.

Supporting authentication with both Kerberos credentials and delegation tokens, the SASL thrift
server accepts connections with either GSSAPI and DIGEST-MD5 mechanisms set. The DIGEST-MD5 mechanism
enables authentication as a normal username and password exchange which `DelegationToken`s leverages.

Since delegation tokens are a weaker form of authentication than Kerberos credentials, user access
to obtain delegation tokens from Accumulo is protected with the DELEGATION_TOKEN system permission. Only
users with the system permission are allowed to obtain delegation tokens. It is also recommended
to configure confidentiality with SASL, using the rpc.sasl.qop=auth-conf configuration property, to
ensure that prying eyes cannot view the DelegationToken as it passes over the network.

15.4.2. Clients

Create client principal

Like the Accumulo servers, clients must also have a Kerberos principal created for them. The
primary difference between a server principal is that principals for users are created
with a password and also not qualified to a specific instance (host).

kadmin.local -q "addprinc $user"

The above will prompt for a password for that user which will be used to identify that $user.
The user can verify that they can authenticate with the KDC using the command kinit $user.
Upon entering the correct password, a local credentials cache will be made which can be used
to authenticate with Accumulo, access HDFS, etc.

The user can verify the state of their local credentials cache by using the command klist.

Configuration

The second thing clients need to do is to set up their client configuration file. By
default, this file is stored in ~/.accumulo/config, $ACCUMULO_CONF_DIR/client.conf or
$ACCUMULO_HOME/conf/client.conf. Accumulo utilities also allow you to provide your own
copy of this file in any location using the --config-file command line option.

Three items need to be set to enable access to Accumulo:

instance.rpc.sasl.enabled=true

rpc.sasl.qop=auth

kerberos.server.primary=accumulo

Each of these properties must match the configuration of the accumulo servers; this is
required to set up the SASL transport.

Verifying Administrative Access

At this point you should have enough configured on the server and client side to interact with
the system. You should verify that the administrative user you chose earlier can successfully
interact with the sytem.

While this example logs in via kinit with a password, any login method that caches Kerberos tickets
should work.

DelegationTokens with MapReduce

To use DelegationTokens in a custom MapReduce job, the call to setConnectorInfo() method
on AccumuloInputFormat or AccumuloOutputFormat should be the only necessary change. Instead
of providing an instance of a KerberosToken, the user must call SecurityOperations.getDelegationToken
using a Connector obtained with that KerberosToken, and pass the DelegationToken to
setConnectorInfo instead of the KerberosToken. It is expected that the user launching
the MapReduce job is already logged in via Kerberos via a keytab or via a locally-cached
Kerberos ticket-granting-ticket (TGT).

If the user passes a KerberosToken to the setConnectorInfo method, the implementation will
attempt to obtain a DelegationToken automatically, but this does have limitations
based on the other MapReduce configuration methods already called and permissions granted
to the calling user. It is best for the user to acquire the DelegationToken on their own
and provide it directly to setConnectorInfo.

Users must have the DELEGATION_TOKEN system permission to call the getDelegationToken
method. The obtained delegation token is only valid for the requesting user for a period
of time dependent on Accumulo’s configuration (general.delegation.token.lifetime).

It is also possible to obtain and use `DelegationToken`s outside of the context
of MapReduce.

Use of the dtConnector will perform each operation as the original user, but without
their Kerberos credentials.

For the duration of validity of the DelegationToken, the user must take the necessary precautions
to protect the DelegationToken from prying eyes as it can be used by any user on any host to impersonate
the user who requested the DelegationToken. YARN ensures that passing the delegation token from the client
JVM to each YARN task is secure, even in multi-tenant instances.

15.4.3. Debugging

Q: I have valid Kerberos credentials and a correct client configuration file but
I still get errors like:

A: When you have a valid client configuration and Kerberos TGT, it is possible that the search
path for your local credentials cache is incorrect. Check the value of the KRB5CCNAME environment
value, and ensure it matches the value reported by klist.

Q: I thought I had everything configured correctly, but my client/server still fails to log in.
I don’t know what is actually failing.

A: Add the following system property to the JVM invocation:

-Dsun.security.krb5.debug=true

This will enable lots of extra debugging at the JVM level which is often sufficient to
diagnose some high-level configuration problem. Client applications can add this system property by
hand to the command line and Accumulo server processes or applications started using the accumulo
script by adding the property to ACCUMULO_GENERAL_OPTS in $ACCUMULO_CONF_DIR/accumulo-env.sh.

Additionally, you can increase the log4j levels on org.apache.hadoop.security, which includes the
Hadoop UserGroupInformation class, which will include some high-level debug statements. This
can be controlled in your client application, or using $ACCUMULO_CONF_DIR/generic_logger.xml

Q: All of my Accumulo processes successfully start and log in with their
keytab, but they are unable to communicate with each other, showing the
following errors:

2015-01-12 14:47:27,055 [transport.TSaslTransport] ERROR: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:53)
at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.accumulo.core.rpc.UGIAssumingTransport.open(UGIAssumingTransport.java:49)
at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:357)
at org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:255)
at org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.getTableMap(LiveTServerSet.java:106)
at org.apache.accumulo.master.Master.gatherTableInformation(Master.java:996)
at org.apache.accumulo.master.Master.access$600(Master.java:160)
at org.apache.accumulo.master.Master$StatusThread.updateStatus(Master.java:911)
at org.apache.accumulo.master.Master$StatusThread.run(Master.java:901)
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
... 16 more
Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:309)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
... 19 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:61)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... 25 more

or

2015-01-12 14:47:29,440 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 10 more

A: As previously mentioned, the hostname, and subsequently the address each Accumulo process is bound/listening
on, is extremely important when negotiating an SASL connection. This problem commonly arises when the Accumulo
servers are not configured to listen on the address denoted by their FQDN.

Q: After configuring my system for Kerberos, server processes come up normally and I can interact with the system. However,
when I attempt to use the "Recent Traces" page on the Monitor UI I get a stacktrace similar to:

java.lang.AssertionError: AuthenticationToken should not be null
at org.apache.accumulo.monitor.servlets.trace.Basic.getScanner(Basic.java:139)
at org.apache.accumulo.monitor.servlets.trace.Summary.pageBody(Summary.java:164)
at org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:63)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:462)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
at java.lang.Thread.run(Thread.java:745)

A: This indicates that the Monitor has not been able to successfully log in a client-side user to read from the trace table. Accumulo allows the TraceServer to rely on the property general.kerberos.keytab as a fallback when logging in the trace user if the trace.token.property.keytab property isn’t defined. Some earlier versions of Accumulo did not do this same fallback for the Monitor’s use of the trace user. The end result is that if you configure general.kerberos.keytab and not trace.token.property.keytab you will end up with a system that properly logs trace information but can’t view it.

Ensure you have set trace.token.property.keytab to point to a keytab for the principal defined in trace.user in the accumulo-site.xml file for the Monitor, since that should work in all versions of Accumulo.

16. Administration

16.1. Hardware

Because we are running essentially two or three systems simultaneously layered
across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have
at least one core and 2 - 4 GB each.

One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks.

It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB
each, but in this case it is recommended to only run up to two processes per
machine — i.e. DataNode and TabletServer or DataNode and MapReduce worker but
not all three. The constraint here is having enough available heap space for all the
processes on a machine.

16.2. Network

Accumulo communicates via remote procedure calls over TCP/IP for both passing
data and control messages. In addition, Accumulo uses HDFS clients to
communicate with HDFS. To achieve good ingest and query performance, sufficient
network bandwidth must be available between any two machines.

In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will
use the following default ports. Please make sure that they are open, or change
their value in conf/accumulo-site.xml.

Table 1. Accumulo default ports

Port

Description

Property Name

4445

Shutdown Port (Accumulo MiniCluster)

n/a

4560

Accumulo monitor (for centralized log display)

monitor.port.log4j

9997

Tablet Server

tserver.port.client

9999

Master Server

master.port.client

12234

Accumulo Tracer

trace.port.client

42424

Accumulo Proxy Server

n/a

50091

Accumulo GC

gc.port.client

50095

Accumulo HTTP monitor

monitor.port.client

10001

Master Replication service

master.replication.coordinator.port

10002

TabletServer Replication service

replication.receipt.service.port

In addition, the user can provide 0 and an ephemeral port will be chosen instead. This
ephemeral port is likely to be unique and not already bound. Thus, configuring ports to
use 0 instead of an explicit value, should, in most cases, work around any issues of
running multiple distinct Accumulo instances (or any other process which tries to use the
same default ports) on the same hardware.

16.3. Installation

Download a binary distribution of Accumulo and install it to a directory on a disk with
sufficient space:

Repeat this step on each machine in your cluster. Typically, the same <install directory>
is chosen for all machines in the cluster. When you configure Accumulo, the $ACCUMULO_HOME
environment variable should be set to /path/to/<install directory>/accumulo-X.Y.Z.

16.4. Dependencies

Accumulo requires HDFS and ZooKeeper to be configured and running
before starting. Password-less SSH should be configured between at least the
Accumulo master and TabletServer machines. It is also a good idea to run Network
Time Protocol (NTP) within the cluster to ensure nodes' clocks don’t get too out of
sync, which can cause problems with automatically timestamped data.

16.5. Configuration

Accumulo is configured by editing several Shell and XML files found in
$ACCUMULO_HOME/conf. The structure closely resembles Hadoop’s configuration
files.

Logging is primarily controlled using the log4j configuration files,
generic_logger.xml and monitor_logger.xml (or their corresponding
.properties version if the .xml version is missing). The generic logger is
used for most server types, and is typically configured to send logs to the
monitor, as well as log files. The monitor logger is used by the monitor, and
is typically configured to log only errors the monitor itself generates,
rather than all the logs that it receives from other server types.

16.5.1. Edit conf/accumulo-env.sh

Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh
and specify the following:

Enter the location of the installation directory of Accumulo for $ACCUMULO_HOME

Enter your system’s Java home for $JAVA_HOME

Enter the location of Hadoop for $HADOOP_PREFIX

Choose a location for Accumulo logs and enter it for $ACCUMULO_LOG_DIR

Enter the location of ZooKeeper for $ZOOKEEPER_HOME

By default Accumulo TabletServers are set to use 1GB of memory. You may change
this by altering the value of $ACCUMULO_TSERVER_OPTS. Note the syntax is that of
the Java JVM command line options. This value should be less than the physical
memory of the machines running TabletServers.

There are similar options for the master’s memory usage and the garbage collector
process. Reduce these if they exceed the physical RAM of your hardware and
increase them, within the bounds of the physical RAM, if a process fails because of
insufficient memory.

Note that you will be specifying the Java heap space in accumulo-env.sh. You should
make sure that the total heap space used for the Accumulo tserver and the Hadoop
DataNode and TaskTracker is less than the available memory on each slave node in
the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop
NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
machines to allow them to use more heap space. If you are running these on the
same machine on a small cluster, likewise make sure their heap space settings fit
within the available memory.

16.5.2. Native Map

The tablet server uses a data structure called a MemTable to store sorted key/value
pairs in memory when they are first received from the client. When a minor compaction
occurs, this data structure is written to HDFS. The MemTable will default to using
memory in the JVM but a JNI version, called the native map, can be used to significantly
speed up performance by utilizing the memory space of the native operating system. The
native map also avoids the performance implications brought on by garbage collection
in the JVM by causing it to pause much less frequently.

Building

32-bit and 64-bit Linux and Mac OS X versions of the native map can be built
from the Accumulo bin package by executing
$ACCUMULO_HOME/bin/build_native_library.sh. If your system’s
default compiler options are insufficient, you can add additional compiler
options to the command line, such as options for the architecture. These will be
passed to the Makefile in the environment variable USERFLAGS.

Examples:

$ACCUMULO_HOME/bin/build_native_library.sh

$ACCUMULO_HOME/bin/build_native_library.sh -m32

After building the native map from the source, you will find the artifact in
$ACCUMULO_HOME/lib/native. Upon starting up, the tablet server will look
in this directory for the map library. If the file is renamed or moved from its
target directory, the tablet server may not be able to find it. The system can
also locate the native maps shared library by setting LD_LIBRARY_PATH
(or DYLD_LIBRARY_PATH on Mac OS X) in $ACCUMULO_HOME/conf/accumulo-env.sh.

Native Maps Configuration

As mentioned, Accumulo will use the native libraries if they are found in the expected
location and tserver.memory.maps.native.enabled is set to true (which is the default).
Using the native maps over JVM Maps nets a noticable improvement in ingest rates; however,
certain configuration variables are important to modify when increasing the size of the
native map.

To adjust the size of the native map, increase the value of tserver.memory.maps.max.
By default, the maximum size of the native map is 1GB. When increasing this value, it is
also important to adjust the values of table.compaction.minor.logs.threshold and
tserver.walog.max.size. table.compaction.minor.logs.threshold is the maximum
number of write-ahead log files that a tablet can reference before they will be automatically
minor compacted. tserver.walog.max.size is the maximum size of a write-ahead log.

The maximum size of the native maps for a server should be less than the product
of the write-ahead log maximum size and minor compaction threshold for log files:

This formula ensures that minor compactions won’t be automatically triggered before the native
maps can be completely saturated.

Subsequently, when increasing the size of the write-ahead logs, it can also be important
to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
This is controlled via tserver.wal.blocksize. A basic recommendation is that when
tserver.walog.max.size is larger than 2GB in size, set tserver.wal.blocksize to 2GB.
Increasing the block size to a value larger than 2GB can result in decreased write
performance to the write-ahead log file which will slow ingest.

16.5.3. Cluster Specification

On the machine that will serve as the Accumulo master:

Write the IP address or domain name of the Accumulo Master to the $ACCUMULO_HOME/conf/masters file.

Write the IP addresses or domain name of the machines that will be TabletServers in $ACCUMULO_HOME/conf/slaves, one per line.

Note that if using domain names rather than IP addresses, DNS must be configured
properly for all machines participating in the cluster. DNS can be a confusing source
of errors.

16.5.4. Accumulo Settings

Specify appropriate values for the following settings in
$ACCUMULO_HOME/conf/accumulo-site.xml :

The instance needs a secret to enable secure communication between servers. Configure your
secret and make sure that the accumulo-site.xml file is not readable to other users.
For alternatives to storing the instance.secret in plaintext, please read the
Sensitive Configuration Values section.

Some settings can be modified via the Accumulo shell and take effect immediately, but
some settings require a process restart to take effect. See the configuration documentation
(available in the docs directory of the tarball and in Configuration Management) for details.

One aspect of Accumulo’s configuration which is different as compared to the rest of the Hadoop
ecosystem is that the server-process classpath is determined in part by multiple values. A
bootstrap classpath is based soley on the accumulo-start.jar, Log4j and $ACCUMULO_CONF_DIR.

A second classloader is used to dynamically load all of the resources specified by general.classpaths
in $ACCUMULO_CONF_DIR/accumulo-site.xml. This value is a comma-separated list of regular-expression
paths which are all loaded into a secondary classloader. This includes Hadoop, Accumulo and ZooKeeper
jars necessary to run Accumulo. When this value is not defined, a default value is used which attempts
to load Hadoop from multiple potential locations depending on how Hadoop was installed. It is strongly
recommended that general.classpaths is defined and limited to only the necessary jars to prevent
extra jars from being unintentionally loaded into Accumulo processes.

16.5.5. Hostnames in configuration files

Accumulo has a number of configuration files which can contain references to other hosts in your
network. All of the "host" configuration files for Accumulo (gc, masters, slaves, monitor,
tracers) as well as instance.volumes in accumulo-site.xml must contain some host reference.

While IP address, short hostnames, or fully qualified domain names (FQDN) are all technically valid, it
is good practice to always use FQDNs for both Accumulo and other processes in your Hadoop cluster.
Failing to consistently use FQDNs can have unexpected consequences in how Accumulo uses the FileSystem.

A common way for this problem can be observed is via applications that use Bulk Ingest. The Accumulo
Master coordinates moving the input files to Bulk Ingest to an Accumulo-managed directory. However,
Accumulo cannot safely move files across different Hadoop FileSystems. This is problematic because
Accumulo also cannot make reliable assertions across what is the same FileSystem which is specified
with different names. Naively, while 127.0.0.1:8020 might be a valid identifier for an HDFS instance,
Accumulo identifies localhost:8020 as a different HDFS instance than 127.0.0.1:8020.

16.5.6. Deploy Configuration

Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml
from the $ACCUMULO_HOME/conf/ directory on the master to all the machines
specified in the slaves file.

16.5.7. Sensitive Configuration Values

Accumulo has a number of properties that can be specified via the accumulo-site.xml
file which are sensitive in nature, instance.secret and trace.token.property.password
are two common examples. Both of these properties, if compromised, have the ability
to result in data being leaked to users who should not have access to that data.

In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common
implementation to abstract away the storage and retrieval of passwords from plaintext
storage in configuration files. Any Property marked with the Sensitive annotation
is a candidate for use with these CredentialProviders. For version of Hadoop which lack
these classes, the feature will just be unavailable for use.

A comma separated list of CredentialProviders can be configured using the Accumulo Property
general.security.credential.provider.paths. Each configured URL will be consulted
when the Configuration object for accumulo-site.xml is accessed.

16.5.8. Using a JavaKeyStoreCredentialProvider for storage

One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider.
Each entry in the KeyStore is the Accumulo Property key name. For example, to store the
instance.secret, the following command can be used:

This configuration will then transparently extract the instance.secret from
the configured KeyStore and alleviates a human readable storage of the sensitive
property.

A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to
all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server
will expect the KeyStore in the same location.

16.5.9. Client Configuration

In version 1.6.0, Accumulo includes a new type of configuration file known as a client
configuration file. One problem with the traditional "site.xml" file that is prevalent
through Hadoop is that it is a single file used by both clients and servers. This makes
is very difficult to protect secrets that are only meant for the server processes while
allowing the clients to connect to the servers.

The client configuration file is a subset of the information stored in accumulo-site.xml
meant only for consumption by clients of Accumulo. By default, Accumulo checks a number
of locations for a client configuration by default:

${ACCUMULO_CONF_DIR}/client.conf

/etc/accumulo/client.conf

/etc/accumulo/conf/client.conf

~/.accumulo/config

These files are Java Properties files. These files
can currently contain information about ZooKeeper servers, RPC properties (such as SSL or SASL
connectors), distributed tracing properties. Valid properties are defined by the
ClientProperty
enum contained in the client API.

16.5.10. Custom Table Tags

Accumulo has the ability for users to add custom tags to tables. This allows
applications to set application-level metadata about a table. These tags can be
anything from a table description, administrator notes, date created, etc.
This is done by naming and setting a property with a prefix table.custom.*.

Currently, table properties are stored in ZooKeeper. This means that the number
and size of custom properties should be restricted on the order of 10’s of properties
at most without any properties exceeding 1MB in size. ZooKeeper’s performance can be
very sensitive to an excessive number of nodes and the sizes of the nodes. Applications
which leverage the user of custom properties should take these warnings into
consideration. There is no enforcement of these warnings via the API.

16.6. Initialization

Accumulo must be initialized to create the structures it uses internally to locate
data across the cluster. HDFS is required to be configured and running before
Accumulo can be initialized.

Once HDFS is started, initialization can be performed by executing
$ACCUMULO_HOME/bin/accumulo init . This script will prompt for a name
for this instance of Accumulo. The instance name is used to identify a set of tables
and instance-specific settings. The script will then write some information into
HDFS so Accumulo can start properly.

The initialization script will prompt you to set a root password. Once Accumulo is
initialized it can be started.

16.7. Running

16.7.1. Starting Accumulo

Make sure Hadoop is configured on all of the machines in the cluster, including
access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
Make sure ZooKeeper is configured and running on at least one machine in the
cluster.
Start Accumulo using the bin/start-all.sh script.

To verify that Accumulo is running, check the Status page as described in
Monitoring. In addition, the Shell can provide some information about the status of
tables via reading the metadata tables.

16.7.2. Stopping Accumulo

To shutdown cleanly, run bin/stop-all.sh and the master will orchestrate the
shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may
take some time for particular configurations.

16.7.3. Adding a Node

Make sure the host in question has the new configuration, or else the tablet
server won’t start; at a minimum this needs to be on the host(s) being added,
but in practice it’s good to ensure consistent configuration across all nodes.

16.7.4. Decomissioning a Node

If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet
server. Accumulo will automatically rebalance the tablets across the available tablet servers.

$ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...}

Alternatively, you can ssh to each of the hosts you want to remove and run:

$ACCUMULO_HOME/bin/stop-here.sh

Be sure to update your $ACCUMULO_HOME/conf/slaves (or $ACCUMULO_CONF_DIR/slaves) file to
account for the removal of these hosts. Bear in mind that the monitor will not re-read the
slaves file automatically, so it will report the decomissioned servers as down; it’s
recommended that you restart the monitor so that the node list is up to date.

16.7.5. Restarting process on a node

Occasionally, it might be necessary to restart the processes on a specific node. In addition
to the start-all.sh and stop-all.sh scripts, Accumulo contains scripts to start/stop all processes
on a node and start/stop a given process on a node.

start-here.sh and stop-here.sh will start/stop all Accumulo processes on the current node. The
necessary processes to start/stop are determined via the "hosts" files (e.g. slaves, masters, etc).
These scripts expect no arguments.

start-server.sh can also be useful in starting a given process on a host.
The first argument to the process is the hostname of the machine. Use the same host that
you specified in hosts file (if you specified FQDN in the masters file, use the FQDN, not
the shortname). The second argument is the name of the process to start (e.g. master, tserver).

The steps described to decomission a node can also be used (without removal of the host
from the $ACCUMULO_HOME/conf/slaves file) to gracefully stop a node. This will
ensure that the tabletserver is cleanly stopped and recovery will not need to be performed
when the tablets are re-hosted.

16.7.6. Running multiple TabletServers on a single node

With very powerful nodes, it may be beneficial to run more than one TabletServer on a given
node. This decision should be made carefully and with much deliberation as Accumulo is designed
to be able to scale to using 10’s of GB of RAM and 10’s of CPU cores.

To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration
directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise
for the user.

Normally, setting a value of 0 for these configuration properties is sufficient. In some
environment, the ports used by Accumulo must be well-known for security reasons and require a
separate copy of the configuration files to use a static port for each TabletServer instance.

It is also necessary to update the following exported variables in accumulo-env.sh.

ACCUMULO_LOG_DIR

The values for these properties are left up to the user to define; there are no constraints
other than ensuring that the directory exists and the user running Accumulo has the permission
to read/write into that directory.

Accumulo’s provided scripts for stopping a cluster operate under the assumption that one process
is running per host. As such, starting and stopping multiple TabletServers on one host requires
more effort on the user. It is important to ensure that ACCUMULO_CONF_DIR is correctly
set for the instance of the TabletServer being started.

To stop TabletServers, the normal stop-all.sh will stop all instances of TabletServers across all nodes.
Using the provided kill command by your operation system is an option to stop a single instance on
a single node. stop-server.sh can be used to stop all TabletServers on a single node.

16.8. Monitoring

16.8.1. Accumulo Monitor

The Accumulo Monitor provides an interface for monitoring the status and health of
Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at
http://monitorhost:50095/.

Things highlighted in yellow may be in need of attention.
If anything is highlighted in red on the monitor page, it is something that definitely needs attention.

The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID.
There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time.
These include ingest and scan performance and other useful measurements.

The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table).
Metrics typically include number of entries (key/value pairs), ingest and query rates.
The number of running scans, major and minor compactions are in the form number_running (number_queued).
Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction.

The Server Activity page graphically displays tablet server status, with each server represented as a circle or square.
Different metrics may be assigned to the nodes' color and speed of oscillation.
The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization).
Similarly, the Overall Max metric picks the metric with the maximum normalized value.

The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle.
The Traces page displays data for recent traces performed (see the following section for information on Tracing).
The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes.
Also, the XML and JSON links provide metrics in XML and JSON formats, respectively.

16.8.2. SSL

SSL may be enabled for the monitor page by setting the following properties in the accumulo-site.xml file:

If the Accumulo conf directory has been configured (in particular the accumulo-env.sh file must be set up), the generate_monitor_certificate.sh script in the Accumulo bin directory can be used to create the keystore and truststore files with random passwords.
The script will print out the properties that need to be added to the accumulo-site.xml file.
The stores can also be generated manually with the Java keytool command, whose usage can be seen in the generate_monitor_certificate.sh script.

If desired, the SSL ciphers allowed for connections can be controlled via the following properties in accumulo-site.xml:

monitor.ssl.include.ciphers
monitor.ssl.exclude.ciphers

If SSL is enabled, the monitor URL can only be accessed via https.
This also allows you to access the Accumulo shell through the monitor page.
The left navigation bar will have a new link to Shell.
An Accumulo user name and password must be entered for access to the shell.

16.9. Metrics

Accumulo is capable of using the Hadoop Metrics2 library and is configured by default to use it. Metrics2 is a library
which allows for routing of metrics generated by registered MetricsSources to configured MetricsSinks. Examples of sinks
that are implemented by Hadoop include file-based logging, Graphite and Ganglia. All metric sources are exposed via JMX
when using Metrics2.

Previous to Accumulo 1.7.0, JMX endpoints could be exposed in addition to file-based logging of those metrics configured via
the accumulo-metrics.xml file. This mechanism can still be used by setting general.legacy.metrics to true in accumulo-site.xml.

16.9.1. Metrics2 Configuration

Metrics2 is configured by examining the classpath for a file that matches hadoop-metrics2*.properties. The example configuration
files that Accumulo provides for use include hadoop-metrics2-accumulo.properties as a template which can be used to enable
file, Graphite or Ganglia sinks (some minimal configuration required for Graphite and Ganglia). Because the Hadoop configuration is
also on the Accumulo classpath, be sure that you do not have multiple Metrics2 configuration files. It is recommended to consolidate
metrics in a single properties file in a central location to remove ambiguity. The contents of hadoop-metrics2-accumulo.properties
can be added to a central hadoop-metrics2.properties in $HADOOP_CONF_DIR.

As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative
to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files
from growing without bound.

Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check
the Accumulo processes log files when debugging missing metrics output.

16.10. Tracing

It can be difficult to determine why some operations are taking longer
than expected. For example, you may be looking up items with very low
latency, but sometimes the lookups take much longer. Determining the
cause of the delay is difficult because the system is distributed, and
the typical lookup is fast.

Accumulo has been instrumented to record the time that various
operations take when tracing is turned on. The fact that tracing is
enabled follows all the requests made on behalf of the user throughout
the distributed infrastructure of accumulo, and across all threads of
execution.

These time spans will be inserted into the trace table in
Accumulo. You can browse recent traces from the Accumulo monitor
page. You can also read the trace table directly like any
other table.

The design of Accumulo’s distributed tracing follows that of
Google’s Dapper.

16.10.1. Tracers

To collect traces, Accumulo needs at least one server listed in
$ACCUMULO_HOME/conf/tracers. The server collects traces
from clients and writes them to the trace table. The Accumulo
user that the tracer connects to Accumulo with can be configured with
the following properties
(see the Configuration section for setting Accumulo server properties)

The zookeeper path is configured to /tracers by default. If
multiple Accumulo instances are sharing the same ZooKeeper
quorum, take care to configure Accumulo with unique values for
this property.

16.10.2. Configuring Tracing

Traces are collected via SpanReceivers. The default SpanReceiver
configured is org.apache.accumulo.core.trace.ZooTraceClient, which
sends spans to an Accumulo Tracer process, as discussed in the
previous section. This default can be changed to a different span
receiver, or additional span receivers can be added in a
comma-separated list, by modifying the property

trace.span.receivers

Individual span receivers may require their own configuration
parameters, which are grouped under the trace.span.receiver.*
prefix. ZooTraceClient uses the following properties. The first
three properties are populated from other Accumulo properties,
while the remaining ones should be prefixed with
trace.span.receiver. when set in the Accumulo configuration.

Note that to configure an Accumulo client for tracing, including
the Accumulo shell, the client configuration must be given the same
trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path
properties as the servers have.

Hadoop can also be configured to send traces to Accumulo, as of
Hadoop 2.6.0, by setting properties in Hadoop’s core-site.xml
file. Instead of using the trace.span.receiver.* prefix, Hadoop
uses hadoop.htrace.*. The Hadoop configuration does not have
access to Accumulo’s properties, so the
hadoop.htrace.tracer.zookeeper.host property must be specified.
The zookeeper timeout defaults to 30000 (30 seconds), and the
zookeeper path defaults to /tracers. An example of configuring
Hadoop to send traces to ZooTraceClient is

The accumulo-core, accumulo-tracer, accumulo-fate and libthrift
jars must also be placed on Hadoop’s classpath.

Adding additional SpanReceivers

Zipkin
has a SpanReceiver supported by HTrace and popularized by Twitter
that users looking for a more graphical trace display may opt to use.
The following steps configure Accumulo to use org.apache.htrace.impl.ZipkinSpanReceiver
in addition to the Accumulo’s default ZooTraceClient, and they serve as a template
for adding any SpanReceiver to Accumulo:

Add the Jar containing the ZipkinSpanReceiver class file to the
$ACCUMULO_HOME/lib/. It is critical that the Jar is placed in
lib/ and NOT in lib/ext/ so that the new SpanReceiver class
is visible to the same class loader of htrace-core.

Your SpanReceiver may require additional properties, and if so these should likewise
be placed in the ClientConfiguration (if applicable) and Accumulo’s accumulo-site.xml.
Two such properties for ZipkinSpanReceiver, listed with their default values, are

The sampler (such as Sampler.ALWAYS) for the trace should only be specified with a top-level span,
and subsequent spans will be collected depending on whether that first span was sampled.
Don’t forget to specify a Sampler at the top-level span
because the default Sampler only samples when part of a pre-existing trace,
which will never occur in a client that never specifies a Sampler.

Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace.
Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS.

It is also possible to add timeline annotations to your spans.
This associates a string with a given timestamp between the start and stop times for a span.

...
writeScope.getSpan().addTimelineAnnotation("Initiating Flush");

Some client operations may have a high volume within your
application. As such, you may wish to only sample a percentage of
operations for tracing. As seen below, the CountSampler can be used to
help enable tracing for 1-in-1000 operations

16.10.4. Viewing Collected Traces

To view collected traces, use the "Recent Traces" link on the Monitor
UI. You can also programmatically access and print traces using the
TraceDump class.

Trace Table Format

This section is for developers looking to use data recorded in the trace table
directly, above and beyond the default services of the Accumulo monitor.
Please note the trace table format and its supporting classes
are not in the public API and may be subject to change in future versions.

Each span received by a tracer’s ZooTraceClient is recorded in the trace table
in the form of three entries: span entries, index entries, and start time entries.
Span and start time entries record full span information,
whereas index entries provide indexing into span information
useful for quickly finding spans by type or start time.

Each entry is illustrated by a description and sample of data.
In the description, a token in quotes is a String literal,
whereas other other tokens are span variables.
Parentheses group parts together, to distinguish colon characters inside the
column family or qualifier from the colon that separates column family and qualifier.
We use the format row columnFamily:columnQualifier columnVisibility value
(omitting timestamp which records the time an entry is written to the trace table).

The parentSpanId is "" for the root span of a trace.
The spanBinaryEncoding is a compact Apache Thrift encoding of the original Span object.
This allows clients (and the Accumulo monitor) to recover all the details of the original Span
at a later time, by scanning the trace table and decoding the value of span entries
via TraceFormatter.getRemoteSpan(entry).

The trace table has a formatter class by default (org.apache.accumulo.tracer.TraceFormatter)
that changes how span entries appear from the Accumulo shell.
Normal scans to the trace table do not use this formatter representation;
it exists only to make span entries easier to view inside the Accumulo shell.

The service and sender are set by the first call of each Accumulo process
(and instrumented client processes) to DistributedTrace.enable(…​)
(the sender is autodetected if not specified).
The description is specified in each span.
Start time and the elapsed time (start - stop, 1 millisecond in the example above)
are recorded in milliseconds as long values serialized to a string in hex.

The following classes may be run from $ACCUMULO_HOME while Accumulo is running
to provide insight into trace statistics. These require
accumulo-trace-VERSION.jar to be provided on the Accumulo classpath
($ACCUMULO_HOME/lib/ext is fine).

When an Accumulo process dies, the watcher will look at the logs and exit codes
to determine how the process failed and either restart or fail depending on the
recent history of failures. The restarting policy for various failure conditions
is configurable through the *_TIMESPAN and *_RETRIES variables shown above.

16.13. Recovery

In the event of TabletServer failure or error on shutting Accumulo down, some
mutations may not have been minor compacted to HDFS properly. In this case,
Accumulo will automatically reapply such mutations from the write-ahead log
either when the tablets from the failed server are reassigned by the Master (in the
case of a single TabletServer failure) or the next time Accumulo starts (in the event of
failure during shutdown).

Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing
updates. The sort status of each file is displayed on
Accumulo monitor status page. Once the recovery is complete any
tablets involved should return to an “online” state. Until then those tablets will be
unavailable to clients.

The Accumulo client library is configured to retry failed mutations and in many
cases clients will be able to continue processing after the recovery process without
throwing an exception.

16.14. Migrating Accumulo from non-HA Namenode to HA Namenode

The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL
hdfs://namenode.example.com:8020 which is going to be moved to hdfs://nameservice1.

Before moving HDFS over to the HA namenode, use $ACCUMULO_HOME/bin/accumulo admin volumes to confirm
that the only volume displayed is the volume from the current namenode’s HDFS URL.

After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice.

Edit $ACCUMULO_HOME/conf/accumulo-site.xml to notify accumulo that a volume is being replaced. First,
add the new nameservice volume to the instance.volumes property. Next, add the
instance.volumes.replacements property in the form of old new. It’s important to not include
the volume that’s being replaced in instance.volumes, otherwise it’s possible accumulo could continue
to write to the volume.

Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to
the new volumes. This is expected and can usually be ignored.

16.15. Achieving Stability in a VM Environment

For testing, demonstration, and even operation uses, Accumulo is often
installed and run in a virtual machine (VM) environment. The majority of
long-term operational uses of Accumulo are on bare-metal cluster. However, the
core design of Accumulo and its dependencies do not preclude running stably for
long periods within a VM. Many of Accumulo’s operational robustness features to
handle failures like periodic network partitioning in a large cluster carry
over well to VM environments. This guide covers general recommendations for
maximizing stability in a VM environment, including some of the common failure
modes that are more common when running in VMs.

16.15.1. Known failure modes: Setup and Troubleshooting

In addition to the general failure modes of running Accumulo, VMs can introduce a
couple of environmental challenges that can affect process stability. Clock
drift is something that is more common in VMs, especially when VMs are
suspended and resumed. Clock drift can cause Accumulo servers to assume that
they have lost connectivity to the other Accumulo processes and/or lose their
locks in Zookeeper. VM environments also frequently have constrained resources,
such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally
deals well with constrained resources from a stability perspective (optimizing
performance will require additional tuning, which is not covered in this
section), however there are some limits.

Physical Memory

One of those limits has to do with the Linux out of memory killer. A common
failure mode in VM environments (and in some bare metal installations) is when
the Linux out of memory killer decides to kill processes in order to avoid a
kernel panic when provisioning a memory page. This often happens in VMs due to
the large number of processes that must run in a small memory footprint. In
addition to the Linux core processes, a single-node Accumulo setup requires a
Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper
server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer.
Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop
ResourceManager, a Hadoop NodeManager, provisioning software, and client
applications. Between all of these processes, it is not uncommon to
over-subscribe the available RAM in a VM. We recommend setting up VMs without
swap enabled, so rather than performance grinding to a halt when physical
memory is exhausted the kernel will randomly* select processes to kill in order
to free up memory.

Calculating the maximum possible memory usage is essential in creating a stable
Accumulo VM setup. Safely engineering memory allocation for stability is a
matter of then bringing the calculated maximum memory usage under the physical
memory by a healthy margin. The margin is to account for operating system-level
operations, such as managing process, maintaining virtual memory pages, and
file system caching. When the java out-of-memory killer finds your process, you
will probably only see evidence of that in /var/log/messages. Out-of-memory
process kills do not show up in Accumulo or Hadoop logs.

To calculate the max memory usage of all java virtual machine (JVM) processes
add the maximum heap size (often limited by a -Xmx…​ argument, such as in
accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage
includes the following:

"Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as -XX:MaxPermSize:100m, and is typically tens of megabytes.

Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore

Socket buffers, where the JVM stores send and receive buffers for each socket.

Thread stacks, where the JVM allocates memory to manage each thread.

Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml.

Garbage collection space, where the JVM stores information used for garbage collection.

You can assume that each Hadoop and Accumulo process will use ~100-150MB for
Off-heap memory, plus the in-memory map of the Accumulo TServer process. A
simple calculation for physical memory requirements follows:

These calculations can add up quickly with the large number of processes,
especially in constrained VM environments. To reduce the physical memory
requirements, it is a good idea to reduce maximum heap limits and turn off
unnecessary processes. If you’re not using YARN in your application, you can
turn off the ResourceManager and NodeManager. If you’re not expecting to
re-provision the cluster frequently you can turn off or reduce provisioning
processes such as Salt Stack minions and masters.

Disk Space

Disk space is primarily used for two operations: storing data and storing logs.
While Accumulo generally stores all of its key/value data in HDFS, Accumulo,
Hadoop, and Zookeeper all store a significant amount of logs in a directory on
a local file system. Care should be taken to make sure that (a) limitations to
the amount of logs generated are in place, and (b) enough space is available to
host the generated logs on the partitions that they are assigned. When space is
not available to log, processes will hang. This can cause interruptions in
availability of Accumulo, as well as cascade into failures of various
processes.

Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of
them has a way of limiting the logs and directing them to a particular
directory. Logs are generated independently for each process, so when
considering the total space you need to add up the maximum logs generated by
each process. Typically, a rolling log setup in which each process can generate
something like 10 100MB files is instituted, resulting in a maximum file system
usage of 1GB per process. Default setups for Hadoop and Zookeeper are often
unbounded, so it is important to set these limits in the logging configuration
files for each subsystem. Consult the user manual for each system for
instructions on how to limit generated logs.

Zookeeper Interaction

Accumulo is designed to scale up to thousands of nodes. At that scale,
intermittent interruptions in network service and other rare failures of
compute nodes become more common. To limit the impact of node failures on
overall service availability, Accumulo uses a heartbeat monitoring system that
leverages Zookeeper’s ephemeral locks. There are several conditions that can
occur that cause Accumulo process to lose their Zookeeper locks, some of which
are true interruptions to availability and some of which are false positives.
Several of these conditions become more common in VM environments, where they
can be exacerbated by resource constraints and clock drift.

Accumulo includes a mechanism to limit the impact of the false positives known
as the Watcher. The watcher monitors Accumulo processes and will restart
them when they fail for certain reasons. The watcher can be configured within
the accumulo-env.sh file inside of Accumulo’s configuration directory. We
recommend using the watcher to monitor Accumulo processes, as it will restore
the system to full capacity without administrator interaction after many of the
common failure modes.

16.15.2. Tested Versions

Another large consideration for Accumulo stability is to use versions of
software that have been tested together in a VM environment. Any cluster of
processes that have not been tested together are likely to expose running
conditions that vary from the environments individually tested in the various
components. For example, Accumulo’s use of HDFS includes many short block
reads, which differs from the more common full file read used in most
map/reduce applications. We have found that certain versions of Accumulo and
Hadoop will include stability bugs that greatly affect overall stability. In
our testing, Accumulo 1.6.2, Hadoop 2.6.0, and Zookeeper 3.4.6 resulted in a
stable VM clusters that did not fail a month of testing, while Accumulo 1.6.1,
Hadoop 2.5.1, and Zookeeper 3.4.5 had a mean time between failure of less than
a week under heavy ingest and query load. We expect that results will vary with
other configurations, and you should choose your software versions with that in
mind.

17. Multi-Volume Installations

This is an advanced configuration setting for very large clusters
under a lot of write pressure.

The HDFS NameNode holds all of the metadata about the files in
HDFS. For fast performance, all of this information needs to be stored
in memory. A single NameNode with 64G of memory can store the
metadata for tens of millions of files.However, when scaling beyond a
thousand nodes, an active Accumulo system can generate lots of updates
to the file system, especially when data is being ingested. The large
number of write transactions to the NameNode, and the speed of a
single edit log, can become the limiting factor for large scale
Accumulo installations.

You can see the effect of slow write transactions when the Accumulo
Garbage Collector takes a long time (more than 5 minutes) to delete
the files Accumulo no longer needs. If your Garbage Collector
routinely runs in less than a minute, the NameNode is performing well.

However, if you do begin to experience slow-down and poor GC
performance, Accumulo can be configured to use multiple NameNode
servers. The configuration “instance.volumes” should be set to a
comma-separated list, using full URI references to different NameNode
servers:

The introduction of multiple volume support in 1.6 changed the way Accumulo
stores pointers to files. It now stores fully qualified URI references to
files. Before 1.6, Accumulo stored paths that were relative to a table
directory. After an upgrade these relative paths will still exist and are
resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in
the same way they were before 1.6.

If the URI for a namenode changes (e.g. namenode was running on host1 and its
moved to host2), then Accumulo will no longer function. Even if Hadoop and
Accumulo configurations are changed, the fully qualified URIs stored in
Accumulo will still contain the old URI. To handle this Accumulo has the
following configuration property for replacing URI stored in its metadata. The
example configuration below will replace ns1 with nsA and ns2 with nsB in
Accumulo metadata. For this property to take affect, Accumulo will need to be
restarted.

Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for
managing the fully qualified URIs stored in Accumulo. Viewfs and HA namenode
both introduce a level of indirection in the Hadoop configuration. For
example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration.
If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to
hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo.
A production system should probably use a HA namenode. Viewfs may be useful on
a test system with a single non HA namenode.

You may also want to configure your cluster to use Federation,
available in Hadoop 2.0, which allows DataNodes to respond to multiple
NameNode servers, so you do not have to partition your DataNodes by
NameNode.

18. Troubleshooting

18.1. Logs

Q: The tablet server does not seem to be running!? What happened?

Accumulo is a distributed system. It is supposed to run on remote
equipment, across hundreds of computers. Each program that runs on
these remote computers writes down events as they occur, into a local
file. By default, this is defined in
$ACCUMULO_HOME/conf/accumule-env.sh as ACCUMULO_LOG_DIR.

A: Look in the $ACCUMULO_LOG_DIR/tserver*.log file. Specifically, check the end of the file.

Q: The tablet server did not start and the debug log does not exists! What happened?

When the individual programs are started, the stdout and stderr output
of these programs are stored in .out and .err files in
$ACCUMULO_LOG_DIR. Often, when there are missing configuration
options, files or permissions, messages will be left in these files.

18.2. Monitor

There’s a small web server that collects information about all the
components that make up a running Accumulo instance. It will highlight
unusual or unexpected conditions.

A: Point your browser to the monitor (typically the master host, on port 50095). Is anything red or yellow?

Q: My browser is reporting connection refused, and I cannot get to the monitor

The monitor program’s output is also written to .err and .out files in
the $ACCUMULO_LOG_DIR. Look for problems in this file if the
$ACCUMULO_LOG_DIR/monitor*.log file does not exist.

A: The monitor program is probably not running. Check the log files for errors.

Q: My browser hangs trying to talk to the monitor.

Your browser needs to be able to reach the monitor program. Often
large clusters are firewalled, or use a VPN for internal
communications. You can use SSH to proxy your browser to the cluster,
or consult with your system administrator to gain access to the server
from your browser.

It is sometimes helpful to use a text-only browser to sanity-check the
monitor while on the machine running the monitor:

$ links http://localhost:50095

A: Verify that you are not firewalled from the monitor if it is running on a remote host.

Q: The monitor responds, but there are no numbers for tservers and tables. The summary page says the master is down.

The monitor program gathers all the details about the master and the
tablet servers through the master. It will be mostly blank if the
master is down.

A: Check for a running master.

18.3. HDFS

Accumulo reads and writes to the Hadoop Distributed File System.
Accumulo needs this file system available at all times for normal operations.

Q: Accumulo is having problems “getting a block blk_1234567890123.” How do I fix it?

This troubleshooting guide does not cover HDFS, but in general, you
want to make sure that all the datanodes are running and an fsck check
finds the file system clean:

to locate the block references of individual corrupt files and use those
references to search the name node and individual data node logs to determine which
servers those blocks have been assigned and then try to fix any underlying file
system issues on those nodes.

On a larger cluster, you may need to increase the number of Xcievers for HDFS DataNodes:

18.4. Zookeeper

Q: accumulo init is hanging. It says something about talking to zookeeper.

Zookeeper is also a distributed service. You will need to ensure that
it is up. You can run the zookeeper command line tool to connect to
any one of the zookeeper servers:

$ zkCli.sh -server zoohost
...
[zk: zoohost:2181(CONNECTED) 0]

It is important to see the word CONNECTED! If you only see
CONNECTING you will need to diagnose zookeeper errors.

A: Check to make sure that zookeeper is up, and that
$ACCUMULO_HOME/conf/accumulo-site.xml has been pointed to
your zookeeper server(s).

Q: Zookeeper is running, but it does not say CONNECTED

Zookeeper processes talk to each other to elect a leader. All updates
go through the leader and propagate to a majority of all the other
nodes. If a majority of the nodes cannot be reached, zookeeper will
not allow updates. Zookeeper also limits the number connections to a
server from any other single host. By default, this limit can be as small as 10
and can be reached in some everything-on-one-machine test configurations.

You can check the election status and connection status of clients by
asking the zookeeper nodes for their status. You connect to zookeeper
and ask it with the four-letter stat command:

A: Check zookeeper status, verify that it has a quorum, and has not exceeded maxClientCnxns.

Q: My tablet server crashed! The logs say that it lost its zookeeper lock.

Tablet servers reserve a lock in zookeeper to maintain their ownership
over the tablets that have been assigned to them. Part of their
responsibility for keeping the lock is to send zookeeper a keep-alive
message periodically. If the tablet server fails to send a message in
a timely fashion, zookeeper will remove the lock and notify the tablet
server. If the tablet server does not receive a message from
zookeeper, it will assume its lock has been lost, too. If a tablet
server loses its lock, it kills itself: everything assumes it is dead
already.

A: Investigate why the tablet server did not send a timely message to
zookeeper.

18.4.1. Keeping the tablet server lock

Q: My tablet server lost its lock. Why?

The primary reason a tablet server loses its lock is that it has been pushed into swap.

A large java program (like the tablet server) may have a large portion
of its memory image unused. The operation system will favor pushing
this allocated, but unused memory into swap so that the memory can be
re-used as a disk buffer. When the java virtual machine decides to
access this memory, the OS will begin flushing disk buffers to return that
memory to the VM. This can cause the entire process to block long
enough for the zookeeper lock to be lost.

A: Configure your system to reduce the kernel parameter swappiness from the default (60) to zero.

Q: My tablet server lost its lock, and I have already set swappiness to
zero. Why?

Be careful not to over-subscribe memory. This can be easy to do if
your accumulo processes run on the same nodes as hadoop’s map-reduce
framework. Remember to add up:

size of the JVM for the tablet server

size of the in-memory map, if using the native map implementation

size of the JVM for the data node

size of the JVM for the task tracker

size of the JVM times the maximum number of mappers and reducers

size of the kernel and any support processes

If a 16G node can run 2 mappers and 2 reducers, and each can be 2G,
then there is only 8G for the data node, tserver, task tracker and OS.

A: Reduce the memory footprint of each component until it fits comfortably.

Q: My tablet server lost its lock, swappiness is zero, and my node has lots of unused memory!

The JVM memory garbage collector may fall behind and cause a
"stop-the-world" garbage collection. On a large memory virtual
machine, this collection can take a long time. This happens more
frequently when the JVM is getting low on free memory. Check the logs
of the tablet server. You will see lines like this:

When freemem becomes small relative to the amount of memory
needed, the JVM will spend more time finding free memory than
performing work. This can cause long delays in sending keep-alive
messages to zookeeper.

A: Ensure the tablet server JVM is not running low on memory.

18.5. Tools

The accumulo script can be used to run classes from the command line.
This section shows how a few of the utilities work, but there are many
more.

There’s a class that will examine an accumulo storage file and print
out basic metadata.

18.6. System Metadata Tables

Accumulo tracks information about tables in metadata tables. The metadata for
most tables is contained within the metadata table in the accumulo namespace,
while metadata for that table is contained in the root table in the accumulo
namespace. The root table is composed of a single tablet, which does not
split, so it is also called the root tablet. Information about the root
table, such as its location and write-ahead logs, are stored in ZooKeeper.

Every tablet gets its own row. Every row starts with the table id followed by
; or <, and followed by the end row split point for that tablet.

file:/default_tablet/F000009y.rf [] 186,1

File entry for this tablet. This tablet contains a single file reference. The
file is /accumulo/tables/3/default_tablet/F000009y.rf. It contains 1
key/value pair, and is 186 bytes long.

last:13fe86cd27101e5 [] 127.0.0.1:9997

Last location for this tablet. It was last held on 127.0.0.1:9997, and the
unique tablet server lock data was 13fe86cd27101e5. The default balancer
will tend to put tablets back on their last location.

loc:13fe86cd27101e5 [] 127.0.0.1:9997

The current location of this tablet.

log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0. …​

This tablet has a reference to a single write-ahead log. This file can be found in
/accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995. The value
of this entry could refer to multiple files. This tablet’s data is encoded as
6 within the log.

srv:dir [] /default_tablet

Files written for this tablet will be placed into
/accumulo/tables/3/default_tablet.

srv:flush [] 1

Flush id. This table has successfully completed the flush with the id of 1.

srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5

This is the lock information for the tablet holding the present lock. This
information is checked against zookeeper whenever this is updated, which
prevents a metadata update from a tablet server that no longer holds its
lock.

srv:time [] M1373998392323

This indicates the time time type (M for milliseconds or L for logical) and the timestamp of the most recently written key in this tablet. It is used to ensure automatically assigned key timestamps are strictly increasing for the tablet, regardless of the tablet server’s system time.

~tab:~pr [] \x00

The end-row marker for the previous tablet (prev-row). The first byte
indicates the presence of a prev-row. This tablet has the range (-inf, +inf),
so it has no prev-row (or end row).

Besides these columns, you may see:

rowId future:zooKeeperID location

Tablet has been assigned to a tablet, but not yet loaded.

~del:filename

When a tablet server is done use a file, it will create a delete marker in the appropriate metadata table, unassociated with any tablet. The garbage collector will remove the marker, and the file, when no other reference to the file exists.

~blip:txid

Bulk-Load In Progress marker.

rowId loaded:filename

A file has been bulk-loaded into this tablet, however the bulk load has not yet completed on other tablets, so this marker prevents the file from being loaded multiple times.

rowId !cloned

A marker that indicates that this tablet has been successfully cloned.

rowId splitRatio:ratio

A marker that indicates a split is in progress, and the files are being split at the given ratio.

rowId chopped

A marker that indicates that the files in the tablet do not contain keys outside the range of the tablet.

rowId scan

A marker that prevents a file from being removed while there are still active scans using it.

18.7. Simple System Recovery

Q: One of my Accumulo processes died. How do I bring it back?

The easiest way to bring all services online for an Accumulo instance is to run the start-all.sh script.

$ bin/start-all.sh

This process will check the process listing, using jps on each host before attempting to restart a service on the given host.
Typically, this check is sufficient except in the face of a hung/zombie process. For large clusters, it may be
undesirable to ssh to every node in the cluster to ensure that all hosts are running the appropriate processes and start-here.sh may be of use.

$ ssh host_with_dead_process
$ bin/start-here.sh

start-here.sh should be invoked on the host which is missing a given process. Like start-all.sh, it will start all
necessary processes that are not currently running, but only on the current host and not cluster-wide. Tools such as pssh or
pdsh can be used to automate this process.

start-server.sh can also be used to start a process on a given host; however, it is not generally recommended for
users to issue this directly as the start-all.sh and start-here.sh scripts provide the same functionality with
more automation and are less prone to user error.

A: Use start-all.sh or start-here.sh.

Q: My process died again. Should I restart it via cron or tools like supervisord?

A: A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems are due to a
misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service restart inside of Accumulo
is generally an undesirable situation as it is indicative of a problem that is being masked and ignored. Accumulo
processes should be stable on the order of months and not require frequent restart.

18.8. Advanced System Recovery

18.8.1. HDFS Failure

Q: I had disasterous HDFS failure. After bringing everything back up, several tablets refuse to go online.

Data written to tablets is written into memory before being written into indexed files. In case the server
is lost before the data is saved into a an indexed file, all data stored in memory is first written into a
write-ahead log (WAL). When a tablet is re-assigned to a new tablet server, the write-ahead logs are read to
recover any mutations that were in memory when the tablet was last hosted.

If a write-ahead log cannot be read, then the tablet is not re-assigned. All it takes is for one of
the blocks in the write-ahead log to be missing. This is unlikely unless multiple data nodes in HDFS have been
lost.

A: Get the WAL files online and healthy. Restore any data nodes that may be down.

Q: How do find out which tablets are offline?

A: Use accumulo admin checkTablets

$ bin/accumulo admin checkTablets

Q: I lost three data nodes, and I’m missing blocks in a WAL. I don’t care about data loss, how
can I get those tablets online?

See the discussion in System Metadata Tables, which shows a typical metadata table listing.
The entries with a column family of log are references to the WAL for that tablet.
If you know what WAL is bad, you can find all the references with a grep in the shell:

Note: the colon (:) is omitted when specifying the row cf cq for the delete command.

The master will automatically discover the tablet no longer has a bad WAL reference and will
assign the tablet. You will need to remove the reference from all the tablets to get them
online.

Q: The metadata (or root) table has references to a corrupt WAL.

This is a much more serious state, since losing updates to the metadata table will result
in references to old files which may not exist, or lost references to new files, resulting
in tablets that cannot be read, or large amounts of data loss.

The best hope is to restore the WAL by fixing HDFS data nodes and bringing the data back online.
If this is not possible, the best approach is to re-create the instance and bulk import all files from
the old instance into a new tables.

A complete set of instructions for doing this is outside the scope of this guide,
but the basic approach is:

Use tables -l in the shell to discover the table name to table id mapping

Stop all accumulo processes on all nodes

Move the accumulo directory in HDFS out of the way:
$ hadoop fs -mv /accumulo /corrupt

Re-initalize accumulo

Recreate tables, users and permissions

Import the directories under /corrupt/tables/<id> into the new instance

Q: One or more HDFS Files under /accumulo/tables are corrupt

Accumulo maintains multiple references into the tablet files in the metadata
tables and within the tablet server hosting the file, this makes it difficult to
reliably just remove those references.

The directory structure in HDFS for tables will follow the general structure:

If files under /accumulo/tables are corrupt, the best course of action is to
recover those files in hdsf see the section on HDFS. Once these recovery efforts
have been exhausted, the next step depends on where the missing file(s) are
located. Different actions are required when the bad files are in Accumulo data
table files or if they are metadata table files.

Data File Corruption

When an Accumulo data file is corrupt, the most reliable way to restore Accumulo
operations is to replace the missing file with an “empty” file so that
references to the file in the METADATA table and within the tablet server
hosting the file can be resolved by Accumulo. An empty file can be created using
the CreateEmpty utiity:

The process is to delete the corrupt file and then move the empty file into its
place (The generated empty file can be copied and used multiple times if necessary and does not need
to be regenerated each time)

If the corrupt files are metadata files, see System Metadata Tables (under the path
/accumulo/tables/!0) then you will need to rebuild
the metadata table by initializing a new instance of Accumulo and then importing
all of the existing data into the new instance. This is the same procedure as
recovering from a zookeeper failure (see ZooKeeper Failure), except that
you will have the benefit of having the existing user and table authorizations
that are maintained in zookeeper.

You can use the DumpZookeeper utility to save this information for reference
before creating the new instance. You will not be able to use RestoreZookeeper
because the table names and references are likely to be different between the
original and the new instances, but it can serve as a reference.

A: If the files cannot be recovered, replace corrupt data files with a empty
rfiles to allow references in the metadata table and in the tablet servers to be
resolved. Rebuild the metadata table if the corrupt files are metadata files.

18.8.2. ZooKeeper Failure

Q: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?

ZooKeeper, in addition to its lock-service capabilities, also serves to bootstrap an Accumulo
instance from some location in HDFS. It contains the pointers to the root tablet in HDFS which
is then used to load the Accumulo metadata tablets, which then loads all user tables. ZooKeeper
also stores all namespace and table configuration, the user database, the mapping of table IDs to
table names, and more across Accumulo restarts.

Presently, the only way to recover such an instance is to initialize a new instance and import all
of the old data into the new instance. The easiest way to tackle this problem is to first recreate
the mapping of table ID to table name and then recreate each of those tables in the new instance.
Set any necessary configuration on the new tables and add some split points to the tables to close
the gap between how many splits the old table had and no splits.

The directory structure in HDFS for tables will follow the general structure:

For each table, make a new directory that you can move (or copy if you have the HDFS space to do so)
all of the rfiles for a given table into. For example, to process the table with an ID of 1, make a new directory,
say /new-table-1 and then copy all files from /accumulo/tables/1/*/*.rf into that directory. Additionally,
make a directory, /new-table-1-failures, for any failures during the import process. Then, issue the import
command using the Accumulo shell into the new table, telling Accumulo to not re-set the timestamp:

Any RFiles which were failed to be loaded will be placed in /new-table-1-failures. Rfiles that were successfully
imported will no longer exist in /new-table-1. For failures, move them back to the import directory and retry
the importdirectory command.

It is extremely important to note that this approach may introduce stale data back into
the tables. For a few reasons, RFiles may exist in the table directory which are candidates for deletion but have
not yet been deleted. Additionally, deleted data which was not compacted away, but still exists in write-ahead logs if
the original instance was somehow recoverable, will be re-introduced in the new instance. Table splits and merges
(which also include the deleteRows API call on TableOperations, are also vulnerable to this problem. This process should
not be used if these are unacceptable risks. It is possible to try to re-create a view of the accumulo.metadata
table to prune out files that are candidates for deletion, but this is a difficult task that also may not be entirely accurate.

Likewise, it is also possible that data loss may occur from write-ahead log (WAL) files which existed on the old table but
were not minor-compacted into an RFile. Again, it may be possible to reconstruct the state of these WAL files to
replay data not yet in an RFile; however, this is a difficult task and is not implemented in any automated fashion.

A: The importdirectory shell command can be used to import RFiles from the old instance into a newly created instance,
but extreme care should go into the decision to do this as it may result in reintroduction of stale data or the
omission of new data.

18.9. Upgrade Issues

Q: I upgraded from 1.4 to 1.5 to 1.6 but still have some WAL files on local disk. Do I have any way to recover them?

A: Yes, you can recover them by running the LocalWALRecovery utility on each node that needs recovery performed. The utility
will default to using the directory specified by logger.dir.walog in your configuration, or can be
overriden by using the --local-wal-directories option on the tool. It can be invoked as follows:

This simple file naming convention allows you to see the basic structure of the files from just
their filenames, and reason about what should be happening to them next, just
by scanning their entries in the metadata tables.

For example, if you see multiple files with M prefixes, the tablet is, or was, up against its
maximum file limit, so it began merging memory updates with files to keep the file count reasonable. This
slows down ingest performance, so knowing there are many files like this tells you that the system
is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.

Appendix A: Configuration Management

A.1. Configuration Overview

All accumulo properties have a default value in the source code. Properties can also be set
in accumulo-site.xml and in zookeeper on per-table or system-wide basis. If properties are set in more than one location,
accumulo will choose the property with the highest precedence. This order of precedence is described
below (from highest to lowest):

A.1.1. Zookeeper table properties

Table properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. While table properties take precedent over system properties, both will override properties set in accumulo-site.xml

Table properties consist of all properties with the table.* prefix. Table properties are configured on a per-table basis using the following shell commmand:

config -t TABLE -s PROPERTY=VALUE

A.1.2. Zookeeper system properties

System properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. System properties consist of all properties with a yes in the Zookeeper Mutable column in the table below. They are set with the following shell command:

config -s PROPERTY=VALUE

If a table.* property is set using this method, the value will apply to all tables except those configured on per-table basis (which have higher precedence).

While most system properties take effect immediately, some require a restart of the process which is indicated in Zookeeper Mutable.

A.1.3. accumulo-site.xml

Accumulo processes (master, tserver, etc) read their local accumulo-site.xml on start up. Therefore, changes made to accumulo-site.xml must rsynced across the cluster and processes must be restarted to apply changes.

Certain properties (indicated by a no in Zookeeper Mutable) cannot be set in zookeeper and only set in this file. The accumulo-site.xml also allows you to configure tablet servers with different settings.

A.1.4. Default Values

All properties have a default value in the source code. This value has the lowest precedence and is overriden if set in accumulo-site.xml or zookeeper.

While the default value is usually optimal, there are cases where a change can increase query and ingest performance.

A.1.5. ZooKeeper Property Considerations

Any properties that are stored in ZooKeeper should consider the limitations of ZooKeeper itself with respect to the
number of nodes and the size of the node data. Custom table properties and options for Iterators configured on tables
are two areas in which there aren’t any failsafes built into the API that can prevent the user from making this mistake.

While these properties have the ability to add some much needed dynamic configuration tools, use cases which might fall
into these warnings should be reconsidered.

A.2. Configuration in the Shell

The config command in the shell allows you to view the current system configuration. You can also use the -t option to view a table’s configuration as below:

rpc.javax.net.ssl.trustStore

rpc.javax.net.ssl.trustStorePassword

Password used to encrypt the SSL truststore. Leave blank to use no password

Type: STRINGZookeeper Mutable: noDefault Value:empty

rpc.javax.net.ssl.trustStoreType

Type of SSL truststore

Type: STRINGZookeeper Mutable: noDefault Value:jks

rpc.sasl.qop

The quality of protection to be used with SASL. Valid values are auth, auth-int, and auth-conf

Type: STRINGZookeeper Mutable: noDefault Value:auth

rpc.ssl.cipher.suites

Comma separated list of cipher suites that can be used by accepted connections

Type: STRINGZookeeper Mutable: noDefault Value:empty

rpc.ssl.client.protocol

The protocol used to connect to a secure server, must be in the list of enabled protocols on the server side (rpc.ssl.server.enabled.protocols)

Type: STRINGZookeeper Mutable: noDefault Value:TLSv1

rpc.ssl.server.enabled.protocols

Comma separated list of protocols that can be used to accept connections

Type: STRINGZookeeper Mutable: noDefault Value:TLSv1,TLSv1.1,TLSv1.2

rpc.useJsse

Use JSSE system properties to configure SSL rather than the rpc.javax.net.ssl.* Accumulo properties

Type: BOOLEANZookeeper Mutable: noDefault Value:false

A.3.2. instance.*

Properties in this category must be consistent throughout a cloud. This is enforced and servers won’t be able to communicate if these differ.

instance.dfs.dir

Deprecated. HDFS directory in which accumulo instance will run. Do not change after accumulo is initialized.

Type: ABSOLUTEPATHZookeeper Mutable: noDefault Value:/accumulo

instance.dfs.uri

Deprecated. A url accumulo should use to connect to DFS. If this is empty, accumulo will obtain this information from the hadoop configuration. This property will only be used when creating new files if instance.volumes is empty. After an upgrade to 1.6.0 Accumulo will start using absolute paths to reference files. Files created before a 1.6.0 upgrade are referenced via relative paths. Relative paths will always be resolved using this config (if empty using the hadoop config).

Type: URIZookeeper Mutable: noDefault Value:empty

instance.rpc.sasl.allowed.host.impersonation

One-line configuration property controlling the network locations (hostnames) that are allowed to impersonate other users

Type: STRINGZookeeper Mutable: noDefault Value:empty

instance.rpc.sasl.allowed.user.impersonation

One-line configuration property controlling what users are allowed to impersonate other users

instance.rpc.ssl.clientAuth

instance.rpc.ssl.enabled

Use SSL for socket connections from clients and among accumulo services. Mutually exclusive with SASL RPC configuration.

Type: BOOLEANZookeeper Mutable: noDefault Value:false

instance.secret

A secret unique to a given instance that all servers must know in order to communicate with one another.It should be changed prior to the initialization of Accumulo. To change it after Accumulo has been initialized, use the ChangeSecret tool and then update conf/accumulo-site.xml everywhere. Before using the ChangeSecret tool, make sure Accumulo is not running and you are logged in as the user that controls Accumulo files in HDFS. To use the ChangeSecret tool, run the command: ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret

Type: STRINGZookeeper Mutable: noDefault Value:DEFAULT

instance.security.authenticator

The authenticator class that accumulo will use to determine if a user has privilege to perform an action

instance.volumes

A comma seperated list of dfs uris to use. Files will be stored across these filesystems. If this is empty, then instance.dfs.uri will be used. After adding uris to this list, run accumulo init --add-volume and then restart tservers. If entries are removed from this list then tservers will need to be restarted. After a uri is removed from the list Accumulo will not create new files in that location, however Accumulo can still reference files created at that location before the config change. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.

Type: STRINGZookeeper Mutable: noDefault Value:empty

instance.volumes.replacements

Since accumulo stores absolute URIs changing the location of a namenode could prevent Accumulo from starting. The property helps deal with that situation. Provide a comma separated list of uri replacement pairs here if a namenode location changes. Each pair shold be separated with a space. For example, if hdfs://nn1 was replaced with hdfs://nnA and hdfs://nn2 was replaced with hdfs://nnB, then set this property to hdfs://nn1 hdfs://nnA,hdfs://nn2 hdfs://nnB Replacements must be configured for use. To see which volumes are currently in use, run accumulo admin volumes -l. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.

Type: STRINGZookeeper Mutable: noDefault Value:empty

instance.zookeeper.host

Comma separated list of zookeeper servers

Type: HOSTLISTZookeeper Mutable: noDefault Value:localhost:2181

instance.zookeeper.timeout

Zookeeper session timeout; max value when represented as milliseconds should be no larger than 2147483647

Type: TIMEDURATIONZookeeper Mutable: noDefault Value:30s

A.3.3. instance.rpc.sasl.impersonation.* (Deprecated)

Deprecated. Prefix that allows configuration of users that are allowed to impersonate other users

A.3.4. general.*

Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud.

general.classpaths

A list of all of the places to look for a class. Order does matter, as it will look for the jar starting in the first location to the last. Please note, hadoop conf and hadoop lib directories NEED to be here, along with accumulo lib and zookeeper directory. Supports full regex on filename alone.

general.kerberos.keytab

Path to the kerberos keytab to use. Leave blank if not using kerberoized hdfs

Type: PATHZookeeper Mutable: noDefault Value:empty

general.kerberos.principal

Name of the kerberos principal to use. _HOST will automatically be replaced by the machines hostname in the hostname portion of the principal. Leave blank if not using kerberoized hdfs

Type: STRINGZookeeper Mutable: noDefault Value:empty

general.kerberos.renewal.period

The amount of time between attempts to perform Kerberos ticket renewals. This does not equate to how often tickets are actually renewed (which is performed at 80% of the ticket lifetime).

Type: TIMEDURATIONZookeeper Mutable: noDefault Value:30s

general.legacy.metrics

Use the old metric infrastructure configured by accumulo-metrics.xml, instead of Hadoop Metrics2

Type: BOOLEANZookeeper Mutable: noDefault Value:false

general.max.scanner.retry.period

The maximum amount of time that a Scanner should wait before retrying a failed RPC

Type: TIMEDURATIONZookeeper Mutable: noDefault Value:5s

general.rpc.timeout

Time to wait on I/O for simple, short RPC calls

Type: TIMEDURATIONZookeeper Mutable: noDefault Value:120s

general.security.credential.provider.paths

Comma-separated list of paths to CredentialProviders

Type: STRINGZookeeper Mutable: noDefault Value:empty

general.server.message.size.max

The maximum size of a message that can be sent to a server.

Type: MEMORYZookeeper Mutable: noDefault Value:1G

general.server.simpletimer.threadpool.size

The number of threads to use for server-internal scheduled tasks

Type: COUNTZookeeper Mutable: noDefault Value:1

general.vfs.cache.dir

Directory to use for the vfs cache. The cache will keep a soft reference to all of the classes loaded in the VM. This should be on local disk on each node with sufficient space. It defaults to ${java.io.tmpdir}/accumulo-vfs-cache-${user.name}

tserver.archive.walogs

tserver.assignment.concurrent.max

The number of threads available to load tablets. Recoveries are still performed serially.

Type: COUNTZookeeper Mutable: yesDefault Value:2

tserver.assignment.duration.warning

The amount of time an assignment can run before the server will print a warning along with the current stack trace. Meant to help debug stuck assignments

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:10m

tserver.bloom.load.concurrent.max

The number of concurrent threads that will load bloom filters in the background. Setting this to zero will make bloom filters load in the foreground.

Type: COUNTZookeeper Mutable: yesDefault Value:4

tserver.bulk.assign.threads

The master delegates bulk file processing and assignment to tablet servers. After the bulk file has been processed, the tablet server will assign the file to the appropriate tablets on all servers. This property controls the number of threads used to communicate to the other servers.

Type: COUNTZookeeper Mutable: yesDefault Value:1

tserver.bulk.process.threads

The master will task a tablet server with pre-processing a bulk file prior to assigning it to the appropriate tablet servers. This configuration value controls the number of threads used to process the files.

Type: COUNTZookeeper Mutable: yesDefault Value:1

tserver.bulk.retry.max

The number of times the tablet server will attempt to assign a file to a tablet as it migrates and splits.

Type: COUNTZookeeper Mutable: yesDefault Value:5

tserver.bulk.timeout

The time to wait for a tablet server to process a bulk import request.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:5m

tserver.cache.data.size

Specifies the size of the cache for file data blocks.

Type: MEMORYZookeeper Mutable: yesDefault Value:128M

tserver.cache.index.size

Specifies the size of the cache for file indices.

Type: MEMORYZookeeper Mutable: yesDefault Value:512M

tserver.client.timeout

Time to wait for clients to continue scans before closing a session.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:3s

tserver.compaction.major.concurrent.max

The maximum number of concurrent major compactions for a tablet server

Type: COUNTZookeeper Mutable: yesDefault Value:3

tserver.compaction.major.delay

Time a tablet server will sleep between checking which tablets need compaction.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:30s

tserver.compaction.major.thread.files.open.max

Max number of files a major compaction thread can open at once.

Type: COUNTZookeeper Mutable: yesDefault Value:10

tserver.compaction.major.trace.percent

The percent of major compactions to trace

Type: FRACTIONZookeeper Mutable: yesDefault Value:0.1

tserver.compaction.minor.concurrent.max

The maximum number of concurrent minor compactions for a tablet server

Type: COUNTZookeeper Mutable: yesDefault Value:4

tserver.compaction.minor.trace.percent

The percent of minor compactions to trace

Type: FRACTIONZookeeper Mutable: yesDefault Value:0.1

tserver.compaction.warn.time

When a compaction has not made progress for this time period, a warning will be logged

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:10m

tserver.default.blocksize

Specifies a default blocksize for the tserver caches

Type: MEMORYZookeeper Mutable: yesDefault Value:1M

tserver.dir.memdump

A long running scan could possibly hold memory that has been minor compacted. To prevent this, the in memory map is dumped to a local file and the scan is switched to that local file. We can not switch to the minor compacted file because it may have been modified by iterators. The file dumped to the local dir is an exact copy of what was in memory.

Type: PATHZookeeper Mutable: yesDefault Value:/tmp

tserver.files.open.idle

Tablet servers leave previously used files open for future queries. This setting determines how much time an unused file should be kept open until it is closed.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:1m

tserver.hold.time.max

The maximum time for a tablet server to be in the "memory full" state. If the tablet server cannot write out memory in this much time, it will assume there is some failure local to its node, and quit. A value of zero is equivalent to forever.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:5m

tserver.memory.manager

tserver.memory.maps.max

Maximum amount of memory that can be used to buffer data written to a tablet server. There are two other properties that can effectively limit memory usage table.compaction.minor.logs.threshold and tserver.walog.max.size. Ensure that table.compaction.minor.logs.threshold * tserver.walog.max.size >= this property.

Type: MEMORYZookeeper Mutable: yesDefault Value:1G

tserver.memory.maps.native.enabled

An in-memory data store for accumulo implemented in c++ that increases the amount of data accumulo can hold in memory and avoids Java GC pauses.

tserver.metadata.readahead.concurrent.max

The maximum number of concurrent metadata read ahead that will execute.

Type: COUNTZookeeper Mutable: yesDefault Value:8

tserver.migrations.concurrent.max

The maximum number of concurrent tablet migrations for a tablet server

Type: COUNTZookeeper Mutable: yesDefault Value:1

tserver.monitor.fs

When enabled the tserver will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.

Type: BOOLEANZookeeper Mutable: yesDefault Value:true

tserver.mutation.queue.max

Deprecated. This setting is deprecated. See tserver.total.mutation.queue.max. The amount of memory to use to store write-ahead-log mutations-per-session before flushing them. Since the buffer is per write session, consider the max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of concurrent writers, increasing this buffer size decreases the frequncy of hsync calls. For a large number of concurrent writers a small buffers size is ok because of group commit.

tserver.server.threads.minimum

tserver.session.idle.max

When a tablet server’s SimpleTimer thread triggers to check idle sessions, this configurable option will be used to evaluate scan sessions to determine if they can be closed due to inactivity

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:1m

tserver.session.update.idle.max

When a tablet server’s SimpleTimer thread triggers to check idle sessions, this configurable option will be used to evaluate update sessions to determine if they can be closed due to inactivity

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:1m

tserver.sort.buffer.size

The amount of memory to use when sorting logs during recovery.

Type: MEMORYZookeeper Mutable: yesDefault Value:200M

tserver.tablet.split.midpoint.files.max

To find a tablets split points, all index files are opened. This setting determines how many index files can be opened at once. When there are more index files than this setting multiple passes must be made, which is slower. However opening too many files at once can cause problems.

Type: COUNTZookeeper Mutable: yesDefault Value:30

tserver.total.mutation.queue.max

The amount of memory used to store write-ahead-log mutations before flushing them.

Type: MEMORYZookeeper Mutable: yesDefault Value:50M

tserver.wal.blocksize

The size of the HDFS blocks used to write to the Write-Ahead log. If zero, it will be 110% of tserver.walog.max.size (that is, try to use just one block)

Type: MEMORYZookeeper Mutable: yesDefault Value:0

tserver.wal.replication

The replication to use when writing the Write-Ahead log to HDFS. If zero, it will use the HDFS default replication setting.

Type: COUNTZookeeper Mutable: yesDefault Value:0

tserver.wal.sync

Use the SYNC_BLOCK create flag to sync WAL writes to disk. Prevents problems recovering from sudden system resets.

Type: BOOLEANZookeeper Mutable: yesDefault Value:true

tserver.wal.sync.method

Deprecated. This property is deprecated. Use table.durability instead.

Type: STRINGZookeeper Mutable: yesDefault Value:hsync

tserver.walog.max.age

The maximum age for each write-ahead log.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:24h

tserver.walog.max.size

The maximum size for each write-ahead log. See comment for property tserver.memory.maps.max

Type: MEMORYZookeeper Mutable: yesDefault Value:1G

tserver.walog.maximum.wait.duration

The maximum amount of time to wait after a failure to create a WAL file.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:5m

tserver.walog.tolerated.creation.failures

The maximum number of failures tolerated when creating a new WAL file within the period specified by tserver.walog.failures.period. Exceeding this number of failures in the period causes the TabletServer to exit.

Type: COUNTZookeeper Mutable: yesDefault Value:50

tserver.walog.tolerated.wait.increment

The amount of time to wait between failures to create a WALog.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:1000ms

tserver.workq.threads

The number of threads for the distributed work queue. These threads are used for copying failed bulk files.

Type: COUNTZookeeper Mutable: yesDefault Value:2

A.3.7. tserver.replication.replayer.*

Allows configuration of implementation used to apply replicated data

A.3.8. logger.*

Properties in this category affect the behavior of the write-ahead logger servers

logger.dir.walog

This property is only needed if Accumulo was upgraded from a 1.4 or earlier version. In the upgrade to 1.5 this property is used to copy any earlier write ahead logs into DFS. In 1.6+, this property is used by the LocalWALRecovery utility in the event that something went wrong with that earlier upgrade. It is possible to specify a comma-separated list of directories.

Type: PATHZookeeper Mutable: yesDefault Value:walogs

A.3.9. gc.*

Properties in this category affect the behavior of the accumulo garbage collector.

gc.cycle.delay

Time between garbage collection cycles. In each cycle, old files no longer in use are removed from the filesystem.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:5m

gc.cycle.start

Time to wait before attempting to garbage collect any old files.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:30s

gc.file.archive

Archive any files/directories instead of moving to the HDFS trash or deleting

trace.zookeeper.path

A.3.12. trace.span.receiver.*

A.3.13. trace.token.property.*

The prefix used to create a token for storing distributed traces. For each propetry required by trace.token.type, place this prefix in front of it.

A.3.14. table.*

Properties in this category affect tablet server treatment of tablets, but can be configured on a per-table basis. Setting these properties in the site file will override the default globally for all tables and not any specific table. However, both the default and the global setting can be overridden per table using the table operations API or in the shell, which sets the overridden value in zookeeper. Restarting accumulo tablet servers after setting these properties in the site file will cause the global setting to take effect. However, you must use the API or the shell to change properties in zookeeper that are set on a table.

table.balancer

This property can be set to allow the LoadBalanceByTable load balancer to change the called Load Balancer for this table

table.bloom.enabled

table.bloom.error.rate

table.bloom.hash.type

The bloom filter hash type

Type: STRINGZookeeper Mutable: yesDefault Value:murmur

table.bloom.key.functor

A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor,,org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key.

table.bloom.load.threshold

This number of seeks that would actually use a bloom filter must occur before a file’s bloom filter is loaded. Set this to zero to initiate loading of bloom filters when a file is opened.

Type: COUNTZookeeper Mutable: yesDefault Value:1

table.bloom.size

Bloom filter size, as number of keys.

Type: COUNTZookeeper Mutable: yesDefault Value:1048576

table.cache.block.enable

Determines whether file block cache is enabled.

Type: BOOLEANZookeeper Mutable: yesDefault Value:false

table.cache.index.enable

Determines whether index cache is enabled.

Type: BOOLEANZookeeper Mutable: yesDefault Value:true

table.classpath.context

Per table classpath context

Type: STRINGZookeeper Mutable: yesDefault Value:empty

table.compaction.major.everything.idle

After a tablet has been idle (no mutations) for this time period it may have all of its files compacted into one. There is no guarantee an idle tablet will be compacted. Compactions of idle tablets are only started when regular compactions are not running. Idle compactions only take place for tablets that have one or more files.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:1h

table.compaction.major.ratio

minimum ratio of total input size to maximum input file size for running a major compactionWhen adjusting this property you may want to also adjust table.file.max. Want to avoid the situation where only merging minor compactions occur.

Type: FRACTIONZookeeper Mutable: yesDefault Value:3

table.compaction.minor.idle

After a tablet has been idle (no mutations) for this time period it may have its in-memory map flushed to disk in a minor compaction. There is no guarantee an idle tablet will be compacted.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:5m

table.compaction.minor.logs.threshold

When there are more than this many write-ahead logs against a tablet, it will be minor compacted. See comment for property tserver.memory.maps.max

Type: COUNTZookeeper Mutable: yesDefault Value:3

table.durability

The durability used to write to the write-ahead log. Legal values are: none, which skips the write-ahead log; log, which sends the data to the write-ahead log, but does nothing to make it durable; flush, which pushes data to the file system; and sync, which ensures the data is written to disk.

Type: DURABILITYZookeeper Mutable: yesDefault Value:sync

table.failures.ignore

If you want queries for your table to hang or fail when data is missing from the system, then set this to false. When this set to true missing data will be reported but queries will still run possibly returning a subset of the data.

Type: BOOLEANZookeeper Mutable: yesDefault Value:false

table.file.blocksize

Overrides the hadoop dfs.block.size setting so that files have better query performance. The maximum value for this is 2147483647

Type: MEMORYZookeeper Mutable: yesDefault Value:0B

table.file.compress.blocksize

Similar to the hadoop io.seqfile.compress.blocksize setting, so that files have better query performance. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)

Type: MEMORYZookeeper Mutable: yesDefault Value:100K

table.file.compress.blocksize.index

Determines how large index blocks can be in files that support multilevel indexes. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)

Type: MEMORYZookeeper Mutable: yesDefault Value:128K

table.file.compress.type

One of gz,lzo,none

Type: STRINGZookeeper Mutable: yesDefault Value:gz

table.file.max

Determines the max # of files each tablet in a table can have. When adjusting this property you may want to consider adjusting table.compaction.major.ratio also. Setting this property to 0 will make it default to tserver.scan.files.open.max-1, this will prevent a tablet from having more files than can be opened. Setting this property low may throttle ingest and increase query performance.

Type: COUNTZookeeper Mutable: yesDefault Value:15

table.file.replication

Determines how many replicas to keep of a tables' files in HDFS. When this value is LTE 0, HDFS defaults are used.

table.replication

table.scan.max.memory

The maximum amount of memory that will be used to cache results of a client query/scan. Once this limit is reached, the buffered data is sent to the client.

Type: MEMORYZookeeper Mutable: yesDefault Value:512K

table.security.scan.visibility.default

The security label that will be assumed at scan time if an entry does not have a visibility set.
Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry.
CAUTION: If a particular key has an empty security label AND its table’s default visibility is also empty, access will ALWAYS be granted for users with permission to that table. Additionally, if this field is changed, all existing data with an empty visibility label will be interpreted with the new label on the next scan.

Type: STRINGZookeeper Mutable: yesDefault Value:empty

table.split.endrow.size.max

Maximum size of end row

Type: MEMORYZookeeper Mutable: yesDefault Value:10K

table.split.threshold

When combined size of files exceeds this amount a tablet is split.

Type: MEMORYZookeeper Mutable: yesDefault Value:1G

table.walog.enabled

Deprecated. This setting is deprecated. Use table.durability=none instead.

Type: BOOLEANZookeeper Mutable: yesDefault Value:true

A.3.15. table.custom.*

Prefix to be used for user defined arbitrary properties.

A.3.16. table.constraint.*

Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface.
For example:
table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint
and:
table.constraint.2 = my.package.constraints.MySecondConstraint

A.3.17. table.iterator.*

Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a class name such as:
table.iterator.scan.vers = 10,org.apache.accumulo.core.iterators.VersioningIterator
These iterators can take options if additional properties are set that look like this property, but are suffixed with a period, followed by opt followed by another period, and a property name.
For example, table.iterator.minc.vers.opt.maxVersions = 3

A.3.18. table.iterator.scan.*

Convenience prefix to find options for the scan iterator scope

A.3.19. table.iterator.minc.*

Convenience prefix to find options for the minc iterator scope

A.3.20. table.iterator.majc.*

Convenience prefix to find options for the majc iterator scope

A.3.21. table.group.*

Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group.
For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled property.
Additional group options may be specified for a named group by setting table.group.<name>.opt.<key>=<value>.

A.3.22. table.majc.compaction.strategy.opts.*

Properties in this category are used to configure the compaction strategy.

A.3.23. table.replication.target.*

Enumerate a mapping of other systems which this table should replicate their data to. The key suffix is the identifying cluster name and the value is an identifier for a location on the target system, e.g. the ID of the table on the target to replicate to

A.3.24. general.vfs.context.classpath.*

Properties in this category are define a classpath. These properties start with the category prefix, followed by a context name. The value is a comma seperated list of URIs. Supports full regex on filename alone. For example, general.vfs.context.classpath.cx1=hdfs://nn1:9902/mylibdir/*.jar. You can enable post delegation for a context, which will load classes from the context first instead of the parent first. Do this by setting general.vfs.context.classpath.<name>.delegation=post, where <name> is your context nameIf delegation is not specified, it defaults to loading from parent classloader first.

A.3.25. replication.*

Properties in this category affect the replication of data to other Accumulo instances.

replication.driver.delay

Amount of time to wait before the replication work loop begins in the master.

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:0s

replication.max.unit.size

Maximum size of data to send in a replication message

Type: MEMORYZookeeper Mutable: yesDefault Value:64M

replication.max.work.queue

Upper bound of the number of files queued for replication

Type: COUNTZookeeper Mutable: yesDefault Value:1000

replication.name

Name of this cluster with respect to replication. Used to identify this instance from other peers

Type: STRINGZookeeper Mutable: yesDefault Value:empty

replication.receipt.service.port

Listen port used by thrift service in tserver listening for replication

replication.work.assignment.sleep

replication.work.attempts

Number of attempts to try to replicate some data before giving up and letting it naturally be retried later

Type: COUNTZookeeper Mutable: yesDefault Value:10

replication.work.processor.delay

Amount of time to wait before first checking for replication work, not useful outside of tests

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:0s

replication.work.processor.period

Amount of time to wait before re-checking for replication work, not useful outside of tests

Type: TIMEDURATIONZookeeper Mutable: yesDefault Value:0s

replication.worker.threads

Size of the threadpool that each tabletserver devotes to replicating data

Type: COUNTZookeeper Mutable: yesDefault Value:4

A.3.26. replication.peer.*

Properties in this category control what systems data can be replicated to

A.3.27. replication.peer.user.*

The username to provide when authenticating with the given peer

A.3.28. replication.peer.password.*

The password to provide when authenticating with the given peer

A.3.29. replication.peer.keytab.*

The keytab to use when authenticating with the given peer

A.4. Property Types

A.4.1. duration

A non-negative integer optionally followed by a unit of time (whitespace disallowed), as in 30s.
If no unit of time is specified, seconds are assumed. Valid units are ms, s, m, h for milliseconds, seconds, minutes, and hours.
Examples of valid durations are 600, 30s, 45m, 30000ms, 3d, and 1h.
Examples of invalid durations are 1w, 1h30m, 1s 200ms, ms, ', and 'a.
Unless otherwise stated, the max value for the duration represented in milliseconds is 9223372036854775807

A.4.2. date/time

A date/time string in the format: YYYYMMDDhhmmssTTT where TTT is the 3 character time zone

A.4.3. memory

A positive integer optionally followed by a unit of memory (whitespace disallowed), as in 2G.
If no unit is specified, bytes are assumed. Valid units are B, K, M, G, for bytes, kilobytes, megabytes, and gigabytes.
Examples of valid memories are 1024, 20B, 100K, 1500M, 2G.
Examples of invalid memories are 1M500K, 1M 2K, 1MB, 1.5G, 1,024K, ', and 'a.
Unless otherwise stated, the max value for the memory represented in bytes is 9223372036854775807

A.4.4. host list

A comma-separated list of hostnames or ip addresses, with optional port numbers.
Examples of valid host lists are localhost:2000,www.example.com,10.10.1.1:500 and localhost.
Examples of invalid host lists are ', ':1000, and localhost:80000

A.4.5. port

An positive integer in the range 1024-65535, not already in use or specified elsewhere in the configuration

A.4.6. count

A non-negative integer in the range of 0-2147483647

A.4.7. fraction/percentage

A floating point number that represents either a fraction or, if suffixed with the % character, a percentage.
Examples of valid fractions/percentages are 10, 1000%, 0.05, 5%, 0.2%, 0.0005.
Examples of invalid fractions/percentages are ', '10 percent, Hulk Hogan

A.4.8. path

A string that represents a filesystem path, which can be either relative or absolute to some directory. The filesystem depends on the property. The following environment variables will be substituted: [ACCUMULO_HOME, ACCUMULO_CONF_DIR]

A.4.9. absolute path

An absolute filesystem path. The filesystem depends on the property. This is the same as path, but enforces that its root is explicitly specified.

A.4.10. java class

A fully qualified java class name representing a class on the classpath.
An example is java.lang.String, rather than String

A.4.11. java class list

A list of fully qualified java class names representing classes on the classpath.
An example is java.lang.String, rather than String

A.4.12. durability

One of none, log, flush or sync.

A.4.13. string

An arbitrary string of characters whose format is unspecified and interpreted based on the context of the property to which it applies.