Since CrateDB has sensible defaults, there is no configuration needed at all
for basic operation.

CrateDB is mainly configured via a configuration file, which is located at
config/crate.yml. The vanilla configuration file distributed with the
package has all available settings as comments in it along with the according
default value.

The location of the config file can be specified upon startup like this:

sh$ ./bin/crate -Cpath.conf=/path/to/config/directory

Any setting can be configured either by the config file or via the -C
option upon startup.

For example, configuring the cluster name by using command line properties will
work this way:

sh$ ./bin/crate -Ccluster.name=cluster

This is exactly the same as setting the cluster name in the config file:

cluster.name = cluster

Settings will get applied in the following order where the latter one will
overwrite the prior one:

This defines the TCP port range to which the CrateDB HTTP service will be
bound to. It defaults to 4200-4300. Always the first free port in this
range is used. If this is set to an integer value it is considered as an
explicit single port.

The HTTP protocol is used for the REST endpoint which is used by all clients
except the Java client.

http.publish_port

Runtime:no

The port HTTP clients should use to communicate with the node. It is
necessary to define this setting if the bound HTTP port (http.port) of
the node is not directly reachable from outside, e.g. running it behind a
firewall or inside a Docker container.

transport.tcp.port

Runtime:no

This defines the TCP port range to which the CrateDB transport service will
be bound to. It defaults to 4300-4400. Always the first free port in this
range is used. If this is set to an integer value it is considered as an
explicit single port.

The transport protocol is used for internal node-to-node communication.

transport.publish_port

Runtime:no

The port that the node publishes to the cluster for its own discovery. It is
necessary to define this setting when the bound tranport port
(transport.tcp.port) of the node is not directly reachable from outside,
e.g. running it behind a firewall or inside a Docker container.

psql.port

Runtime:no

This defines the TCP port range to which the CrateDB Postgres service will be
bound to. It defaults to 5432-5532. Always the first free port in this
range is used. If this is set to an integer value it is considered as an
explicit single port.

Filesystem path to the directory containing the configuration files
crate.yml and log4j2.properties.

path.data

Runtime:no

Filesystem path to the directory where this CrateDB node stores its data
(table data and cluster metadata).

Multiple paths can be set by using a comma separated list and each of these
paths will hold full shards (instead of striping data across them). In case
CrateDB finds striped shards at the provided locations (from CrateDB
<0.55.0), these shards will be migrated automatically on startup.

path.logs

Runtime:no

Filesystem path to a directory where log files should be stored.

Can be used as a variable inside log4j2.properties.

For example:

appender:file:file:${path.logs}/${cluster.name}.log

path.repo

Runtime:no

A list of filesystem or UNC paths where repositories of type
fs may be stored.

Without this setting a CrateDB user could write snapshot files to any
directory that is writable by the CrateDB process. To safeguard against this
security issue, the possible paths have to be whitelisted here.

CrateDB performs poorly when the JVM starts swapping: you should ensure that
it never swaps. If set to true, CrateDB will use the mlockall
system call on startup to ensure that the memory pages of the CrateDB process
are locked into RAM.

Define allowed origins of a request. * allows any origin (which can be
a substantial security risk) and by prepending a / the string will be
treated as a regular expression. For example /https?:\/\/crate.io/ will
allow requests from http://crate.io and https://crate.io. This
setting disallows any origin by default.

This setting defines the maximum number of elements an array can have so
that the !=ANY(), LIKEANY() and the NOTLIKEANY() operators
can be applied on it.

Note

Increasing this value to a large number (e.g. 10M) and applying those
ANY operators on arrays of that length can lead to heavy memory,
consumption which could cause nodes to crash with OutOfMemory exceptions.

All current applied cluster settings can be read by querying the
sys.cluster.settings column. Most
cluster settings can be changed at runtime using the
SET/RESET statement. This is documented at
:each setting.

A boolean indicating whether or not to collect statistical information about
the cluster.

Note

Enabling the collection of statistical information incurs a performance
penalty, as details about every job and operation across the cluster will
cause data to be inserted into the corresponding system tables.

stats.jobs_log_size

Default:10000

Runtime:yes

The maximum number of job records kept to be kept in the sys.jobs_log table on each node.

A job record corresponds to a single SQL statement to be executed on the
cluster. These records are used for performance analytics. A larger job log
produces more comprehensive stats, but uses more RAM.

Older job records are deleted as newer records are added, once the limit is
reached.

The maximum number of operations records to be kept in the
sys.operations_log table on each node.

A job consists of one or more individual operations. Operations records are
used for performance analytics. A larger operations log produces more
comprehensive stats, but uses more RAM.

Older operations records are deleted as newer records are added, once the
limit is reached.

Setting this value to 0 disables collecting operations information.

stats.operations_log_expiration

Default:0s (disabled)

Runtime:yes

Entries of sys.operations_log are cleared by a periodically
job when they are older than the specified expire time. This setting
overrides stats.operations_log_size. If
the value is set to 0 the time based log entry eviction is disabled.

This field expects a time value either as a long or double or alternatively
as a string literal with a time suffix (ms, s, m, h, d,
w).

If the value provided is 0 then the refresh is disabled.

Note

Using a very small value can cause a high load on the cluster.

Settings that control the behaviour of the the stats circuit breaker. There are
two breakers in place, one for the jobs log and one for the operations log. For
each of them the breaker limit can be set.

By default, when the CrateDB process stops it simply shuts down, possibly
making some shards unavailable which leads to a red cluster state and lets
some queries fail that required the now unavailable shards. In order to
safely shutdown a CrateDB node, the graceful stop procedure can be used.

The following cluster settings can be used to change the shutdown behaviour of
nodes of the cluster:

cluster.graceful_stop.min_availability

Default:primaries

Runtime:yes

Allowed Values:none|primaries|full

none: No minimum data availability is required. The node may shut down
even if records are missing after shutdown.

primaries: At least all primary shards need to be availabe after the node
has shut down. Replicas may be missing.

full: All records and all replicas need to be available after the node
has shut down. Data availability is full.

Note

This option is ignored if there is only 1 node in a cluster!

cluster.graceful_stop.reallocate

Default:true

Runtime:yes

true: The gracefulstop command allows shards to be reallocated
before shutting down the node in order to ensure minimum data availability
set with min_availability.

false: The gracefulstop command will fail if the cluster would need
to reallocate shards in order to ensure the minimum data availability set
with min_availability.

Note

Make sure you have enough nodes and enough disk space for the
reallocation.

cluster.graceful_stop.timeout

Default:2h

Runtime:yes

Defines the maximum waiting time in milliseconds for the reallocation process
to finish. The force setting will define the behaviour when the shutdown
process runs into this timeout.

The timeout expects a time value either as a long or double or alternatively
as a string literal with a time suffix (ms, s, m, h, d,
w).

cluster.graceful_stop.force

Default:false

Runtime:yes

Defines whether gracefulstop should force stopping of the node if it
runs into the timeout which is specified with the
cluster.graceful_stop.timeout setting.

Set to ensure a node sees N other master eligible nodes to be considered
operational within the cluster. It’s recommended to set it to a higher value
than 1 when running more than 2 nodes in the cluster.

discovery.zen.ping_timeout

Default:3s

Runtime:yes

Set the time to wait for ping responses from other nodes when discovering.
Set this option to a higher value on a slow or congested network to minimize
discovery failures.

discovery.zen.publish_timeout

Default:30s

Runtime:yes

Time a node is waiting for responses from other nodes to a published cluster
state.

Note

Multicast used to be an option for node discovery, but was deprecated in
CrateDB 1.0.3 and removed in CrateDB 1.1.

CrateDB has built-in support for several different mechanisms of node
discovery. The simplest mechanism is to specify a list of hosts in the
configuration file.

discovery.zen.ping.unicast.hosts

Default:notset

Runtime:no

Currently there are two other discovery types: via DNS and via EC2 API.

When a node starts up with one of these discovery types enabled, it performs a
lookup using the settings for the specified mechanism listed below. The hosts
and ports retrieved from the mechanism will be used to generate a list of
unicast hosts for node discovery.

The same lookup is also performed by all nodes in a cluster whenever the master
is re-elected (see Cluster Meta Data).

all allows all shard allocations, the cluster can allocate all kinds of
shards.

none allows no shard allocations at all. No shard will be moved or
created.

primaries only primaries can be moved or created. This includes existing
primary shards.

new_primaries allows allocations for new primary shards only. This means
that for example a newly added node will not allocate any replicas. However
it is still possible to allocate new primary shards for new indices. Whenever
you want to perform a zero downtime upgrade of your cluster you need to set
this value before gracefully stopping the first node and reset it to all
after starting the last updated node.

Note

This allocation setting has no effect on recovery of primary shards! Even
when cluster.routing.allocation.enable is set to none, nodes will
recover their unassigned local primary shards immediatelly after restart, in
case the recovery.initial_shards setting is satisfied.

cluster.routing.allocation.allow_rebalance

Default:indices_all_active

Runtime:yes

Allowed Values:always|indices_primary_active|indices_all_active

Allow to control when rebalancing will happen based on the total state of all
the indices shards in the cluster. Defaulting to indices_all_active to
reduce chatter during initial recovery.

cluster.routing.allocation.cluster_concurrent_rebalance

Default:2

Runtime:yes

Define how many concurrent rebalancing tasks are allowed cluster wide.

cluster.routing.allocation.node_initial_primaries_recoveries

Default:4

Runtime:yes

Define the number of initial recoveries of primaries that are allowed per
node. Since most times local gateway is used, those should be fast and we can
handle more of those per node without creating load.

Define node attributes which will be used to do awareness based on the
allocation of a shard and its replicas. For example, let’s say we have
defined an attribute rack_id and we start 2 nodes with node.rack_id
set to rack_one, and deploy a single table with 5 shards and 1 replica. The
table will be fully deployed on the current nodes (5 shards and 1 replica
each, total of 10 shards).

Now, if we start two more nodes, with node.rack_id set to rack_two,
shards will relocate to even the number of shards across the nodes, but a
shard and its replica will not be allocated in the same rack_id value.

The awareness attributes can hold several values

cluster.routing.allocation.awareness.force.*.values

Runtime:no

Attributes on which shard allocation will be forced. * is a placeholder
for the awareness attribute, which can be defined using the
cluster.routing.allocation.awareness.attributes setting. Let’s say we
configured an awareness attribute zone and the values zone1,zone2
here, start 2 nodes with node.zone set to zone1 and create a table
with 5 shards and 1 replica. The table will be created, but only 5 shards
will be allocated (with no replicas). Only when we start more shards with
node.zone set to zone2 the replicas will be allocated.

All these values are relative to one another. The first three are used to
compose a three separate weighting functions into one. The cluster is balanced
when no allowed action can bring the weights of each node closer together by
more then the fourth setting. Actions might not be allowed, for instance, due
to forced awareness or allocation filtering.

cluster.routing.allocation.balance.shard

Default:0.45f

Runtime:yes

Defines the weight factor for shards allocated on a node (float). Raising
this raises the tendency to equalize the number of shards across all nodes in
the cluster.

cluster.routing.allocation.balance.index

Default:0.55f

Runtime:yes

Defines a factor to the number of shards per index allocated on a specific
node (float). Increasing this value raises the tendency to equalize the
number of shards per index across all nodes in the cluster.

cluster.routing.allocation.balance.threshold

Default:1.0f

Runtime:yes

Minimal optimization value of operations that should be performed (non
negative float). Increasing this value will cause the cluster to be less
aggressive about optimising the shard balance.

Defines the lower disk threshold limit for shard allocations. New shards will
not be allocated on nodes with disk usage greater than this value. It can
also be set to an absolute bytes value (like e.g. 500mb) to prevent the
cluster from allocating new shards on node with less free disk space than
this value.

cluster.routing.allocation.disk.watermark.high

Default:90%

Runtime:yes

Defines the higher disk threshold limit for shard allocations. The cluster
will attempt to relocate existing shards to another node if the disk usage on
a node rises above this value. It can also be set to an absolute bytes value
(like e.g. 500mb) to relocate shards from nodes with less free disk space
than this value.

By default, the cluster will retrieve information about the disk usage of the
nodes every 30 seconds. This can also be changed by setting the
cluster.info.update.interval setting.

Specifies the maximum number of bytes that can be transferred during shard
recovery per seconds. Limiting can be disabled by setting it to 0. This
setting allows to control the network usage of the recovery process. Higher
values may result in higher network utilization, but also faster recovery
process.

indices.recovery.retry_delay_state_sync

Default:500ms

Runtime:yes

Defines the time to wait after an issue caused by cluster state syncing
before retrying to recover.

indices.recovery.retry_delay_network

Default:5s

Runtime:yes

Defines the time to wait after an issue caused by the network before retrying
to recover.

indices.recovery.internal_action_timeout

Default:15m

Runtime:yes

Defines the timeout for internal requests made as part of the recovery.

indices.recovery.internal_action_long_timeout

Default:30m

Runtime:yes

Defines the timeout for internal requests made as part of the recovery that
are expected to take a long time. Defaults to twice
internal_action_timeout.

The Query circuit breaker will keep track of the used memory during the
execution of a query. If a query consumes too much memory or if the cluster is
already near its memory limit it will terminate the query to ensure the cluster
keeps working.

indices.breaker.query.limit

Default:60%

Runtime:yes

Specifies the limit for the query breaker. Provided values can either be
absolute values (intepreted as a number of bytes), byte sizes (eg. 1mb) or
percentage of the heap size (eg. 12%). A value of -1 disables breaking
the circuit while still accounting memory usage.

indices.breaker.query.overhead

Default:1.09

Runtime:no

A constant that all data estimations are multiplied with to determine a final
estimation.

The gateway persists cluster meta data on disk every time the meta data
changes. This data is stored persistently across full cluster restarts and
recovered after nodes are started again.

gateway.expected_nodes

Default:-1

Runtime:no

The setting gateway.expected_nodes defines the number of nodes that
should be waited for until the cluster state is recovered immediately. The
value of the setting should be equal to the number of nodes in the cluster,
because you only want the cluster state to be recovered after all nodes are
started.

gateway.recover_after_time

Default:0ms

Runtime:no

The gateway.recover_after_time setting defines the time to wait before
starting starting the recovery once the number of nodes defined in
gateway.recover_after_nodes are started. The setting is relevant if
gateway.recover_after_nodes is less than gateway.expected_nodes.

gateway.recover_after_nodes

Default:-1

Runtime:no

The gateway.recover_after_nodes setting defines the number of nodes that
need to be started before the cluster state recovery will start. Ideally the
value of the setting should be equal to the number of nodes in the cluster,
because you only want the cluster state to be recovered once all nodes are
started. However, the value must be bigger than the half of the expected
number of nodes in the cluster.

CrateDB comes, out of the box, with Log4j 1.2.x. It tries to simplify log4j
configuration by using YAML to configure it. The logging configuration file is
at config/log4j2.properties.

The yaml file is used to prepare a set of properties used for logging
configuration using the PropertyConfigurator but without the tediously
repeating log4j prefix. Here is a small example of a working logging
configuration.

It’s possible to set the log level of loggers at runtime. This is particularly
useful when debugging problems and there is a need to increase the log level
without wanting to restart nodes. Logging settings are cluster wide and
override the logging configuration of nodes defined in their
log4j2.properties.

The RESET statement is also supported, however only with the
limitation that the reset of the logging override only takes affect after
cluster restart.

To set the log level you can use the regular SET statement, for
example:

SETGLOBALTRANSIENT"logger.action"='INFO';

The logging setting consists of the prefix logger and a variable suffix
which defines the name of the logger that the log level should be applied to.

Specifies the home directory of the installation, it is used to find default
file paths like e.g. config/crate.yml or the default data directory
location. This variable is usally defined at the by-distribution shipped
start-up script. In most cases it is the parent directory of the directory
containing the bin/crate executable.

CRATE_HOME:

Home directory of CrateDB installation. Used to refer
to default config files, data locations, log files, etc. All
configured relative paths will use this directory as a parent.

This variable specifies the amount of memory that can be used by the JVM.

The value of the environment variable can be suffixed with g or m. For
example:

CRATE_HEAP_SIZE=4g

Certain operations in CrateDB require a lot of records to be hold in memory at
a time. If the amount of heap that can be allocated by the JVM is too low these
operations would fail with an OutOfMemory exception.

So it’s important to choose a value high enough for the intended use-case. But
there are two limitations:

Be aware that there is also another user of memory besides CrateDB’s HEAP: our
underlying storage engine Lucene. It leverages the underlying OS for caching
in-memory data structures by design. Lucene indexes are split in several
segment files, every file is immutable and will never change. This makes them
super cache-friendly and the underlying OS will keep hot segments resident in
memory for faster access. So if all system memory is assigned to CrateDB’s
HEAP, there won’t be any left-over for Lucene which can cause serious
performance impacts.

Note

A good recommendation is to assign 50% of the available memory to CrateDB’s
HEAP while leaving the other 50% free. It will not get unused, Lucene
will use whatever is left-over.

These are pointers to java objects in the heap that only consume 32 Bit, which
saves you lots of space. The actual native 64 bit pointers are computed by
scaling the 32 bit value by a factor of 8 and add it to a base heap address.
This allows the JVM to address about 32 GB of heap.

If you configure your heap to more than 32 GB Compressed Oops cannot be used
anymore. In effect, there will be much less space available in the heap as
object pointers now consume twice as much.

This boundary should be considered an upper bound for the heap size of any JVM
application.

Note

In order to ensure that Compressed Oops are used no matter what JVM
CrateDB runs on, configuring the heap to a value less than or equal to 30.5
GB (30500m) is suggested, as some JVMs only support Compressed Oops
up to that value.

If hardware with much more RAM is available, it is suggested to run more than
one CrateDB instance on that machine with each one having a heap size of around
30.5 GB (30500m). But still leave half of the available RAM to Lucene.

In this case consider adding: cluster.routing.allocation.same_shard.host:true to your config. This will prevent allocating primary and replica of the
same shard on the same machine even if more than one instances running on it.

CrateDB uses a hybrid MMap/NIO file system storage directory to store its
indices. Out-of-memory exceptions and failing bootstrap checks may be caused by
the operating system’s default limit on max_map_count (the maximum number
of memory map areas a process may have) being too low.

On Linux, you can increase the limit by running the following command:

sh$ sudo sysctl -w vm.max_map_count=262144

To set this value permanently, update the vm.max_map_count setting in
/etc/sysctl.conf. You can then verify this after rebooting by running:

Authentication settings (auth.host_based.*) are node settings, which means
that their values apply only to the node where they are applied and different
nodes may have different authentication settings.

auth.host_based.enabled

Runtime:no

Default:false

Setting to enable or disable Host Based Authentication (HBA). It is disabled
by default.

The auth.host_based.config. setting is a group setting that can have zero,
one or multiple groups that are defined by their group key (${order}) and
their fields (user, address, method, protocol, ssl).

${order}:

An identifier that is used as a natural order key when looking up the host

based configuration entries. For example, an order key of a will be

looked up before an order key of b. This key guarantees that the entry