The cassandra.yaml file is the main
configuration file for DataStax Enterprise. The dse.yaml file is the primary
configuration file for security, DSE Search, DSE Graph, and DSE
Analytics.

Important: After changing properties in the
cassandra.yaml file, you must restart
the node for the changes to take effect.

Syntax

For the properties in each section, the parent setting has zero
spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax
and retain the spacing.

Literal default values are shown as
literal.

Calculated values are shown as
calculated.

Default values that are not defined are shown as
Default: none.

Internally defined default values are described.

Note: Default values can be defined
internally, commented out, or have implementation
dependencies on other properties in the
cassandra.yaml file.
Additionally, some commented-out values may not
match the actual default values. The commented out
values are recommended alternatives to the default
values.

If you have changed any of the default directories during
installation, set these properties to the new locations. Make sure you have root
access.

data_file_directories

The directory where table data is stored on disk. The
database distributes data evenly across the
location, subject to the granularity of the
configured compaction strategy. If not set, the
directory is
$DSE_HOME/data/data.

The directory where the commit log is stored. If not
set, the directory is
$DSE_HOME/data/commitlog.

For
optimal write performance, place the commit log on
a separate disk partition, or ideally on a
separate physical device from the data file
directories. Because the commit log is append
only, a hard disk drive (HDD) is
acceptable.

Default: /var/lib/cassandra/commitlog

cdc_raw_directory

The directory where the change data capture (CDC) commit
log segments are stored on flush. DataStax
recommends a physical device that is separate from
the data directories. If not set, the directory is
$DSE_HOME/data/cdc_raw.

Default: /var/lib/cassandra/cdc_raw

hints_directory

The directory in which hints are stored. If not set, the
directory is $CASSANDRA_HOME/data/hints.

Default: /var/lib/cassandra/hints

saved_caches_directory

The directory location where table key and row caches
are stored. If not set, the directory is
$DSE_HOME/data/saved_caches.

Default: /var/lib/cassandra/saved_caches

Commonly used properties

Properties most frequently used when configuring DataStax
Enterprise.

Before starting a node for the first time, DataStax recommends that you
carefully evaluate your requirements.

A class that implements the IEndpointSnitch interface.
The database uses the snitch to locate nodes and
route requests.

Important: Use only
snitch implementations bundled with DSE.

DseSimpleSnitch

Appropriate only for
development deployments. Proximity is determined
by DSE workload, which places transactional,
analytics, and search nodes into their separate
datacenters. Does not recognize datacenter or rack
information.

For EC2 deployments in a single
region. Loads region and availability zone
information from the Amazon EC2 API. The region is
treated as the datacenter, the availability zone
is treated as the rack, and uses only private IP
addresses. For this reason, Ec2Snitch does not
work across multiple regions.

Ec2MultiRegionSnitch

Uses the public IP as
the broadcast_address to allow cross-region
connectivity. This means you must also set seed addresses to the public IP and open
the storage_port or ssl_storage_port on the public IP
firewall. For intra-region traffic, the database
switches to the private IP after establishing a
connection.

RackInferringSnitch

Proximity is determined
by rack and datacenter, which are assumed to
correspond to the 3rd and 2nd octet of each node's
IP address, respectively. Best used as an example
for writing a custom snitch class (unless this
happens to match your deployment
conventions).

GoogleCloudSnitch

Use for deployments on
Google Cloud
Platform across one or more regions. The
region is treated as a datacenter and the
availability zones are treated as racks within the
datacenter. All communication occurs over private
IP addresses within the same logical
network.

The addresses of hosts that are designated as contact
points in the cluster. A joining node contacts one
of the nodes in the -seeds list to learn the
topology of the ring.

Important: Use
only seed provider implementations bundled with
DSE.

class_name - The class that handles the seed
logic. It can be customized, but this is typically
not
required.

Default: org.apache.cassandra.locator.SimpleSeedProvider

- seeds - A comma delimited list
of addresses that are used by gossip for
bootstrapping new nodes joining a cluster. If your
cluster includes multiple nodes, you must change
the list from the default value to the IP address
of one of the nodes.

Default: "127.0.0.1"

Attention: Making every node a seed node is not
recommended because of increased maintenance and reduced gossip performance. Gossip
optimization is not critical, but it is recommended to use a small seed list
(approximately three nodes per datacenter).

Memtable settings

Total permitted memory to use for memtables. When this
threshold is exceeded, writes are not accepted until
a flush completes. A flush is triggered based on
memtable_cleanup_threshold. When not set:

If the deprecated settings are not present,
the calculated default is 1/4 of the heap
size.

If the deprecated memtable_heap_space_in_mb or
memtable_offheap_space_in_mb settings are present,
an error is logged and the appropriate value is
used based on memtable_allocation_type.
Remove the deprecated
settings.

The number of memtable flush writer threads. These
threads are blocked by disk I/O, and each one holds
a memtable in memory while blocked. Set this value
to the smaller of the number of disks or number of
cores, with a minimum of 2 and a maximum of 8.

Tip: Four flush writers is generally enough
to flush on a fast disk array mounted as a single
data directory. Adding more flush writers will
result in smaller, more frequent flushes that
introduce more compaction overhead. There is a
direct tradeoff between the number of memtables
that can be flushed concurrently and flush size
and frequency. More flush writers is not better.
Ensure that you set enough flush writers to
prevent stalls waiting for flushing to free
memory.

Replaced by memtable_space_in_mb. The amount of off-heap memory allocated for
memtables. The database uses the total of this
amount and the value of memtable_heap_space_in_mb to
set a threshold for automatic memtable flush.

Default: calculated 1/4 of heap size (2048)

Common automatic backup settings

incremental_backups: false
snapshot_before_compaction: false

incremental_backups

Whether to enable incremental backups.

true - Enable incremental backups to create a
hard link to each SSTable flushed or streamed
locally in a backups
subdirectory of the keyspace data. Incremental
backups enable storing backups off site without
transferring entire snapshots.

Important: The database does not
automatically clear incremental backup files.
DataStax recommends setting up a process to clear
incremental backup hard links each time a new
snapshot is created.

group - Send ACK signal for writes after the
commit log has been flushed to disk. Wait up to
commitlog_sync_group_window_in_ms
between flushes.

batch - Send ACK signal for writes after the
commit log has been flushed to disk. Each incoming
write triggers the flush task.

Default: periodic

commitlog_sync_period_in_ms

Use with commitlog_sync: periodic. Time
interval between syncing the commit log to disk.
Periodic syncs are acknowledged
immediately.

Default: 10000

commitlog_sync_group_window_in_ms

Use with commitlog_sync: group. The
time that the database waits between flushing the
commit log to disk. DataStax recommends using
group instead of
batch.

Default: commented out (1000)

commitlog_sync_batch_window_in_ms

Deprecated. Use with commitlog_sync:
batch. The maximum length of time that
queries may be batched
together.

Default: commented out (2)

commitlog_segment_size_in_mb

The size of an individual commitlog file segment. A
commitlog segment may be archived, deleted, or
recycled after all its data has been flushed to
SSTables. This data can potentially include
commitlog segments from every table in the system.
The default size is usually suitable, but for
commitlog archiving you might want a finer
granularity; 8 or 16 MB is reasonable.

The maximum size of a mutation before the mutation is
rejected. Before increasing the commitlog segment
size of the commitlog segments, investigate why the
mutations are larger than expected. Look for
underlying issues with access patterns and data
model, because increasing the commitlog segment size
is a limited fix. When not set, the default is
calculated as (commitlog_segment_size_in_mb
* 1024) /
2.

Default: calculated

commitlog_total_space_in_mb

Total space for commit logs. If the total space used by
all commit logs exceeds this threshold, the database
rounds up to the next nearest segment multiple and
flushes memtables to disk for the oldest commitlog
segments, removing those log segments from the
commit log. This flushing reduces the amount of data
to replay on start-up, and prevents infrequently
updated tables from keeping commitlog segments
indefinitely. If the
commitlog_total_space_in_mb is
small, the result is more flush activity on
less-active tables.

Change-data-capture (CDC) space settings

Whether to enable change data capture (CDC)
functionality on a per-node basis. This modifies the
logic used for write path allocation rejection.

true - use cdc functionality to reject
mutations that contain a CDC-enabled table if at
space limit threshold in cdc_raw_directory

false - standard behavior, never reject.

Default: false

cdc_total_space_in_mb

Total space to use for change-data-capture (CDC) logs on
disk. If space allocated for CDC exceeds this value,
the database throws WriteTimeoutException on
mutations, including CDC-enabled tables. A
CDCCompactor (a consumer) is responsible for parsing
the raw CDC logs and deleting them when parsing is
completed.

Default: calculated (4096 or 1/8th of the total space of the drive where the cdc_raw_directory resides)

cdc_free_space_check_interval_ms

Interval between checks for new available space for
CDC-tracked tables when the cdc_total_space_in_mb threshold is reached
and the CDCCompactor is running behind or
experiencing back pressure. When not set, the
default is
250.

The number of concurrent compaction processes allowed to
run simultaneously on a node, not including
validation compactions for anti-entropy repair. Simultaneous
compactions help preserve read performance in a
mixed read-write workload by limiting the number of
small SSTables that accumulate during a single
long-running compaction. If your data directories
are backed by SSDs,
increase this value to the number of cores. If
compaction running too slowly or too fast, adjust
compaction_throughput_mb_per_sec
first.

Important: Increasing concurrent
compactors leads to more use of available disk
space for compaction, because concurrent
compactions happen in parallel, especially for
STCS. Ensure that adequate disk space is available
before increasing this configuration.

Generally, the calculated default value is appropriate and
does not need adjusting. DataStax recommends contacting the DataStax Services team before changing this value.

Default: calculated The fewest number of disks or number of cores, with a minimum of 2 and a maximum of 8 per CPU core.

concurrent_validations

Number of simultaneous repair validations to allow. When
not set, the default is unbounded. Values less than
one are interpreted as unbounded.

Default: commented out (0) unbounded

concurrent_materialized_view_builders

Number of simultaneous materialized view builder tasks
allowed to run concurrently. When a view is created,
the node ranges are split into (num_processors * 4)
builder tasks and submitted to this
executor.

Default: 2

sstable_preemptive_open_interval_in_mb

The size of the SSTables to trigger preemptive opens.
The compaction process opens SSTables before they
are completely written and uses them in place of the
prior SSTables for any range previously written.
This process helps to smoothly transfer reads
between the SSTables by reducing cache churn and
keeps hot rows hot.

Important: A low
value has a negative performance impact and will
eventually cause heap pressure and GC activity.
The optimal value depends on hardware and
workload.

Default: 50

Cache and index settings

column_index_size_in_kb: 16
# file_cache_size_in_mb: 4096

column_index_size_in_kb

Granularity of the index of rows within a partition. For
huge rows, decrease this setting to improve seek
time. Lower density nodes might benefit from
decreasing this value to 4, 2, or
1.

Default: 16

file_cache_size_in_mb

Maximum memory to use for buffer pooling and SSTable
chunk cache. 32 MB is reserved for pooling buffers,
the remaining memory is the cache for holding recent
or frequently used index pages and uncompressed
SSTable chunks. This pool is allocated off heap and
is in addition to the memory allocated for heap.
When manually tuning the size, consider:

Throttle for the throughput of all outbound streaming
file transfers on a node. The database does mostly
sequential I/O when streaming data during bootstrap
or repair which can saturate the network connection
and degrade client (RPC) performance. When not set,
the value is 200
Mbps.

Note: Should be
set to a value less than or equal to
stream_throughput_outbound_megabits_per_sec
since it is a subset of total
throughput.

Default: commented out (200)

streaming_keep_alive_period_in_secs

Interval to send keep-alive messages. The stream session
fails when a keep-alive message is not received for
2 keep-alive cycles. When not set, the default is
300 seconds (5 minutes) so that a stalled stream
times out in 10
minutes.

Default: commented out (300)

streaming_connections_per_host

Maximum number of connections per host for streaming.
Increase this value when you notice that joins are
CPU-bound, rather than network-bound. For example, a
few nodes with large files. When not set, the
default is
1.

Default: commented out (1)

Fsync settings

trickle_fsync: true
trickle_fsync_interval_in_kb: 10240

trickle_fsync

When set to true, causes fsync to force the operating
system to flush the dirty buffers at the set
interval trickle_fsync_interval_in_kb. Enable this
parameter to prevent sudden dirty buffer flushing
from impacting read latencies. Recommended for use
with SSDs, but not with
HDDs.

Default: false

trickle_fsync_interval_in_kb

The size of the fsync in
kilobytes.

Default: 10240

max_value_size_in_mb

The maximum size of any value in SSTables. SSTables are
marked as corrupted when the threshold is exceeded.

Default: 256

Thread Per Core (TPC) parameters

#tpc_cores:
# tpc_io_cores:
io_global_queue_depth: 128

tpc_cores

Number of TPC event loops. If not set, the
default is the number of cores (processors on the
machine) minus one.

Threshold to log a warning message when any
multiple-partition batch size exceeds this value in
kilobytes.

CAUTION: Increasing this
threshold can lead to node
instability.

Default: 64

batch_size_fail_threshold_in_kb

Threshold to fail and log WARN on any multiple-partition
batch whose size exceeds this value. The default
value is 10X the value of
batch_size_warn_threshold_in_kb.

Default: 640

unlogged_batch_across_partitions_warn_threshold

Threshold to log a WARN message on any batches not of
type LOGGED that span across more partitions than
this limit.

Default: 10

broadcast_address

The public IP address this node uses to broadcast
to other nodes outside the network or across regions in multiple-region EC2
deployments. If this property is commented out, the node uses the same IP address or
hostname as listen_address. A node does not need a separate broadcast_address in a
single-node or single-datacenter installation, or in an EC2-based network that
supports automatic switching between private and public communication. It is
necessary to set a separate listen_address and broadcast_address on a node with
multiple physical network interfaces or other topologies where not all nodes have
access to other nodes by their private IP addresses. For specific configurations,
see the instructions for listen_address.

The token to start the contiguous range. Set this
property for single-node-per-token architecture, in
which a node owns exactly one contiguous range in
the ring space. Setting this property overrides
num_tokens.

If your installation is not using vnodes or this node's num_tokens is set it to 1 or is commented
out, you should always set an
initial_token value when setting
up a production cluster for the first time, and when
adding capacity. See Generating tokens.

Set this property for virtual node token architecture.
Determines the number of token ranges to assign to
this virtual node
(vnode). Use a number between 1 and 128, where
1 disables vnodes. When the token
number varies between nodes in a datacenter, the
vnode logic assigns a proportional number of ranges
relative to other nodes in the datacenter. In
general, if all nodes have equal hardware
capability, each node should have the same
num_tokens value.

Random selection algorithm: Assign
token ranges randomly. A higher
num_token value increases the
probability that the data and workload are evenly
distributed. A recommended starting value is
128. The random algorithm is
enabled if
allocate_tokens_for_local_replication_factor
is also commented out.

Note: Over time, loads in a
datacenter using this algorithm become unevenly
distributed. The random selection algorithm method
is not recommended by DataStax. Instead,
use the allocation algorithm.

Allocation algorithm: Assign token
ranges using the allocation algorithm which
optimizes the workload balance using the target
keyspace replication factor. Enabled when the
allocate_tokens_for_local_replication_factor is set. DataStax recommends setting the number
of tokens to 8 to distribute the workload with
~10% variance between nodes.

Note: All other
nodes in the datacenter must have the same token
architecture, that is single-token, random
algorithm vnode or allocation algorithm vnode
architecture.

Default: 1 (disabled)

allocate_tokens_for_local_replication_factor

The target replication factor (RF) of keyspaces in the
datacenter when adding a vnode to an existing
cluster or setting up nodes in a new datacenter.
Triggers the recommended algorithmic
allocation for the RF and
num_tokens for this node. The
allocation algorithm attempts to choose tokens in a
way that optimizes replicated load over the nodes in
the datacenter for the specified RF. The load
assigned to each node is close to proportional to
the number of vnodes.

Note: The allocation algorithm is supported only
for the Murmur3Partitioner and RandomPartitioner
partitioners. The Murmur3Partitioner is the
default partitioning strategy for new DSE clusters
and the right choice for new clusters in almost
all
cases.

Default: 3

partitioner

The class that distributes rows (by partition key)
across all nodes in the cluster. Any
IPartitioner may be used,
including your own as long as it is in the class
path. For new clusters use the default
partitioner.

DataStax Enterprise provides the
following partitioners for backward compatibility:

﻿TTL for different trace types used during logging of
the query
process.

Default: 86400

tracetype_repair_ttl

﻿TTL for different trace types used during logging of
the repair
process.

Default: 604800

Advanced automatic backup setting

auto_snapshot: true

auto_snapshot

Whether to enable snapshots of the data before
truncating a keyspace or dropping a table. To
prevent data loss, DataStax strongly advises using
the default setting. If you set auto_snapshot to
false, you lose data on truncation or
drop.

When creating or modifying tables, you can enable or disable the row
cache for that table by setting the caching parameter. Other row
cache tuning and configuration options are set at the global (node)
level. The database uses these settings to automatically distribute
memory for each table on the node based on the overall workload and
specific table usage. You can also configure the save periods for
these caches globally.

(Only applies to BIG format SSTables) Threshold for the
total size of all index entries for a partition that
the database stores in the partition key cache. If
the total size of all index entries for a partition
exceeds this amount, the database stops putting
entries for this partition into the partition key
cache.

Important: Use only row cache
provider implementations bundled with
DSE.

When not set, the default is
org.apache.cassandra.cache.OHCProvider (fully
off-heap)

Default: commented out (org.apache.cassandra.cache.OHCProvider)

row_cache_size_in_mb

Maximum size of the row cache in memory. The row cache
can save time, but it is space-intensive because it
contains the entire row. Use the row cache only for
hot rows or static rows. If you reduce the size, you
may not get you hottest keys loaded on start up.

0 - disable row caching

MB - Maximum size of the
row cache in memory

Default: 0 (disabled)

row_cache_save_period

The number of seconds that rows are kept in cache.
Caches are saved to saved_caches_directory. This setting has
limited use as described in
row_cache_size_in_mb.

Counter cache helps to reduce counter locks' contention for hot counter
cells. In case of RF = 1 a counter cache hit causes the database to
skip the read before write entirely. With RF > 1 a counter cache
hit still helps to reduce the duration of the lock hold, helping
with hot counter cell updates, but does not allow skipping the read
entirely. Only the local (clock, count) tuple of a counter cell is
kept in memory, not the whole counter, so it is relatively cheap.

Note: If you reduce the counter cache size, the database may load the
hottest keys start-up.

counter_cache_size_in_mb

When no value is set, the database uses the smaller of
minimum of 2.5% of heap or 50 megabytes (MB). If
your system performs counter deletes and relies on
low gc_grace_seconds, you should disable the
counter cache. To disable, set to
0.

Default: calculated

counter_cache_save_period

The time, in seconds, after which the database saves the
counter cache (keys only). The database saves caches
to saved_caches_directory.

Default: 7200 (2 hours)

counter_cache_keys_to_save

Number of keys from the counter cache to save. When not
set, the database saves all
keys.

Default: commented out (disabled, saves all keys)

Tombstone settings

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000

When executing a scan, within or across a partition, the database must
keep tombstones in memory to allow them to return to the
coordinator. The coordinator uses tombstones to ensure that other
replicas know about the deleted rows. Workloads that generate
numerous tombstones may cause performance problems and exhaust the
server heap. Adjust these thresholds only if you understand the
impact and want to scan more tombstones. You can adjust these
thresholds at runtime using the
StorageServiceMBean.

Default: 5000. How long the coordinator waits for read
operations to complete before timing it out.

range_request_timeout_in_ms

Default: 10000. How long the coordinator waits for
sequential or index scans to complete before timing
it out.

aggregated_request_timeout_in_ms

How long the coordinator waits for sequential or index
scans to complete. Lowest acceptable value is 10 ms.
This timeout does not apply to aggregated queries
such as SELECT, COUNT(*), MIN(x), and so
on.

Default: 120000 (2 minutes)

write_request_timeout_in_ms

How long the coordinator waits for write requests to
complete with at least one node in the local
datacenter. Lowest acceptable value is 10 ms.

How long the coordinator waits for counter writes to
complete before timing it
out.

Default: 5000 (5 seconds)

cas_contention_timeout_in_ms

How long the coordinator continues to retry a CAS
(compare and set) operation that contends with other
proposals for the same row. If the coordinator
cannot complete the operation within this timespan,
it aborts the
operation.

Default: 1000 (1 second)

truncate_request_timeout_in_ms

How long the coordinator waits for a truncate (the
removal of all data from a table) to complete before
timing it out. The long default value allows the
database to take a snapshot before removing the
data. If auto_snapshot is disabled (not
recommended), you can reduce this time.

How much to increase the cross-datacenter timeout
(write_request_timeout_in_ms +
cross_dc_rtt_in_ms) for requests that
involve only nodes in a remote datacenter. This
setting is intended to reduce hint pressure.

Tip: DataStax recommends using
LOCAL_* consistency levels (CL)
for read and write requests in multi-datacenter
deployments to avoid timeouts that may occur when
remote nodes are chosen to satisfy the CL, such as
QUORUM.

Default: commented out (0)

slow_query_log_timeout_in_ms

Default: 500. How long before a node logs slow queries.
Select queries that exceed this value generate an
aggregated log message to identify slow queries. To
disable, set to 0.

Whether to enable operation timeout information exchange
between nodes to accurately measure request
timeouts. If this property is disabled, the replica
assumes any requests are forwarded to it instantly
by the coordinator. During overload conditions this
means extra time is required for processing
already-timed-out requests.

CAUTION: Before enabling this property make sure NTP
(network time protocol) is installed and the times
are synchronized among the nodes.

The sending socket buffer size and
internode_recv_buff_size_in_bytes is
limited by net.core.wmem_max. If
this property is not set,
net.ipv4.tcp_wmem determines the
buffer size. For more details run man
tcp and refer to:

/proc/sys/net/core/wmem_max

/proc/sys/net/core/rmem_max

/proc/sys/net/ipv4/tcp_wmem

/proc/sys/net/ipv4/tcp_wmem

Default: not set

internode_recv_buff_size_in_bytes

The receiving socket buffer size in bytes for inter-node
calls.

Default: not set

internode_compression

Controls whether traffic between nodes is compressed.
Valid values:

all - Compresses all traffic

dc - Compresses traffic between datacenters
only

none - No compression.

Default: dc

inter_dc_tcp_nodelay

Whether to enable tcp_nodelay for inter-datacenter
communication. When disabled, the network sends
larger, but fewer, network packets. This reduces
overhead from the TCP protocol itself. However,
disabling inter_dc_tcp_nodelay may
increase latency by blocking cross datacenter
responses.

The port where the CQL native transport listens for
clients. For security reasons, do not expose this
port to the internet. Firewall it if
needed.

Default: 9042

native_transport_max_frame_size_in_mb

The maximum allowed size of a frame. Frame (requests)
larger than this are rejected as
invalid.

Default: 256

native_transport_max_concurrent_connections

The maximum number of concurrent client connections.

Default: -1 (unlimited)

native_transport_max_concurrent_connections_per_ip

The maximum number of concurrent client connections per
source IP address.

Default: -1 (unlimited)

native_transport_address

When left blank, uses the configured hostname of the
node. Unlike the listen_address,
this value can be set to 0.0.0.0, but you must set
the native_transport_broadcast_address to a value
other than 0.0.0.0.

Note: Set
native_transport_address OR
native_transport_interface, not both.

Default: localhost

native_transport_interface

IP aliasing is not supported.

Note: Set
native_transport_address OR
native_transport_interface, not both.

Default: eth0

native_transport_interface_prefer_ipv6

Use IPv4 or IPv6 when interface is specified by name.

false - use first IPv4 address.

true - use first IPv6 address.

When only a single address is used, that
address is selected without regard to this
setting.

Default: commented out (false)

native_transport_broadcast_address

Native transport address to broadcast to drivers and
other DSE nodes. This cannot be set to 0.0.0.0.

When outgoing mutations are below this value, they are
rate limited according to the incoming rate
decreased by the factor (described below). When
above this value, the rate limiting is increased by
the factor.

Default: 0.90

factor

A number between 1 and 10. When backpressure is below
high ratio, outgoing mutations are rate limited
according to the incoming rate decreased by the
given factor; if above high ratio, the rate
limiting is increased by the given
factor.

Default: 5

flow

The flow speed to apply rate limiting:

FAST - rate limited to the speed of the
fastest replica

SLOW - rate limit to the speed of the slowest
replica

Default: FAST

dynamic_snitch_badness_threshold

The performance threshold for dynamically routing client
requests away from a poorly performing node.
Specifically, it controls how much worse a poorly
performing node has to be before the dynamic snitch
prefers other replicas. A value of 0.2 means the
database continues to prefer the static snitch
values until the node response time is 20% worse
than the best performing node. Until the threshold
is reached, incoming requests are statically routed
to the closest replica as determined by the snitch.

Default: 0.1

dynamic_snitch_reset_interval_in_ms

Time interval after which the database resets all node
scores. This allows a bad node to recover.

Default: 600000

dynamic_snitch_update_interval_in_ms

The time interval, in milliseconds, between the
calculation of node scores. Because score
calculation is CPU intensive, be careful when
reducing this interval.

A blacklist of datacenters that will not perform hinted
handoffs. To disable hinted handoff on a certain
datacenter, add its name to this list.

Default: commented out

max_hint_window_in_ms

Maximum amount of time during which the database
generates hints for an unresponsive node. After this
interval, the database does not generate any new
hints for the node until it is back up and
responsive. If the node goes down again, the
database starts a new interval. This setting can
prevent a sudden demand for resources when a node is
brought back online and the rest of the cluster
attempts to replay a large volume of hinted
writes.

Maximum amount of traffic per delivery thread in
kilobytes per second. This rate reduces
proportionally to the number of nodes in the
cluster. For example, if there are two nodes in the
cluster, each delivery thread uses. the maximum rate.
If there are three, each node throttles to half of
the maximum, since the two nodes are expected to
deliver hints simultaneously.

The time, in milliseconds, to wait before flushing hints
from internal buffers to disk.

Default: 10000

max_hints_delivery_threads

Number of threads the database uses to deliver hints. In
multiple datacenter deployments, consider increasing
this number because cross datacenter handoff is
generally slower.

Default: 2

max_hints_file_size_in_mb

The maximum size for a single hints file, in
megabytes.

Default: 128

hints_compression

The compressor for hint files. Supported compressors:
LZ, Snappy, and
Deflate.
When not set, the database does not compress hints
files.

Default: LZ4Compressor

batchlog_replay_throttle_in_kb

Total maximum throttle, in KB per second, for replaying
hints. Throttling is reduced proportionally to the
number of nodes in the
cluster

Default: 1024

batchlog_endpoint_strategy

Strategy to choose the batchlog storage endpoints.

random_remote - Default, purely random.
Prevents the local rack, if possible. Same
behavior as earlier releases.

dynamic_remote - Uses DynamicEndpointSnitch to
select batchlog storage endpoints. Prevents the
local rack, if possible. This strategy offers the
same availability guarantees as random_remote, but
selects the fastest endpoints according to the
DynamicEndpointSnitch. DynamicEndpointSnitch
tracks reads but not writes. Write-only, or
mostly-write, workloads might not benefit from
this strategy. Note: this strategy will fall back
to random_remote if dynamic_snitch is not
enabled.

dynamic - Mostly the same as dynamic_remote,
except that local rack is not excluded, which
offers lower availability guarantee than
random_remote or dynamic_remote. Note: this
strategy will fall back to random_remote if
dynamic_snitch is not enabled.

The authentication backend. The only supported
authenticator is DseAuthenticator for external
authentication with multiple authentication schemes
such as Kerberos, LDAP, and internal authentication.
Authenticators other than DseAuthenticator are
deprecated and not supported. Some security features
might not work correctly if other authenticators are
used. See authentication_options in
dse.yaml.

The authorization backend. Authorizers other than
DseAuthorizer are not supported. DseAuthorizer
supports enhanced permission management of
DSE-specific resources. Authorizers other than
DseAuthorizer are deprecated and not supported. Some
security features might not work correctly if other
authorizers are used. See Authorization options
indse.yaml.

Important: Use only authorization
implementations bundled with
DSE.

Default: com.datastax.bdp.cassandra.auth.DseAuthorizer

system_keyspaces_filtering

Whether to enable system keyspace filtering so that
users can access and view only schema information
for rows in the system and system_schema keyspaces
to which they have access.

The DSE Role Manager supports LDAP roles and internal
roles supported by the CassandraRoleManager. Role
options are stored in the dse_security keyspace.
When using the DSE Role Manager, increase the
replication factor of the dse_security keyspace.
Role managers other than DseRoleManager are
deprecated and not supported. Some security features
might not work correctly if other role managers are
used.

Important: Use only role manager
implementations bundled with
DSE.

Default: com.datastax.bdp.cassandra.auth.DseRoleManager

roles_validity_in_ms

Validity period for roles cache in milliseconds.
Determines how long to cache the list of roles
assigned to the user; users may have several roles,
either through direct assignment or inheritance (a
role that has been granted to another role). Adjust
this setting based on the complexity of your role
hierarchy, tolerance for role changes, the number of
nodes in your environment, and activity level of the
cluster.

Fetching permissions can be an expensive
operation, so this setting allows flexibility.
Granted roles are cached for authenticated
sessions in AuthenticatedUser.
After the specified time elapses, role validity is
rechecked. Disabled automatically when internal
authentication is not enabled when using DseAuthenticator.

0 - disable role caching

milliseconds - how long to
cache the list of roles assigned to the user

Default: 120000 (2 minutes)

roles_update_interval_in_ms

Refresh interval for roles cache. After this interval,
cache entries become eligible for refresh. On next
access, the database schedules an async reload, and
returns the old value until the reload completes. If
roles_validity_in_ms is non-zero,
then this value must also be non-zero. When not set,
the default is the same value as
roles_validity_in_ms.

Default: commented out (120000) ﻿

permissions_validity_in_ms

How long permissions in cache remain valid to manage
performance impact of permissions queries. Fetching
permissions can be resource intensive. Set the cache
validity period to your security tolerances. The
cache is used for the standard authentication and
the row-level access control (RLAC) cache. The cache
is quite effective at small durations.

0 - disable permissions cache

milliseconds - time, in
milliseconds

CAUTION:REVOKE
does not automatically invalidate cached
permissions. Permissions are invalidated the next
time they are refreshed.

Default: 120000 ﻿(2 minutes)

permissions_update_interval_in_ms

Sets refresh interval for the standard authentication
cache and the row-level access control (RLAC) cache.
After this interval, cache entries become eligible
for refresh. On next access, the database schedules
an async reload and returns the old value until the
reload completes. If permissions_validity_in_ms is
non-zero, the value for roles_update_interval_in_ms
must also be non-zero. When not set, the default is
the same value as permissions_validity_in_ms.

Default: commented out (2000) ﻿

permissions_cache_max_entries

The maximum number of entries that are held by the
standard authentication cache and row-level access
control (RLAC) cache. With the default value of
1000, the RLAC permissions cache can have up to 1000
entries in it, and the standard authentication cache
can have up to 1000 entries. This single option
applies to both caches. To size the permissions
cache for use with Setting up Row Level Access Control (RLAC), use
this formula:

Inter-node encryption options. If enabled, you must also
generate keys and provide the appropriate key and
truststore locations and passwords. No custom
encryption options are supported.

Tip: The passwords used in these options must match
the passwords used when generating the keystore
and truststore. For instructions on generating
these files, see Creating a Keystore
to Use with JSSE.

Encryption options for of inter-node communication using
the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite for
authentication, key exchange, and encryption of data
transfers. Use the DHE/ECDHE ciphers, such as
TLS_DHE_RSA_WITH_AES_128_CBC_SHA if running in
(Federal Information Processing Standard) FIPS 140
compliant mode.

all - Encrypt all inter-node communications

none - No encryption

dc - Encrypt the traffic between the datacenters (server only)

rack - Encrypt the traffic between the racks (server only)

Default: none

keystore

Relative path from DSE installation directory or
absolute path to the Java keystore (JKS) suitable
for use with Java Secure Socket Extension (JSSE),
which is the Java version of the Secure Sockets
Layer (SSL), and Transport Layer Security (TLS)
protocols. The keystore contains the private key
used to encrypt outgoing
messages.

Default: resources/dse/conf/.keystore

keystore_password

Password for the keystore. This must match the password
used when generating the keystore and
truststore.

Valid types are JKS, JCEKS, PKCS12, or PKCS11. For file-based keystores, use PKCS12.

Default: commented out (JKS)

truststore_type

Valid types are JKS, JCEKS, PKCS12, or PKCS11.

Default: commented out (JKS)

cipher_suites

Supported ciphers:

TLS_RSA_WITH_AES_128_CBC_SHA

TLS_RSA_WITH_AES_256_CBC_SHA

TLS_DHE_RSA_WITH_AES_128_CBC_SHA

TLS_DHE_RSA_WITH_AES_256_CBC_SHA

TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

Default: commented out

require_client_auth

Whether to enable certificate authentication for
node-to-node (internode) encryption. When not set,
the default is
false.

Default: commented out (false)

require_endpoint_verification

Whether to verify the connected host and the host name
in the certificate
match. When not set, the default is false.

Default: commented out (false)

Client-to-node encryption
options

Client-to-node encryption protects in-flight data from client
machines to a database cluster using SSL (Secure Sockets Layer) and establishes
a secure channel between the client and the coordinator node.

Whether to enable client-to-node encryption. You must
also generate keys and provide the appropriate key
and truststore locations and passwords. There are no
custom encryption options enabled for DataStax
Enterprise.

Advanced settings:

enabled

Whether to enable client-to-node
encryption.

Default: false

optional

Whether to allow unsecured connections when client
encryption is
enabled.

Default: false

keystore

Relative path from DSE installation directory or
absolute path to the Java keystore (JKS) suitable
for use with Java Secure Socket Extension (JSSE),
which is the Java version of the Secure Sockets
Layer (SSL), and Transport Layer Security (TLS)
protocols. The keystore contains the private key
used to encrypt outgoing
messages.

Default: resources/dse/conf/.keystore

keystore_password

Password for the
keystore.

Default: cassandra

require_client_auth

Whether to enable certificate authentication for
client-to-node encryption When not set, the default
is false.

Whether to enable user defined functions (UDFs), code
that is executed inside Cassandra daemons. UDFs can
present a security risk, since they are executed on
the server side. UDFs are executed in a sandbox to
control what code can be executed. See the DataStax
blog post User Defined
Functions.

Whether to enable sandbox for asynchronous JavaScript
UDF execution. Does not apply to Java UDFs.

true - Enabled. Only one instance of a
function can run at one time. Asynchronous
execution prevents UDFs from running too long or
forever and destabilizing the cluster.

false - Disabled. Allows multiple instances of
the same function to run simultaneously.

CAUTION: Disabling asynchronous UDF
execution implicitly disables the Java security
manager. You must monitor the read timeouts for
JavaScipt UDFs that run too long or forever, which
can cause the cluster to destabilize.

Default: true

user_defined_function_warn_micros

Threshold in microseconds (CPU time). When a UDF runs
too long and this threshold is exceeded, a warning
is logged and sent to the client. Java UDFs always
issue a warning. Scripted UDFs log a warning only if
enable_user_defined_functions_threads is
set to true.

Default: 500

user_defined_function_fail_micros

Threshold in microseconds (CPU time). When a fatal UDF
run-time situation is detected and this threshold is
exceeded, the UDF is stopped. Java UDFs always throw
an exception and stop. Scripted UDFs throw an
exception and stop only if enable_user_defined_functions_threads is
set to true.

Default: 10000

user_defined_function_warn_heap_mb

Threshold in MB for heap allocations. When this
threshold is exceeded, a warning is logged and sent
to the client. Java UDFs always issue a warning.
Scripted UDFs log a warning only if enable_user_defined_functions_threads is
set to true.

Default: 200

user_defined_function_fail_heap_mb

Threshold in MB for heap allocations. When this
threshold is exceeded, the UDF is stopped.

Options to tune continuous paging that pushes pages,
when requested, continuously to the client:

Maximum memory
used:

max_concurrent_sessions ⨉ max_session_pages ⨉ max_page_size_mb

Default: calculated (60 ⨉ 4 ⨉ 8 = 1920 MB)

Guidance

Because memtables and SSTables are used by the
continuous paging query, you can define the
maximum period of time during which memtables
cannot be flushed and compacted SSTables cannot be
deleted.

If fewer threads exist than sessions, a
session cannot execute until another one is
swapped out.

Distributed queries (CL > ONE or non-local
data) are swapped out after every page, while
local queries at CL = ONE are swapped out after
max_local_query_time_ms.

max_concurrent_sessions

The maximum number of concurrent sessions. Additional
sessions are rejected with an unavailable
error.

Default: 60

max_session_pages

The maximum number of pages that can be buffered for
each session. If the client is not reading from the
socket, the producer thread is blocked after it has
prepared
max_session_pages.

Default: 4

max_page_size_mb

The maximum size of a page, in MB. If an individual CQL
row is larger than this value, the page can be
larger than this value.

Default: 8

max_local_query_time_ms

The maximum time for a local continuous query to run.
When this threshold is exceeded, the session is
swapped out and rescheduled. Swapping and
rescheduling ensures the release of resources that
prevent the memtables from flushing and ensures
fairness when max_threads <
max_concurrent_sessions. Adjust when high write
workloads exist on tables that have continuous
paging requests.

Default: 5000

client_timeout_sec

How long the server will wait, in seconds, for clients
to request more pages if the client is not reading
and the server queue is full.

Default: 600

cancel_timeout_sec

How long to wait before checking if a paused session can
be resumed. Continuous paging sessions are paused
because of backpressure or when the client has not
request more pages with backpressure
updates.

Default: 5

paused_check_interval_ms

How long to wait, in milliseconds, before checking if a
continuous paging sessions can be resumed, when that
session is paused because of
backpressure.

Default: 1

Fault detection setting

# phi_convict_threshold: 8

phi_convict_threshold

The sensitivity of the failure detector on an
exponential scale. Generally, this setting does not need
adjusting.