Information about configuring DataStax Enterprise, including using virtual nodes; setting up security; storing
and accessing data exclusively from memory; setting up distributed data replication from remote clusters;
running multiple DataStax Enterprise nodes on a single host machine, and automating the movement of data
across different types of storage media.

Information about configuring DataStax Enterprise, including using virtual nodes; setting up security; storing
and accessing data exclusively from memory; setting up distributed data replication from remote clusters;
running multiple DataStax Enterprise nodes on a single host machine, and automating the movement of data
across different types of storage media.

Virtual node (vnode) configuration

A description of virtual nodes (vnodes) and how to use them in different types of
datacenters. Also steps for disabling vnodes.

Virtual nodes simplify many tasks in DataStax Enterprise, such as eliminating the
need to determine the partition range (calculate and assign tokens), rebalancing the
cluster when adding or removing nodes, and replacing dead nodes. For a complete
description of virtual nodes and how they work, see Virtual nodes.

DataStax Enterprise requires the same token architecture on all nodes in a
datacenter. The nodes must all be vnode-enabled or single-token architecture. Across
the entire cluster, datacenter architecture can vary. For example, a single cluster
with:

A transaction-only datacenter running OLTP.

A single-token architecture analytics datacenter (no vnodes).

A search datacenter with vnodes.

Guidelines for using virtual nodes

Whether virtual nodes (vnodes) are enabled or disabled depends on the initial
cassandra.yaml settings. There are two methods of
distributing token ranges. DataStax recommends using the allocation algorithm. Use
the same method on all systems in the datacenter.

Allocation algorithm

Optimizes token range distribution between nodes and racks in the
datacenter based on the keyspace replication factor (allocate_tokens_for_local_replication_factor) of the
datacenter. Distributes the token ranges proportionately using the num_tokens settings. All
systems in the datacenter should have the same
num_token settings unless the systems performance
varies between systems. To distribute more of the workload to the higher
performance hardware, increase the number of tokens for those
systems.

The allocation algorithm efficiently balances the
workload using fewer tokens; when systems are added to a datacenter,
the algorithm maintains the balance. Using a higher number of tokens
more evenly distributes the workload, but also significantly
increases token management overhead.

CAUTION: When
adding multiple nodes to the cluster using the allocation algorithm,
ensure that nodes are added one at a time. If nodes are added
concurrently, the algorithm assigns the same tokens to different
nodes.

DataStax recommends using 8 vnodes (tokens).
This distributes the workload between systems with a ~10% variance
and has minimal impact on performance. Set the number of vnode
tokens (num_tokens) based on the workload
distribution requirements of the datacenter:

Table 1. Allocation algorithm workload distribution
variance

Replication factor

4 vnode (tokens)

8 vnode (tokens)

64 vnode (tokens)

128 vnode (tokens)

2

~17.5%

~12.5%

~3%

~1%

3

~14%

~10%

~2%

~1%

5

~11%

~7%

~1%

~1%

Enabling vnodes

(Recommended) To use the allocation algorithm uncomment allocate_tokens_for_local_replication_factor and set it to the
target replication factor for the keyspaces in the datacenter. If the
replication varies, alternate between the replication factor (RF) settings.

Disabling vnodes

Important: If you do not use vnodes, you must make sure that each node is
responsible for roughly an equal amount of data. To ensure that each node is
responsible for an equal amount of data, assign each node an initial_token value and calculate the
tokens for each datacenter as described in Generating tokens.