Node repair is the process that the database uses to make sure data in every replica is consistent with other replicas.

Partitioners

A partitioner determines how data is distributed across the nodes in the cluster
(including replicas).

cassandra.yaml

The location of
the cassandra.yaml file depends on the type of
installation:

Package installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations

installation_location/resources/cassandra/conf/cassandra.yaml

A partitioner determines how data is distributed across the nodes in the cluster
(including replicas).

A partitioner is a function for deriving a token representing a partition key, typically
by hashing. Each row of data is then distributed across the cluster by the value of the
token. Any IPartitioner may be used, including your own, as long as
it is in the classpath.

The default Murmur3Partitioner uses tokens to help assign equal portions
of data to each node and evenly distribute data from all the tables throughout the ring
or other grouping, such as a keyspace. This is true even if the
tables use different partition keys, such as
user names or timestamps. Because each part of the hash range receives an equal number
of partitions on average, the read and write requests to the cluster are evenly
distributed and load balancing is simplified. For more information, see Consistent hashing.

Virtual nodes: Use either the allocation algorithm or the random selection
algorithm to specify the number of tokens distributed to nodes within the
datacenter. All systems in the datacenter must use the same algorithm.

Note: Be aware that partitioners are not compatible with each other. Data partitioned with
one partitioner cannot be easily converted to the other partitioner.

DataStax Enterprise supplies the following partitioners:

Murmur3Partitioner (default)

Uniformly distributes data across the cluster based on
MurmurHash hash values. This hashing function creates a
64-bit hash value of the partition key with a possible range from
-263 to +263-1. This partitioner is the right choice
for new clusters in almost all cases and is 3 to 5 times more performant than
the RandomPartitioner.

When using this partitioner, you can page through all rows using the token function in a CQL query.

RandomPartitioner

This partitioner is included for backwards compatibility. It uniformly
distributes data evenly across the nodes using an MD5 hash value of the row key.
The possible range of hash values is from 0 to 2127 -1. Because it
uses a cryptographic hash, which isn't required by the database, it takes longer
to generate the hash value than the Murmur3Partitioner.

When using this partitioner, you can page through all rows using the token function in a CQL query.

ByteOrderedPartitioner

This partitioner is included for backwards compatibility. This partitioner
orders rows lexically by key bytes. It is not recommended because it requires
significant administrative overhead to load balance the cluster, sequential
writes can cause hot spots, and balancing for one table can result in uneven
distribution for another table in the same cluster.

DataStax recommends using the default partitioner, Murmur3Partitioner.
You can change the partitioner in the cassandra.yaml
file.