DSE Search allows you to quickly find data and provide a modern search experience for your users, helping you create
features like product catalogs, document repositories, ad-hoc reporting engines, and more.

Documentation for configuring and using configurable distributed data replication.

Using DSEFS

Steps to use DSEFS, configure data replication, and other functions, including setting
the Kafka log retention.

You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter, and perform other functions,
including setting the Kafka log retention.

DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as
described in the steps below.

dse.yaml

The location of the
dse.yaml file depends on the type of
installation:

Package installations

/etc/dse/dse.yaml

Tarball installations

installation_location/resources/dse/conf/dse.yaml

DSEFS limitations

Know these limitations when you
configure and tune DSEFS. The following functionality and features are not supported:

Encryption.

Use operating system access controls to protect the local DSEFS data
directories.

File system consistency checks (fsck) and file repair have only
limited support. Running fsck will re-replicate blocks that were
under-replicated because a node was taken out of a cluster.

File repair.

Forced rebalancing, although the cluster will eventually reach balance.

Checksum.

Automatic backups.

Multi-datacenter replication.

Symbolic links (soft links, symlinks) and hardlinks.

Snapshots.

Procedure

Configure replication for the metadata and the data blocks.

You must set the replication factor appropriately to prevent data loss in the case of node
failure. Replication factors must be set for both the metadata and the data blocks. The
replication factor of 3 for data blocks is suitable for most use-cases.

Globally: set replication for the metadata in the dsefs keyspace that is
stored in the database.

For example, use a CQL statement to configure a replication factor of 3 on the
Analytics datacenter using
NetworkTopologyStrategy:

For example, in a cluster with multiple datacenters, the keyspace names
dsefs1 and dsefs2 define separate file systems in each
datacenter.

When bouncing a streaming application, verify the Kafka
log configuration (especially log.retention.check.interval.ms and
policies.log.retention.bytes). Ensure the Kafka log retention policy is
robust enough to handle the length of time expected to bring the application and consumers back
up.

For example, if the log retention policy is too conservative and deletes or rolls are
logged very frequently to save disk space, the users are likely to encounter issues when
attempting to recover from a checkpoint that references offsets that are no longer maintained
by the Kafka logs.