By clicking or navigating this website site, you agree to allow our collection of information on Scaleway to offer you an optimal user experience and to keep track of statistics through cookies. Learn more about our Cookie Policy.

Configure a Cassandra Cluster on Ubuntu Bionic Beaver

Cassandra Overview

Apache Cassandra is a replicated NoSQL database and an ideal solution for situations that require maximum data redundancy, uptime and horizontal scaling across multiple servers. It is an open source application that can easily be managed from a simple command line interface using Cassandra Query Language (CQL) which is very similar to Structured Query Language, making it easy to learn for users that are already firm with SQL.

Important: You have to agree to the license terms for Oracle Java when installing it.

Repeat the steps above on three instances in total.

Configuring Additional Nodes

Configuration files of Cassandra are located in the /etc/cassandra directory. cassandra.yaml is the file containing most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses.

The key points to edit are:

cluster_name: Can be anything chosen by you to describe the clusters name. All members of a cluster must have the same name.

num_tokens: This value represents the number of virtual nodes within a Cassandra instance. It is used to partition the data and to spread it throughout the cluster. A good starting value is 256.

seeds: These are the IP addresses of the clusters seed servers. Seed nodes are used as known places to obtain cluster information (such as a list of nodes in the cluster). All active nodes have this information, to avoid a single point of failure. They are known locations that can be relied on, to have the information when other machines can come and go. It is recommended to have 3 seed nodes per datacenter.

listen_address: This is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication. The software will try to guess your machines IP address if you leave it blank, but it’s best to specify it yourself. This information will be specific on each node.

rpc_address: This is the IP address that Cassandra will listen on for client based communication, such as through the CQL protocol. This information will change on each node.

endpoint_snitch: Represents the ‘snitch’ used by Cassandra. A snitch tells Cassandra which datacenter and rack a node belongs to within a cluster. There are various types that could be used here, you may refer to the official documentation for more information on this topic.

To be fault tolerant and to minimize the risk of data loss or downtime, Cassandra distributes data across the cluster. Whenever possible it will ensure that data and backups are stored on a different rack, or datacenter to ensure that the impact of even a failing datacenter will be minimal on the production environment.

2 . Edit the /etc/cassandra/cassandra-rackdc.properties file on each node and set the DC and rack information. You can use your own naming standard to determine the location of each node.

On Node 1:

dc=dc1
rack=rack1

On Node 2:

dc=dc1
rack=rack1

On Node 3:

dc=dc1
rack=rack2

3 . Remove the file /etc/cassandra/cassandra-topology.properties as we do not use it: