As we know, DC/OS gave us minimal packages of services for running on apache mesos like cassandra. Cassandra services only give us the functionality for storing data , not other things like native tools as `nodetool`, `cqlsh`, `cassandra-stress` and more. Today, we are looking into the way for using `nodetool`, for monitoring DC/OS cassandra services nodes or cluster.

Brief:

Nodetool is a native command line interface for managing and monitoring cassandra cluster. While our cassandra services are running on DC/OS, we can manage and monitor cluster using nodetool. We need to execute nodetool on DC/OS using cassandra docker image on any of the DC/OS cassandra nodes by performing below steps:

Note: Sometime nodetool commands may not work if Cassandra version and Nodetool version is incompatible e.g: Cassandra Version 3.0 and Nodetool version 3.10, nodetool will not be able to run all commands.

Below, we are looking into some of the important nodetool commands for monitoring perspective of cassandra cluster. But still, nodetool have lots of commands, which we are not discussing now. For reference please click on this link.

1. CASSANDRA CLUSTER STATUS

For looking into cassandra cluster health or status using node tool, we need to execute below command:

sudo docker run -t --net=host cassandra:3.0 nodetool -p 7199 status

OUTPUT:

—

Address

Load

Tokens

Owns (effective)

Host ID

Rack

UN

23.26 GB

256

21.7%

rac1

UN

20.48 GB

256

19.7%

rac1

UN

21.62 GB

256

19.7%

rac1

U indicate whether the node is UP.

N indicates whether the node is NORMAL.

DC/OS cassandra node IP address.

Load: the amount of filesystem data under the cassandra data directory and updated every 90 seconds.

5. MONITOR CLUSTER THREADS AND PENDING PROCESS

Cassandra is based on a Staged Event Driven Architecture (SEDA). Different tasks are separated into stages that are connected by a messaging service. Stages have a queue and thread pool. Some stages skip the messaging service and queue tasks immediately on a different stage when it exists on the same node. The queues can back up if executing at the next stage is too busy and cause performance bottlenecks.