Monitor performance metrics in the OpsCenter Dashboard. Real-time and historical performance metrics are available at
different granularities: cluster-wide, per node, per table (column family), or storage tier.

Configure alert thresholds for Cassandra cluster-wide, table, and operating system metrics in the Alerts area of OpsCenter.
This proactive monitoring feature is available for DataStax Enterprise clusters.

Alert metrics

Configure alert thresholds for Cassandra cluster-wide, table, and operating system
metrics in the Alerts area of OpsCenter. This proactive monitoring feature is available for
DataStax Enterprise clusters.

From the Alerts area of OpsCenter, configure alert thresholds for Cassandra cluster-wide,
table, and operating system metrics. This proactive monitoring feature is available for
DataStax Enterprise clusters.

Commonly watched alert metrics

Commonly watched metrics are available from the main Notify me when
choice menu in the Add Alert dialog.

Metric

Definition

Node down

When a node does not respond to requests, OpsCenter marks the node as down. To
determine whether a node is down, each agent gets a list of nodes that its node
suspects are down based on information from Cassandra returned via JMX. Based on
that information, Opscenterd determines whether a node is truly down based on status
reported by other nodes, or if a node is simply flapping and erroneously reporting
all other nodes down. Nodes marked with a down status are clearly indicated in the
Nodes Ring View. For even more awareness and
visibility, see adding an alert for down
nodes for further instructions.

Write requests

The number of write requests per second. Monitoring the number of writes over a
given time period can give you and idea of system write workload and usage
patterns.

Write request latency

The response time (in milliseconds) for successful write operations. The time
period starts when a node receives a client write request, and ends when the node
responds back to the client.

Read requests

The number of read requests per second. Monitoring the number of reads over a
given time period can give you and idea of system read workload and usage
patterns.

Read request latency

The response time (in milliseconds) for successful read operations. The time
period starts when a node receives a client read request, and ends when the node
responds back to the client.

CPU usage

The percentage of time that the CPU was busy, which is calculated by
subtracting the percentage of time the CPU was idle from 100 percent.

Load

Load is a measure of the amount of work that a computer system performs. An
idle computer has a load number of 0 and each process using or waiting for CPU time
increments the load number by 1.

Advanced Cassandra alert metrics

The maximum amount of shared memory allocated to the JVM heap for Cassandra
processes.

Heap used

The amount of shared memory in use by the JVM heap for Cassandra
processes.

JVM CMS collection count

The number of concurrent mark-sweep (CMS) garbage collections performed by the
JVM per second.

JVM ParNew collection count

The number of parallel new-generation garbage collections performed by the JVM
per second.

JVM CMS collection time

The time spent collecting CMS garbage in milliseconds per second
(ms/sec).

JVM ParNew collection time

The time spent performing ParNew garbage collections in ms/sec.

Data size

The size of table data (in gigabytes) that has been loaded/inserted
into Cassandra, including any storage overhead and system metadata.

Compactions pending

The number of compaction operations that are queued and waiting for system
resources in order to run. The optimal number of pending compactions is 0 (or at
most a very small number). A value greater than 0 indicates that read operations are
in I/O contention with compaction operations, which usually manifests itself as
declining read performance.

Total bytes compacted

The number of SSTable data compacted in bytes per second.

Total compactions

The number of compactions (minor or major) performed per second.

Flush sorter tasks pending

The flush sorter process performs the first step in the overall process of
flushing memtables to disk as SSTables. The optimal number of pending flushes is 0
(or at most a very small number).

Flushes pending

The flush process flushes memtables to disk as SSTables. This metric shows the
number of memtables queued for the flush process. The optimal number of pending
flushes is 0 (or at most a very small number).

Gossip tasks pending

Cassandra uses a protocol called gossip to discover location and state
information about the other nodes participating in a Cassandra cluster. In
Cassandra, the gossip process runs once per second on each node and exchanges state
messages with up to three other nodes in the cluster. Gossip tasks pending shows the
number of gossip messages and acknowledgments queued and waiting to be sent or
received. The optimal number of pending gossip tasks is 0 (or at most a very small
number).

Hinted hand-off pending

While a node is offline, other nodes in the cluster will save hints about rows
that were updated during the time the node was unavailable. When a node comes back
online, its corresponding replicas will begin streaming the missed writes to the
node to catch it up. The hinted hand-off pending metric tracks the number of hints
that are queued and waiting to be delivered once a failed node is back online again.
High numbers of pending hints are commonly seen when a node is brought back online
after some down time. Viewing this metric can help you determine when the recovering
node has been made consistent again.

Internal response pending

The number of pending tasks from various internal tasks such as nodes joining
and leaving the cluster.

Manual repair tasks pending

The number of operations still to be completed when you run anti-entropy repair
on a node. It will only show values greater than 0 when a repair is in progress. It
is not unusual to see a large number of pending tasks when a repair is running, but
you should see the number of tasks progressively decreasing.

Memtable postflushers pending

The memtable post flush process performs the final step in the overall process
of flushing memtables to disk as SSTables. The optimal number of pending flushes is
0 (or at most a very small number).

Migrations pending

The number of pending tasks from system methods that have modified the schema.
Schema updates have to be propagated to all nodes, so pending tasks for this metric
can manifest in schema disagreement errors.

Miscellaneous tasks pending

The number of pending tasks from other miscellaneous operations that are not
ran frequently.

Read requests pending

The number of read requests that have arrived into the cluster but are waiting
to be handled. During low or moderate read load, you should see 0 pending read
operations (or at most a very low number).

Read repair tasks pending

The number of read repair operations that are queued and waiting for system
resources in order to run. The optimal number of pending read repairs is 0 (or at
most a very small number). A value greater than 0 indicates that read repair
operations are in I/O contention with other operations.

Replicate on write tasks pending

When an insert or update to a row is written, the affected row is replicated to
all other nodes that manage a replica for that row. This is called the
ReplicateOnWriteStage. This metric tracks the pending tasks
related to this stage of the write process. During low or moderate write load, you
should see 0 pending replicate on write tasks (or at most a very low
number).

Request response pending

Streaming of data between nodes happens during operations such as bootstrap and
decommission when one node sends large numbers of rows to another node. The metric
tracks the progress of the streamed rows from the receiving node.

Streams pending

Streaming of data between nodes happens during operations such as bootstrap and
decommission when one node sends large numbers of rows to another node. The metric
tracks the progress of the streamed rows from the sending node.

Write requests pending

The number of write requests that have arrived into the cluster but are waiting
to be handled. During low or moderate write load, you should see 0 pending write
operations (or at most a very low number).

Advanced table alert metrics

The write load on a table measured in operations per second. This
metric includes all writes to a given table, including write requests
forwarded from other nodes.

Local write latency

The response time in milliseconds for successful write operations on a table.
The time period starts when nodes receive a write request, and ends when nodes
respond.

Local reads

The read load on a table measured in operations per second. This metric
includes all reads to a given table, including read requests forwarded from
other nodes.

Local read latency

The response time in microseconds for successful read operations on a table.
The time period starts when a node receives a read request, and ends when the node
responds.

Table key cache hits

The number of read requests that resulted in the requested row key being found
in the key cache.

Table key cache requests

The total number of read requests on the row key cache.

Table key cache hit rate

The key cache hit rate indicates the effectiveness of the key cache for a given
table by giving the percentage of cache requests that resulted in a cache
hit.

Table row cache hits

The number of read requests that resulted in the read being satisfied from the
row cache.

Table row cache requests

The total number of read requests on the row cache.

Table row cache hit rate

The key cache hit rate indicates the effectiveness of the row cache for a given
table by giving the percentage of cache requests that resulted in a cache
hit.

Table bloom filter space used

The size of the bloom filter files on disk.

Table bloom filter false positives

The number of false positives, which occur when the bloom filter said the row
existed, but it actually did not exist in absolute numbers.

Table bloom filter false positive ratio

The fraction of all bloom filter checks resulting in a false positive.

Live disk used

The current size of live SSTables for a table. It is expected that SSTable size
will grow over time with your write load as compaction processes continue doubling
the size of SSTables. Monitor the current state of compaction for a given table
using this metric together with SSTable count.

Total disk used

The current size of the data directories for the table including space
not reclaimed by obsolete objects.

SSTable count

The current number of SSTables for a table. When table memtables are persisted
to disk as SSTables, this metric increases to the configured maximum before the
compaction cycle is repeated. Monitor the current state of compaction for a given
table using this metric together with live disk used.

Pending reads and writes

The number of pending reads and writes on a table. Pending operations indicate
Cassandra is not keeping up with the workload. A value of zero indicates healthy
throughput.

From the Nodes section of OpsCenter, select different views (Ring or List) of the nodes that comprise a DataStax Enterprise
cluster and perform node management. View the status of agents, install and upgrade agents, and troubleshoot any agent
issues in the Agents view.

OpsCenter manages multiple DataStax Enterprise clusters with a single install of the central opscenterd server. Administer
your clusters using the options available from the Cluster Actions menu. Generate reports from the Help menu.

Monitor performance metrics in the OpsCenter Dashboard. Real-time and historical performance metrics are available at
different granularities: cluster-wide, per node, per table (column family), or storage tier.

Cluster metrics monitor cluster performance at a high level. Cluster metrics are aggregated across all nodes in the cluster.
OpsCenter tracks a number of cluster-wide metrics for read performance, write performance, memory, and capacity.

Pending task metrics track requests that have been received by a node but are waiting to be processed. An accumulation
of pending tasks on a node can indicate a potential bottleneck in performance and should be investigated.

Table (formerly column family) metrics allow drilling down and locating specific areas of application workloads that are
the source of performance issues. If you notice a performance trend at the OS or cluster level, viewing table metrics
can provide a more granular level of detail.

Configure alert thresholds for Cassandra cluster-wide, table, and operating system metrics in the Alerts area of OpsCenter.
This proactive monitoring feature is available for DataStax Enterprise clusters.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.