Snapshots

The basic way to backup Cassandra is to take a snapshot. Since sstables are immutable, and since the snapshot command flushes data from memory before taking the snapshot, this will provide a complete backup.

Use nodetool snapshot to take a snapshot of sstables. You can specify a particular keyspace as an optional argument to the command, like nodetool snapshot keyspace1. This will produce a snapshot for each table in the keyspace, as shown in this sample output from nodetool listsnapshots:

The first column is the snapshot name, to refer to the snapshot in other nodetool backup commands. You can also specify tables in the snapshot command.

The output at the end of the list of snapshots — for example, Total TrueDiskSpaceUsed: 5.42 MiB — shows, as the name suggests, the actual size of the snapshot files, as calculated using the walkFileTree Java method. Verify this by adding up the files within each snapshots directory under your data directory keyspace/tablename (e.g., du -sh /var/lib/cassandra/data/keyspace1/standard1*/snapshots).

To make the snapshots more human readable, you can tag them. Running nodetool snapshot -t 2018June05b_DC1C1_keyspace1 keyspace1 results in a more obvious snapshot name as shown in this output from nodetool listsnapshots:

The default snapshot name is already a timestamp (number of milliseconds since the Unix epoch), but it’s a little hard to read. You could get the best of both worlds by doing something like (depending on your operating system): nodetool snapshot -t keyspace1_date +”%s” keyspace1. I like how the results of listsnapshots sorts that way, too. In any case, with inevitable snapshot automation, the human-readable factor becomes largely irrelevant.

You may also see snapshots in this listing that you didn’t take explicitly. By default, auto_snapshot is turned on in the cassandra.yaml configuration file, causing a snapshot to be taken anytime a table is truncated or dropped. This is an important safety feature, and it’s recommended that you leave it enabled. Here’s an example of a snapshot created when a table is truncated:

To preserve disk space (or cost), you will want to eventually delete snapshots. Use nodetool clearsnapshot with the -t flag and the snapshot name (recommended, to avoid deleting all snapshots). Specifying — and the keyspace name will additionally filter the deletion to the keyspace specified. For example, nodetool clearsnapshot -t 1528233451291 — keyspace1 will remove just the two snapshot files listed above, as reported in this sample output:

Requested clearing snapshot(s) for [keyspace1] with snapshot name [1528233451291]

Note that if you forget the -t flag or the — you will get undesired results. Without the -t flag, the command will not read the snapshot name, and without the — delimiter, you will end up deleting all snapshots for the keyspace. Check syntax carefully.

The sstables are not tied to any particular instance of Cassandra or server, so you can pass them around as needed. (For example, you may need to populate a test server.) If you put an sstable in your data directory and run nodetool refresh, it will load into Cassandra. Here’s a simple demonstration:

This simple command has obvious implications for your backup and restore automation.

Incrementals

Incremental backups are taken automatically — generally more frequently than snapshots are scheduled — whenever sstables are flushed from memory to disk. This provides a more granular point-in-time recovery, as needed.

There’s not as much operational fun to be had with incremental backups. Use nodetool statusbackup to show if they are Running or not. By default, unless you’ve changed the cassandra.yaml configuration file, they will be not running. Turn them on with nodetool enablebackup and turn them off with nodetool disablebackup.

A nodetool listbackups command doesn’t exist, but you can view the incremental backups in the data directory under keyspace/table/backups. The backups/snapshots nomenclature is truly confusing, but you could think of snapshots as something you do, and backups as something that happen.

Restoring from incrementals is similar to restoring from a snapshot — copying the files and running nodetool refresh — but incrementals require a snapshot.

These various nodetool commands can be used in combination in scripts to automate your backup and recovery processes. Don’t forget to monitor disk space and clean up the files created by your backup processes.

PYTHIAN®, LOVE YOUR DATA®, and ADMINISCOPE® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.