cbbackupmgr tutorial

A quick guide to using cbbackupmgr

DESCRIPTION

A tutorial that goes gives examples of how to use all of the commands in
cbbackupmgr effectively.

TUTORIAL

In this tutorial we will show how to take backups and restore data using
cbbackupmgr. This tutorial uses a cluster that contains both the travel-sample
and beer-sample buckets installed and requires modifying some of the documents
in the travel-sample bucket. To make it easier to set up a cluster and edit/get
documents the following scripts are provided at
http://github.com/couchbaselabs/backup-tutorial. You can then find scripts
corresponding to your version of Couchbase. We will reference other scripts in
this github repository later in the tutorial so it is recommended that you
download these scripts. The only requirement for running the scripts is that you
have curl installed. To automatically setup the cluster in the appropriate state
required for this tutorial download and install Couchbase and then run the
01-initialize.sh script. If you do not want to use this script then you can
navigate through the Couchbase setup process and initialize the cluster will all
available services and install the travel-sample and beer-sample sample data
bucket.

Using this cluster we will show how the incremental/merge approach taken by
cbbackupmgr reduces time and overhead on your cluster.

Configuring a Backup

Before getting started with cbbackupmgr we must first decide the directory where
to store all of our backups. This directory is referred to as the backup
archive. The backup archive contains one or more backup repositories. These
backup repositories are where your backups will be contained. The easiest way to
think of a backup repository is that it corresponds directly to a single cluster
that you want to back up. The backup repository also contains a configuration
for how to back that cluster up. A backup repository is created by using the
config subcommand. In this tutorial we will use a backup archive located at
/data/backup. The backup archive is automatically created if the directory
specified is empty. Below is an example of how to create a backup repository
called "cluster" which will backup all data and index definitions from all
buckets in the target cluster.

One of the most important aspects of backup repository creation is that we can
configure that backup repository in many different ways to change the way
backups in each backup repository are taken. Let’s say we want a separate backup
of only the index definitions in the travel-sample bucket. To do this we can
create a separate backup repository called "single".

The config subcommand provides many options in order to customize how you backup
your data. See the cbbackupmgr-config page for more information
about what options are available and how they are used.

Backing up a Cluster

Now that we have created some backup repositories we should take a look at our
backup archive to see what it looks like. The easiest way to do this is to use
the list subcommand. This subcommand is used to examine a backup archive and
gives information about how much data is stored in it. To see the entire backup
archive we can run the command below.

$ cbbackupmgr list -a /data/backup

Size Items Name
0B - /
0B - + cluster
0B - + single

The list subcommand gives us a directory print out of all of the backup
repositories and backups in out backup archive. Since there are no backups yet
we can just see our archives list in the output of this command. There is also
information about how much disk space each folder and file contains and, if
applicable, how many items are backed up in those folders/files. More
information about the list subcommand can be found in the
cbbackupmgr-list page.

Now that we have our backup repositories configured it’s time to start taking
backups. Since the backup repository contains all of the configuration
information for how the backup should be taken we just need to specify the
backup repository name and the information for the target cluster we intend to
back up. Below is an example of how to take a backup on the "cluster" backup
repository. We will assume that we have our cluster running on localhost.

When the backup command is executed it will by default print out a progress bar
which is helpful for understand how long your backup will take to complete and
the rate of data movement. While the bakcup is running the progress bar will
give an estimated time to completion, but this will change to average backup
rate when the backup finishes. Information is also provided on the total data
and items already backed up and the current rate of data movement. If the backup
completes successfully you will see the "Backup completed successfully" message
as the last line printed.

Let’s also run the backup on the "single" backup repository to see how the two
backup runs differ.

Since the "single" backup repository is only configured to back up index
definitions for the travel-sample bucket we can we do not see a progress bar for
the beer-sample bucket. We can also see that the backup executed quicker since
the was much less data to actually back up.

Since we now have backups in our backup archive let’s take a look at the state
of our backup archive has changed by using the list subcommand.

Now that we have some backups defined the output of the list subcommand is much
more useful. We can see that our "cluster" backup repository contains one backup
with a name corresponding to the time the backup was taken. That backup also
contains two buckets and we can see various files in each of those backups with
their size and item counts. The "single" backup repository also contains one
backup, but this backup only contains the travel-sample bucket and contains 0
data items.

One of the most important features of cbbackupmgr is that it is an
incremental-only backup utility. This means that once we have backed up some
data we will never need to back it up again. In order to simulate some changes
on the cluster we can run the 02-modify.sh script from the backup-tutorial
github repository mentioned at the beginning of the tutorial. If you do not have
this script then you will need to modify two documents and add two new documents
to the travel-sample bucket. After we have modified some data we will run the
backup subcommand on the "cluster" backup repository again.

In this backup notice that since we updated 2 items and created two items that
this is all that we need back up during this run. If we list the backup archive
using the list subcommand then we will see that the backup archive looks like
something like what is below.

Restoring a Backup

Now that we have some backup data let’s restore that data backup to the cluster.
In order to restore data we just need to know the name of the backup that we
want to restore. To find the name we can again use the list subcommand in order
to see what is in our backup archive. The backup name will always be a
timestamp. For example, let’s say we want to restore the
2016-03-22T10_26_08.933579821-07_00 from the "cluster" backup repository. In
order to do this we run the command below.

In the command above we use the --start and --end flags to specify the range of
backups we want to restore. Since we are only restoring one backup we specify
the same value for both --start and --end. We also added the --force-updates
flag in order to skip Couchbase conflict resolution. This tells cbbackupmgr to
force overwrite key-value pairs being restored even if the key-value pair on the
cluster is newer and the one being restored. If we look at the two values that
we updated on the cluster we will now see that they have been reverted back to
what they were at the time we took the initial backup. If you used the script in
the backup-tutorial github repository to update documents then you an use the
03-insepct.sh script to see the state of the updated documents after the
restore.

The restore subcommand also allows for you to exclude data that was backed up
from the restore and provides various other options. See the
cbbackupmgr-restore page for more information on restoring data.

Merging backups

Using an incremental backup solution means that each backup we take increases
disk space. Since disk space in not infinite we need to be able to reclaim this
disk space. In order to do this we use the cbbackupmgr-merge
subcommand to merge two or more backups together. Since we have two backups in
the "cluster" backup repository we will merge these backups together using the
command below.

We can see from the list command that there is now a single backup in the
"cluster" backup repository. This backup has a name that reflects the name of
the most recent backup in the merge. It also has 31593 data items in the
travel-sample bucket. This is two more items than the original backup we took
because the second backup had two new items. The two items that were updated
were de-duplicated during the merge so they do not add extra items to the count
displayed by the list subcommand.

For more information on how the merge command works as well as information on
other ways the merge command can be used see the cbbackupmgr-merge
page.

Removing a Backup Repository

If no longer need a backup repository then we can use the remove subcommand to
remove the backup repository. Below is an example showing how to remove the
"cluster" backup repository.