Search

Archiving Data with Snapshots in LVM2

Sometimes we use a technology even though we're
unaware of its full features and capabilities and how they may be able
to benefit us. One such feature is the data snapshot. The snapshot is
a single state (that is, a copy) of a storage volume at a particular point
in time. A volume can refer to a disk device or partition. The snapshot
is primarily a data backup technology. Directed toward larger storage
capacities, utilizing such a technique has advantages. For
instance, full backups of an entire volume can take a long time and also
use large amounts of storage space, even for files that will remain
unchanged for some time to come. Also, when performing a data backup
of entire volumes or subsets of volumes in a symmetric multiprocessing
environment, write operations still may continue to modify the file data
on that volume, preventing atomicity and, in turn, possibly leading to data
corruption. There are ways around the latter in which the volume can
be taken off-line or marked as read-only prior to the archival process,
but in high-availability production environments, that may never be an
option. This is where the snapshot comes in.

Used to avoid downtime and retain atomicity, the snapshot provides
a read-only copy of a specified volume at a specific point in time,
usually implemented by a copy-on-write mechanism. Some vendors and
software implementations are known to support write commands via a
concept known as branching snapshots, in which diverging versions of
data are created via an extremely complex system of pointers, all based
on the original snapshot. When you write to a snapshot or the original
volume, the write will not be seen by the other. The way this works is
when a volume marked for snapshot gets written to and data is modified,
the original and unchanged data block(s) or file data (in the case of
a file-based snapshot) will be copied to the space allocated for the
snapshot. After all original and unmodified data are copied over to the
snapshot, the original volume will be updated with the new data. When
the snapshot volume needs to be mounted, using a system of pointers,
the snapshot will reference the parent volume with the original data
saved in the snapshot. With such a technique, it now becomes possible to
archive valuable data incrementally without losing productivity or the
risk of suffering from any data corruption.

The use of snapshot technologies can be seen in a variety of environments,
ranging from external storage controllers, filesystems, virtual machines
(such as VMware, VirtualBox and so on), databases and even volume managers,
which is the focus of this article. Here, I cover the snapshot
feature found in LVM2 and how to manage it, all from the command line.

Note:

LVM2 refers to a collection of user-space tools that provide logical
volume management on Linux.

The Linux Logical Volume Manager

The second generation of the Linux Logical Volume Manager (LVM2)
is a logical volume manager capable of pooling multiple
storage devices together to represent a single volume or volumes, either in a
striped or mirrored fashion. Everything is created and managed on a layer-by-layer
basis. First is the physical volume. It is followed
by the volume group and then the mountable logical volume itself. Most
mainstream Linux distributions usually have the LVM2 userland tools
preinstalled. If you find that it's not installed on your distribution,
download and install it via your distribution's package repository.

The idea is almost similar in concept to the Redundant Array of
Independent Disks (RAID), and although LVM2 does not support any
parity-driven striping, it still adds additional value. For instance, LVM2
allows for the uninterrupted addition, removal and replacement of storage
devices. It makes for easy dynamic resizing of volume groups and logical
volumes. Most important, it supports the snapshot—the focus of this
article. As of LVM2, write operations are supported to snapshot volumes.

As mentioned earlier, LVM2 volumes utilize a layered structure—that is,
physical volumes (or PVs) must be created from a physical disk device. This
can be accomplished with the pvcreate command followed by the list of
physical partitions to label for LVM2 usage:

$ sudo pvcreate /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

With the newly labeled physical volumes, volume groups (or VGs) need
to be created with the vgcreate command, followed by a name for the volume
group and then a list of all physical volumes to use:

By default, the volume groups are located in the /dev directory path. It
is with this volume group that logical volumes (or LVs) can be created,
formatted with a filesystem, mounted and, in turn, used for file I/O
operations. The best feature of creating logical volumes is that you
can use some or all available capacity of the VG. For instance, if a 1GB LV needs to be created from the 4GB VG, the lvcreate command
needs to be used followed by the name of the VG and then a size for the
LV. When an LV is created, it will create a node name for accessibility
in the /dev directory path under the volume group's name:

The example above showcases the creation of a nonredundant LV. To create
an LV with mirroring capabilities, invoke the lvcreate command with the
-m option. The example below creates a 500MB-mirrored LV:

Note that a list of all logical volumes, volume groups and physical
volumes with detailed volume information can be displayed with the
lvdisplay, vgdisplay and pvdisplay commands.

LVM2 Snapshots

Now that I've covered a brief summary of how LVM2 is structured and
managed, it's time to focus on the snapshot feature. It is worth
noting that the LVM2 snapshot feature can be used only on LVM2-managed
logical volumes. Assuming that an LV already exists, possibly the partition
for the / directory path, a second LV needs to be created for the
snapshot of the original logical volume. With regard to size, another
great feature of the snapshot is that the snapshot volume does not have
to be equal in size to the original volume. The size even can be half or
less than the original volume, allowing only that many changes of data to
be backed up. By default, LVM2 will disable the snapshot automatically if
the snapshot LV ever gets filled. The amount of storage space necessary
is dependent on the usage of the snapshot. If the snapshot size equals
the size of the original LV, it never will overflow, and snapshot service
will not be interrupted. In the worst-case scenario, if it is found that space
is running out on the snapshot, the LV always can be resized dynamically
to a larger capacity.

Define the size to allocate for the snapshot. Create the snapshot on
the desired VG by using the lvcreate command, with the size followed by
the snapshot switch, the name for the snapshot and the VG. In this example,
only 500MB are allocated for modified data. Realistically, this is not an
ideal size to use (it's too small but serves its purpose here):

If the original LV is written to, using the copy-on-write mechanism,
the snapshot will write all original data from the original volume to
the snapshot volume before it replaces the original volume with the new
data. To better understand the mechanics behind the snapshot, mount the
snapshot volume, so that it can be accessed like any other mounted device.

Here is a simple exercise to verify that the snapshot is functional: write
to the original volume—that is, modify an existing file or add/remove
a file. The original data for those files will be present on the mounted
snapshot. If a new file is added/removed from the original volume, it
will not be present on the snapshot. Note that the same logic applies if
the snapshot data is modified. The original volume will remain unaltered:

In some versions of various Linux distributions, including Red Hat
Enterprise Linux (also in the latest beta release of RHEL 6), CentOS
and even SUSE Linux, there exists a known problem when attempting
to remove or deactivate logical volumes. Unable to remove the LV, the
following error message will be returned: Can't remove open logical
volume "rootsnapshot". If dmsetup info -c
rootsnapshot is invoked
on the command line, the status of the LV will be returned and it will
confirm the error message. To work around this, use the dmsetup command
followed by the lvremove command. Confirm that the LV has been removed
with the lvdisplay command:

In some cases, it is advised to ensure that enough storage space is
allocated for the snapshot or (as discussed below) a backup directory
that will contain all of the archived snapshot data for restoring
purposes. To extend an existing volume group, a new PV needs to be
labeled. To do so, identify the physical storage device, and using fdisk,
sfdisk or parted, create the desired partition size. Verify the partition
by reading back the partition table. Then, continue to create the PV:

If at some point the PV needs to be removed from a VG, use the vgreduce
command followed by the names of the VG and the PV:

$ sudo vgreduce VolGroup /dev/sde1

If the VG is being extended for the purpose of creating a backups
directory to archive routine snapshots, following the normal lvcreate
procedure, define the name, size and VG for the desired LV. Then, format
the LV with a filesystem, and for file I/O accessibility, mount it to a
directory path:

In an event of failure or if older revisions of files need to be
retrieved, the archived snapshot can be used to restore the original
data contents. This is an extremely ideal backup strategy when
running a high-availability production environment. No downtime is
required. Although this backup does not necessarily need to be written
to a file, using the tar or dd commands, the snapshot can be written
directly to another physical storage device, including a tape drive:

$ sudo tar -cf /dev/st0 /mnt/VolGroup/rootsnapshot

Summary

LVM2 comes prepackaged with some of the more common Linux-based
distributions. In some cases, it even is used as part of the default
filesystem layout. Its snapshot feature is one of those lesser-known
treasures that really can be used to one's advantage, ranging from
personal to larger-scale environments. All it takes is a little time,
a little knowledge and a plan on design, deployment and configuration.

dmsetup(8)

dmsetup(8) is a low-level tool used to manage logical devices that use
the device-mapper driver. The LVM2 user-space toolset relies heavily on
the device-mapper kernel module and support library.

Petros Koutoupis is a full-time Linux kernel, device driver and
application developer for embedded and server platforms. He has been
working in the data storage industry for more than six years and enjoys discussing
the same technologies.