Archiving Data with Snapshots in LVM2

Sometimes we use a technology even though we're
unaware of its full features and capabilities and how they may be able
to benefit us. One such feature is the data snapshot. The snapshot is
a single state (that is, a copy) of a storage volume at a particular point
in time. A volume can refer to a disk device or partition. The snapshot
is primarily a data backup technology. Directed toward larger storage
capacities, utilizing such a technique has advantages. For
instance, full backups of an entire volume can take a long time and also
use large amounts of storage space, even for files that will remain
unchanged for some time to come. Also, when performing a data backup
of entire volumes or subsets of volumes in a symmetric multiprocessing
environment, write operations still may continue to modify the file data
on that volume, preventing atomicity and, in turn, possibly leading to data
corruption. There are ways around the latter in which the volume can
be taken off-line or marked as read-only prior to the archival process,
but in high-availability production environments, that may never be an
option. This is where the snapshot comes in.

Used to avoid downtime and retain atomicity, the snapshot provides
a read-only copy of a specified volume at a specific point in time,
usually implemented by a copy-on-write mechanism. Some vendors and
software implementations are known to support write commands via a
concept known as branching snapshots, in which diverging versions of
data are created via an extremely complex system of pointers, all based
on the original snapshot. When you write to a snapshot or the original
volume, the write will not be seen by the other. The way this works is
when a volume marked for snapshot gets written to and data is modified,
the original and unchanged data block(s) or file data (in the case of
a file-based snapshot) will be copied to the space allocated for the
snapshot. After all original and unmodified data are copied over to the
snapshot, the original volume will be updated with the new data. When
the snapshot volume needs to be mounted, using a system of pointers,
the snapshot will reference the parent volume with the original data
saved in the snapshot. With such a technique, it now becomes possible to
archive valuable data incrementally without losing productivity or the
risk of suffering from any data corruption.

The use of snapshot technologies can be seen in a variety of environments,
ranging from external storage controllers, filesystems, virtual machines
(such as VMware, VirtualBox and so on), databases and even volume managers,
which is the focus of this article. Here, I cover the snapshot
feature found in LVM2 and how to manage it, all from the command line.

Note:

LVM2 refers to a collection of user-space tools that provide logical
volume management on Linux.

The Linux Logical Volume Manager

The second generation of the Linux Logical Volume Manager (LVM2)
is a logical volume manager capable of pooling multiple
storage devices together to represent a single volume or volumes, either in a
striped or mirrored fashion. Everything is created and managed on a layer-by-layer
basis. First is the physical volume. It is followed
by the volume group and then the mountable logical volume itself. Most
mainstream Linux distributions usually have the LVM2 userland tools
preinstalled. If you find that it's not installed on your distribution,
download and install it via your distribution's package repository.

The idea is almost similar in concept to the Redundant Array of
Independent Disks (RAID), and although LVM2 does not support any
parity-driven striping, it still adds additional value. For instance, LVM2
allows for the uninterrupted addition, removal and replacement of storage
devices. It makes for easy dynamic resizing of volume groups and logical
volumes. Most important, it supports the snapshot—the focus of this
article. As of LVM2, write operations are supported to snapshot volumes.

As mentioned earlier, LVM2 volumes utilize a layered structure—that is,
physical volumes (or PVs) must be created from a physical disk device. This
can be accomplished with the pvcreate command followed by the list of
physical partitions to label for LVM2 usage:

$ sudo pvcreate /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

With the newly labeled physical volumes, volume groups (or VGs) need
to be created with the vgcreate command, followed by a name for the volume
group and then a list of all physical volumes to use:

By default, the volume groups are located in the /dev directory path. It
is with this volume group that logical volumes (or LVs) can be created,
formatted with a filesystem, mounted and, in turn, used for file I/O
operations. The best feature of creating logical volumes is that you
can use some or all available capacity of the VG. For instance, if a 1GB LV needs to be created from the 4GB VG, the lvcreate command
needs to be used followed by the name of the VG and then a size for the
LV. When an LV is created, it will create a node name for accessibility
in the /dev directory path under the volume group's name:

The example above showcases the creation of a nonredundant LV. To create
an LV with mirroring capabilities, invoke the lvcreate command with the
-m option. The example below creates a 500MB-mirrored LV:

Petros Koutoupis is a full-time Linux kernel, device-driver and
application developer for embedded and server platforms. He has been
working in the data storage industry for more than six years and enjoys discussing the same technologies.

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.