Understanding Archivers

In the next few articles, I'd like to take a look at backups and archiving
utilities.

If you're like I was when I started using Unix, I was intimidated
by the words tar, cpio and dump, and a quick peek at their respective man pages did not alleviate my fears.

So I quickly convinced myself that I really didn't need to learn how those utilities worked. After all, I didn't even own a tape drive on my home FreeBSD system. Yes, I knew that backups were really, really important, but surely I could just copy the files I needed as I needed them.

I've since learned that copying files is actually the hard way to do a
backup and is not particularly conducive to me backing up everything I should
on a regular basis. In today's article, I'd like to introduce the concept
of archiving, which archiving utilities are available, and some of the
differences between the archiving utilities. In the next few articles, I'll
continue by demonstrating the usage of each of these archiving utilities.

I'm currently logged in as the user "dru". I'll cd into my home
directory and take a look at its contents:

You'll note that the last modified time for the file time.pl has been
changed to the time I made the recursive copy, rather than the last
time this file was actually modified, which was back in February.

This may or may not be a big deal to you if you are only interested in backing up
some files in your own home directory. However, this could certainly cause
confusion if this was the backup solution for larger portions of your
FreeBSD system.

There are other considerations when using cp -r to backup files.
What if I wanted to backup files for several users? I would probably
do the backup as the superuser. Let's see what happens if I repeat that
copy, but this time as the superuser:

You'll note that both the backup directory and the time.pl file are owned by the user who did the copy, in this case root. This situation could have
been avoided if I had remembered to include the p switch to preserve the
original permissions.

Just imagine the nightmare if I had backed up each user's
home directory as the superuser using cp -r; I would have to readjust the
ownership and possibly the permissions of any file that needed to be restored, plus the original file modification times would still be unknown.

If that's still not a big deal to you, consider how I would backup my
entire home directory using cp -r. I do NOT want to do it this way, even
though it seems logical enough:

cd
cp -r . backup

If I do try to do this, my hard drive will churn for an eerily long period
of time before giving me an error message that includes several screens
worth of the word backup and something about the name being too long. This
is because the cp command will go into an endless loop if your
destination happens to be in the same directory or a subdirectory of the
source you are backing up. It will copy backup to backup/backup to
backup/backup/backup and so on until it runs out of space.

So how would I backup my entire home directory? This is where things start
to involve a bit more work and I start to get the gnawing suspicion that
there has to be an easier way to accomplish this. This will work:

but will quickly become time-consuming and inconvenient as the number of
files in my home directory continues to grow. I could get a bit fancier by
coming up with wildcard expressions that represent all of the files and
directories in my home directory, but I would still be doing things the
hard way.

This is where the concept of archiving and utilities that were designed to
do archiving come into play. So what exactly is an archive? It is a file
containing a collection of other files in a structure that preserves the
contents, permissions, timestamp, owner, group, and pathnames of the
original files so they can be reconstructed at a later time. In other
words, archiving utilities can copy all of the files and subdirectories
within a directory and then recreate that original directory structure without
losing any permissions or modification times along the way.

This is actually even more interesting once you realize that there are
devices that don't even know what a filesystem is or how to read a
filesystem hierarchy. We are used to thinking of our files living in a
filesystem hierarchy. For example, my time.pl file is a file that lives in the
perlscripts directory which is a subdirectory of my home directory (dru) which
is a subdirectory of the home directory which is a subdirectory of the /usr
filesystem, or:

/usr/home/dru/perlscripts/time.pl

Any device that can contain a filesystem and therefore understand a
filesystem hierarchy is known as a block device. The hard drive that
contains your FreeBSD operating system is an example of a block device.

However, there are devices that do not understand what a filesystem
hierarchy is. Consider how a tape device works. When you write data to
a tape, your characters are simply passed to the tape one after the other,
or sequentially. There is no filesystem, or any concept that the file time.pl
belongs within the perlscripts directory. Such devices are known as
character devices and are often called "raw."

Archiving utilities can backup to either a block or character device.
The archive file itself contains all of the information required to
recreate the original file hierarchy; that information is saved along with
your data. This means you can backup your data to a character device such
as a tape drive, and then later restore your data to a block device such as
your hard drive.

There are several archiving utilities that come with your FreeBSD system.
I will be covering tar, cpio, pax, dd, and dump/restore. Let's
see what the whatis command has to say about each of these utilities:

Note that tar, cpio, and pax are considered to be archivers. We'll
see that tar is easiest to use when you want to backup entire directory
structures. In contrast, the cpio utility is the easiest command to use
when you want to pick and choose which files to backup. And the pax command is a combination of both these commands with a bit of added
functionality thrown in.

The dd utility is interesting -- it can actually convert files as it
backs them up. We'll see that this can be invaluable, say, when backing up
files from a PC to a SPARC. Finally, the dump command is designed to
backup an entire filesystem, not just a directory structure.

I want to discuss a few more items, though, before we start using each of
these commands. Most of these commands assume that you will be backing up
to a SCSI tape drive but will let you change this default with a switch.
Even if you don't have a tape drive, it is useful to understand the naming
syntax your FreeBSD system uses for tape devices.

Like other Unix systems, FreeBSD stores information regarding devices in
the /dev directory. Let's do a long listing of the first few files in
this directory:

Notice the difference in the fifth field of that long listing. The first
few files indicate their size in bytes -- for example, the file MAKEDEV is
43405 bytes in size. However, the last five files have a "117," or "116," instead. Note that these files are also character devices; you can
tell this as their file mode is c (just before their permissions).
Directories have a file mode of d and regular files have a file mode of -.

The device files in the /dev directory are really just pointers to a
driver contained in the kernel for the device that each device file represents.
This means that these files are really empty, they are just pointers. The
value in what is normally the size field of ls -l represents a
"major_number,minor_number. " For example, the device file acd1c has a major number of 117 and no minor number. The major number indicates which driver
should be used; the minor number gives any additional information about
the device to the driver.

The MAKEDEV file in this directory is really a shell script used to make the device files. If you want to find out what a device file
refers to, read the comments at the beginning of this file. For example,
to see which devices refer to tape devices, I'll search this file for the
word tape:

more /dev/MAKEDEV
/tape

And I'll find that the following tape drives are supported on my FreeBSD system:

Each of these has an associated man page which you can read if you have one
of these tape devices.

If I look for these devices in the /dev directory, I'll note that they
usually come with some additional letters:

ls /dev | grep wt
nrwt0
nrwt0b
nrwt0c
nrwt0d
rwt0
rwt0b
rwt0c
rwt0d

Most tape devices (but not all) will include the letter "r" indicating
that they are a "raw" or character device. By default, after you backup to
a tape device, it will rewind; meaning your backup will be overwritten if
you do another backup to that tape. To prevent this default behavior, use
the device that includes the letter "n" for no rewind.

Occasionally, a device will also include an "e," meaning that it will eject the tape once the backup is complete.

The last thing I want to mention in today's article is the difference
between absolute and relative pathnames. Since an archiving utility will
save the pathname of a file and use that pathname information when
recreating the file, it is important to know the difference between the two
types of pathnames.

If a pathname begins with a / it means it is an absolute pathname. This is usually considered to be a bad thing in a backup as you will only be
able to restore that file to the original directory it came from, meaning you
will lose any changes you've made to that file since you backed it up.
Even if you are in a different directory when you restore that file, it
will still restore that file to its original location.

If a pathname begins with ./ or no / at all it means it is a relative
pathname. This is usually considered to be a good thing in a backup as the
file can be restored anywhere. You simply cd to the directory you want
to restore the file to, and the archiver will add the current directory to
the pathname as it restores the file.

In next week's article, we'll continue this series by demonstrating how to
use the tar utility.

Dru Lavigne
is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.