cpio

In this month's column, Eric moves beyond find to cover duplicating files and directory trees using the versatile cpio command. cpio uses space on tape more efficiently than tar and is an excellent alternative for creating archives on platforms that do not have the GNU utilities available. Read on for a thorough discussion of cpio and its three modes of operation: Pass-through, Create and Extract.

Create Mode

Create mode creates archive files. (This is also referred to
as “copy-out” mode.) cpio accepts a list of file names, just as
it does in pass-through mode. But instead of creating duplicate
files in another area, it creates an archive and sends it to
standard output.

Since it is sent to standard output, the archive can be
redirected to any device or file such as a tape, diskette, or
standard file.

$ find -depth /export/home \
| cpio --create > /dev/fd0

This creates an archive of the /export/home directory tree on
the floppy drive at /dev/fd0. Of course, the /export/home area
probably won't fit on one floppy, but cpio prompts for another
device or file name when each floppy is filled, so it can be
replaced, and the user can type the device name again. (Note that
find's -depth switch is still
recommended to prevent possible problems when the archive is
extracted.)

When it comes to creating archives, cpio has many options.
One of the most important is the format of the archive.

bin(default) the binary
format encodes files in a non-portable method. Therefore, it is not
suited for exchanging files between Linux on a PC and Linux on
other architectures such as Alpha or Power PC.

odcold (POSIX.1) portable
format. This is portable across platforms, but is not suited for
file systems with more than 65536 inodes, which means most of
today's larger hard disks.

newcnew portable format.
This is portable across platforms, and has no inherent limit on
number of inodes.

crcnew portable format, with
a checksum added.

tarcompatible with tar, but
only supports file names up to 100 characters.

This uses the crc format for the archive and prompts the user
with Insert next disk and type /dev/fd0 as each
floppy is filled. The --message option, which
works in both create and extract mode, replaces the default
message.

There are many other options available for the creation of
archives, which I will cover later.

Even though GNU tar does have many of the advantages of cpio,
the ability to use find to specify the files to be backed up
provides much more flexibility than shell wildcards. [You can do
this with tar, too, but you have to send the output of find into a
file and use that file as an “include file” for tar—ED]

Extract Mode

Extract mode (also referred to as “copy-in” mode) extracts
files from archives. This mode is inconsistent with the other two,
since file names are specified on the command line, instead of via
a list on standard input.

$ cpio --extract < /dev/fd0

This command restores all of the files from the archive in
/dev/fd0, since no file names were specified. If the archive spans
more than one volume, cpio will prompt for each volume the same way
it does when archives are created. The --message option can be
used to override the default message, as in create mode.

cpio automatically recognizes archive formats during
extraction, so it is not necessary to specify them on the command
line.

The path passed to cpio by find is stored in the archive.
Therefore it is important to pay attention to how find is
used.

$ find . -depth | cpio --create > /tmp/archive

This creates an archive that extracts into the present
working directory.

$ find /export/home -depth | cpio --create \
> /tmp/archive

This creates an archive that will try to extract to
/export/home, regardless of the circumstances. If the
-d option is specified the directory is created
if it does not already exist. (If /export/home does not exist and
-d is omitted, the extraction will fail.)

Anything specified on the command line that is not an option
is treated as a filename pattern.

$ cpio --extract "back" < /dev/fd0

This will extract files in the archive that have
back in their name. No other files will be
restored. Multiple patterns can also be specified.

$ cpio --extract "back" "save" < /dev/fd0

This will extract files with “back” or “save” in their
names.

In addition to providing patterns on the command line, they
can be provided as lines in a file. The file is specified with the
--pattern-file=filename
option. This provides a lot of flexibility in restoring files,
since the actual path does not have to be known and wildcards are
not needed. Frequently restored patterns can be stored in a
file.

The --nonmatching option is used to
specify files not to extract.

It may help to see the contents of the archive before
extracting anything from it.

$ cpio --list < /dev/fd0

The --list option lists the contents of
the archive. The option --numeric-uid-gid forces
the list to show user and group IDs numerically, instead of trying
to resolve the names with the passwd and group files.

Instead of standard input and output the archive can be sent
to (or extracted from) a file.

$ find /export/home -depth | cpio --create \
--file=/vol/archive

This option works either for creating or extracting archives.
To use a remote tape drive specify the hostname and user name
before the filename. (The user must have access to the remote host
without a password. This can be done by using the file
.rhosts)

This will copy the current directory to /vol/copy while
copying the modification times on the old files to the new and also
leaving the access times on the original files untouched.

The default action for cpio, when operating in copy-in
(extract) or pass-through mode, is to prompt a user for
confirmation before writing over existing files, if the existing
file is newer. By default, cpio will not replace the existing
files. The --unconditional option overrides that
behavior:

$ cpio --extract --unconditional "back" "save" \
< /dev/fd0

The --dereference option copies the file
pointed to by a symbolic link, instead of the link itself, in
archive creation and pass-through mode.

The --rename command will prompt the user
to interactively rename each file. This only works in extract
mode.

When acting as a system administrator, it is sometimes useful
to restore an archive or duplicate a directory and change the user
or group id of the target in the process.

$ cpio --extract --owner=eric.staff < /dev/fd0

This will restore the archive on /dev/fd0 and set the owner
of all the extracted files to eric and the group to staff. Only
root may use this option. If the group is left out, it will not be
changed unless the .
is included, in which case the group will be set to the user's
login group.

Another option related to file ownership is
--no-preserve-owner. This is the default
behavior for non-root users. Files will belong to the user copying
or extracting them, instead of the original user. For root the
default is to preserve ownership.

There are also advanced options related to transferring data
between big-endian and little-endian architectures and for
controlling I/O buffer sizes to optimize performance.

The -depth option to find ensures that directory names are output after the names of the files in them, not before. In combination with the --make-directories (or just -d) and --preserve-modification-times (or just -p) options to cpio, this results in cpio preserving the original modification time of both files and directories.

This works because cpio will create a directory automatically while writing the files inside it; only after it is done writing all the directory contents does it visit the directory itself to set its attributes, which includes resetting the modification time.

You are missing a couple other important options, though: the -print0 option to find and the --null (or just -0) option to cpio cause find and cpio to write and read the list of filenames terminated by a null character instead of a newline. Since most Linux filesystems allow names to contain nulls, this is important to properly archive such files and avoids doing something very bad with a file named like:

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.