Understanding CPIO

In the previous article, I demonstrated the usage of the tar archiver
utility. This week I'll continue by introducing the cpio archiver
utility.

While both tar and cpio will achieve the same results, the cpio
utility approaches things a little bit differently. The tar utility
assumes that you want to recursively archive everything under
the specified directory or directories, meaning that you have to
explicitly tell tar if you want to exclude certain portions of that
directory structure. In contrast, the cpio utility expects to be
explicitly told which files or directories you wish to archive; this
behavior is commonly referred to as "receiving from standard input." In other
words, cpio expects to receive a list that contains one file per line, and if
you remember from the "Finding Things in Unix" and "Find: Part Two" articles, that is exactly the type of list that the find utility creates.
The ls utility can also create this type of list, meaning that you will see
either of the ls or the find utilities used in conjunction with cpio. And
since cpio archives a list of files it receives from standard
input, you usually use a pipe (|) whenever you create an archive with the
cpio utility.

The tar utility also assumes that you want to write the archive to
your first SCSI tape drive, unless you explicitly specify a file using
the f switch. In contrast, the cpio utility writes to what is
known as standard output. This means that you will be using a redirector
(either < or >) whenever you are creating, listing, or extracting a cpio
archive file. Again, that file may be an actual file, or it may be your
floppy, or it may be a tape device, since in Unix everything is a file.

This may sound a bit more complicated at first, but a few examples should
convince you that it really isn't.

Let's start by creating a cpio archive. In the last article, I created a
test user account and created a directory structure named www in this
user's home directory so I would have some files on which to practice using the
archiving utilities. I'll log in as the test user, cd into the www directory,
and see what happens if I use the ls command with the cpio utility:

cd www
ls | cpio -ov > backup.cpio

You'll note that I first cded into the directory that contained the
files I wished to archive. I used the ls utility to make a list of the
files in the current directory and used a pipe (|) to send that list to
the cpio utility. The o switch invokes what is known as "copy out mode," which tells cpio to create an archive. The v switch tells cpio to be verbose, meaning it will list each file as it archives it. Finally, I used the > redirector to write the results (the archive) to a file called
backup.cpio. I can call this file anything I like; I chose to give it a
cpio extension to remind me that it is a cpio backup file. I can verify the
file type using the file utility:

file backup.cpio
backup.cpio: cpio archive

Instead of using the redirector, I could have also used the F switch to
specify which file to write the archive to. So the following command will
achieve the same results:

Once the archive was created, cpio told me how many blocks it wrote to
the archive; in my case, it was 48 blocks.

So to create an archive, use the o switch or copy-out mode. To either view
or extract the contents of the archive, use what is known as "copy-in mode." You invoke this mode by using the i switch. If you just want
to view the contents of the archive, also include the t switch, which
will list the contents of the archive without extracting them:

cpio -it < backup.cpio

You'll note that this time I used the other redirector (<), as I wanted
the contents of the backup.cpio file to be sent to the cpio utility. I
can also include the v switch, if I want to see a verbose listing of
the backup:

cpio -itv < backup.cpio

Remember that it is important to view the contents of an archive before
attempting to restore it, as you want to ensure that the files don't begin
with a /.

To restore this archive, I simply cd into the directory to which I'd like to
restore the archive, and repeat the above command without the t
switch. I'll cd back into my home directory and create a directory named
backupand do the restore there:

cd
mkdir backup
cd backup
cpio -iv < ~/www/backup.cpio

You'll note something interesting if you try this exercise yourself; if
you use the ls -F command, you'll see that you did indeed restore all of
the files and directories that were in the www directory. But if you cd
into any of those subdirectories, you'll note that they are empty. Even
more interestingly, if you try to remove any of those subdirectories, you
still have to use the R switch, as they are still valid directories.

What happened here? Since the cpio utility received its file list from the
ls utility (and the ls utility can only list the files in the current
directory), cpio was unaware of all of the files that existed below the
current directory. Remember, cpio will only archive the files that are
sent to it in a list. This may seem odd at first, but it is an ideal way to
archive just the files in the current directory. In order to do this with
the tar utility, you would have to create an exclude file, as tar wants
to recursively copy everything in and below the current directory.

This doesn't mean that cpio can't archive recursively; it simply means
that if you want to just archive the current directory, you use ls and if
you want to archive recursively, you use find instead.

Let's try that backup and restore again, this time using the find
utility. First, I'll remove the old backup and empty out the backup
directory:

rm www/backup.cpio
rm -R backup/*

Then I'll cd into the directory I wish to back up (www) and archive its contents:

cd www
find -d . -print | cpio -ov > backup.cpio

When using the find utility with cpio, it is always a good idea to
include either the d or the depth switch. Remember from the find
article that this switch prevented permissions from interfering with a backup.
When using this switch, either put -d right after the word find and before
the directory to search (in this case, "."), or put the word -depthafter
the directory to search, like so:

find . -depth -print | cpio -ov > backup.cpio

So as a recap on the find command, I told find to search the current
directory (".") and to "print" its contents; the | was used to send those
contents to the cpio utility, which created an archive (-o) and wrote
that archive to a file called backup.cpio. When I created this archive,
I noted that cpio wrote 43097 blocks, which is many more than the 48 I
received with the ls command.