Unix Lab

The tar and jar utilities

Archiving tools

In this module we will look at two tools that are used for archiving
files. Just what is "archiving" anyway? To archive something means to save
it in some kind of long-term storage. In the computer world, this usually
means doing something like making copies of important files and storing them
on a backup device such as tape or a spare disk drive.

We will look at two archiving tools.
The first is a Unix utility called tar that has been
around for a long time. The tar utility is designed to pull together
Unix files and directories and wrap them up in such a way that they can easily be
stored.

The second is a utility called jar that is used primarily to
pull together Java class files and wrap them up in such a way that they can
easily be stored.

Uses for archive utilities

So, what's useful about being able to wrap up files and directories?

Think about wanting to save some of the work you have been developing as
part of a project for the past month. There are multiple files and directories
that you want to backup. One way would be to simply copy the files one file
at a time to a backup device such as a spare disk drive. Sometimes the spare
disk is attached to another host computer so this may require having to move
these files across a network.

Using tar allows you to create a single file that contains
everything you want to backup. This single file can be copied to the backup
device as one item with a single command even though the file may contain many
files and directories within it. These special files created by the tar
utility are called tar files.

Obviously, the original files must be modified in some way to be able to accomplish
this magic. Moreover, if you create one of these special files, you would like to
be able to recover the original files whenever you want. The tar utility
takes care of all that.

A similar utility for Java class files is also available. This utility is called
jar and its uses are similar to the uses of tar. Files that
have been "archived" by the jar utility are called jar files.
jar files have the nice property that the Java Virtual Machine can access
the class files within them without having to "un-archive" the class files. This
makes these files easy to use. In particular, they can be downloaded from a
server as one file (which saves a lot of time that would be used if each class
file had to be requested separately), and used in Web applications.

You should now see that in addition to the two java files in the project
directory you now have a file called patternBackup.tar.

Let's analyze what happened.

The tar command takes many options. (We will only look at a few.) The
c option tells tar that you want to create a tar file and the
f option (the option characters are simply clumped together but you must
not leave any space between options)
is used to name the tar file that you are about to
create (we called it patternBackup.tar).

Following the options we list the files that we want to wrap up into this
tar file: We chose Patterns.java and Patternmaker.java.

You can also use wildcards in the file names as in:tar -cf patternBackup.tar *.java

Finally, there is another option that you can use to see what tar is doing
while it's doing it. This is the v option. For example, in the previous
example suppose we add the option and type:tar -cvf patternBackup.tar *.java

What you will see is:

PatternMaker.java
Patterns.java

This shows you the names of the files that are being placed in the tar file.

Extracting tar files

Keeping the tar file in the project directory, delete the two
java files. We will now see how to extract these two files
from the tar file.

Now type:tar -xf patternBackup.tar Patterns.java

The x option tells tar to extract files from a
tar file. The f option is as before; it names the tar
file. The last parameter is the name of the file we want to extract from the
tar file.

If you issue the ls command you will see that you have now the tar
file and the Patterns.java file in the project directory.

Delete the java file again and type:tar -xf patternBackup.tar

Now use ls to see what you have. What does tar do when you don't supply
a list of files to extract?

The v option can (and should) be used with the x option? Go back
and delete the java files and use the following command to recover them:tar -xvf patternBackup.tar

tar and working with directories of files

The tar utility has the ability to store entire file subsystems of files.
In this section we will see how that works.

Start with just the Patterns.java and the PatternMaker.java
files in the project directory (delete the tar file from the
exercises above). Now cd to the directory in which the project directory is
contained. If you created the project directory from your home directory, then
you should be in your home directory.

Now type:tar -cvf project.tar project

As you know by now, the c option is to create a new tar file.
The v option lets you see what tar is doing as it does it. The
f option names the tar file being created. The last parameter: project
names a directory to be archived.

When tar sees that you want to archive a directory, it will archive every file
and every subdirectory in that directory.

What you should see when you type the previous command is:

project/
project/PatternMaker.java
project/Patterns.java

This shows you that tar archived the project directory including the
two files within it.

Be sure you have the tar file in the same directory as the project directory.
If you use the ls -F command, you should see:

Delete the entire contents of the project directory and rmdir the project
directory itself.

Now type:tar -xvf project.tar

Now issue an ls command. What do you see?
Use the cd command to visit the project subdirectory. What do
you see?

What you should see is that the subdirectory and its contents have been completely
restored.

See if tar will archive directories within directories. For example: create
a subdirectory in the project directory (call it data) and place
a simple text file in the data subdirectory.

Now go back to the parent directory of project and delete the tar
file that you created earlier. Now issue the appropriate command to archive the
project subdirectory.

Delete the project directory contents starting with the data
subdirectory and moving upwards.

Use the tar command to extract the project directory and check
the contents of the project subdirectory. What happened?

Checking the contents of tar files

Often, software for Unix systems is made available in the form of a tar file.
Before you extract the contents of the tar file, you should see what's contained
within it. One of the options to the tar command lets you do this.

Pick the tar file from the previous exercise (project.tar) and issue
the following command:tar -tf project.tar

What you should see is a table of contents of the tar file.
You can always view the contents of the tar file before you extract any
files. You may decide to extract the entire contents or just a few files.

The jar utility

The jar utility is similar to the tar utility except that it
is used for java class files. Actually, its primary purpose has been
to pull together an applet and all its support files (including image files, for
example) into a single archive file that can be downloaded all at once to a
requesting Web client. Not only does the jar utility pull together
these files, but also it compresses them so that there will be less data
to download.

Creating jar files

Assuming you have created a project directory as in the material
on the tar utility, go to the project directory and make sure it
contains the two files (only) Patterns.java and PatternMaker.java.

The first thing we will do is to compile the java files to generate the
corresponding class files:javac PatternMaker.java Patterns.java

You should now see that the two class files are now present in the
project directory.

Now type:jar -cvf Pattern.jar PatternMaker.class Patterns.class

The options for the jar command have the same meaning as in the tar
command. In particular, we are creating (c option) and viewing the process as
jar carries it out (the v option) and naming the archive file (the
f option) as Pattern.jar.

The term "manifest" refers to a special file that the jar utility places
in the jar file that contains special information about the contents of
jar file the file. We won't worry about that in this tutorial.

The other lines tell us as the jar utility compresses and includes each
of the files we named in the command. For each file, we are told the initial
size of the file (in bytes), it's compressed size, and the percentage
size decrease that this represents.

Extracting files from jar archive files

Now let's see how to recover the compressed archive files. If you understood
how to do this with the tar utility, you can do it with the jar
utility.

Start by deleting the .class files in the project directory. Then type:jar -xvf Pattern.jar

If you use the ls command you will see that the class files are back
but so is a new directory called META-INF. This is where information about
the files would be placed if we had used this feature. In our case, the
directory will contain a single file (MANIFEST.MF) with a couple of lines
of information but otherwise empty. You can ignore this file and the
directory in what you will be doing in this class.

Try using the t option to view the table of contents of the jar file.

Using jar with directories of files

Just as with the tar utility, with the jar utility we can pull
together whole directories of files into a single jar file.

Start with the project directory (as above) with just the two
.class files within it.

Now cd to the parent directory of the project directory and
type:jar -cf project.jar project

Use the -t option of the jar command to see the contents of the file:
project.jar. What do you find?

Now delete everything in the project subdirectory and delete the directory
itself (keep the project.jar file). See if you can reconstruct the
project subdirectory again. What's the command?

One of the features of jar files is the ability to use them as containers
of java packages. Java packages are classes that are logically connected.
Think of them as libraries of classes that you can download and use in your
Java applications.

In fact in the module on compiling, we used a jar file
containing a Java package called archipelago (see
the section on importing
packages).

The important thing is to place these jar files in a location
which the Java compiler and Virtual Machine will find. This is handled
with the CLASSPATH variable.