Zip Anywhere

Downloading and browsing lot of stuff from Internet is one part, while storing
and retrieving relevant information from it is another. Retrieval is tough and
needs more thought, but storage is simple and hence should be automated. The
usual model of storing the downloads is to have a separate folder for each topic
and keeping these folders inside one 'downloads' folder. (Usually I also prefix
with an underscore '_', to make it appear first when sorted).

Once while trying to burn a CD of some 600mb of downloads (web pages and software), I faced with an interesting problem. CD formats are quirky and some
times incompatible. Come to think of it, it is natural. CDs usually
transcend the OS borders. We naturally like to copy from MS Windows, and maybe
run it in Linux (assuming they are all documents of HTML or PDF format). If so,
what should be the format of the CD? For this portability reason, CD's have
their own file structure standardized by ISO. This standard restricts the
number of sub directories that can be created (something like 8 sub folder or more),
and the maximum size of the file name (usually 32 characters). Of course there
are MS Windows friendly formats like 'Joliet'. But these too have restrictions.

The first time I burned a CD, I was oblivious to these nuances. So when I went
ahead with the burning of my 600mb downloads, I faced an harangue of insults from
my CD burning software. Being in a hurry, I looked around for some tool, which
can rename all the long file names with small meaningful ones. Obviously this was
a doomed search, moreover how am I to handle my deep folder structure? I looked
for a solution with anxiety as I have to leave that place shortly. Also being
conservative, I wanted to comply with basic CD rules, so that it will be readable
in most machines.

Then it struck, Why not zip each folder before burning the CD ?!
This was indeed a good solution, as it elides both my long file name problem,
and the deep folder problem. In fact, from then on it has become a constant
practice to zip before archiving into a CD. Of course, it is prudent to have
small zips, so make lot of root level folders, instead of sub folders (usually
I prefix the grouping name for the flattened folders, like in the picture).

But this means I need a flexible zipping solution within easy reach. I tried
to use the command line tools of WinZip, but it lacked a way to dynamically
specify all folders inside a folder. I tried Apache Ant, but that too did not
have an handy way to iterate over the sub folders inside a folder. In fact if
these tools provided a way to iterate over files in all the sub folders, we will
have a potent tool (maybe this should be the next goal). Anyway, not having a
canned tool for iterating over the file system, I'm forced to use programming.
In Java, this means I needed a way to zip programatically.

Java Zip API

The java.util package has two sub packages for this purpose.
java.util.zip and java.util.jar. For those who don't
know, Jar files are just pure zip files with some optional extra meaning for a
special folder (META-INF) and few special files ( typically
manifest.mf). The javadoc says,

JAR file is a file format based on the popular ZIP file format and is used
for aggregating many files into one. A JAR file is essentially a zip file
that contains an optional META-INF directory... In many cases, JAR files are
not just simple archives of java classes files and/or resources...
The META-INF directory, if it exists, is used to store package and extension
configuration data, including security, versioning, extension and services.
J2SE 1.4.1 - JAR File Specification

So we just need to concentrate on the zipping, and other things are easy. But,
jar package does have some cute methods to associate a compressed file with an
entry in the manifest file. So, zipping is not everything in the Java world!
Anyway, for our archiving example above, zipping and file system recursion is
more than enough.

Recursion over sub folders

Recursively iterating over the file system is pretty simple and an useful task.
This is possible as Java treats both a file and a folder as a
java.io.File. The idea is just to check if the given File
object isDirectory(), and if so loop over that directory/folder
and recursively so on.

The recursion logic used in the above method can be effectively used for zipping
a given folder or a file. If it is a folder, do nothing. If it is a file,
you write it to the java.util.zip.ZipOutputStream.

Zipping a file

Zipping a file involves two steps: One, to put a zipentry, which is just the
destination file path to be recorded inside the zip file (you can play tricks
here!).

//Close the outermost stream.//It takes care of the inner ones.zos.close();

UnZipping a file

Unzipping again is the reverse of these two steps: One, get the
ZipEntry from the zip file.
And two, read and write the content from the zip file to the filesystem.
This reading, also decompresses it. Of course writing the file content, based
on its zip entry needs some File manipulation. Also, since
all these are based on InputStream and OutputStream,
you read the compressed stream from anywhere and write it anywhere.

For a long time I was procrastinating to write this tool. But when Venu
asked for a neat task that can be useful, I gave this. Venu, though a new comer
to Java (hardly two months now), but since he was strong in C++, completed this
tool within few days, working just in his spare time. Thanks to him, this
article and the zip tool
(Tattvum-20030727-ZipUnZip.zip)
is before you.

To test this tool, run it in a test folder which contain one or more non-empty
folders, and then use it for your regular archiving. Please note that this
is a very early release, but it works fine with the 300mb+ downloads!
Known bugs are, it doesn't handle empty folders (it stops), and unzip needs some
tweaking to work with zips created by WinZip. Comments and bug reports are
welcome.