CLI Magic: Antiword

It's a simple fact of life. Most people have not yet found their
way to freedom, and instead try to make themselves as comfortable
as they can be in captivity. They suffer the burdens of the
monopoly with only an occasional wince at the thought of malware or
a curse because of their latest crash. This almost always means
their word processing is done with MS Word. We've got to learn to
deal with them, and their documents. Antiword is a very handy
little tool for doing just that. Shake the cobwebs of the GUI out
of your head and meet me at the CLI and I'll show you how to use
it.Antiword is a GPLd,
multiplatform labor of love by Adri van Os, with assistance from
others in porting and making it available on RISC OS, FreeBSD,
BeOS, OS/2, Mac OS/X, Amiga, VMS, NetWare, Plan 9, Epic, Zaurus,
DOS, and Windows. Those in addition to Linux/Unix, of course.
Antiword is still in beta, but I found it completely usable as it
is. The current release on Debian is 0.35.1.

Building and installing Antiword from developer source is not
difficult, but it is a little different than normal. If you can't
find the right binary for your distribution, download the latest
source tarball from the site. Version 0.36.1 is the release I
grabbed. After decompressing the tarball, enter the subdirectory
created by tar and type make, then make
install as a normal user. There are many platform specific
Makefiles included in the tarball, but the default in our download
is for Linux. By the way, the make install process
creates a bin directory in your home directory, and
puts the executable there.

No man pages are included with the developer release, but simply
entering antiword without any arguments produces a
handy little cheat sheet explaining how to use it. Like this
one:

As you can see, we can various options for the conversion. We
can create a straight text file, PDF, PostScript, or XML. That's a
pretty impressive range of options for a beta. But how well does it
work, that's the real question. Let's give it a whirl with some
real world documents.

I downloaded an MS Word 6 document from the Oracle web site. The
first test was to convert to plain text, like this:

antiword -t Linux_DB.doc > LDB.txt

Paging through the resulting text document, I noticed that the
graphics were missing, but other than that, the text was well
formatted and perfectly legible. Then I tried the PDF and
PostScript options (using antiword -a letter Linux_DB.doc
> LDB.pdf and antiword -p letter Linux_DB.doc >
LDB.ps respectively). Again, the images were missing, but
other than that, the conversions seemed to have worked
perfectly.

Click to
enlarge

I decided to tinker with the -i option for PostScript
conversions, and sure enough, using -i2 faithfully
reproduced the images from the original as well as the text. You
can see screenshot of the PostScript data viewed with GhostView
alongside.

Other attempts on other MS Word documents did not always result
in the images being included in the conversion. Possibly they were
created with earlier versions of MS Word, as the image feature is
only supposed to work on documents created by MS Word 6 and
later.

One last option to mention. The -s argument tells
Antiword to show any comments hidden by MS Word in the original
document. There have been a number of embarrassing
slips by various firms who have found out to late that these
"hidden" comments can be brought back to visibility by people who
were never intended to see them.

Antiword is a valuable tool when you want to see, or to print,
an MS Word document quickly, without waiting for a huge word
processing app to load itself into memory. It's not quite soup in
some ways, but I'm going to keep eye on it. When it can handle PDFs
and images without a hitch, it's good to go as far as I'm
concerned.