wvWare

The wv library itself

Download

The general overview of wv can be found on the home page. This section relates some
more detail as to the fileformat. Firstly word documents from version 6 upwards are stored in an ole2
fileformat wrapper. And the internal word format is enclosed inside this. There are two subtypes of
word fileformat in each main format, fast and full save. Otherwise known as complex and simple
format. The complex format is quite difficult to implement correctly, but I believe that wv has
achieved this correctly. The reasoning behind the creation of something as monstrous as the fastsave
fileformat by Microsoft is unknown to me, but it is very awkward.

What do you need

All that is required is the source
but..
If you want to be able to handle embedded wmf files (which you do), then you need
to have the following installed

You need to have libpng installed. If Imagemagick was not found then wv will attempt to find and use png itself, if
Imagemagick was installed wv will use it instead and hope that it was linked against png, if this turns out to be false you
should reinstall imagemagick with png support, or failing that install png and run wv's configure as ./configure --without-Magick

In general after you install zlib, png and ImageMagick, then you just have to do is
./configure
make
The INSTALL file in the distribution has all the building details you need to know.

Charset Conversion

The text of a word document in word 8 is often stored in unicode, wv will convert this
to utf-8 so that it can be read in netscape and other modern browsers. In older word
documents (and under certain conditions in word 8) the text is stored in one of the windows codepages.
By default wv will promote this text to unicode and convert it to utf-8

Some users dislike utf-8 as an output format and wish to convert it to different output
formats such as koi8-r and the standard iso-8859-1.

wv contains an internal charset converter which can promote all windows codepages to unicode
and can convert unicode to

utf-8

iso-8859-15

koi8-r

tis-620

(read the wvHtml manpage to see if any others have been added)

wv will always be able to do the above conversions, but during configure if wv finds that your
system has an iconv implementation which can do the above conversions then wv will be
able to use all the other conversions that your iconv can handle. In practice this only happens if
you have glibc2.1 and above on your system (redhat 6.0 and above). In this scenario you have
a multitude of conversions from unicode to many other character sets. Experiment with the
iconv program that is on your system if you have iconv support.