Writing Man Pages in HTML

HTML is a cool way to look at the Linux man pages. Here's how to do it.

The Web invasion of the Internet
continues at a breakneck pace. In the space of just a few years
HTML has become one of the most widely supported document formats.
Currently, there's gold-fever driven effort by a cast of thousands
to reformat older sources of information for presentation on the
Web. The required technology is relatively simple to understand,
and is inexpensive to assemble. Linux is an ideal development and
delivery environment: low cost, reliable, and necessary software is
included in several competing Linux distributions.

In this article I'll discuss a solution I've written for
serving up old documents in the new medium—automated translation.
The documents to be converted are the Unix man pages. Man pages are
highly structured documents in which major headings and references
to other documents, are easily recognized. The files making up the
man page system are already organized into a rigid set of
hierarchies using a formalized naming system; thus, it is easy to
deliver the entire existing man system and document format via the
Web without reorganization of content or overall structure. I've
assembled some pieces of software that translate the old Unix
manual format into HTML while preserving the old style and
organization. The technologies present in the Web allowed further
enhancements: documents can be cross-linked; alternate forms of
indices can be automatically generated; and full text searches are
possible.

I've called the package I've assembled vh-man2html. It is
designed to be activated as a set of CGI (Common Gateway Interface)
scripts from a web server—the same technology that drives HTML
forms. This means that vh-man2html can be used to serve man pages
to other hosts on your LAN or on the Internet. The principle
component of vh-man2html is Richard Verhoeven's man2html
translator, augmented by several scripts to generate indexes and
facilitate searches. vh-man2html also has some supporting scripts
that allow you to drive Netscape from the Unix command line, so
that vh-man2html can actually replace “man” on systems that
include Netscape.

What Does It Look Like?

If you have access to the Web, you can see vh-man2html in
action at:

http://www.caldera.com/cgi-bin/man2html

where the man pages for Caldera 1.0 are available on-line.

Figure 1: Main vh-man2html Web Page

Figure 2: Man Page Generated by vh-man2html

Figure 3: Portion of Name-Description Index

Figure 4: Portion of Name-Only Index

Figure 5: Full-Text Search Result

Figure 1 shows the main vh-man2html web page which provides
direct access to individual man pages or to three different kinds
of indices: name-description, name-only and full text search. In
the main window you can enter the man page name, man page name and
section number, or narrow things down even further by specifying
the hierarchy or full path name.

Figure 2 shows a man page generated by the man2html
converter. The converter has translated the man formatting tags to
approximately equivalent HTML tags. It has also created an HTTP
reference and links for any references to other man pages. The
converter also generates a subject heading index, which is useful
when reading larger man pages. Text highlighting and font changes
are correctly translated, and if tables are present, they will be
translated into HTML tables. At this time, man2html doesn't
translate eqn described equations, but since very few man pages use
eqn, this is not a major drawback.

Figure 3 shows part of a name-description index for section 1
of the man pages. The index includes an alphabetic sub-index and
links into the other sections of the manual. The name-only index in
Figure 4 is similar to the name-description index, except that it
is more compact.

Figure 5 shows the result of a full text search for pages
containing a reference to “cdrom”. Links are generated to each
page matched.

These figures illustrate one of the key advantages of serving
up the man pages via the Web: a variety of access paths can be
presented in an integrated form—a one-stop-shop interface.