Plans for Noweb 3

April, 1999Norman Ramsey

Don Knuth coined the term ``literate programming'' to describe the art
of programming primarily for the human reader, and only secondarily
for the machine.
Literate programming is supported by many tools, all of which provide
some way for authors to interleave program source code with well
typeset documentation.
Most tools also support automatic or semi-automatic cross-referencing
of source code.
Only four or five literate-programming tools are widely used, and
noweb may
be the most widely used of all.
It is certainly the most widely used literate-programming tool that is independent of
the target programming language, and it was the first such tool.

Noweb emphasizes simplicity, extensibility, and
language-independence.
Noweb has the simplest markup of any literate-programming
tool, making it easy for authors to understand the tool and to create
literate programs.
Noweb uses a pipelined architecture, which makes it possible
for expert users to extend the system without recompiling and
using the programming language of their choice.
Users write extensions as Unix programs and
use command-line options to insert them into the noweb pipeline.
Users of noweb have written extensions for prettyprinting,
conditional compilation, language-dependent cross-reference, etc.
The pipelined architecture also makes it easy to support multiple
styles of documentation; noweb is unique in supporting
plain TeX, LaTeX, HTML, and troff.

Noweb is structured as a collection of C programs, shell
scripts, awk scripts, and Icon programs, connected together by Unix
pipelines.
Noweb can be difficult to install; installers may have to
work around bugs in vendors' implementations of awk, and installers
must get Icon [Available for free from the University of
Arizona] to exploit all of the capabilities of the system.
Porting Noweb to the DOS or Windows platform requires either some
effort to replace shell scripts or the purchase of a commercial shell.

Noweb's main competitor in the market for
language-independent literate-programming tools is nuweb,
whose design was inspired by noweb, but which is structured
as a monolithic C program.
As a result, nuweb is not extensible, but it is easy to port,
and it runs quickly.
Noweb can run slowly when it is necessary to fork many
pipeline stages, some of which run in interpreted languages.
[As hardware speeds have continued to increase, this speed
disadvantage is less important today than in 1997, when the plans for
Noweb 3 were being laid.]Noweb can process nuweb files, but nuweb
users continue to prefer nuweb because of its speed and
installation.

Noweb's cross-referencing capability extends to HTML; a
reader of a literate program can use a Web browser to click on an
identifier and jump to the identifier's definition (and
documentation).
This capability has proven very useful, but it is limited to single
documents.
When large programs are composed of many separately compiled modules,
it is awkward, to say the least, to process the entire program as a
single document. (Such documents may run to hundreds of pages, even
for a program of modest size, say 10,000 lines.)
Users would much prefer to browse one document per module, and to be
able to follow references between documents, but noweb does
not currently support this model.

In sum, the three improvements that noweb's users would most like to
see implemented are

Ability to make cross-references between documents.

Easier porting and installation.

Improved performance.

The Noweb 3 effort is focusing on the last two improvements---the
first improvement awaits an able student or collaborator with an
interest in thinking deeply about cross-reference.

I intend to realize these improvements by replacing the shell
scripts and Icon programs with code written in the embedded language
Lua.
I have chosen Lua primarily because
the implementation is small, clear, and simple (about 6000 lines of
C code), and it works on both
Unix and Microsoft systems.
To avoid working with a moving target, I have cloned Lua
version 2.5.
Extending it to support case statements has resulted in the language
``Lua2.5+nw,'' which I expect to be able to maintain indefinitely.
The language itself is quite clean, and
it can readily be extended to support special types and operators as
needed for functionality and performance.

Thus, to run a Noweb 3 program, you call the no binary,
written in C, which in turn calls a Lua script to weave or tangle.
This Lua script builds and executes a pipeline, which may include
C stages, Lua stages, and external stages.

Garret Prestwood has contributed substantially to the implementation
of Noweb 3.