Usage

Requirements

Author

Originally written by Philip L. Bewig, ported to CHICKEN by Christian Kellermann.

License

GPL-2 or later

History

0.1

Initial port

0.2

Add missing lload to export list, fix category in .meta

Note

The following has been converted from schemeweb's own literate sources. The conversion is close to the original HTML Document which can be found in the egg distribution (download the egg with chicken-install -r schemeweb ). It is likely that bugs have been introduced in the formatting while converting it to wiki syntax. These bugs are all Christian's and not Philip's...

Literate Documentation of schemeweb

Literate programming is the art of writing computer programs in such a way that the source code is presented clearly and engagingly for a human reader while still making it available to the compiler. Literate programming embeds source code within its technical documentation; it permits the code to be presented in an order that suits the human reader, then re-orders the code for compilation.

This note describes SchemeWeb, a literate programming system specialized for the Scheme programming language.

Literate Programming

Donald Knuth invented literate programming during the development of his TeX typesetting program in 1981 in response to a challenge from Tony Hoare to publish the source code to TeX. Knuth admits to being terrified at the prospect of publishing his code, but, drawing on the ideas of holon programming developed by Pierre Arnoul de Marneffe, Knuth developed the original WEB system that combined program code and technical documentation into a single document that provided a clear, engaging description to a human reader and simultaneously provided source code to the compiler.

Knuth suggests that the structure of a software program may be thought of as a web made up of many interconnected pieces. The documentation of the program describes each piece of the web individually, specifying how it relates to its neighbors. That description is provided in prose form, for the human reader, and in code form, for the compiler; the combination of the two forms a whole that is greater than the sum of its parts.

The prose description of the software program proceeds in an order that makes sense to the writer (and hopefully conveys that sense to the reader). In most cases, compilers have rules about the order in which they must see the various parts of a program that conflict with the order that the author chooses for explication. Thus, literate programming systems provide a means of automatically extracting the source code from the web and re-ordering it for the compiler; the program that does this is called tangle, because it takes the neatly-ordered document and jumbles it into a mess that only a compiler could understand. At the same time, a weave program reformats the web using the available tools of a typesetter, intertwining prose and code to produce pleasing documentation for the human reader of the web; the name suggests taking the individual strands from a web and using them to produce silken cloth.

SchemeWeb

SchemeWeb is a minimal literate programming system that provides two features specifically intended to simplify programming in Scheme. First, a chunk beginning with a left-parenthesis or semi-colon is automatically taken to be Scheme code; there is no need to mark the code in any way. A convenient side-effect of this feature is that a file containing only Scheme code is a valid SchemeWeb file, allowing SchemeWeb source files and traditional Scheme code files to co-exist gracefully. Second, each input file expands to a single load-object, so it may be loaded directly into a running top-level with a single command. Thus, SchemeWeb is trivially easy to adopt; just write a SchemeWeb source file where you would otherwise use a traditional Scheme code file.

Literate Scheme source files

The input file for a SchemeWeb program has extension .lss and consists of a series of chunks, which are a maximal sequence of non-empty lines separated by a maximal sequence of empty lines.

Text chunks are blocks of prose that describe the code or otherwise help the user understand the program. Text chunks are important to the human reader of a literate program, but are ignored by the compiler that processes it.

Unnamed code chunks are chunks distinguished by having a left-parenthesis or semi-colon as the first non-whitespace character on their first line. Unnamed code chunks contain Scheme source code, and are extracted into the tangled output in the order they appear in the web input. Unnamed code chunks can also contain references to named code chunks written as the name enclosed within chevrons: <<chunk name>>.

Named code chunks are chunks whose first line consists of a chevron-quoted chunk-name followed by an equals-sign, like this:

<<chunk name>>=

The line may have leading or trailing whitespace; any whitespace between the chevrons is significant. The remaining lines of a named chunk are its body.

A SchemeWeb program is converted to Scheme code by ignoring text chunks, concatenating unnamed code chunks in order, and replacing references to named code chunks with their body text. The chunk-replacement process is recursive; if a named code chunk refers in its body text to another named code chunk, the referenced chunk will appear in place of the reference. As a consequence, the code chunks form a directed acyclic graphy, rooted at the unnamed code chunk. A named code chunk may be introduced in pieces; all of the named code chunks with the same name are concatenated into a single body text in the order they appear in the input.

As an aid to the formatter, text chunks may contain display code within double-brackets, such as [[(cons obj rest)]]. The formatter treats such code by formatting it in a monospace font, as in this note. Because a chunk is only code if its first line is a named reference or if it begins with a left-parenthesis or semi-colon, it is possible to include lengthy blocks of display code in a literate program (or temporarily comment-out a block of code) by writing a block of code with double-brackets on the first and last lines, like this:

[[(multiple lines
(of)(display code))]]

which will be displayed as:

| (multiple lines
| (of)
| (display code))

SchemeWeb weaves its output into html. Text chunks are displayed in Arial, code (whether in chunks or in display) is displayed in Courier, and display chunks are offset to the right of a vertical line to show they are not part of the tangled output. Other formatting commands may be passed to the browser by embedding html tags into text chunks.

This note is itself a SchemeWeb literate Scheme source file, and contains the complete source code of the SchemeWeb literate programming system.

Tangle, weave and lload

SchemeWeb provides three user-callable procedures.

The tangle procedure (tangle input output) takes two arguments and returns a string; it may write a file or port as a side-effect. Both arguments are optional; if output is given, input must also be given. Both arguments may contain either ports or strings containing filenames. If the arguments are given as strings, the corresponding ports are opened; if the filename extension is omitted from the input filename, it defaults to .lss, and if the filename extension is omitted from the output filename, it defaults to .ss. If no input is specified, input is taken from the current input port. Output is written to a file or port only if the output argument is specified; regardless, the tangled output is returned as the value of the function.

The weave procedure (weave input) takes a single argument which may be either a port or a string containing a filename. If the argument is given as a string, the corresponding port is opened; if the filename extension is omitted from the input filename; it defaults to .lss. Weave writes output to a file whose name is constructed using the same basename as the input file and extension .html, and returns no value.

The lload procedure (lload filename) takes a single string argument and returns nothing; it loads the Scheme source code in a SchemeWeb literate Scheme source file into a running Scheme top-level as a side-effect. If the filename extension is omitted from the input filename, it defaults to .lss. Lload calls tangle to extract the Scheme source code from the .lss file.

The weave procedure provided by SchemeWeb is considerably less featureful than the weave function of other literate programming systems such as CWEB and noweb. As described in his Usenet posting cited in the references, the author believes that most working code is never published outside the text editor where it is written, so fancy weaving is seldom needed. Other weaves provide such features as code pretty-printing, indexing of variable and function names, and cross-referencing of chunk definitions and usages, but when the code is viewed from within the programmer's text editor, a simple search replaces all the indexing and cross-referencing features, and pretty-printing only causes confusion by making the on-screen and printed versions of the program look different. In fact, the author believes that the point of diminishing returns is quickly reached when adding features to weave, and it is unlikely any of those other features will ever be added to SchemeWeb. In fact, the author disclaims any responsibility for weave on the grounds he never uses it.

The SchemeWeb implementation

SchemeWeb provides the three procedures tangle, weave, and lload. It also uses many local procedures, but these are defined within the three exported procedures to avoid namespace pollution. The three procedures are described below.

Tangling works in two phases. First, the input is read, and code chunks, both named and unnamed, are stored in a dictionary; that work is done by the call to build. Then, a recursive process performs depth-first search through the code-chunk call-graph, writing code as it proceeds, starting at a chunk named "" (the unnamed chunk); that work is done by the call to tangl. The dictionary is implemented as an association-list; a-lists have linear time complexity, but unless the input is very large, that will have little effect on the actual running time of the program.

The recursive tangling phase

Tangl is the recursive function that performs depth-first search through the code-chunk call-graph, writing output as it goes. The four arguments to tangl are the name of the code-chunk to be tangled, the a-list that stores the code-chunk dictionary, the indent that preceeds each line, and the port on which output is written. The indent is interesting; tangle works hard to make its output look nice. Tangl first looks up the code associated with the named chunk, then loops through each code line, parsing it to look for calls to additional code-chunks. If it finds a chunk-call, tangl calls itself recursively to expand the code of the called chunk, being careful to set the indent.

Tangl calls parse-line to extract chunk-call references from code lines. Parse-line returns three values: the prefix that preceeds a chunk-call reference, the call-name of a chunk-call reference, and the suffix that follows a chunk-call reference. If the code line has no chunk-call reference, the call-name and suffix will be empty strings.

The dictionary building phase

The build function takes an input port and returns a dictionary. Build loops over each chunk on the input port until it finds a null? chunk indicating the end of the input. Code chunks, whether named or unnamed, are added to the dictionary; text chunks are ignored. Build calls get-name to extract the chunk name from the first line of named chunks.

The add-dict function adds a chunk to the dictionary. It loops through the dictionary a-list, looking for a chunk with the appropriate name; if none exists, it creates a new entry in the a-list. Add-dict calls dedent to remove leading space from the chunk, so that SchemeWeb authors can indent code chunks to make them more easily visible, if desired.

The dedent function takes a list of strings and removes leading whitespace they have in common. Dedent looks at the first character of each string in the list, and if all strings have the same first character, each string will be shortened; dedent continues to call itself recursively until all leading whitespace has been removed.

Weave

Weave takes a single argument, the name of the .lss SchemeWeb literate Scheme source file. The mainline code of weave opens and closes needed files and calls weeve to do the actual work of creating output.

Weeve takes the base-name of the file being woven (which it writes to the title of the .html output file, the input file, and the output file. It writes a header and footer. In between, it loops over each chunk, classifies it, and calls the appropriate function to write it to the output.

Functions that write chunks

There are four functions that write chunks, one for each type of chunk. The three that write code change the font face to courier, but are careful to set the font face to arial before they leave, because that is the default font used to display text. These procedures do all the setting of fonts and writing of newlines, but call other functions to actually write code and text.

Functions that write code and text

These functions actually write code and text. They are called in various contexts by the four functions that write chunks, and are careful to never write a newline, because various contexts may require that they are called within a line, not to write an entire line.

These functions call a utility function quote-html that changes literal less-than, greater-than and ampersand symbols to their html equivalents. Note that the ampersand replacement must come first because it is part of the replacement text of the other symbols.

Port-handling functions

These functions open input ports and output ports. The parameter may be either a port or a string. If it is a string, the appropriate port is opened, perhaps using a default extension. If it is a port, it is returned unchanged. These functions use several non-R5RS extensions &mdash; file-exists?, delete-file, and error &mdash; which exist in most Scheme systems.

Function string-index finds the first occurrence of a search string in a target string, starting after a specified position within the string. It returns the index within the search string where the target string begins, or #f if the search string does not appear in the designated portion of the target string. The algorithm is simple brute-force search.

Function string-replace examines a target string, replacing all occurrences of a search string with a replacement string; it returns a newly-allocated string containing all the replacements. If the search string is not found within the target string, no replacements are made, and the original target string is returned.

Functions that read ports

SchemeWeb is concerned with chunks that consist of lines. The following two functions read lines and chunks (called paragraphs) from an input port. Note that read-line is careful to handle lines terminated by a carriage return, a line feed, or both, so it can be used on a variety of operating systems.

Obtaining and installing SchemeWeb

SchemeWeb is available from pbewig.googlepages.com The SchemeWeb literate Scheme source file is available as schemeweb.lss. The woven version of SchemeWeb is available as schemeweb.html. The tangled Scheme source code is available as schemeweb.ss. To install SchemeWeb, obtain schemeweb.ss, copy it to a convenient directory, and say (load "schemeweb.ss"); this is most usefully done in a standard prelude loaded each time the Scheme system is started.

References

Philip L. Bewig, The Essence of Literate Programming, comp.programming.literate, May 27, 1996. Describes the chunking mechanism as essential, and suggests that plain ascii text, not fancy formatting, is sufficient for literate programmers who view their work from inside a text editor rather than in print.

Donald E. Knuth, Literate Programming, Center for the Study of Language and Information, 1992 (ISBN 0937073814). Reprints many of the early papers by Knuth and others that define and describe literate programming.