Using the
Library of Common Code

Abstract

The architecture of the Library of Common Code
is both flexible and open which makes it usable in many different
contexts. This paper describes some important practical aspects of how
to use the Library and what is provided through the current API for
threads, streams and other basic concepts in the Library. The
description contains a set of examples and is based on version 3.1 which
is to be released June 1995. It is furthermore the intention that this
paper can be used as a basis for discussions on how to improve the API
for the Library of Common Code. For a more detailed discussion on the
architecture of the Library and a full listing of the functionality and
various class definitions provided, the reader is referred to the
Internals and Programmer's Guide and the paper
Towards a Uniform Library of
Common Code.

The Common Code Library
is a general code base that can be used as a basis for building a large
variety of World-Wide Web applications. Its main purpose is to provide
services to transmit data objects rendered in many different media types
either to or from a remote server using the most popular Internet access
methods or the local file system. It is written in plain C and is
especially designed to be used on a large set of different platforms.
Version 3.1 supports more than 20 Unix flavors, VMS, Windows NT,
Windows95 and ongoing work is being done to support Power Macintosh
platforms.

Even though plain C does not support an object oriented model but
merely enables the concept, many of the data structures in the Library
are derived from the class notation. This leads to situations where
forced type casting is required in order to use a reference to a
subclass where a superclass is expected. The forced type casting problem
and inheritance in general would be solved if an object oriented
programming language was to be used in the Library, but the current
standardization level of object oriented languages in general would
imply that a part of the portability would get lost in the transition.
There are several intermediate solutions under consideration where one
or more object oriented APIs built on top of the Library provides the
application programmer with a cleaner interface. However, the purpose of
this document if to describe the current API with a large set of
practical hints about using and modifying the behavior of the Library.

Many of the features of the Library is demonstrated in the Line Mode Browser which is a simple dumb terminal client built
right on top of the Library. Even though this application is usable as
an independent Web application, its main purpose is to show a working
examples of how the Library can be used. However, it is important to
note that the Line Mode Browser is only one way of using the
Library and many other applications may want to use it in other ways.

The development of the World-Wide Web Library of Common Code was started
by Tim Berners-Lee in
1990, Ari Luotonen, Jean-Francois Groff, and Håkon W. Lie have
contributed, and today the Library is a multifunctionality code base
with a large amount of knowledge about network programming and
portability built into it thanks to a large number of people on the
Internet.

The Library can be obtained as
a distribution packet from the Library status page
which includes all source files and documentation on how to unpack and
compile the Library using the BUILD
script. This is a simple script which is common for all W3C source
code destributions. If the actual platform is known by the BUILD
script then it creates a
Makefile with a set of platform dependent and independent
information and compiles the specified modules; if not, then it can
(often without major difficulties) be modified to support the new
platform. This is all explained in the
documentation on the BUILD script. The Library is
compiled simply by typing:

./BUILD library

As new versions of the Library are released frequently, it is
recommended to verify that the version is up to date by looking into the
file Version.Make in the Implementation
directory and compare it with the information given at the
Library status page.

Before starting on the design phase of
an application, it is advantagous to have an overview of the fundamental
concepts of the Library and how it interacts with the application.
Largely, it is divided into four different kategories of functions as
indicated in the figure:

Core Entity

This is the fundamental part of the Library. The core entity is not
a closed entity but an open frame construction that provides hooks for
the dynamic modules. It consists of an access manager, a thread manager,
a stream manager, a cache manager and some fundamental data structures.
The contents of the core entiry itself is largely internal to the
Library but the hooks are public and initialized dynamically. Many of
the sections through out this paper contain references to the core
entity and explain the interaction between the core entity and an
application.

Dynamic Modules

The dynamic modules are can be enabled or disabled dynamically
during execution of an application. They consist of a set of converter
streams and protocol modules which is explained in the sections Access Methods and Stream Interfaces. There
are several ways to initialize the dynamic modules:

Through a configuration file (often called a rule file) which is
parsed at start up time

Using static initialization functions which are created at compile
time

Initialize the modules during execution as the application requires
it

The Library has a set of default, static initialization functions
which can be found in the HTInit
module which by default enables all the dynamic modules in the
Library.

Application Specific Modules

These modules are often specific for client applications including
functions that require user interaction, management of history lists
etc. The implementation of these modules is often simple and intended
for simple character based applications like the Line Mode Browser and more advanced clients will often have to
override them. This is explained in detail in the sections Keeping Track of History and User Prompts and
Confirmations.

Generic Modules

The Library provides a large set of generic utility modules such as
various container classes, parsing modules etc. These modules are
characterized by being public available to the application programmer so
that they easily can be employed in the application implementation. The
reader is referred to the Internals and
Programmer's Guide for a description of these modules.

In
version 3.0, the include file WWWLib.h is the only include file which is required in
order to use the Library. This file contains all the functionality that
is public available, but as the architecture is very open, this includes
most of the modules in the Library itself. Apart from this, only two
functions are necessary in order to initialize and cleanup the Library
respectively:

HTLibInit()

This function initializes memory, file descriptors, interrupt
handlers etc, and it calls the static initialization functions for the
dynamic modules. This can be changed either by overriding the HTInit
module or by using a preprocessor directive as explained in section Override a Library Module and Global Flags
respectively.

HTLibTerminate()

Cleans up the memory and closes open file descriptors.

It is essential that HTLibInit() is the first
call to the Library and HTLibTerminate() as the last as
the behavior otherwise is undefined. HTLibInit() calls a
set of internal and external initialization functions. The external
functions handles the initialization of the dynamic modules and are
placed in the HTInit module
which

Library
version 3.0 has been designed using a new
thread concept which allows
an application to handle requests in a constrained asynchronous manner
using non-blocking I/O and an eventloop. As a result, I/O operations
such as establishment of a TCP connection to a remote server and reading
from the network can be handled without letting the user wait until the
operation has terminated. Instead the user can issue new requests,
interrupt ongoing requests, scroll a document etc. Version 3.1 of the
Library has an enhanced thread model as it supports writing large amount
of data from the application to the network, also using non-blocking I/O
operations. This becomes useful in multithreaded server applications and
in client applications with support for remote collaborative work
through the HTTP methods PUT
and POST. The Library has been designed to support threads on a
wide set of platforms with or without native support for threads, and
this section describes how Library threads can be used by the
application and how the API is designed to support other thread models.

Thread Interfaces

The Library provides three different modes in the
thread API and it is necessary to be aware of these modes in the design
phase of an application as they have a significant impact on the
architecture of the application. There is no distinct differentiation
between the three modes, it depends on the architecture of the
application and often one application can change mode as a function of
the action requested by the user. The three different modes and how they
can be used are described in the following:

Base Mode

This mode is strictly singlethreaded and requires no special
considerations in the design of the application. The difference between
this mode and the other two is that all sockets are made blocking
instead of non-blocking. This is done simply by enabling a flag in the
HTRequest structure.
The Library does still expect the definition of the set of callback
function as described in the section Providing
Callback Functions, but they can be defined as dummy functions
without any content. The mode preserves compatibility with World-Wide
Web applications with a singlethreaded approach, however it does not
provide interruptible I/O as this requires an active eventloop either
internally or externally to the Library.

Active Mode

In this mode the eventloop containing a select() call
is placed in the Library in the
HTEvent module. The mode can either be used by character based
applications with a limited capability of user interaction, or it can be
used by more advanced GUI clients where the window widget allows
redirection of user events to one or more sockets that can be recognized
by the select() call. It is important to note that even
though all sockets are non-blocking, the select() function
is blocking if no sockets are pending. The HTThread module contains a thread scheduler which gives highest
priority to the events on the redirected user events which allows a
smooth operation on GUI applications with a fast response time. It is
important to note that the scheduler The mode is currently used by the
Arena Client and the Line Mode Browser. This mode has a major impact on the design of
the application as the eventloop is based on callback functions which
must be provided by the application. In section Providing
Callback Functions this architecture is explained in more detail.

Passive mode

This mode is intended for applications where user events can not be
redirected to a socket or there is already an eventloop that can not
work together with the eventloop in the Library. The major difference
from the Active Mode is that instead of using the eventloop
defined in the HTEvent
module, this module is overwritten by the application as described
in the section Modules to Overwrite. All
socket descriptor arrays (referenced using the FD_XXX
macros) are still handled internally in the HTThread module but by providing the same set of functionality as
the HTEvent module the
information required for an external select() function
call can be obtained in the external eventloop. The Passive Mode
has the same impact on the application architecture as the Active
Mode except for the eventloop, as all library interactions with the
application are based on callback function.

Providing Callback Functions

The thread model in the
Library is foreseen to be able to work with native thread interfaces but
can also be used in a non-threaded environment. In the latter case, the
Library handles the creation and termination of threads internally
without any interaction required by the application. The thread model is
based on callback routines which must be supplied by the application as
indicated in the figure:

The dashed lines
from the eventloop to some of the access modules symbolizes that the
access method is not yet implemented using non-blocking I/O, but the
eventloop is still a part of the call-stack. This is an example that it
is possible to actually using blocking I/O in the eventloop.

User Event Handlers

As described in section Use of Threads an application can register
a set of user event handlers to handle events on a set of sockets
defined by the application to contain actions taken by the user. This
can for example be interrupting a request, start a new request, or
scroll a page.

Event Termination

This function is called every time a request is terminated.

Timeout Handler

In the active mode, the select() function in the
Library eventloop is blocking so that if no actions are pending on the
any active registered socket

Control the Library

The
application is free to do any action in any of the callback functions -
also envolving the Library. However, some actions affects the current
state of the Library, for example if a new request is issued, a request
is interrupted etc. This information must be handed back to the Library
using the return values of the callback functions.

Interrupting a
request

The interrupt handler implemented for active mode is
non-eager as it is a part of the select function in the socket
eventloop. That is, an interrupt through standard input is caught when
the executing thread is about to execute a blocking I/O operation such
as read from the Internet and execution is handled back to the
eventloop. The reason for this is that the user is not blocked even
though the interrupt does not get caught right away so it is not as
critical as in a singlethreaded environment. In passive mode the client
has complete control over when to catch interrupts from the user and
also how and when to handle them.

The Library handles a wide
set of Internet Protocols as well as access to the local file system.
The current set of access methods supported are: HTTP, FTP, Gopher,
telnet, rlogin, NNTP and WAIS. All protocol modules are dynamic modules
and each module can be bound to an access scheme dynamically as
described in section Get Started. As an example,
the URL:

http://www.w3.org/

has the access scheme http
and can be bound to the HTTP
module. The binding between a protocol module and an access method
is done at run time and by default, the Library enables all the access
schemes that it provides services for during initialization as explained
in section Get Started. The application can change
the default behavior by providing its own initialization of the binding
between protocol modules and access methods. This can be done in order
to make applications with a limited set of Internet access methods
available or to add new protocol modules to the Library.

One special case is the support for access to WAIS databases. If native support for access to a WAIS database is
desired, the application must be linked with a WAIS Library in which case the
HTWAIS.c module will be compiled into the Library as the interface
between the Library of Common Code and the WAIS library. This can be
done by enabling the support in the
Makefile.include which is the platform specific part of the
Makefile created by the
BUILD script. In case direct WAIS support is not present, the
Library looks for a WAIS
gateway in order to handle the request and if no WAIS gateway is
specified using environment variables, the
default destination is wais://www.w3.org:8001/
where a WAIS gateway is accepting connections.

An application can
also indirectly support an access method by redirecting the request to
either a proxy or a gateway. The difference between a proxy server and a
gateway is described in
Internals and Programmer's Guide,
but it does not affect the application using the Library and the
redirection is normally transparent to the user. The Library supports
both proxies and gateways through classes of environment variables and all requests can be redirected to a proxy
or a gateway, even requests on the local file system. Of course, the
Library can also be used in proxy or gateway applications which in
terms can use other proxies or gateways so that a single request can be
passed through a series of intermediate agents. Proxies and gateways are
defined using the following set of environment variables:

WWW_<access>_GATEWAY

MORE. Note that a WAIS gateway can be defined in this way to change
the default gateway at wais://www.w3.org:8001/.

<access>_proxy

MORE

no_proxy

MORE

where <access> is the specific access scheme.
Proxy servers have precedence over gateways, so if both a proxy server
and a gateway has been defined for a specific access scheme, the proxy
server is selected to handle the request. The default WAIS gateway

It
is important to note that the usage of proxy servers or gateways is an
extension to the binding between an access scheme and a protocol module.
An application can be set up to redirect all URLs with a specific access
scheme without knowing about the semantics of the URLs or how to access
the information directly.

Streams are objects used to transport data internally in the Library
between the application, the network, and the local file system. Streams
are characterized by accepting sequences of characters but the action
executed on a character sequence is specific for each stream. The very
generic definition of streams makes their usage almost unlimited and the
Library has a large set of streams used to serve many purposes. The
Library streams can be divided into groups depending on their behavior:

Converters

Streams that can be used to convert data from one media type to
another.

From
version 3.1 of the Library, streams are also used to transport data from
the application to the network which enables users send data from the
client application to the remote server and hence do collaborative work
with remote users using HTTP as the transport carrier.

Setting
up Converters

Converters can be set up at run time just like the
access methods.The Library contains a set of default initialization
function which are placed in the
HTInit module.

Changing the Destination for Data

Changing the Format of a Stream

Error Stream

Caching is a required
part of any efficient Internet access applications as it saves bandwidth
and improves access performance significantly in almost all types of
accesses. The Library supports two different types of cache: The memory
cache and the file cache. The two types differ in several ways which
reflects their two main purposes: The memory cache is for short term
storage of graphic objects whereas the file cache is for intermediate
term storage of data objects. Often it is desirable to have both a
memory and a file version of a cached document, so the two types do not
exclude each other. The following paragraphs explains how the two
caches can be maintained in the Library.

Memory Cache

The
memory cache is largely managed by the application as it simply consists
of keeping the graphic objects described by the HyperDoc
structure in memory as the user keeps requesting new documents. Before a
request is processed over the net, the anchor object is searched for a
HyperDoc structure and a new request is issued only if
this is not present or the Library explicitly has been asked to reload
the document, which is described in the section Short
Circuiting the Cache

As the management of the graphic object is
handled by the application, it is also for the application to handle the
garbage collection of the memory cache. The
Line Mode Browser has a very simple
memory management of how long graphic objects stay around in memory. It
is determined by a constant in the GridText module and is by default set to 5 documents. This approach
can be much more advanced and the memory garbage collection can be
determined by the size of the graphic objects, when they expire etc.,
but the API is the same no matter how the garbage collector is
implemented. To free a graphic object, do the following: MORE MORE

File
Cache

The file cache is intended for intermediate term storage
of documents or data objects that can not be represented by the HyperDoc
structure which is referenced by the HTAnchor object. As
the definition of the HyperDoc structure is done by the
application there is no explicit rule of what graphic objects that can
not be described by the HyperDoc, but often it is binary
objects, like images etc.

The file cache in the Library is a very
simple implementation in the sense that no intelligent garbage
collection has been defined. It has been the goal to collect experience
from the file cache in the CERN proxy server implemented by Ari Luotonen
before a garbage collector is implemented in the Library.

An
important difference between the memory cache and the file cache is the
format of the data kept. In the memory cache, the cached objects are
graphic objects ready to be displayed to the user. In the file cache the
dat object is stored along with its metainformation so that important
header information like Expires, Last-Modified, Language etc. is a part
of the stored obejct. All metainformation describing a graphic object in
memory is stored in the anchor object as described in section What is an Anchor?

In situations where a cached document is known to be stale
it is desired to flush any existent version of a document in either the
memory cache or the file cache and perform a reload from the
authoritative server. This can for example be the case if an expires
header has been defined for the document when returned from the origin
server. Short circuiting the cache can be done by enabling the XXX flag
in the Request
structure in which case the access module immediately issues the
request instead of searching the local cache.

An anchor represents a reference to All URLs registered in the
Library are bound to an anchor
object which contains meta information about that data object, the
URL identifies, for example the natural language used, when the data
expires, the title, media type etc. The anchor structure maintains a
snapshot or a mini-web of all the links a user has been in touch with
when browsing the web and it also

At this point
most of the design issues have been addressed and the Library it is now
possible to use the Library to exchange information between the
application an the Internet. The Library provides a set of functions
that can be used to request an URI either on a remote server or on the
local file system. The access method binds the URL with a specific
protocol module as described in section
Access Methodsand the stream pipes defines the
data flow for incoming and outgoing data.

Handling the Request
structure

Selecting the Method

Searching a URL

Receiving an Entity

Sending an
Entity

Experimenting with the HTTP Module

The anchor structure is a generic super class used for both
parent anchors and child anchors. Both types
have a specific structure which is a subclass of the generic structure.
It contains all information about relations among URIs and whether they
have been loaded or not.

The HyperDoc structure is only declared in the
Library - the real definition is left to the client application. For the
Line Mode Browser, it is
defined in the GridText
Module where it is called _HText. It contains all
information needed to present and manage a
graphic object. The client is responsible for allocating and freeing
all graphic objects which is a question about speed versus
available resources. When the object is freed, the link from the anchor
structure must be put to NULL. The dotted line
symbolizes that it is free for the client to create a
HyperDoc object including a link to the
request structure.

HTErrorMsgThis module
generates and formats the messages on the error stack. If the application wants it own format for the
messages, then this module can be overwritten.
HTAlert See also Description of HTAlert. All communication within the Library to the
user goes through this module. It contains functions for prompting for
user name etc. Obviously this must be overwritten by GUI clients.

Global Variables

Global variables have until recently been in
widespread use throughout the Library but as this often conflicts with a
multithreaded environment, many global variables have been replaced with
thread-safe representations. However, many modules do still contain
state independent global variables defining display options, global
time-outs, trace options etc. Typical examples are the module to
generate
directory listings for
HTTP, FTP, and local file access to directories and the error handling module.

Only two specific global variables are
to be mentioned in this paper as they must be defined in the application
before linking with the library, and they must be assigned values with
specific semantics.

HTAppName

A string defining the name of the application. This value is used
in the User-Agent field in the HTTP
Protocol and it must obey the semantics for this field.

HTAppVersion

A string defining the version of the application. The value is also
used in the User-Agent field and must obey the general
semantics for this field.

Environment Variables

The Library supports a set of
environment variables which are used to define few important features in
the Library.

Only one other environment variable is of importance in
the Library:
WWW_HOME. This variable is used by the help function
HTHomeAnchor() to find the address of the default document
to load when a client application is started. If no WWW_HOME
variable has been specified at run time, the Library tries any of the
values of the preprocessor defines:

PERSONAL_DEFAULT

LOCAL_DEFAULT_FILE

LAST_RESORT

Preprocessor Defines

This section is dedicated to a
set of examples that show some of the functionality of the Library as
described in the previous sections. The Line
Mode Browser is, as mentioned in the introduction, a working example
of most of the functionality provided by the Library but as it contains
almost 5000 lines of code, it is often difficult to extract the right
examples. The following examples are not intended to be complete but to
clarify the API needed to use the Library.