University of Cork

TEI-WWW workshop

As originally proposed at ACH-ALLC in Washington earlier this year,
Peter Flynn of the Curia Project at the University of Cork organized a
two day meeting with a general view of creating dialogue between the
TEI and the developers of World Wide Web, one of the most rapidly
growing computer systems since the Internet itself. WWW is a
distributed hypertext system running at some improbably large number of
sites worldwide, which uses a very simple SGML tagset called HTML (it
has been rather unkindly characterized as "Pidgin-SGML"). WWW itself
consists of a markup language (HTML), a set of Internet protocols (FTP,
HTTP etc) and a naming scheme for objects or resources (the "Universal
Resource Locator" or URL). A number of browsers are now available
which use these components. Mosaic, developed at NCSA, is probably the
most impressive: running on Mac, X and Windows it offers a fully
graphical interface with just about everything current technology can
support. Lynx, developed at the Computer Science dept at U of Kansas,
is at the opposite extreme, assuming only a VT100 (there is also a
WWW-mode for EMACS!). I will not attempt here to describe WWW in
operation. Web browsers are freely available by anonymous FTP all over
the place: if you haven't tried it out already, and can't see what all
the fuss is about, then you should stop reading now, get yourself a
browser and do so forthwith.

The two day meeting was attended by Chris Wilson (NCSA); Lou Montulli
(Lynx, U Kansas); Bill Perry (EMACS, Indiana University); Dave Ragget
(Hewlett-Packard; HTML+) and myself for TEI. Various representatives of
the Curia project, notably Patricia Kelly from the Royal Irish Academy,
were also present. I gave a short presentation about the TEI, focussing
mostly on contextual issues but also including some detailed technical
stuff about bases and toppings and X-pointer syntax, which seemed to be
well received. Dave Ragget then talked us through the current HTML+
draft which started off a very wide ranging discussion. This continued
during the second day of the meeting, but was at least partially nailed
down in the shape of a brief report (see below) which should be
somewhere in the Web by the time you read this one.

To their credit, most WWW people seem painfully aware of the
limitations of the current HTML specification, which was very much an
experimental dtd hacked together in haste and ignorance of the finer
points of SGML. (or indeed the blunter ones). HTML+, which Dave Raggett
has been working on for the last year or so, attempts to improve on it
without sacrificing too much of its flexibility. This draft will
eventually progress to Internet RFC status; there is also talk of an
IETF working group co-chaired by Ragget and Tim Berners-Lee (of CERN;
onlie begettor of the Web) to steer this process through.

The Cork meeting was an interesting opportunity for the developers of
three of the major Web browsers to meet face to face and argue over
some of the design decisions implicit in the HTML+ spec. To some extent
this did happen, though the discussion was rather anarchic and
unstructured. It was also a good opportunity for the TEI to encourage
development of HTML+ in a TEI convergent manner, and this I think was
achieved. Several of the changes accepted, at least in principle, will
make it much easier to transform TEI documents into HTML, if not vice
versa. Some practical issues about how WWW should handle TEI conformant
documents were also resolved.

Outside the meeting, this was also a good opportunity to find out more
about the Curia project itself. My hasty assessment is that this
project has still some way to go. There is a clear awareness of the many
different ways in which it could develop, and a tremendous enthusiasm. I
think the project would benefit from some detailed TEI consultancy
before too much more P1-conformant material is created. It also offers
interesting contrastive opportunities with other corpus-building
activities, chiefly because of its enormous diachronic spread, and its
polyglot nature.

<!-- This uses the HTML dtd -->
<title>WWW/TEI Meeting</title>
<h1>Notes from WWW/TEI Meeting</h1>
<h3>Action Items/Recommendations</h3>
<list>
<li>HTML 1.0 should be documented to define the behavior of existing browsers,
and should be frozen as agreed upon at the WWW Developers' Conference.
<list>
<li>Features to be documented, implemented and specified include
collapsing spaces, underline, alt attribute, BR, HR, ISMAP...
<li>HTML IETF spec needs to be updated by CERN, as well as existing
documentation
</list>
<li>HTML+ future browsers need not support HTML 1.0 features after a reasonable
amount of time.
As an aid in transition, the HTML+ spec/DTD will not include any deprecated
features of HTML 1.0.
<list>
<li>HTML 1.0 deprecated features
<list>
<li>nextid
<li>method, rel, rev, effect from &lt;A&gt; tag (but
not from the &lt;LINK&gt; tag)
<li>blockquote --&gt; quote
<li>There was a feeling that the &lt;img&gt; tag will
be superceded by the &lt;fig&gt; tag, although its
deprecation was not agreed upon.
<li>menu list --&gt; ul
<li>dir list--&gt; ul
</list>
</list>
<li>The intention of HTML+ is to support generic SGML-compliant authoring
tools, and authors are recommended to use this software with the HTML+
DTD for the creation/maintenance of documents.
<li>Browsers may implement different levels of HTML+ conformance.
<list>
<li>Level 0 implementation
<list>
<li>HTML 1.0 spec referenced above
</list>
<li>Level 1 implementation
<list>
<li>Partial fill-out forms
<li>New entity definitions (in section 5.1 of HTML+ draft)
</list>
<li>Level 2 implementation
<list>
<li>Additional presentation tags (sub, sup, strike) &amp;
logical emphasis
<li>Full forms support (incl. type checking)
<li>Generic emphasis tag
</list>
<li>Level 3 implementation
<list>
<li>Figures
<li>NOTEs and admonishments
</list>
<li>Further levels to be specified
</list>
<li>Authoring tools are expected to conform to the HTML+ DTD and are
<b>NOT</b> to support deprecated features.
<li>We expect the HTML+ DTD to be developed incrementally. The HTML+
internet draft will make clear which features are now stable and
which are still subject to change. The DTD will be structured to
reflect this.
<ol>
<li>HTML+ will work with the SGML reference concrete syntax.
<li>The entity sets will be user-specifiable (in the long run).
<li>HTML+ will support nested divisions or containers.
<li>There will be a number of new features
<dl>
<dt><b>Figures & Images</b>
<dd>&lt;fig&gt; may be able to subsume the role of &lt;img&gt;.
<dt><b>Generic highlighting tag</b>
<dd>The &lt;em&gt; tag will be used with a set of
three or four defined attributes to present a
guaranteed-distinct presentation of these
attributes.
<dt><b>Generic roles</b>
<dt><b>Support for undefined elements</b> (user extensions) (render)
<dd>
<dt><b>Tables</b>
<dd>This is now stable.
<dt>Math</b>
<dd>for research
</dl>
</ol>
<li>HTML/TEI
<list>
<li>It was felt the correct way to convert between TEI and HTML
was to do it on the server side using a conversion filter.
<li>This server will also provide a hypertext link to download the
raw TEI text.
<li>We (WWW developers and TEI people) will strive together to
converge functionality between HTML* and TEI, as well as to
produce this server/filter system.
</list>
<li>Links to:
<list>
<li>HTML spec
<li>HTML+ spec
<li><ref target="http://curia.ucc.ie/curia/doc/tei.html">TEI overview</a>
</list>
</list>