The editors-in-chief of Markup Languages: Theory & Practice interview Joan Smith, who was instrumental in promoting SGML, especially in Europe.

"Joan Smith is Chairman of the SGML Technologies Group of a pan-European companies, with subsidiaries in Brussels and Luxembourg; the largest group of companies specializing in SGML in Europe. She founded the International SGML Users' Group, has written numerous books and papers on SGML, and has organized conferences on SGML. She has recently been accepted as a Freeman of the Worshipful Company of Information Technologists of the City of London. She is a Fellow of the British Computer Society, a Member of the Institute of Directors, and was the first European to receive the GCA's Tekkie award and later the GCA's International SGML Award."

"Businesses and organizations are increasingly finding that HTML (Hyper-Text Markup Language)
offers no help whatsoever in managing the information on their web sites. SGML (Standard Generalized
Markup Language) provides the flexibility and reuse lacking in HTML. However, SGML alone does not
address the problems involved in maintaining online document repositories. Although traditional database
management systems are clumsy at managing hyperlinked documents, a system combining SGML, database
technology, and the protocols of the Web can provide a reasonably robust environment for developing and
maintaining a web site. Two possible site designs employing SGML are discussed and evaluated with respect to
a set of design objectives and choices. The likely impact of the emerging XML (Extensible Markup Language)
standard on web site design is also discussed."

"Sites 1 and 2 illustrate a dilemma that today's web site developers to take advantage of the benefits of
SGML. On the one hand, they can rely heavily on SGML's ability to represent data in an application-specific,
structured manner
and on CGI to dynamically generate browser-ready web output in response to SGML database queries. While
such a site design enables users to quickly find information through application-specific queries and is easier to
maintain than a
collection of HTML documents, it requires extra effort on the part of content providers, additional server
overhead, and the implementation of hyperlinking if links to off-site web pages are desired. On the other hand,
web site developers
may choose to minimize the burden on content providers and to maximize server performance, interoperability
with web search engines, and linkage with other web sites. In this case, they must sacrifice application-specific
structured query
capability and implement tools for managing entities and maintaining hyperlinks. The emerging XML standards
promise to provide web site developers with the best of both worlds, allowing them to enjoy most of the
benefits of SGML
while not sacrificing the convenience of HTML and interoperability with the rest
of the Web. If XML is ultimately successful, not only will it be easier for web site
developers to use SGML, but also they will be able to take advantage of newly
available capabilities to make their content easier for users to read and easier for
web clients and other desktop applications to interpret."

"The syntax of XML is simple enough that it is possible to parse an XML document into a list of its
markup and text items using a single regular expression. Such a shallow parse of an XML document can be very
useful for the construction of a variety of lightweight XML processing tools. However, complex regular
expressions can be difficult to construct and even more difficult to read. Using a form of literate programming
for regular expressions, this paper documents a set of XML shallow parsing expressions that can be used as a
basis for simple, correct, efficient, robust and language-independent XML shallow parsing. Complete shallow
parser implementations of less than 50 lines each in Perl, JavaScript and Lex/Flex are also given."

[From the conclusion:] "The simplicity of the shallow parsing model based on regular expressions
suggest suggests some interesting possible directions for development of XML.
First of all, a shallow parsing representation such as that produced by REX could
be a useful reference representation for a revised XML specification. Such a refer-ence
representation would have the advantage of providing a language-independent
approach to shallow parsing encoded in the standard, with a
language-independent implementation framework based on regular expressions.
Furthermore, it may be possible to relax certain XML restrictions that can be
easily accommodated by regular-expression processing, such as the restriction
that attributed values must always be quoted. However, possibilities such as these must be carefully weighed by the overall XML development community."

"Named Entity recognition involves identifying expressions which refer to (for example) people,
organizations, locations, or artifacts in texts. This paper reports on the development of a Named Entity
recognition system developed fully within the XML paradigm. In the section 'Named Entity recognition' we
describe the nature of the Named Entity recognition task and the complexities involved. The system we
developed was entered as part of a DARPA-sponsored competition, and we will
briefly describe the nature of that competition.
We then give an overview of the design philosophy behind our Named Entity
recognition system and describe the various XML tools that were used both in the
development of the system and that make up the runtime system (section "LTG
text handling tools"), and give a detailed description of how these tools were used
to recognize temporal and numerical expressions (section "TIMEX, NUMEX")
and names of people, organizations and locations (section "ENAMEX"). We conclude
with a description of the results we achieved in the competition, and
how these compare to other systems (section 'Conclusion), and give details on
the availability of the system (section Availability').

[System description:] "One of the design features of the system which sets it apart from other Named
Entity recognition systems is that it is designed fully within the SGML paradigm:
the system is composed of several tools which are connected via a pipeline with
data encoded in SGML or XML. This allows the same tool to apply different
strategies to different parts of the texts using different resources. The tools do not
convert from SGML into an internal format and back, but operate at the SGML
or XML level. Our system does not rely heavily on lists or gazetteers but instead treats
information from such lists as "likely" and concentrates on finding contexts in
which such likely expressions are definite. In fact, the first phase of the enamex
analysis uses virtually no lists but still achieves substantial recall. The system is document centered. This
means that at each stage the system
makes decisions according to a confidence level that is specific to that processing
stage, and draws on information from other parts of the document. The system is
hybrid, applying symbolic rules and statistical partial matching techniques in an
interleaved fashion. A runtime version of the system described here is available for free at
http://www.ltg.ed.ac.uk/software/ne/.
We also have a set of tools which can be used to develop a Named Entity
recognition system. The tool suite is called LT TTT, and is available from
http://www.ltg.ed.ac.uk/software/ttt/. LT TTT consists of
lttok, ltstop
and fsgmatch, a number of resource files for tokenization, for end-of-sentence
disambiguation, and for the recognition of temporal expressions, and tools for
extending these resource grammars or for creating new ones.
It also has a visual interface which uses XSL style sheets to render the XML
Named Entity annotation in a form that is easier to inspect.
The part of speech tagger is available as a separate tool. See
http://www.ltg.ed.ac.uk/software/pos/.

"Wizards have been a part of workstation products since the early 1990s. A wizard
is a task-oriented dialog that guides the user through a given task, automating as
much of that task as possible. A typical wizard panel has a graphic area on the
left, a set of navigation buttons on the bottom, and an area on the right that
contains any text and controls needed for the task at hand."

"IBM's TaskGuide technology gives Technical Writers and Human Factors professionals the ability to create wizards. Based on the premise that task analysis is the most difficult part of creating an effective wizard, our tools let you focus on design, not writing code. This paper discusses the basics of wizard technology, followed by a discussion of the XML-based system we have created. We cover some of the key design decisions we had to make, and introduce some of the unique features of our product. We also discuss the changes we have made to our product as technology has changed around us. Finally, we demonstrate a recursive document, a wizard that creates another wizard."

"IBM's TaskGuide technology allows technical writers to create wizard panels
without programming. These panels are created dynamically based on the
information in wizard scripts. Our approach lets wizard writers focus on the truly
difficult tasks of task analysis and technical writing, rather than on the mundane
aspects of programming a graphical interface. As our technology has grown over
time, the basic skills learned to create wizards with our first driver are still useful
and effective today."

"In this paper, the system used for the editorial process of the European Union's budget is described, both from a functional and a technical point of view. It will be shown how the choice of SGML as the key technology has had an impact on the overall architecture as well as on individual modules which constitute the system. The description is based on the current status of the system. Future developments are discussed briefly."

"The editorial process of the budget of the European Union is an annual, on-going process in which different players such as authors, translators, reviewers and a printer all operate in a common environment to enter, translate, and review data needed to produce the budget. The budget itself is published on paper and on the Web. The system, designed to fulfill requirements for the timely delivery of high-quality documents, together with short production times, and hence minimized costs, is entirely SGML-based. It has evolved to a complete and mature production environment. In this paper an overview of the architecture of the system is given as well as a description of the rationale behind the key technical choices that were made. It highlights certain aspects of SGML, such as concurrency and links, which are explained by illustrating their use in the budget application. The need for reliability and stability is shown to have led to a client/server system in which SGML acts as the backbone of the modules which govern the production workflow. These modules communicate with each other through SGML-formatted messages. This application has been made possible through the use of a full-featured SGML parser and an associated application language that combine to make a powerful SGML engine. In a final section, future developments, some of which are currently being developed, are briefly discussed."

"The declarations for predefined &amp; and &lt; entities provided in section 4.6, Predefined Entities, of the XML Recommendation may be confusing at first sight because the leading ampersand in each numeric character reference is itself escaped as a complete numeric character reference. [shows how <!ENTITY my-amp "&#38;#38;"> will eventually yield strings like "AT&T" (internally) in an application after reparsing...]

"Neil Bradley has been working with generic markup applications for over ten years; his offering, The XML Companion, benefits accordingly. His treatment covers the same range of issues as other overviews, but the text itself is refreshingly free of statements of unanchored principle (what XML 'should' be) and
prognostication, instead presenting the actual state of things and concentrating on
what is known by markup practitioners to work. Likewise, he is much more accurate
and forthright than many other general references in indicating which technologies
are stable (for example, the DTD syntax of XML 1.0 is not subject to
change and will not suddenly be replaced by 'XML-Data', even while a new
schema language is in the works) and which are soft or still under development
(like XSL). He is also more consistently successful in exposing core ideas, rather
than depending on examples (plucked from wherever) to be self-explanatory. . ."

"XML: The Annotated Specification is the shortest and most manageable of the books under review, and the quality of information in it is good; its scope is also narrower. Unlike the other books, it is not a general reference; Bob DuCharme concentrates exclusively on the syntax of XML languages (both instance and DTD syntaxes) as defined in the February 1998 W3C Recommendation (which appears in the book verbatim, intermixed with commentary). DuCharme, while not a member of the committee that wrote the specification itself, was party to discussions about its design when it was in progress, and is thus in a good position to present an interpretation without compromising the specification's 'actual meaning'. This book will be of greatest interest and most benefit, naturally, to the technical user who has a reason to be concerned with details of the standard itself, rather than with one or another implementation or application of it. . ."

"XML In Plain English is a digest of information from available specifications presented in directory form, so that one could, for example, look up 'children' in the XML Syntax section and find out how the XML Specification uses the term. Included are sections on XML Syntax (information derived from the February 1998 XML Specification), XLink and XPointer (1998 Working Drafts), Cascading Style Sheets (CSS1 and CSS2), the DSSSL-O subset of DSSSL (August 1998), Appendixes on Unicode and XML Editors and Utilities, and a Glossary. . ."

Pitts-Moultis and Kirk's XML Black Book, billed as a 'comprehensive reference', tries to cover the full range of XML-related issues. It contains six parts, variously approaching high- and low-level problems of document modeling, system design and implementation, style sheet technologies, application development and so on. Within these parts the chapters, with titles like 'Implementing XML in a Corporate Environment' or 'Creating Content in XML', each contain an 'In Depth' and an 'Immediate Solutions' section. . ."