W. F. Hammond: The GELLMU ArchiveA Brief Introduction to Regular GELLMUWilliam F. HammondDepartment of Mathematics Statistics The University at Albany Albany, NY 12222, USASeptember 2000
Last revision:
(still needs revision as well as a better proof reading)Note: Since this document was written, there have been
some enhancements to the system it describes For more information
see the
href="http:www.albany.eduhammondgellmuglmanglman.html"GELLMU ManualAnother MarkUp Language?The purpose of regularGELLMUGELLMU stands for
Generalized Extensible like MarkUp
is to provide a markup language with provision for mathematics under a
system design that is structured enough to admit rigorous,
modularlystaged, automatic processing into many other formats while
at the same time being a comfortable input format for those accustomed to
using There is not a set list of translation target formats
Examples of possible translation targets include:
HTMLHTML as enhanced by MathMLTexinfoME (a roff package)textplain (for optimal deing)audio streamsearch engineindex engineGELLMU is not although its style of markup resembles that of
An example of how there is provision for
enrichment of information lies in the markup used
for a quoted phrase in the href="http:www.albany.eduhammondgellmuigliglm.glm"source for this document.The base for
relative URI references given here may be found at the end of the
printed form of this document. Whereas the usual way of
writing the phrase in the last sentence would be
enrichment of information,
and that markup would be meaningful in regular GELLMU, the markup
quophraseenrichment of information
can be detected more easily by a processor looking for quophrase
contents, while the separate doublequoting indicators may or may not
be adequate for such a processor seeking to glean similar information
Furthermore, the effort by a processor to glean such information might
be spoiled by the occurrence of one or more unbalanced doublequoting
indicators Possibly even more useful ways of marking the
same phrase that still would yield the same printed appearance,
depending on the overall context, might be:
jargonenrichment of information or
topicenrichment of information.An important characteristic of this markup is that the processing of a
document always begins with a faithful translation to an instance
under the umbrella of Standard Generalized Markup Language
(SGML) It is widely known in the text processing community (though
not widely observed in the mathematical community) that sufficiently
structured markup languages under SGML can be robustly converted to
many other formatsGELLMU has been designed to provide a bridge
from LaTeXlike markup to the world of SGMLThe translation from source markup to SGML markup is performed by my
program called the GELLMU Syntactic Translator Once that translation is performed the
authors document exists in the world of SGML, and in that world many
programs, almost all of which are very highly configurable, are
available for automatic processingBeyond the syntactic translation from source markup to SGML the first
release of materials from this project will contain a didactic
articleSGML document type, and some of the statements herein
are, for illustration, nonexplicit allusions to the article document
type, which is just one example of an SGML document type in the
advancedGELLMU categoryRealistically today, SGML is a document standard for inhouse
work, and its subsetXMLSubcategory
might be a better term here than subset.
has become the only shareable form of SGML Major SGML
document types such as DocBook and TEI have canonical
translations to XML Still one does not abandon SGML in
an authoring environment, because some of the features of full
SGML provide substantial convenience for authorsWhat added convenience there is in using like markup as a writing
interface for XML is greatly aided by taking advantage of
a few of the features in full SGML that are not available in
XML On the other hand, basicGELLMU is available for the direct
preparation of an XML document as the direct output of the syntactic translatorHow did GELLMU arise?Aside from markup languages under SGML another example of
a markup language that is structured enough to admit robust conversion
is Texinfo, which is the markup language of The
href="http:www.gnu.org"GNU Documentation
System The Info side of Texinfo is the
Info hypertext selfdocumentation system within the popular
fully programmable editing system GNU Emacs Documents
authored in Texinfo can be automatically converted into both
Info (for viewing in an GNU Emacs buffer) and
(for printing)The Info system is a hypertext system that predates the World
Wide Web (WWW) and the latters default presentation markup language
known as HTML, which falls under the SGML regime
One should, therefore, not be surprised to learn that with the dawn of
HTML Lionel Cons of CERN (the birthplace of the
World Wide Web) furnished the world with a PERL program for
converting a Texinfo document into HTML with
(internal) hypertext referencesAlthough Texinfo is a like markup language, it makes wide
use of the character as commandintroducer where in
one would (normally) use It seems that many
authors find this superficial difference uncomfortable Ulrik Vieth
of
href="http:www.tug.org"The Users Group (TUG)
wrote an Emacs Lisp program
that served to convert into Texinfo a file
written in January 1998 as the
href
"ftp:ctan.tug.orgtexarchivetdsdraftstandardtds0.9995.tar.gz"draft report
of the TUG Working Group on a directory structure for
systemsThis example, which was an ad hoc conversion, played an
important role in my arrival at this design for a bridge from
to SGMLTranslation of GELLMU MarkupGELLMU is essentially, but for appearance, a markup language under
SGML This means that after a document is translated by the syntactic translator into
the corresponding markup language falling formally under SGML, any
generalpurpose SGML processing tool can be usedThe projects prototype production system is built around
Regular GELLMU, which uses features for enhanced authoring
convenience in the syntactic translator that require some assumptions about the SGML
document type under which the author is writingBasic
GELLMU, on the other hand, may be used as a simple like
markup interface with a large number of document types, including XML
document typesTwo SGML processing tools that are easily available form the
core of the prototype production systemJames Clarks nsgmls, an SGML parser that is
part of his href="ftp:ftp.jclark.compubsp"SP library packageDavid Megginsons PERL
program sgmlspl that is an interface for his Perl library
SGMLSPM. It may be found at
href="http:www.cpan.orgmodulesbyauthorsDavidMegginson"CPAN or at his
href="http:www.megginson.com"officesgmlspl is a
programmerfriendly interface for writing simple or complex
eventdriven translators (to other formats) that operate
on nsgmls outputIn the prototype production system a document under the
didactic articleSGML document type made with the GELLMU Syntactic Translator
from like source markup is parsed by nsgmls and then
passed through a simple sgmlspl translation to yield an
instance under the corresponding XML document typeIn the prototype production system the XML version of an article may
be translated either to HTML or to Other translations are
possible The two existing translations are done with
sgmlsplBut a number of other freely available translation tools may be
considered for the XML version of a GELLMU article including:
James Clarks jade, an engine for DSSSL with
a wide variety of possible translation targetsJames Clarks xt, an XSLT engine also with
a wide variety of possible translation targetsDavid Carlisles xmltex, a package for TeX,
the Program, that enables its user to write what amounts to
a style sheet in , rather than, say, XSL,
for formatting (with its own parse) an instance of an XML
document type in (actually DVI since one does not
get a chance to see a source image)Advanced GELLMU vs. Basic GELLMUThis document is about the regularGELLMU It is important
first to know a bit about basic mode and advanced mode
What is called regular in this document is an instance of
advanced mode The term regular is being used here to
reference the didactic article document type, which is an SGML
document type that admits the more elaborate and succinct handling of
advanced mode while not being in any way incompatible with
basic modeNeither the basic nor the advanced mode involves in any way adoption of
the language of (But many command names under the didactic
document type, mimic command names.)To use either basic or advanced mode one must be
familiar with the SGML document type for which one is writing
Ordinary HTML is an example where the use of basic mode
is indicated Only a very few of the features in
advanced markup not part of basic markupWith several minor
exceptions, one related to the direct writing of SGML attributes
(which cannot contain markup and which do not have many parallels in
) and another related to the way of escaping the character
, everything about basic mode also applies to advanced
mode. make any sense for use in the direct preparation of HTML with
like inputBasic GELLMU uses the special characters ,
, and along with like argumentoption syntax,
where braces immediately following a command name indicate command
arguments and square brackets, i.e., and
, indicate command options A command corresponds to
an SGML element, and in basic mode a command may have at most one
argument, the content of which corresponds to SGML element content,
and at most one option, the content of which corresponds to a list of
SGML attribute specifications Thus for example, in basic mode for
HTML one may use the markup
a[href="http:www.w3.org"]The World Wide Web Consortium
to form the HTML anchor:
a href="http:www.w3.org"The World Wide Web Consortiuma
(The formation of anchors with the didactic document
type in advanced mode is slightly more complicated because the
characters and , which may acquire special (and
overloaded) semantic significance in mathematical
contexts, are held for delayed evaluation as empty elements and
because the syntactic translator, which does not recognize command names, regards this
usage in advanced mode as multiple argumentoption syntax, which
is not part of basic mode.)An example of the distinction between basic and advanced GELLMU is that
in advanced mode it is possible and easy to arrange to have a blank line,
as in , represent the beginning of a paragraph In basic mode
for classical HTML one
mustThere is a way, with the setting of several
variables for the syntactic translator, to have blank lines begin
new paragraphs in basic input for HTML
use p to begin a paragraph, and for the XML version of
HTML one must also provide markup for the end of every paragraph, which
may be done in several waysFor some of the details on using the basic markup with HTML see
href="http:www.albany.eduhammondgellmughtml.html"Using the GELLMU Syntactic Translator
to Write HTMLSome Fundamentals on Regular GELLMUThe Markup CommandsThere are several kinds of commands:A maximal string introduced byfollowed by a letterFor the first alpha release
letter means something matching the regular expression
[AZaz] This will not be a limitation of the syntactic translator
after the alpha releases I believe that
GNU Emacs, from version 20, is fully capable of meeting the
needs of GELLMU for supporting international
character sets At present, it is possible to use Emacsword constituent characters in names defined with
newcommand, and there is no reason why the definition strings
cannot involve standard SGML entity notation It is also possible to
define symbolic SGML entities in the SGML internal declaration
subset, which is the content of an option (at most one) that follows the
unique argument to the metacommand documenttype
Additionally, the SGML version of the didactic article
document type provides commands for the document preamble that may be
used to construct an internal declaration subset for the XML version
of the document (Of course, someone using this markup to edit
directly for a document type under XML would need to use the
documenttype option provided for the SGML internal
declaration subset.)
and otherwise consisting only of letters and numbers.The first number, if any, must not be 0 since
such names are reserved for use by the syntactic translator.
Command names are case sensitive Such a command is a
container, corresponding to an SGMLelement, if it is
immediately followed, without intervening white space, by the
character In that case the delimited zone of containment
normally ends with the subsequent balancing character (like multiple argumentoption chains deserve
more discussion; for now it will suffice to point out that the use of
the anch command in this document for making
anchors is an example, and, of course, s
frac command is another example For the present
discussion these commands are considered to be containers.) Such a
command corresponds to an SGMLdefinedempty element if it is
followed immediately without intervening white space by the character
; In any other case there is syntactical ambiguity However,
the syntactic translator will produce a corresponding SGML
open tag unless the logical variable gellmuxmlstrict has been
set (This variable is normally not set for advanced GELLMU.)
Certain single characters., , , , ,
, , and have command meanings
that are similar to their meanings in The characters
; and : are ordinary characters that have special
meaning at the end of a command name The character is
also a special character used, as with , in newcommand
templates In source for the didactic
article document type any nonalphanumeric ASCII
character may be escaped (referenced for itself) with a named command,
e.g. may be referenced for itself as pct; This
is necessary in order to provide delayed evaluation for
ultimate translation to one of many possible ultimate formatsThe following language meanings apply to both basic and regular
markup:
"": Command introducer
Escape in basic mode: "". This escape is
incorrect in regular mode For the didactic article
document type the escape is "bsl;" For other document
types one may resort to an entity reference
if there is an aversion to the provision of a corresponding
definedempty element or if one lacks control of the document type"": Command argument opener
Escape: """": Command argument closer
Escape: """[": Command option opener"]": Command option closer";": Command terminator for commands corresponding
to definedempty SGML elements; otherwise an ordinary
character Its use as a command terminator is invisible and may be
omitted optionally in some contexts This syntax is analogous to
the use of ; as an entity reference terminator in HTML":": Special purpose command terminator used for
indicating the close of a commandzone; otherwise an ordinary
character Its use as a command terminator is invisible"": Comment introducer (in force to end of line)Any command is terminated by a nonalphanumeric character There is
syntactic ambiguity unless the terminator is one of ,
, ;, or : This kind of syntactic
ambiguity is not permitted in the direct editing for an XML document
type with GELLMU input The terminator can be a blank space, but, if
so, the blank space becomes invisibleThe following language meanings apply to regular markup with
allusion to the didactic article document type"": Nonbreaking interword space
Equivalent: nbs;, cf. nbsp; in HTML"": Superscript command
Equivalent: sup"": Subscript command
Equivalent: sub"": Dual use: tabular cells and entity introducer introduces an entity reference if it is followed
by anything other than white space It is used, as in , as
a tabular cell delineator when it is followed by white space"": Toggle inline math mode.
Equivalent:
"tmath . . . "
Nearly equivalent:
"( . . . )" or
"math . . . ".It is possible to
merge the inline math and tmath zones at any level
of processing beyond the syntactic translator (These are indeed the same in
.) The syntactic translator resists the temptation here
to go beyond syntax, and with the didactic article
document type the formatting to inserts the markup
, for a small horizontal space before and after
math, but not before and after tmath.Certain strings of plain text in regular GELLMU that are part
of legacy practice under .""A short dash as used with a range
of numbers, e.g., 12
Equivalent: rdash""A long dash as used for
punctuation, e.g., a dash like this
Equivalent: pdash" "Blank interword space
Equivalent: spc;","Small horizontal space
Equivalent: hsp;""A forced line break may be used at the end of a line of
input for a forced line break In a tabular environment
(with the didactic article document type, as in
) it begins a new tabular row Any other use is
deprecated, and will result in translation to the definedempty
element bsl corresponding to the ASCII character
with a warning from the syntactic translator.
The use of tabular is deprecated
and is not supported beyond lrc Use table,
which is not similar to s table, instead of
tabular
Equivalents: brk; for a
line break outside of a tabular environment.The syntactic translator simply outputs the SGML definedempty element
brk0, which belongs to its reserved name space The dual use of
brk0 involves some SGML chicanery that is resolved during translation
to the XML version of the article document type,
where tabular is converted to table and nontabular use of
brk0 is converted to brk.A blank line. Begin new paragraph command
Equivalent: parb Nearly equivalent: par""Left (double) quotation mark
Equivalent: ldq""Right (double) quotation mark
Equivalent: rdq"("Beginmath modecommand
Equivalent: beginmath")"Endmath modecommand
Equivalent: endmath"["Begindisplaymath modecommand
Equivalent: begindisplaymath"]"Enddisplaymath modecommand
Equivalent: enddisplaymathbeginname. . .endnameThis usage is equivalent to name. . .name:, which, in turn, is
equivalent to name...Multiple ArgumentOption SyntaxAn essential point in the present design is that the whole system
is built from components, each of which has its own
functionIn the prototype production system based on
the didactic article document type the output from each
stage is available for examination and, where necessary, intervention
However, such use of intervention is intended only for temporary
expedient use while a GELLMU system is being designed or enhanced
As with , enhancement is an ongoing process
Consistent with this design the syntactic translator operates with knowledge of syntax
but little or no knowledge of languageMultiple argumentoption syntax has been built into advanced mode as
part of the overall idea of providing, where sensible, like
features in a precise user markup interface for writing in document
types under SGML and XMLWhat are the rules for converting the multiple argumentoption syntax
in source markup into SGML Direct conversion by the syntactic translator of this
type of usage into XML is not available because such conversion
requires some language knowledge and the program does not operate with
knowledge of language at that level One obtains an XML version of a
document in the prototype production system by using a translator with
minimal knowledge of the command vocabulary to create the XML version
from an SGML version that is the immediate output of the syntactic translatorIn multiple argumentoption syntax, which is much like that of
, arguments and options follow command names Arguments are
delimited by braces, i.e., and and options
by square brackets, i.e., and There
must be no white space between the arguments and options nor between
the command name and the first member of an argumentoption sequenceEach command with a multiple argumentoption sequence is translated to
an open tag whose name is the name of the command Each argument is
translated to an ag0 element and each option to an op0
element lying in GELLMUs reserved name space There are two
exceptional casesThe first argument or option is an option inside which the very
first character is a colon, i.e., : This is the method
provided in advanced mode for the direct entry of an SGML attribute
sequence The entire contents of the option string, apart from the
leading :, which is discarded, are understood to be a
sequence of SGML attributes for the SGML element whose name is the
name of the command There is no syntax check of the attribute
contents by the syntactic translator Such an attribute option is not treated
as an op0 element In particular, an attribute option is
correctly followed immediately by a semicolon, i.e., the character
;, if and only if the corresponding SGML element is a
definedempty element under the SGML document type Since SGML
attributes correspond to very little of classical , attribute
options may be entirely ignoredThe first argument is the only argument and there are no options
apart from a possible attribute option This case, which is extremely
common, is exceptional relative to argumentoption handling
since the sole argument simply becomes element content without an
ag0 wrapperMathematicsThe Greek letter is marked, up as in , with
GammaThe didactic article document type provides mathematical
markup that is similar to that of
For example, one may use the markup
[ 2p1equiv 1 pmodp2eos]
to speak of the congruence
2p1 1 p2
In this example the markup eos is a formal
endofsentenceRegular GELLMU recognizes . followed
by two blank spaces or by a newline as endsofsentence markup when
these occur outside of mathematical contexts.In a math environment, input text abc is
set the same as the the input text a b c
following the presumption that each glyph is a separate symbol At
one point I had planned to provide for a distinction between these two
forms of input for article However, I am concerned about
author confusion; so there will be provision for having abc
be a math symbol, but an instance will be invoked as
abcIn math mode all symbols are assumed to be singlecharacters as in
A mathsym command (part of language
design that, for the moment, has no extant implementation) in the
document preamble may be used to specify that a string is to be
regarded as a single mathematical entity whenever it occurs in math
mode This contrasts with in that there is no formal
declaration of math symbolsmathsym will be a variant of newcommand,
which is a metacommand that is handled entirely by the syntactic translator with no
trace in the output However, the design of mathsym, which will
also be a metacommand but which will leave marks in the output, is
not yet decided The design issues surround how to provide the author
who is its user with fexible open means of planting semantic information
in the output without undue verbosity The syntax might be simply
mathsymsymbolnamepresentationdefintionsemanticinformationwithoutmarkup
Omitting the option could make author usage equivalent to that of
newcommand, but still trace information planted in the output
of the syntactic translator could be sufficient that in the future a derived
MathML object or a derived XML formatted object produced by
translation and viewed in
href="http:www.mozilla.org"Mozilla might
enable the reader to ascertain the symbolname and then launch
a search for other occurrences The option would serve as a mechanism
for the author to pass semantic enrichment information to downstream
formattersExamplesThe following table refers to the didatic article document
type It demonstrates how source is translated by the GELLMU Syntactic Translator to the
SGML version of article and from there is translated using
sgmlspl to the XML version of article

ll

Source

emphthis

SGML

emphthisemph

XML

emphthisemph

Source

tex;

SGML

tex

XML

tex

Source

frac23

SGML

tmathfracag02ag0ag03ag0tmath

XML

tmathfracnumr2numrdenm3denmfractmath

Source

anch[href="nil"]Nullanch:

SGML

anchop0hrefeqs"nil"op0ag0Nullag0anch

XML

anchanchrefhref="nil"anchrefanchvNullanchvanch

One may find many other examples of GELLMU markup in the
href="http:www.albany.eduhammondgellmu"project archiveNotesAlthough the metacommand newcommand, which is handled
internally by the syntactic translator, is now available, the variant metacommand
mathsym is not yet availableThis document is still at draft stageIt is itself a GELLMU document
The following versions are available:
href="http:www.albany.eduhammondgellmuigliglm.glm"sourcehref="http:www.albany.eduhammondgellmuigliglm.sgml"SGMLhref="http:www.albany.eduhammondgellmuigliglm.xml"XMLhref="http:www.albany.eduhammondgellmuigliglm.xhtml"XHTML with MathMLhref="http:www.albany.eduhammondgellmuigliglm.html"classic HTMLhref="http:www.albany.eduhammondgellmuigliglm.ltx"href="http:www.albany.eduhammondgellmuigliglm.dvi"DVI