******************************************************
ACTIVATING CHARACTERS FOR INTERNATIONAL USE
******************************************************
--- by Laurent Siebenmann
Forward
The following note was written in a moment of enthousiasm
in June 1990 when it appeared politically possible to get
a quick update of AmSTeX and LamSTeX to alleviate
difficuties obstructing the activation of ;:! and ?
for the needs of French language users. These hopes were
disappointed, but the problem has not gone away and I hope that
by publishing in TeXMag, I will stir up some reactions that
will in one way or another catalyse a solution. TeXhax would be
a good medium for discussion; it would be helpful to hear in
particular about the needs of other languages and of other formats.
Abstract
Over the past six months, in correspondence with Mike
Spivak and Bernard Gaulle I have been trying to sort out the
problems posed by European typography for punctuation and accents
under AmSTeX and LamSTeX (the two formats created by Spivak).
This has involved a multiplicity of problems caused by change of
category of characters from 12 (=other) to 13 (=active) and back.
Through the hurly-burly of individual problems I now perceive a
reasonable partition of responsibilities between format designers
(such as Spivak) and national user groups (such as Gutenberg, the
French group presided by Gaulle). Timely concertation among all
format designers could greatly simplify the elaboration of
national style files --- by providing a few standard low-level
macros that facilitate category change.
¤1. Problems posed by activation.
How do the problems with active characters arise? We can
illustrate simply by focusing on the semicolon, the story for
three more :!? and perhaps some others is quite similar.
The semicolon in French prose typography requires more
space before it than in English. (How much and what sort is a
matter for French typographers to decide).
There is a well known mechanism of TeX to allow this: one
assigns the semicolon category 13 (=active), for example by a
command \catcode`\;=13. The active semicolon has the
`intelligent' behavior of a TeX macro, whereas with the original
category 12 (=other) it is a `dumb character'. Thus the French
typographer can issue a command
\def;{}
to modify the behavior of the semicolon. When one leaves French
and enters another language one can either change the macro or
revert to category 12 (=other).%
% Remark: There are two other another possible solutions that
should be mentioned.
a) Since the early days of TeX, many French typists have been
trained to type a tilde before the semicolon in prose, since that
provides an unbreakable space (under essentially all formats).
Now, the tilde is a active character under essentially all
formats and its expansion can be altered to provide exactly the
desired space in case the following character is a semicolon.
This solution is typographically sound and does not require
category change of any character at all; there are TeXperts who
would consider the matter settled! However this solution is less
convenient because the tilde is *required*; typists who encounter
simpler typing of punctuation (without a tilde, as in English or
with a space instead of a tilde) are disconcerted. In everyday
matters such as punctuation, TeX owes the typist the most
ergonomic solution; the recommended solution above would normally
behave well whether or not the typist explicitly indicates space
before the semicolon.
b) The generalised kerns and ligatures of TeX3.0 may offer a
solution since the concepts are very powerful. However, one
wants a portable solution, so I would not recommend using a
special collection of virtual fonts for the job. Further, if one
wants some stretch in the space preceeding the semicolon, then I
fail to see how these new features will help. (The stretch is
definitely there for the colon as used in Le Monde.) Nevertheless
this approach should be kept in mind.
In the beginning there was Plain TeX. Under Plain the
activation approach works in a perfectly straightforward way. And
there is not the slightest need to alter the situation.
Under AmSTeX and LamSTeX, as they officially stand today,
change of category of the semicolon causes problems. I presume
always that a standard (unadulterated) format is used, built upon
standard Plain TeX. A criterion of portability for .tex files
dictates this, see paragraph ¤3 below. My contention will
ultimately be that these and other formats should be coherently
revised to be essentially as flexible as Plain TeX, see ¤2. There
are basically two problems:
1) The semicolon appears explicitly in many macro
expansions of AmSTeX and LamSTeX. This means the semicolon of
category 12. Recall that TeX permanently assigns a category code
among 0,...,16 to each character as it is being read in (see
TeXbook Chap~8). Some of these should remain in category 12, for
example semicolons in error messages. Others should switch to
category 13 for as long as the user makes the semicolon active,
for example semicolons to be printed as such.
This first problem is a clear sign that the second author
was blissfully unaware of European needs while writing these
formats!
Fortunately I have a very simple revision of the formats
that provides a remedy. One can define a private macro
\semicolon@ and put it in place of the semicolons that should
change category. Initially, one gives a definition
\let\semicolon@=; (while the semicolon has category 12). Just
after the user switches the semicolon category to 13, he should
directly or indirectly reiterate \let\semicolon@=; to make the
category value 13, and inversely. Recall that, internally,
characters are tagged by their category code so that in a very
real sense category 13 and category 12 semicolons are distinct
characters that exist simultaneously.
2) The semicolon appears explicitly in the syntax
surrounding some macros of LamSTeX. For example, in LamSTeX,
\cgaps{3;2;2.5} sets the first three column gaps in a commutative
diagram in terms of a standard gap. Under the original standard
LamsTeX, the semicolon really means the category 12 semicolon.
Such macros will fail when the semicolon is assigned category 13.
In such cases I again propose a change of the format. At
the time when semicolon gets category 13, we roughly speaking
reiterate relevant definitions involving the semicolon in their
syntax. For efficiency, a number of macro definitions are
rearranged so that those explicitly involving the semicolon in
their syntax are extremely short.
This (incomplete) discussion for the semicolon is
illustrative for all four punctuation marks (;:!?). For a more
thorough discussion, see the technical report []. See also
[GPAMS]: Germans often make the " (double-quote) character active
and use it to provide the umluat accent; this involves a special
little problem since the " character is used by TeX itself for
indicating hexadecimal numbers.
I should add some comments concerning auxiliary macro
files; they pose the same sort of difficulties. Those macro files
that are official adjuncts of a major format will hopefully be
carefully revised by the wizards in charge of the format.
When it comes to the innumerable unsupported style files,
the casual TeXpert may find himself suddenly responsible for
adapting them; in that case, there are a couple of nuances:
(a) It may be necessary to quickly hack together a revised
version of a style file.
(b) It may be desirable to produce an auxiliary file rather than
alter the style file.
Here is some advice: To begin, look carefully through the macro
file for any definition involving a character whose category-code
you have to change. When the category change causes trouble at
this point, an almost universal remedy is to *restate* the
definition in an appropriate category-code environment. This may
cost a very great deal of space but it is usually quickly
arranged. Further the restatements can be put in an auxiliary
file. Punctuation to be printed to the screen can be preceded by
\string, which in effect forces category 12.
¤2. An inter-format standard for category change?
One of the major strengths of elaborate formats is their
ability to shield the enduser, and even the typographer, from the
intricate inner workings of TeX. What I have said up to this
point is still unsatisfactory inasmuch as some detailed knowledge
of the workings of AmSTeX and LamSTeX appears to be required.
This requirement can be eliminated by installing a high
level interface to category change. I propose to define standard
public macros in the next updates of AmSTeX and LamSTeX:
\semicolonactive \semicolonother
\colonactive \colonother
\exclamationmarkactive \exclamationmarkother
\questionmarkactive \questionmarkother
\Quoteactive \Quoteother
...(and some others? cf. ¤4)
--- whose job is to make the extra format-specific
modifications that are necessary when the category-code changes
envisaged above are made.
For reasons that may become clearer in ¤3, I hope that
each format will define these macros. Or at least those
necessary in their format.
As the use of TeX evolves and spreads, more characters
may be added to this list. (The above list is based on current
French and German needs.)
Then for example, to adapt AmSTeX to the needs of
the Terpenty Coast Republic, where just the semicolon has to be
active, the national TeX users' group would sponsor a national
style file terp.sty whose contents might be roughly as follows.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% terp.sty (Terpenty Coast national style file)
%% July 1990 (alpha version)
%%
%% Hyphenation: supposes TeX version 3
\ifx\terpentine\undefined
\newlanguage\terpentine
{\language=\terpentine
\input terp.hyph
% file of form \patterns{...}\hyphenation{...}
% that establishes Terpentine hyphenation
% to be used when language is Terpentine
}
\fi
%% Punctuation:
\ifx\semicolonactive\undefined % true for Plain
\gdef\semicolonactive{\relax}\fi
\ifx\semicolonother\undefined % true for Plain
\gdef\semicolonother{\relax}\fi
\def\terppunct{\catcode`\,=13
\def,{}
\semicolonactive}
\def\nontreppunct{\catcode`\,=12
\semicolonother}
\def \terps{\language=\terpentine\terppunct}
\def \noterps{\language=0\nontreppunct}
\endinput
If Terpentine hyphenation has already been installed (for
language number \terpentine) nothing happens in the hyphenation
part.
But otherwise there is an attempt to install it. Because
of the (implicit) use of the \patern primitive, it will then
normally be necessary to process this file using INITEX the
enriched initialisation version of the TeX program. At the prompt
** of INITEX, one could type.
**&amstex \input terp.sty \dump
This assumes amstex.fmt is an available precompiled format, and
it quickly produces a new precompiled format, Tpamstex.fmt say.
Once terp.sty has been input, TeX will respond to the
command \terps by providing national hyphenation and punctuation
for Terpenty Coast nationals and to \noterps by returning to
Knuth's punctuation and English hyphenation. These two are the
only macros above that the everyday user in Terpenty will employ.
(Many members of the Terpentine TeX users group feel that
the national features offered by this file are woefully
incomplete and an ad hoc group is exploring numerous
elaborations.)
Note that, this file contains no specific reference to
the format used. All that is required is that the macros from my
proposed list \semicolonactive and \semicolonother be suitably
defined, i.e. so as to prevent undesirable side-effects when the
comma switches category respectively to 13 and back to 12.
For other nationalities, the national style file could, I
hope be similar in structure (so far as punctuation is
concerned), and similarly independent of format.
Thus the above file will work as described with AmSTeX
and LamSTeX and hopefully with any other format if and when the
the changes I propose are implemented.
As a stop-gap measure while awaiting that happy time, I
have provided [], for both AmSTeX and LamSTeX, suitable
*external* definitions of all the macros in the list (*). Such
external definitions would have to be input before the national
style file. While the revisions for AmSTeX and LamSTeX proposed
to define (*) cost negligible space, (a couple of K octets in
all), these external definitions are ugly and bulky, in all about
6K for AmSTeX and 14K for LamSTeX. Retrofit is costly!
To give the flavor, here is the definition of
\semicolonactive as it is currently proposed for a revision of
LamSTeX.
{% group to localize category changes
\catcode`\;=\active@ % makes ; active
\catcode`\@=11 % makes @ a letter
\gdef\semicolonactive{%
%definition is global but effect is local
\let\semicolon@=;% see problem (1) above
\let\ds@\ds@active % see problem (2) above
% ... plus five similar lets using in place of \ds
% the following: \dtX, \dtY, \cgaps, \rgaps, \gaps@
} % end of the definition of \semicolonactive
} % this ends the group with special local catcodes
The new macro \ds@active is defined in revised LamSTeX by
\def\ds@active(#1,#2){\ds@@{#1}{#2}}
where ds@@ is essentially a preexisting 1989 LamsTeX macro.
The definition of \semicolonother for (revised) LamSTeX
is entirely similar but `other' replaces `active'.
Since the semicolon is the most troublesome of ;:!? this
gives a petty honest overview of the whole solution, and
indicates that it is reasonably tidy.
My fondest hope is that the approach to the activation of
characters that I have sketched for AmSTeX and LamSTeX will prove
suitable for other formats and that a consensus among all formats
will be possible.
¤3. Criteria for implementing national styles.
One main purpose of the interformat macros (*) we are
proposing is to facilitate the construction of national style
files.
There are almost too many ways to go about the task of
implementing a national style. But various natural criteria I
list below happily narrow down the choices, and make a consensus
more likely. Hopefully, my proposals above for related category
changes are in harmony with all of them.
I am indebted to Bernard Gaulle president of GUTenberg,
the French users group, for mentioning several criteria, on the
basis of his experience in writing a provisional French style
file for LaTeX (see []).
1) Portability.
It is vital that one be able to send an article of a
given language anywhere in the world and have it printed at its
destination with all its national style features intact. At a
stroke this precludes national styles based on a revision of one
or more formats.
The first functional francisation of Plain TeX was based
on a modification of Plain TeX done in Strasbourg by Desarmenien
et al in th mid eighties. A marvel in its time, but
non-portable. Portability has been greatly assisted by the
multilingual hyphenation in version 3 of TeX (and of MLTeX before
it). The tricky bilingual hyphenation table of Desarmenien of
42K octets can now be replaced by a French hyphenation file of
about 5K (also by Desarmenien). Hopefully, French articles will
soon be able to move around the world with less than 10K of extra
baggage --- in the form of a a hyphenation file plus a style
file.
Since, in many computer centers, standard formats are
most easily available as precompiled binary formats, it is
desirable that any national style file be loadable after the
standard format. This strongly influences design. So far as
possible, it should not matter when the style file is loaded.
The need to use INITEX, the unfamiliar initialisation
version of TeX, to input any file that uses \patterns is a
regrettable inconvenience that is sure to scare off many users
wishing to exploit national styles beyond their national
frontiers. Fortunately, some implementations of TeX assimilate
the functions of INITEX in TeX itself, for example, Textures on
the Macintosh.
2) A clear division of responsibilities between format designers
and implementors of national styles.
TeX is used for many languages and TeX version 3 is
expecting to be used for many more. It is unreasonable to expect
format designers to go on writing macros specific to national
groups. It is equally unreasonably to ask national groups to
cede such responsibilities to format designers. Instead, format
designers should provide the low-level tools --- for example
the macros (*) --- to allow local texperts to independently
implement their national styles. A couple slogans are apt:
Ma\^\itres chez nous! (Ren\'e Levesque, Parti Qu\'ebecois)
Give us the tools and we'll do the job. (an Alglo Americain)
3) Independence of format.
One national style file should apply to many (hopefully
all!) formats. It is difficult to decide where to draw a line in
elaborating a national style file. This criterion may give
strong hints. German.sty for LaTeX by H. Partl et al. is one
style that is notable for having applied to both LateX and
Plain; with my approach it could apply to AmSTeX and LamSTeX
too.
4) Mutual compatibility (two facets)
Language independence for national style file design.
There are some very international people and organizations, and
for their sake, it would be nice if, once you understand one
sufficiently complicated national style file you understand them
all.
Multilingual works such as conference proceedings can
greatly benefit from mutual functional compatibility of all the
language style files involved. This means that their commands can
be mixed in one typescript.
5) Simplicity.
Since problems will fatally arise, simplicity should be
preserved to give ordinary mortals half a chance to solve then!
¤4. Concluding remarks.
My concrete proposal, namely that the macros (*)
be defined by any format in which activation of the characters
;:!?" causes problems that do not occur in Plain TeX, is a very
conservative one. Indeed, it is nothing but an orderly return to
the liberties available in Plain TeX! While it is clearly
motivated by French and German needs, all nationalities are put
on an equal footing.
Active punctuation has been used by Knuth to implement
what is called *hanging punctuation*, see TeXbook, Appendix D.
This is the practice of letting punctuation protrude into the
margin (on the grounds that this produces more aesthetic
allignments). There is sufficient support for this practice that
`what you see is what you get' wordprocessing on the Macintosh
microcomputers will shortly offer this feature.
This hanging punctuation brings home two points:
(i) Active punctuation can be is of interest for all
languages; indeeds for matters that are not language specific.
(ii) The list of characters whose activation aught to be
facilitated by macros (*) should probably be extended to provide
for hanging punctuation (and perhaps other applications).
Specifically,
\periodactive \periodother
\commaactive \commaother
\lquoteactive \lquoteother
\rquoteactive \rquoteother
are desirable additions. As Knuth observes, the comma and the
period are particularly awkward (even under Plain!) because the
the period (alternatively comma) is used in specification of
dimensions, as in \vskip=3.5truein ; when . is made active, one
is seemingly obliged to type something like \vskip=3\pnt5truein
instead, where \pnt is defined to be the category 12 period.
Hopefully, each format designer will in future
document the freedoms and constraints that apply to activation of
characters under his format, in particular for those in the list
";:!?,.`'
This note has not mentioned the new possibility, under
TeX version 3, of exploiting extended ASCII codes 128-255. French
punctuation does not seem to benefit directly. German accents can
certainly be handled by using them; but an optimized seven bit
classic ASCII standard will remain useful for information
exchange, notably by email. My experience does suggest that the
nascent eight bit TeX standard (or standards?) for codes 128-255
should include conventions for category and category change.
How difficult are the macros (*) to implement for other
formats? (The double quote " and the semicolon ; proved somewhat
painful in LamsTeX, but the rest were easy.) Do the macros (*)
really give the best available solution to the ``activation
problem''? Which characters should one be able to make active for
other languages? (I.e. is the list ";:!?,.`' adequate?)
I solicit comments on the macros (*) from readers so that
the best consensus will crystallize.
REFERENCES
= L. Siebenmann, LamSTeX un nouveau formateur de M. Spivak, Cahiers Gutenberg,
no 6, Juillet, 1990, pages 25-33.
--- FPAMS.TEX 17K
--- FPLAMS.TEX 27K
--- GPAMS.TEX 11K
These three files were written at my request by Mike Spivak in
spring 1990. They are available by email or by ftp
130.84.128.100 alias rsovax.circe.fr; Login: anonymous; pwd:
anything; directory: [anonymous.siebenmann]
Laurent Siebenmann
Mathematique, Bat. 425,
Univ de Paris-Sud,
91405-Orsay,
France
lcs@matups.matups.fr
LS@FrMaP711.bitnet (weekends)
siebenmann@LALCLS.decnet.cern.ch (reliable)
Fax number: 33-1-6941-6221
RELEVANT ADDRESSES
1) Mike Spivak, Texplorators, 1703 W. Alabama, Suite 450-273,
PO Box 27703-273, Houston Texas, 77027.
2) Rainer Schoepf, Konrad-Zuse-Zentrum fuer
Informationstechnik Berlin, Heilbronner Strasse 10
D-1000 Berlin 31, Germany
or .
Rainer Schoepf adapted AmSTeX to LateX for the AMS.
3) Michael J. Downes, Amer. Math. Soc., 201 Charles Street,
Providence RI02904, USA.
Mike Downes has recently worked on AmSTeX.