Notes from TUG2008 in Cork: Day 2

2008-07-22

These are rough notes from a public event, and any errors or stupidity should be attributed to me and my poor note taking; I hope these notes are useful despite their obvious flaws, and everything should be double checked :)

Unicode and TeX

Arthur Reutenauer

9:05 [5 mins late]]

XeTeX does on the fly translation from UTF8 to TeX’s “legacy”
encodings.

RFC 4646 is a language naming scheme standard that covers everything;
the ISO 2 or 3 character codes don’t cover enough language
variants. Eg, the UK language could be British English or Ukranian.

xindy: UTF8 indexes

Joachim Schrod

9:30

if you create a in index, that usually means page numbers. But not
always; music pieces have names, Bibles have named sections that
matter. Ranges over structured location references. xindy allows for
this. We have a declarative style language for both declaring these
locations and for defining the output style. We have pre-made modules
for common tasks.

Perhaps the most important contribution of xindy is its theoretical
model for index creation. Something that LuaTeX could take on?

We have a set of predefined languages - even Klingon ;) - although
that’s not in Unicode! ;p - but this isn’t a very wide selection, its
euro centric (because its a community effort)

…

We have markup normalisation for the index; we made a TeX introductory
book that has “\MF” and so on instead of “\index{METAFONT@\MF}”

…

Do we need a ‘Cork’ math font encoding?

Ulrik Vieth

10:00

Returning to Cork this year, I thought about the last time, when the
Cork encoding was developed. It provided a model for more 8 bit font
encodings, supported many European languages, and started further
developments. Its complete 7 bit ASCII support was good … but some
shortcomings; didn’t follow any other standards like ISO Latin 1 or 2,
and input and output encodings were different (solved in 93/94 by
LaTeX2e and inputenc and fontenc) and created a lot of local encoding
forks (solved by TeX Gyre fonts) and left out text symbols and the
glyphs commonly available in PostScript fonts. So there was a big mess
of font encodings.

This is only resolved by moving to Unicode and OpenType fonts. The TeX
Gyre project provides a consistent implementation of many encodings,
with a root in Unicode/OpenType.

Today, TeX is transitioning again - from DVI/PS to PDF, scalable fonts
have replaced bitmap PK fonts, Unicode and OpenType are replacing 8
bit encoded fonts thanks to the new engines that are widely
available.

The 7 bit text and math fonts were developed at the same time, DEK
needed them to typeset TAOCP. 8 bit text fonts were developed by
European users for their own needs but math fonts weren’t. There are
reasons for doing them though, and the ‘Aston’ project in 1993 and
then the ‘newmath’ prototype in 1997/98,

OpenType math in MS Office 207: while we were waiting for STIX fonts,
MS added a MATH table to OpenType, and Cambria Math font is a
reference implementation.

There is acceptance of OpenType math: many concepts and idas from TeX
were adopted by Microsoft, its officially still experimental but
already a de facto standard, FontForge and XeTeX already support it,
LuaTeX is likely to follow. Its likely that OpenType Math Support will
be adopted in new TeX engines and new TeX fonts. And Unicode sorts out
the issue of ‘math font encodings’ - the issue is not developing
OpenType Math fonts.

The OpenType font format; developed by Adobe and Microsoft, its a
vendor controlled specification and isn’t really open; it has concepts
in Type1 and TrueType fonts; the table structure of TrueType; uses
Unicode encoding; advanced typographic features like glyph positioning
GPOS and glyph ….

The OpenType MATH table: Font specific global parameters, and some
have direct relations to TeX parameters, and others are
simplifications, although a few TeX parameters don’t have clear
correspondence. TeX engines can use some workarounds for that. And
glyph specific metric information.

Optical sizing is important for super/sub scripts, and METAFONTs
typically have 5/7/10pt adjusted for readability.

Challenges presented by OpenType Math fonts: the scope of the project;
a huge set of geometric symbols and alphabetical font shapes to be
designed. There are organisational issues, the font extends across
multiple Unicode planes (> 16 bits) and there are size variants and
optical sizes to be packaged in un-encoded slots. technical issues,
matching fontdimens and other TeX parameters to the MATH table, and
mapping TFMs to glyph-specific metrics, and font substitutions too.

10:30

Q: 10-20 person years put into the OpenType MATH stuff, including
Cambria implementation. They don’t claim their MATH table is generic;
its specific to Cambria, and its an ongoing and infinite task…

A: Sure

Q: You left out something in the summary: Interface issues. Its useful
to have Unicode math, and STIX fonts. But what about higher level
interfaces?

A: Sure

Three Typefaces for Mathematics

Dan Rhatigan

10:50

This is not about technology, its about design issues.

I’ve been typesetting for a long time; using a lot of core
configurations for dealing with math; as I get more into type design,
I knew I had problems with type as a compositor/designer. So I was
casting about for things to look at for this, and I found 3 case
studies that bring up different issues.

Trick things about maths? legibility in paragraphs is different to
that in equations, they combine multiple styles scripts and symbols
and the positioning and spacing is a kind of script of its own, moving
vertically and horizontally and even back and forth.

Legibility, of letters, and readability of paragraphs.

….

Here’s a hand set equation using Modern Series 7, and here’s a
machine-set equation using Times Series 569. You can see the x height
was normalised and other changes, but the big thing was the italic’s
slant was changed, 4’ to be more upright. Times had a 16’ slant which
is quite a lot.

Here’s photos of the pattern drawings, with shapes highlighted, and
overlapped, and you can really see the difference.

So Knuth also made a font of Modern Series 7, Computer Modern, and
then had an idea for a new kind of approach, a CONTRAST of style,
rather than a seamless blend. Zapf did the drawings that the typeface
was bsaed on, but there was a rich correspondence between Zapf and DEK
also, and the design pushed the boundaries of the technology it was
meant for. “An upright italic with a casual twist” that reflected the
tone of handwriting a mathematician would use. Eliminating the problem
of how to fit all the pieces together with a slanted shape. It has the
characteristics of a italic shape, though. The calligraphic forms
also help. A notion Zapf got behind was not capturing a sense of fine
formal broad-nib calligraphy, but the rough quick pen work of someone
jotting down an equation. They started with book typography but moved
away from it in the process.

The problems of making the subtleties of Zapf’s drawings come across
in the digitisation with METAFONT by a team at Stanford. Here’s photos
of the final drawings that Zapf submitted. There were subtle
modulations

The team decided to drawn the OUTLINES with METAFONT instead of a
stroke/nib skeleton/flesh model.

…

Cambria, the default Math font in Office 2007+ until more math fonts
are developed. A focus on ClearType rendering; curves that move
quickly from horizontal to vertical, avoiding large diagonal gestures
wherever possible - so things render sharp and crisp on screen with
ClearType.

Minion Math

I wanted a math font that improves over existing math fonts: something
that is very consistent (Computer Modern uses some AMS Math glyphs…)
and comprehensive and versatile (not just one width, one optical size,
one weight)

Why start with Minion? I like it. It has Greek letters and optical
sizes already. 1990 Adobe font, had Multiple Master versions,
and then Greek glyphs.

Weights: Regular-Medium-Semi bold-Bold

Optical Sizes: Display-Subhead-Regular-Caption-Tiny

In the final release, they will offer full Unicode math support, full
math alphabets, and a real Math italic. I plan to fill the Unicode
block for mathematical characters totally.

…

Consistent look, consistent metrics.

Q: legal status?

A: yes I have a legal agreement, I’m licensed to use their trademark
and to publish my font.

Cuneiform with METAFONT

Starting point for cuneiform is the basic elements, the wedges. I
didn’t scan images of clay tablets, I’ve constructed the shapes in 3
variants, Classic, Filled and Academic.

I used MetaType1 to produce Type 1 fonts, then FontForge to generate
OpenType, and I also use t1utils and others for the final result.

The MetaType1 package, was developed for the TeX Gyre project, and it
runs MetaPost (any available version) to produce EPS files with
outlines for all the glyphs, and collects the data together into one
Type1 file. The MetaPost source files describe the glyph designs, and
then additional macros are defined in a MetaType1 macro extension or
appended by the user to combine them into a font.

TODO: I wish MetaType1 would be extended to MetaOpenType to produce
OpenType directly.

Meta-Designing Parametrized Arabic Fonts For AlQalam

Ameer M Sherif

Hossam A H Fahmy

Here’s a reed pen nib, the traditional Arabic writing tool. Here’s the
Naskh style of Arabic script, written right to left, and most letters
connect - only 6 do not. And you have the same word written wider or
shorter to justify the line as you like. Its not justified by the
spaces between the words, as in Latin, but inside the words. There are
a lot of ligatures, the same letter can have a very different shape
depending on its position in a word. The 2nd and 3rd line of this
slide are images from a Arabic calligraphy handbook.

There are other styles of Arabic; like roman, italic, fraktur for
Latin. in Naskh you have a unit like an em, a scalable unit, and the
base pen nib shape is a square at 45’

A vertical stroke is not really vertical, its not just two points, but
4 points describe it well, “z1..z2..z3..z4”, a 5th point is
redundant often, although sharp bends and asymmetric strokes can
require them.

DEK used a simple set of primitives, and parametrised them to get a
large set of glyphs. We want primitives to make letters more flexible
and better connected.

We used 3 kinds of primitives:

Some are used without any modifications in many letters

Some are dynamic but change shape only a little

Some are dynamic and change a lot

There are ‘approximate’ directions in calligraphy books, where
ligatures are pretty different shapes to their component
characters. METAFONT isn’t that smart yet, to learn over time ;), so
we have to put that into the design. These are the 2nd kind above.

The 3rd kind are tricky; eg the “kashida” that doesn’t belong to one
of the two letters, its a connection between the two. OpenType is
buggy; you cannot have glyphs that change width on the fly; you have
to predefine sizes. But the line-breaking algorithm ought to tell the
font what width an Arabic character it wants.

The best OpenType fonts in Arabic, from Decotype in Holland, have a
predefined width. This will create poor connections between joined up
glyphs, but if you can have a smart font and line breaker, it will be
smooth. (?)

Urdu is totally oblique, and so you need to look at the different
Arabic writing styles for each font. Arabic is the most commonly used
script after Latin; used for about 15 languages.

Taco and Hans were asking about when Arabic letters stack up; The
baseline is the base; for combining letters, we benefit from the
declarative nature of METAFONT. The horizontal positioning starts from
the right, the vertical positioning starts from the left at the
baseline, and the writing starts from the right.

Flexing and contracting with kashidas is a matter of personal taste of
a calligrapher, so with type its something the type
designer/typographer can decide. The length of the kashida is the
length of the word, minus the minimum width of the letters.

We wrote a simple GUI for this: it reads input word(s) and parses them
into character streams, lists the chars, manually select the
letter-forms and length, then output files with selected letter-forms,
lengths and order in word(s), and finally runs METAFONT and a
DVIViewer. So we get complete words out of METAFONT using these
primitives.

We tested 16 words with 30 people on a comfort scale of 1 to 5, and
made a mean average of their opinions. We used Simplified Arabic and
Traditional Arabic, that Microsoft ship, and DecoType Nashk - said to
be the best available - and ours. We get 3.9/5, DecoType gets 3.2/5,
trad 2.4 and simple 2.3. The big difference is the kerning, and Decotype
isn’t doing a good job with the kerning right now.

Future?

We want automatic selection of the most suitable glyph shapes and
sizes.

We want contextual analysis to choose the form, and line justification
analysis to choose the size and ligatures. This will take a whole
paragraph, and process the whole thing. You won’t know the shape of
the first character of the first line until you’ve taken into account
the last character of the last line. Very complex!

We want to meta-design all possible letter forms

We want to automatically place dots and other diacritic marks.

We’re not sure if its worth modelling the ink spread and movement
speed of human calligraphers

We want to embed METAFONT sources into PDFs; if you want to re-flow
things, you need to re-justify them. So the sources of METAFONT should
be available in the PDF, and then in the PDF viewers have a METAFONT
engine to re-typeset the paragraphs. PDF viewers have an OpenType
engine, so why not a METAFONT engine? METAFONT is much much better
than the tables of OpenType.

Finally, we want to support other Arabic writing styles. We haven’t
finished this one yet, but plan to move forward

Q: Tom Milo (behind DecoType) has the ACE text layout engine as an
InDesign plug-in that uses a special font format to set text, and these
fonts can be ‘frozen’ into OpenType fonts for general use.

A:

Writing Gregg Shorthand with LaTeX and METAFONT

Gregg shorthand was made in 1888, the current version is the centennial
version, it is a simplified alphabet for phonetic writing and brief
forms and phrases. text2gregg.php at
http://www3.rz.tu-clausthal.de/~rzsjs/steno/Gregg.php shows how this
works; lets input “once upon a time there was a family that lived
happily ever after” with a proof of 23 to make it larger.

…

Gregg has a lot of ligatures, and so we need to join curved (C) and
vertical (V) strokes together - basically in 3 ways - CV VC and
CVC. We use Hermite Interpolation for Bezier Splines to do this
smoothly. …

CAVE CANEM - a Pompeii before 79AD, there is old Roman cursive, DEK,
Herout-Mikulik, Gregg and Pitman. These meta-notations or shorthand
notations are machine drawn; and do not confuse pen stenography with
machine stenographer products in the US.

There is a book “Gregg shorthand adapted to Irish”, with copy
inscribed “courtesy of john r Gregg 1930” (?)

My talk

Multidimensional Text

John Plaice

15:20

What is text? In many ways we are stuck in the typewriter age; Most
formatting systems assume input and output strongly resemble each
other; typewriter, telegraph, WYSIWYG, TeX/LaTeX, Unicode/XML

A sequence of typeset glyphs, and the characters that generate it,
there is such a resemblance.

What do we know? We need to move from one representation to another
with only the inherent complexity of each process… We need separate
input, output and internal representations (note the plural)

Chris Rowley (Kyoto 2003) wrote about this.

The solution already exists, it took a while to invent it and realise
it was already invented; AVMs, or Attribute Value Matrices. Everything
is an attribute valued list; values themselves can be AVMs. Any value
is reachable through an index (“iterator”) AKA feature structures.