WRT today's teleconference:
What follows is very very brief. If people want more details or have
specific questions, please let me know.
African languages fall into four categories:
1) languages supported by unicode.
E.g. Hausa and Pulaar (using Latin script).
2) languages supported by unicode, but require additional support in
rendering systems.
E.g. Yoruba, Ife, Dinka, Nuer, etc.
This can include correct placement of combining diacritics based on
languages' typographic conventions, or stacking of combining diacritics.
Ife offers a challenging example.
Some notes under construction that may illustrate some of the issues:
http://www.openroad.net.au/languages/african/ife-2.htmlhttp://www.openroad.net.au/languages/african/dinka-4.html
This is an issue for font rendering technologies (AAT/ATSUI, Uniscribe
and Graphite for example). OpenType has features (e.g. MarkToBase,
MarkToMark) that are designed for correct positioning of combining
diacritics. Support for this in Uniscribe is currently under
development. (Not sure of the status of AAT/ATSUI in this regard).
In some cases: (Dinka and Nuer for instance) the existing combining
diacritics for some fonts are adequate for lowercase characters (but not
optimal), although entirely unsuitable for uppercase characters. In
other cases like Ife, where diacritic stacking is required, it is a
crucial concern which will be alleviated when the new versions of the
font rendering technologies become widespread.
Additionally, African languages use alternative glyphs for certain
characters (most common example is uppercase ENG). It is possible to
create alternative glyphs for different languages/typographic traditions
within an opentype font. Unfortunately current software is unable to
interact sufficiently with the font rendering systems to allow use of
langauge specific features within fonts.
At least thats my current understanding.
3) languages that have some characters that are not present in Unicode.
E.g. Dagera (Burkina Faso), Hausa/Pulaar/etc. in Ajami (Arabic script).
There has been a fair amount of discussion recently on Ajami on the
Unicode-Afrique, A12N Collaboration and H-Hausa mailing lists.
4) scripts currently not supported by Unicode.
E.g N'ko, Vai, Tifinagh, etc.
With respect to HTML, issues are how to identify languages when there is
no ISO-639-1 code or IANA language code. How should the "x-" convention
be used in practical settings?
For an example:
http://home.vicnet.net.au/~andrewc/samples/nuer.htm
I've use a convention "x-sil-" to indicate an ethnologue language codes.
Although thats neither here nore there.
Other key issues include charset identification in the absence of
"defined" character encodings.
A useful starting point is the "A12N gateway" http://www.bisharat.net/A12N/
Andrew
Andrew Cunningham
Multilingual Technical Officer
OPT, Vicnet,
State Library of Victoria
Australia
andrewc@vicnet.net.au