Thursday, 14 November 2013

For years I have bemoaned the incomplete and broken implementation of script-specific font configuration in Internet Explorer. The ability to manually configure what font to use for what Unicode script is a killer feature for me, and something that in my opinion should make Internet Explorer vastly superior to Chrome, which does not allow the user to choose what font to use by default for particular Unicode scripts (in the absense of a font being explicitly specified by the page being read). For multilingual users, especially those who work with more obscure scripts and languages, I find that Internet Explorer generally provides a much better experience, with fewer annoying little boxes for unsupported characters. (I have had bad experiences with Firefox in the past, but reinstalled it for this blog post and was pleasantly surprised by its multiscript support, which is much better than I remember.)

Tag Cloud for the BabelStone Blog as viewed with Internet Explorer 10

Tag Cloud for the BabelStone Blog as viewed with Chrome 30

Tag Cloud for the BabelStone Blog as viewed with Firefox 25

IE6 through IE10 support font configuration for 37 languages or scripts. (This is a little better than Firefox 25 which allows font configuration for 32 languages or regions.)

Configurable Languages in IE10 under Windows 7

Language/Script

Scripts

UnicodeVersion

Fonts listed

Arabic

1.0

(various)

Armenian

1.0

Arial Unicode MSSylfaenTahoma

Bengali

1.0

Arial Unicode MSShona BanglaVrinda

Braille

3.0

Segoe UI Symbol

Canadian Syllabic

3.0

Euphemia

Cherokee

3.0

Plantagenet Cherokee

Chinese Simplified

Han

1.0

(various)

Chinese Traditional

Han and Bopomofo

1.0

(various)

Cyrillic

1.0

(various)

Devanagari

1.0

AparajitaArial Unicode MSKokilaMangalUtsaah

Ethiopic

3.0

Nyala

Georgian

1.0

Arial Unicode MSSylfaen

Greek

1.0

(various)

Gujarati

1.0

Arial Unicode MSShruti

Gurmukhi

1.0

Arial Unicode MSRaavi

Hebrew

1.0

(various)

Japanese

Han and Hiragana/Katakana

1.0

(various)

Kannada

1.0

Arial Unicode MSTunga

Khmer

3.0

DaunPenhKhmer UIMoolBoran

Korean

Han and Hangul

1.0

(various)

Lao

1.0

Arial Unicode MSDokChampaLao UI

Latin based

1.0

(various)

Malayalam

1.0

Arial Unicode MSKartika

Mongolian

3.0

Myanmar

3.0

Ogham

3.0

Segoe UI Symbol

Oriya

1.0

Arial Unicode MSKalinga

Runic

3.0

Segoe UI Symbol

Sinhala

3.0

Iskoola Pota

Syriac

3.0

Estrangelo Edessa

Tamil

1.0

Arial Unicode MSLathaVijaya

Telugu

1.0

Arial Unicode MSGautamiVani

Thaana

3.0

MV Boli

Thai

1.0

(various)

Tibetan

2.0

Arial Unicode MSMicrosoft Himalaya

User Defined

(various)

Yi

3.0

Microsoft Yi Baiti

As you can see, this list does not include any languages with Unicode scripts introduced later than Unicode version 3.0, which was released in September 1999, but it does include all Unicode scripts available in Unicode 3.0 (Bopomofo is presumably subsumed within Chinese Traditional). When IE6 was released in August 2001 this list was pretty much up to date, and only lacked three scripts added in Unicode 3.1 (Deseret, Gothic and Old Italic), which had been released in March 2001, after IE6 had gone beta.

This was a great start, and suggested that IE was going to provide cutting-edge support for Unicode scripts as they were encoded. However, it seems that no-one took ownership of this feature, and it was left to languish for the next twelve years. When IE10 was released in August 2012, thirteen years after Unicode 3.0, it still only allowed font configuration for the original list of 37 languages.

At the same time as no-one was updating the font configuration feature for the 62 new scripts that were added to Unicode between 3.1 and 6.1 (released in January 2012), no-one was fixing any bugs with the the font configuration feature. As discussed in Michael Kaplan's blog post, The importance of Tagalog to Burmese, aka "Of course I'd lie to you, I'm a font!" (18 April 2008), the main bugs in the feature are due to the way that IE populates the list of fonts for each language. It lists those fonts that: a) have the appropriate Unicode Subset Bitfield bit set; and b) which also have a mapping to a sample Unicode character for the script. Unfortunately, in the case of Myanmar (Burmese), the sample character used is U+1700 ᜀ TAGALOG LETTER A, which is a character from the historic Philippine script Tagalog (Baybayin) which was encoded in Unicode 3.2. The reason for this mistake is that the list of Unicode 3.0 sample characters used by IE was based on draft code charts, and the Myanmar script was relocated from its original proposed location starting at U+1700 to a new location starting at U+1000 when it was actually encoded. This means that no Myanmar font will show up on the list of Myanmar fonts unless it redundantly includes a mapping to the Tagalog character at U+1700. In the case of Mongolian, no sample character is listed at all, which is even worse than the situation for Myanmar as no font ever passes the test for supporting Mongolian, and so although Microsoft has shipped a Mongolian font ("Microsoft Baiti") since Windows Vista, this font does not show up on the list of Mongolian fonts for IE10 and earlier.

Mongolian Font Configuration Dialog in IE10 under Windows 7

No fonts listed even though Windows 7 ships with the "Mongolian Baiti" font.

Myanmar Font Configuration Dialog in IE10 under Windows 7

"Noto Sans Tagalog" is listed although it does not cover Myanmar; and Martin Hosken's Padauk fonts for Myanmar are listed only because they deliberately includes a dotted circle glyph mapped to U+1700.

When Internet Explorer 11 installed itself on my laptop recently, the first thing I did was check the Font configuration setting, as I did with IE7 and IE8 and IE9 and IE10 when they first appeared, but as no changes had been made for IE7 through IE10 I was not expecting anything new from IE11. Imagine my suprise then, when I opened the font configuration dialog and discovered that the list of languages has been expanded from 37 to 55. That seems like one big step forward!

Configurable Languages in IE11 under Windows 7

Language/Script

UnicodeVersion

Fonts listed

Arabic

1.0

(various)

Armenian

1.0

Arial Unicode MSSylfaenTahoma

Bengali

1.0

Arial Unicode MSShona BanglaVrinda

Bopomofo

1.0

Microsoft JhengHei

Braille

3.0

Segoe UI Symbol

Buginese

4.1

Leelawadee UI

Canadian Syllabic

3.0

Euphemia

Cherokee

3.0

Plantagenet Cherokee

Chinese Simplified

1.0

(various)

Chinese Traditional

1.0

(various)

Coptic

4.1

Segoe UI Symbol

Cyrillic

1.0

(various)

Deseret

3.1

Segoe UI Symbol

Devanagari

1.0

AparajitaArial Unicode MSKokilaMangalUtsaah

Ethiopic

3.0

Nyala

Georgian

1.0

Arial Unicode MSSylfaen

Glagolitic

4.1

Segoe UI Symbol

Gothic

3.1

Segoe UI Symbol

Greek

1.0

(various)

Gujarati

1.0

Arial Unicode MSShruti

Gurmukhi

1.0

Arial Unicode MSRaavi

Hebrew

1.0

(various)

Japanese

1.0

(various)

Javanese

5.2

Javanese Text*

Kannada

1.0

Arial Unicode MSTunga

Khmer

3.0

DaunPenhKhmer UIMoolBoran

Korean

1.0

(various)

Lao

1.0

Arial Unicode MSDokChampaLao UI

Latin based

1.0

(various)

Malayalam

1.0

Arial Unicode MSKartika

Mongolian

3.0

Mongolian Baiti

Myanmar

3.0

Myanmar Text*

New Tai Lue

4.1

Microsoft New Tai Lue

N'Ko

5.0

Ebrima

Ogham

3.0

Segoe UI Symbol

Ol Chiki

5.1

Nirmala UI

Old Italic

3.1

Segoe UI Symbol

Old Turkic

5.2

Segoe UI Symbol

Oriya

1.0

Arial Unicode MSKalinga

Osmanya

4.0

Ebrima

Phags-pa

5.0

Microsoft PhagsPa

Runic

3.0

Segoe UI Symbol

Sinhala

3.0

Iskoola Pota

Sora Sompeng

6.1

Nirmala UI*

Syriac

3.0

Estrangelo Edessa

Tai Le

4.0

Microsoft Tai Le

Tamil

1.0

Arial Unicode MSLathaVijaya

Telugu

1.0

Arial Unicode MSGautamiVani

Thaana

3.0

MV Boli

Thai

1.0

(various)

Tibetan

2.0

Arial Unicode MSMicrosoft Himalaya

Tifinagh

4.1

Ebrima

User Defined

(various)

Vai

5.1

Ebrima

Yi

3.0

Microsoft Yi Baiti

* Listed in IE11 under Windows 7 although not actually installed on Windows 7.

This is an impressive list, but a little odd. The list does not include all scripts added since Unicode 3.0, but only a selection of scripts added in Unicode versions 3.1 (March 2001), 4.0 (April 2003), 4.1 (March 2005), 5.0 (July 2006), 5.1 (April 2008), 5.2 (October 2009), and 6.1 (January 2012). In fact the list excludes some 47 scripts added to Unicode between 4.0 and 6.1:

Avestan (Unicode 5.2)

Balinese (Unicode 5.0)

Bamum (Unicode 5.2)

Batak (Unicode 6.0)

Brahmi (Unicode 6.0)

Buhid (Unicode 3.2)

Carian (Unicode 5.1)

Chakma (Unicode 6.1)

Cham (Unicode 5.1)

Cuneiform (Unicode 5.0)

Cypriot (Unicode 4.0)

Egyptian Hieroglyphs (Unicode 5.2)

Hanunoo (Unicode 3.2)

Imperial Aramaic (Unicode 5.2)

Inscriptional Pahlavi (Unicode 5.2)

Inscriptional Parthian (Unicode 5.2)

Kaithi (Unicode 5.2)

Kayah Li (Unicode 5.1)

Kharoshthi (Unicode 4.1)

Lepcha (Unicode 5.1)

Limbu (Unicode 4.0)

Linear B (Unicode 4.0)

Lisu (Unicode 5.2)

Lycian (Unicode 5.1)

Lydian (Unicode 5.1)

Mandaic (Unicode 6.0)

Meetei Mayek (Unicode 5.2)

Meroitic Cursive (Unicode 6.1)

Meroitic Hieroglyphs (Unicode 6.1)

Miao (Unicode 6.1)

Old Persian (Unicode 4.1)

Old South Arabian (Unicode 5.2)

Old Turkic (Unicode 5.2)

Phoenician (Unicode 5.0)

Rejang (Unicode 5.1)

Samaritan (Unicode 5.2)

Saurashtra (Unicode 5.1)

Sharada (Unicode 6.1)

Shavian (Unicode 4.0)

Sundanese (Unicode 5.1)

Syloti Nagri (Unicode 4.1)

Tagalog (Unicode 3.2)

Tagbanwa (Unicode 3.2)

Tai Tham (Unicode 5.2)

Tai Viet (Unicode 5.2)

Takri (Unicode 6.1)

Ugaritic (Unicode 4.0)

Why exclude these 47 scripts? Well, the answer is that they are all scripts for which Microsoft does not currently support at the font level. So it seems that the Microsoft thinking is that users should only be allowed to configure what font to use for what script if Microsoft provides a font for that script. If Microsoft does not currently provide a font for a particular script, but you have third party fonts installed that cover that script, then hard luck. I have to say that this is a very disappointing attitude, and makes it very frustrating for users like myself who are immensely grateful to Microsoft for supporting minor scripts such as Mongolian, Phags-pa, Tibetan, Yi, etc. but who also wish to use scripts for which Microsoft does not yet provide support.

What about the Myanmar and Mongolian bugs? Finally fixed (or at least, so it seems) – another step forward!

Mongolian Font Configuration Dialog in IE11 under Windows 7

"Mongolian Baiti" font is finally listed.

Myanmar Font Configuration Dialog in IE11 under Windows 7

Microsoft's "Myanmar Text" font is listed, but so is "Noto Sans Tagalog"!

Hmm, something's not right here.

Firstly, the Myanmar configuration lists the "Myanmar Text" font, but the sample just shows boxes. Wait a minute, I don't have the "Myanmar Text" font installed on my Windows 7 laptop, because that font only ships with Windows 8 and later. And for that matter, I don't have the "Nirmala UI" font listed for Sora Sompeng or the "Javanese Text" font listed for Javanese either.

Secondly, the Myanmar configuration still lists the "Noto Sans Tagalog" font even that font has not a single Myanmar character in it. A little experiment shows that when U+1700 is removed from the Padauk font it is no longer listed under Myanmar in IE11. So it seems like the Myanmar bug has not been fixed at all, but the dialog has simply been hard-coded to statically include the "Myanmar Text" font in addition to fonts that are dynamically (but still incorrectly) enumerated.

Thirdly, although the Mongolian dialog now lists Microsoft's "Mongolian Baiti" font, it does not list any of the several other third-party Unicode Mongolian fonts installed on my system. I suspect that the Mongolian bug has not been fixed at all, but the dialog has simply been hard-coded to show the "Mongolian Baiti" font. I have a sinking feeling about this. Let's take a look at Phags-pa, as I recently and belatedly updated my Phags-pa fonts to work under Windows 7+. Will they be listed?

As I thought, only Microsoft's Phags-pa font is listed. My Phags-pa fonts are not listed even though they set the appropriate Unicode Subset Bitfield bit and cover all Phags-pa characters. However, my "BabelStone Phags-pa Book" font is listed under Latin based and User Defined, so it is not getting entirely ignored by IE11, only ignored for the specific script that it is designed for use with.

After a little investigation, it becomes clear that none of the eighteen new IE11 font configuration dialogs (for Bopomofo, Buginese, Coptic, Deseret, Glagolitic, Gothic, Javanese, New Tai Lue, N'Ko, Ol Chiki, Old Italic, Old Turkic, Osmanya, Phags-pa, Sora Sompeng, Tai Le, Tifinagh, Vai) list any installed third-party fonts that cover the particular script. Furthermore, all eighteen dialogs only list a single font, even in the case of Bopomofo which is covered by more than ten Microsoft fonts in Windows 7, so no choice of font is possible. The inescapable conclusion is that the eighteen new font configuration dialogs in IE11 (and also the dialog for Mongolian) simply list a single hard-coded Microsoft font for each script (even if the listed font is not installed on the system), giving the user absolutely no choice whatsoever over font configuration for these scripts. In other words, the IE11 changes to font configuration are a facade thinly disguising a fake implementation. Who in Microsoft, I wonder, decided that a fake implementation that gives the user no choice (not even Hobson's choice as you cannot not select the proffered Microsoft font) was in any way better than not having the font configuration dialogs for these scripts?

So what initially looked like two steps forward turns out to have been an illusion, a cheap conjurer's trick, and in fact IE11 is not one iota better than IE6 was at allowing the user to configure what fonts to use for what scripts. Twelve years on and zero progress.

Postscript A

Does font configuration even work for scripts that have more than one font listed? Not always, at least not for Tibetan. The Tibetan configuration dialog allows you to choose between the "Arial Unicode MS" font (which has glyphs for Tibetan characters but has no shaping behaviour so combining vowels signs are rendered as spacing marks) and the "Microsoft Himalaya" font (which fully supports Tibetan shaping behaviour), but if you choose "Arial Unicode MS" (not a good choice, but if you offer the user a choice they should be free to make a bad choice) then a web page with unstyled Tibetan text will be rendered with "Microsoft Himalaya".

Tibetan Font Configuration Dialog in IE11 under Windows 7

"Arial Unicode MS" font is listed, but the font in the preview can't be "Arial Unicode MS" as it does not do joined-up Tibetan.

In fact, if you install a good third-party Unicode Tibetan font such as Chris Fynn's Jomolhari, it will be listed in the Tibetan font configuration dialog, but if you select it you will still only ever see "Microsoft Himalaya" used to render unstyled Tibetan text on web pages. So what's the point?

Postscript B

In the Phags-pa font configuration dialog shown above the sample Phags-pa text is ꡏꡡꡋꡂꡡꡙ ꡢꡠꡙꡠmongol qele, which is a brave but flawed attempt to render Mongolian ᠮᠣᠨᠭᠭᠣᠯ ᠬᠡᠯᠡmongɣol kele "Mongolian language" in the Phags-pa script. It is wrong on several counts:

In the Phags-pa script a space is always used to separate syllables not words, so there should be three spaces not one;

Mongolian ng should be represented by the single Phags-pa letter ꡃnga;

Mongolian ɣ is normally represented using the Phags-pa letter ꡢqa;

Mongolian k is normally represented using the Phags-pa letter ꡁkha;

Mongolian e would probably be represented using the Phags-pa letter ꡦee here (Phags-pa script has two flavours of e, and although the Phags-pa spelling of Mongolian kele is not attested, by analogy with other Phags-pa Mongolian words ee would be expected).

It is unfortunate that this flawed spelling was chosen as someone at Microsoft asked me for the autonym for Mongolian in Phags-pa script in 2011, and I suggested ꡏꡡꡃ ꡢꡡꡙ ꡁꡦ ꡙꡦmong qol khė lė which I believe to be much more authentic. Mind you, as the Phags-pa script was specifically devised to be used for writing multiple languages, and during the Yuan dynasty was used for writing Chinese at least as much as for writing Mongolian, as well as for Sanskrit, Tibetan and Uyghur, in my opinion choosing "Mongolian language" as the sample text for Phags-pa is not quite right anyway.