Thursday, 23 February 2006

Michael Kaplan thinks that stacking diacritics up to the ceiling and down to the basement is really cool. I think so too, and was disappointed to find that it doesn't work with the current release of BabelPad. Well, with a couple of tweaks (allowing large line spacing values and centring the output vertically within the space between the previous and next lines), I've got stacking diacritics to display correctly in BabelPad, as you can see from the following screenshots of the letter a with 72 combining diacritics (33 above and 39 below). In both cases the font is Doulos SIL at 24 points, but the first screenshot shows what you get if you turn off Uniscribe and render everything as spacing characters, whilst the second one shows monumental stacking when you turn on Uniscribe (version 1.420.2600.2180 on my computer) and set BabelPad's line spacing to 12.0.

Unfortunately the new improved version of BabelPad used for these screenshots won't be coming out until May. It was scheduled for release at the end of March, as soon as Unicode 5.0 is released, but we now hear that the release of Unicode 5.0 is being delayed until May. As my working versions of BabelPad and BabelMap have already been upgraded internally to support 5.0 I can't release them until after 5.0 is out. On the one hand, this is rather annoying, as there are lots of bug fixes and improvements that I want to release as soon as possible; on the other hand, it gives me some desperately needed time to get everything else ready for 5.0, including my suite of Phags-pa fonts.

Anyhow, back to stacking diacritics. a with 72 dicaritics is certainly impressive, but not very useful in the real world. However, there is one script that I can think of that does occasionally require multiple stacking characters. This is Tibetan. Tibetan is normally written horizontally with consonant clusters stacking vertically (implemented as one full-sized consonant from the range 0F40..0F69 and zero or many subjoined consonants from the range 0F90..0FB9). Ordinary Tibetan text only has limited vertical stacking (usually just one subjoined consonant, but sometimes two), and can be rendered correctly using Uniscribe version 1.453.3665.0 or later and a competent OpenType Tibetan font (of which there are now several freely available). However, occasionally, in esoteric texts, consonants are piled up (or rather down) like some crazy Yertle stack. With the first version of Uniscribe to support Tibetan OpenType features, Tibetan stacks with many subjoined consonants do not render correctly with any Tibetan OpenType font (I'm using Microsoft's not-yet-released Ximalaya font for these examples).

But with the latest versions of Uniscribe (version 1.468.4011.0 or later), in conjunction with the Ximalaya font, it is possible to correctly render Tibetan stacks with many subjoined consonants. It's not terribly pretty, but I think it is pretty amazing.

The above are all real examples of complex stacking, taken from sngags kyi klog thabs shes rab mig 'byedསྔགས་ཀྱི་ཀློག་ཐབས་ཤེས་རབ་མིག་འབྱེད. However, there are still some complex stacks that cannot yet be rendered in plain text. For example, in some of the complex stacks in this text there is also a horizontal element, where one or more of the subjoined letters is followed horizontally by the letter NGA to make a subjoined syllable such as yangཡང (the YA is vertically in line with the stack, but the NGA protrudes forward). At present there is no way of indicating horizontal progression at the subjoined level (and there probably never will be).

Also, with the "Ximalya" font (or at least my version of it, maybe the version shipping with Vista [called Microsoft Himilaya] will have been improved) non-standard multiple vowel signs do not work well, for example two i vowel signs will overlay each other, when they should each occupy their own space. As double i vowel signs are found in some abbreviations (which do not use abnormal stacking), the failure to render multiple vowel signs correctly is a little disappointing.

This comes as a suprise to most people, who do not naturally associate the swastika with the Chinese script. Of course, the swastika is not a Chinese invention, but was originally an ancient Indian religious symbol. It was introduced into China along with Buddhism, as the swastika was supposed to be one of the thirty-two marks of a Buddha. In the year 693 Empress Wu decreed that the swastika should henceforth be regarded as a Chinese character, to be pronounced the same as the character 萬 wàn "ten thousand".

The swastika thus entered the vast corpus of Chinese characters. The left-facing form is most common in Chinese usage, but both forms are found, as there was some disagreement amongst Chinese authorities as to which form was correct. The swastika, in either or both forms, is duly recorded in most large modern dictionaries (although only the left-facing form is found in the Kangxi Dictionary 康熙字典, where it has a very meagre entry). The two swastika characters were included in early Chinese encodings such as CNS 11643-1986, and so also included in the earliest version of Unicode as part of the CJK unified ideograph repertoire derived from the various legacy encodings.

The swastika character in Chinese does not have any meaning other than its own shape as an auspicious symbol, and so it is usually only used in the compound word wàn zì 卍字 (also often written as 萬字) "swastika character" to describe the swastika motif in the decorative arts. The following excerpt from the great 18th century novel Honglou Meng 紅樓夢 "A Dream of Red Mansions" illustrates the use of the swastika character in running text (the novel also includes a maid with the name of Wan'er 卍兒) :

Yesterday when I opened the storeroom I saw quite a few rolls of vermilion cicada-wing patterned gauze in some big chests. There were all sorts of designs with sprigs of flowers, as well as designs with floating clouds and patterns of swastika and good fortune characters, and designs with butterflies fluttering amongst the flowers. The colours are bright, and the gauze is soft and light, the like of which I have never seen before.

N.B. In some editions the word biānfú 蝙蝠 "bat" is found in place of wànfú 卍福 "swastika and good fortune", the bat also being an auspicious emblem in Chinese. The name of the maid Wan'er 卍兒 is also written 萬兒 in some editions.

The swastika is also an important symbol in other cultures, particularly in Tibet, where the swastika 卐 is a symbol of changelessness and eternity for Buddhists, and the left-facing swastika 卍 is the main emblem of the native Bön བོན religion. The most common name for the swastika symbol in Tibetan is g.yung drungགཡུང་དྲུང་ (silent initial g), which is a word of uncertain etymology. By themselves, g.yungགཡུང་ means a cross between a cow and a yak, and drungདྲུང་ means "near to, in front of or beside", so literally the word g.yung drung would mean something like "in front of the cow-yak", which obviously makes no sense. However, in the ancient Zhang Zhung language that is partially preserved in the Bön tradition, the word for the swastika is drung mu, which obviously has some relationship to Tibetan g.yung drung, although the etymology of the Zhang Zhung word is equally obscure (mu means "sky, heaven" in Zhang Zhung, but the root meaning of drung is not clear).

As the swastika is not confined to Han usage, but is a symbol used by many other cultures, some would argue that the swastika signs should be encoded in Unicode as symbols for general usage, in the same way that U+262F YIN YANG ☯, U+262A STAR AND CRESCENT ☪, U+262D HAMMER AND SICKLE ☭, U+2629 CROSS OF JERUSALEM ☩ and many other such religious or political symbols are, and that U+534D and U+5350 should then be restricted to Han usage. This is unlikely to happen due to sensitivities over the misuse of the swastika symbol by one particular culture. Nevertheless, there are several problems that I see with only encoding the swastika ideographs and not encoding swastika symbols in their own right.

Firstly, the swastika ideographs are given a Unicode script property of "Han", which indicates that they are only intended for use in a Han ideographic context. However, other scripts have a legitimate claim to the use of the swastika, and the Unicode Standard explictly states that the Tibetan script uses U+534D and U+5350 (TUS 4.0 p.257). This suggests to me that, out of the 70,000+ CJK ideographs currently encoded, U+534D and U+5350 alone should perhaps be given a script property of "Common". Michael Kaplan has suggested that it is a deficiency in the Unicode script property that characters must either belong to a single script only or else belong to all scripts, and thus it is not possible to specify that a character belongs to a particular subset of scripts, such as "Han and Tibetan" in the case of U+534D and U+5350. I guess that for many characters it is difficult to define the boundaries of script usage, and it is a lot simpler to just use "common" rather than a potentially controversial or changing list of scripts.

Secondly, the glyphs for the ideographic swastikas are often drawn in an ideographic style which may not be suitable for non-Han usage.

Thirdly, because U+534D and U+5350 are hidden amongst the thousands of anonymous CJK ideographs, it is not easy for users to find them if they do not already know where to look. For example, searching for "swastika" in either Windows Character Map or BabelMap will not produce any results (though this will change in the next version of BabelMap), which would probably lead most people to suppose that there are no swastika symbols encoded in Unicode ... and perhaps they would be half right.

Addendum I

Looking through some old files I have just rediscovered some images of bon head marks, which are formed from the left-facing swastika. [2007-05-27 : these headmarks are actually used in the sMar-chen script, as discussed in my Zhang Zhung Scripts post.]

These marks are the equivalent of the head mark character U+0F04 ༄TIBETAN MARK INITIAL YIG MGO MDUN MA, or perhaps more accurately the recently proposed archaic-style head mark character, pencilled in as U+0FD3 TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA, with the curl styled into a swastika, and are used in bon religious texts. (I think that Tibetan head marks should perhaps be the topic for my next Tibetan Extensions blog.)

Addendum II: An Unauthorised History [2014-07-07]

When I wrote this blog post eight years ago I thought it unlikely that generic swastika signs would ever be encoded in Unicode "due to sensitivities over the misuse of the swastika symbol by one particular culture". But only a little more than a year later four generic swastika signs (two plain swastikas and two dotted swastikas) were proposed for encoding by Michael Everson as part of a larger proposal for Vedic Sanskrit. The swastikas have been entirely removed from the revised version of this proposal dated 2007-04-26 that is in the WG2 document registry (N3235), but the original proposal dated 2007-04-13 is still preserved on Everson's site:

This propsal was discussed at WG2 meeting 50 at Frankfurt, Germany during the week of 23–27 April 2007, although all such discussion and mention of the word "swastika" have been redacted from the minutes of the meeting. Luckily, I was present in person at the meeting, and remember much of the discussion. There was general consensus amongst the experts at the meeting that the four proposed characters should be encoded in principal, but there was concern that German anti-Nazi laws could land the ISO/IEC 10646 and Unicode Standards in legal trouble if the characters were encoded. However, the German delegates and meeting hosts assured the meeting that encoding these signs would not cause any problems as they are religious symbols, and use of religious swastika symbols was not illegal in Germany. Nevertheless, in order to ensure that users of the standard clearly understand that these swastika signs are intended for use as religious symbols, it was agreed that they should be encoded in the Tibetan block with Tibetan names. The following month, in response to the discussion at this meeting, Michael Everson, Chris Fynn, Peter Scharf and myself submitted a proposal to encode four swastika signs (N3268):

Page 1 of N3268 (2007-05-09)

In this proposal the swastika signs are given a new identity as Tibetan symbols:

At the next WG2 meeting, in Hangzhou, China in September 2007, it was agreed to add these four characters to Amendment 6 of ISO/IEC 10646 for balloting by ISO national bodies (see N3353 §9.17). In their ballot comments the Indian national body requested that the characters be moved out of the Tibetan block, and placed in either a Devanagari block or a general symbols block, and renamed to use the name "swastika" (see N3476). The following year, Amendment 6 underwent a second Proposed Draft Amendment (PDAM) ballot, and in response to the concerns of the Indian national bodies the USA and Irish national bodies requested that the characters be renamed to use the terms "left svasti" and "right svasti" (deliberately avoiding the stigmatized "swastika") instead of the Tibetan names (see N3516). In the end, the four characters were kept in the Tibetan block, but with non-Tibetan names (although the Tibetan names were given as aliases), when Amendment 6 was published in October 2009, following two further rounds of ISO balloting:

Detail of Page 23 of ISO/IEC 10646:2003 Amendment 6 (2009-10-15)

The contents of Amendment 6, including these four svasti signs, were included in Unicode version 5.2, which was released on 1st October 2009.

However, five years after entering Unicode, these four characters are still very poorly represented in fonts, with a perhaps understandable reluctance of large companies to support characters which may be misconstrued as Nazi symbols. No fonts that ship with Windows XP, 7 or 8 include these characters, and on my computer only the following freeware fonts cover the four svasti characters:

Left-facing swastika in running Tangut text (the following character is the Tangut character for 'character', so the context is discussing the 'swastika character')

Woodblock edition of the Peacock Wisdom King Dharani 孔雀明王陀羅尼 (Mahāmāyuri)

Left-facing hollow swastika

Addendum IV: Swastikas in Other Scripts [2014-07-09]

Marchen Script

The Marchen script is one of several scripts within the Tibetan Bön tradition that were supposedly used to write the Zhang Zhung language of the Zhang Zhung culture that flourished in the western and northern parts of Tibet before the introduction of Buddhism into the country during the 7th century. The left-facing swastika ࿖ is the paramount symbol of the Bön religion, and is present in two characters in the script: the head mark (corresponding to ༄ in the Tibetan script and ꡴ in the Phags-pa script); and the letter nya. The Marchen script has been included in Amendment 2 of ISO/IEC 10646:2014, which is currently undergoing its first round of ISO balloting (the PDAM ballot). If it successfully passes the ISO balloting process, the Marchen script will be included in a future version of the Unicode Standard (currently scheduled for Unicode 6.0 in June 2014).

U+11C70 "MARCHEN HEAD MARK" and U+11C79 "MARCHEN LETTER NYA" are both based on the left-facing swastika

Nüshu Script

The Nüshu script 女书 is a script used exclusively by women in Jiangyong county of Hunan province to write the local dialect of Chinese. Many Nüshu characters are derived directly from Chinese characters by skewing the shape of the character, and the Nüshu character representing uoɯ³³ 万 (wàn), uoɯ⁴⁴ 弯 or 湾 (wān), va³³ 位 (wèi), and iu 约 (yuē) is based on the right-facing ideographic swastika character 卐 (wàn). One other Nüshu character looks as if it is based on a dotted right-facing ideographic swastika character, but it is actually derived from the Chinese character 断 (duàn), and is used to represent taŋ¹³ 断 (duàn), ta³³ 地 (dì), taŋ³³ 段 or 缎 (duàn), laŋ³⁵ 短 (duǎn), laŋ⁴⁴ 端 (duān), tuoɯ¹³ 但 (dàn), tai¹³ 动 (dòng), and ŋu¹³ 午 (wǔ). The Nüshu script (named "Nushu" for technical reasons) has also been included in Amendment 2 of ISO/IEC 10646:2014, and should be included in a future version of the Unicode Standard (currently scheduled for Unicode 6.0 in June 2014).

U+1B258 "NUSHU CHARACTER-1B258"* looks like it is based on a dotted swastika, but is actually derived from the character 断 (duàn)

* Nushu was originally included in ISO/IEC 10646:2014 Amendment 1, with character names based on the phonetic reading of the most frequent meaning of the character, so U+1B195 was named "NUSHU CHARACTER UOW33", and U+1B258 was named "NUSHU CHARACTER TANG13" (see N4484). However, in response to feedback from Japanese experts (see N4513) and ballot comments from the Japanese national body (see N4520), and after much heated debate, it was decided to use algorithmic names based on the hexadecimal code point of the character for all East Asian ideographic scripts, including Nüshu, Tangut, Jurchen and Khitan. Consequently, Nushu was moved back to Amendment 2 so that it can be balloted with the new algorithmic names.

Naxi Tomba Script

The Naxi Tomba script comprises a set of over a thousand pictographic glyphs. A preliminary proposal to encode 1,188 Naxi Tomba characters (N4043) includes a swastika character under the category of religious symbols at #290:

Addendum V: More CJK Swastikas [2014-11-05]

Are two CJK swastika characters enough? Not for Buddhists! A CJK version of the dotted right-facing swastika (with square mouths for dots) is found in some Buddhist texts collected in the Taishō Shinshū Daizōkyō 大正新脩大藏經 (1924–1934) (also known as the Taishō Tripiṭaka). This character has been proposed for encoding within CJK Unified Ideographs Extension F, and is currently on the ballot for ISO/IEC 10646:2014 Amd.2 (SC2 N4379). If all goes according to schedule then CJK-F should be encoded in Unicode 9.0 in June 2016 (but the swastika character provisionally assigned to U+2D14A may well be encoded at a different code point).