Tibetan script summary

This page provides basic information about the Tibetan script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For similar information related to other scripts, see the Script comparison table.

Show more links.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

The Tibetan script is used for writing the Tibetan, Dzongkha, Ladakhi and Sikkimese languages, spoken in Tibet, Bhutan, Nepal and India. It is also used for transcribing religious Sanskrit texts. The exact origin of the script is not clear; Tibetan Buddhism traditionally ascribes its creation to Minister Thon mi Sambhota in Northeast India, but Bon Po religious tradition cites Iranian or Central Asian origins. What is generally agreed upon is that it is ultimately derived from the Brahmi script, as evidenced by its syllabic structure, its use of diacritics to modify the vowel in a syllable, and its typically Brahmic canonical arrangement of the letters in phonological groups.

There are a number of different styles of writing Tibetan, which can be grouped into two main variants: dbu-can 'with a head', which is the most commonly used and is the less cursive of the two, and dbu-med 'headless' which includes the relatively careful dpe-yig 'book writing' or the rapid nkhyug-yig 'running writing'.

The Tibetan alphabet is an abugida used to write the Tibetic languages such as Tibetan, as well as Dzongkha, Sikkimese, Ladakhi, and sometimes Balti. The printed form of the alphabet is called uchen script while the hand-written cursive form used in everyday writing is called umê script.

The alphabet is very closely linked to a broad ethnic Tibetan identity, spanning across areas in Tibet, Bhutan, India, Nepal. The Tibetan alphabet is of Indic origin and it is ancestral to the Limbu alphabet, the Lepcha alphabet, and the multilingual 'Phags-pa script. ...

The creation of the Tibetan alphabet is attributed to Thonmi Sambhota of the mid-7th century. Tradition holds that Thonmi Sambhota, a minister of Songtsen Gampo (569-649), was sent to India to study the art of writing, and upon his return introduced the alphabet. The form of the letters is based on an Indic alphabet of that period.

Three orthographic standardizations were developed. The most important, an official orthography aimed to facilitate the translation of Buddhist scriptures, emerged during the early 9th century. Standard orthography has not altered since then, while the spoken language has changed by, for example, losing complex consonant clusters. As a result, in all modern Tibetan dialects, in particular in the Standard Tibetan of Lhasa, there is a great divergence between current spelling (which still reflects the 9th-century spoken Tibetan) and current pronunciation. This divergence is the basis of an argument in favour of spelling reform, to write Tibetan as it is pronounced, for example, writing Kagyu instead of Bka'-rgyud. In contrast, the pronunciation of the Balti, Ladakhi and Burig languages adheres more closely to the archaic spelling.

Tibetan is an abugida, ie. consonants carry an inherent vowel sound a that is overridden using vowel signs. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

Text runs horizontally from left to right.

There are various different Tibetan scripts, of two basic types: དབུ་ཅན་dbu-can, pronounced uchen (with a head), and དབུ་མེད་dbu-med, pronounced ume (headless). This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.

Traditional Tibetan text was written on pechas (dpe-chaདཔེ་ཆ་), loose-leaf sheets. Some of the characters used and formatting approaches are different in books and pechas.

One of the key distinguishing features of Tibetan is the set of separate code points for subjoined consonants, used where a syllable has multiple consonants. Of the 77 combining characters in the Tibetan block, 48 represent subjoined consonant forms.

Most Tibetan syllables contain stacked consonants or vowel signs, and very often they contain both together.

རྒྱུད་

An example showing a syllable with an initial stack of three consonants plus a vowel sign.

Character lists

The Tibetan script characters in Unicode 10.0 are contained in a single block (not counting shared characters, such as punctuation):

Native Tibetan words use 30 consonants, but the Tibetan block contains many more. Many of the extra consonants (and other characters) are used for transliteration of other languages, principally Sanskrit and Chinese. These include the retroflex and voiced aspirated consonants. A couple of characters are extensions for Balti.

The pronunciation of Tibetan words is typically much simpler than the orthography, which involves patterns of consonants. These reduce ambiguity and can affect pronunciation and tone.

The primary consonant is called the root consonant (or radical), and the other consonants in the syllable (which normally has up to 6 consonants in total) annotate or modify it. The following rules help identify the root:

a consonant with a vowel is always the root, unless it is the phrase connector འི, and letters with superscripts or subscripts are root consonants.

in a 2-consonant syllable with no vowel, the first consonant is always the root

in a 3-consonant syllable where the last consonant is not ས[U+0F66 TIBETAN LETTER SA], the second consonant is likely to be the root.

in a 4-consonant syllable, the second consonant is always the root.

The following diagram shows characters in all of the syllabic positions, and lists the characters that can appear in each of the non-root locations. The word is འགྲེམས་སྟོན་'grems-stonɖɹem-ton (exhibition).

Only two characters can appear in the secondary suffix location, according to Tibetan grammar, ས[U+0F66 TIBETAN LETTER SA] and ད[U+0F51 TIBETAN LETTER DA], and the latter is no longer officially found in modern Tibetan. A character in this position adds no sound and nor does it affect the sounds in the rest of the syllable, eg. བསྒྲུབས་bsgrubsɖɹúb (established), and གྱུརད་gyurdkjùr (became).

The three characters that appear in the superscript location raise the tone pitch of the syllable, but are not pronounced themselves. Each superscript character can only be used with a specified set of root characters.

རྐ

ka

རྒ

ga

རྔ

ŋa

རྗ

ʤa

རྙ

ɲa

རྟ

ta

རྡ

da

རྣ

na

རྦ

ba

རྨ

ma

རྩ

tsa

རྫ

dza

ལྐ

ka

ལྒ

ga

ལྔ

ŋa

ལྕ

ca

ལྗ

ʤa

ལྟ

ta

ལྡ

da

ལྤ

pa

ལྦ

ba

ལྷ

lha

སྐ

ka

སྒ

ga

སྔ

ŋa

སྙ

ɲa

སྟ

ta

སྡ

da

སྡ

na

སྤ

pa

སྦ

ba

སྨ

ma

སྩ

tsa

Note that RA has a shape slightly different from its nominal shape in all combinations except རྙ and རླ. You should still use the normal RA character for the superscript. The font will make the needed adjustments of shape.

A standard stack has a standard consonant character at the top (although it may actually be slightly squeezed or adapted slightly in shape), and one or more special subjoined consonant characters beneath it.

The topmost consonant in a stack always uses the standard character from the Unicode Tibetan block regardless of whether it is a root consonant or not, and consonants below it always use a character from the subjoined range.

See this example from the Unicode Standard of the word སྤྱིར་spyirʧí (general), which shows a stack with three consonants.

Unlike Indic scripts, there is no virama (or halant) used for Tibetan. Instead, just a full and subjoined form of each consonant. The subjoined forms are combining characters. Avoiding the virama makes sense because the virama is not used by Tibetans, and the approach taken makes it easier to create the large number of stacks contained in Tibetan text.

Tibetan uses the word 'head' to refer to either the top-most consonant (ie. spacially) or the root consonant of a syllable, which may be a subjoined consonant. We therefore avoid this term here, and say 'root' or 'topmost'.

The following list shows the order in which characters should be typed, and stored in memory, for a set of stacked characters.

Both 'a-chung and a-chen can be used with vowel signs, in which case the a sound is replaced by that of the vowel.

'A-chung can also represent a nasal, so མཚམས་mtshams (boundary) and མཐུན་mthun (agreement) are often written འཚམས་ and འཐུན་.

'A-chung may also nasalise the juncture of two morphemes, as in དགེ་འདུན་dge-'dun (buddhist community), pronounced ɡenyn.

Other than loanwords, Tibetan only allows diphthongs in diminutive expressions. 'A-chung is used to write these, as in the following: མི་miperson → མེའུ་me'udwarf; རྡོ་rdostone → རྡེའུ་rde'upebble.

A subjoined 'a-chung is used to express long vowels in loan words (Tibetan doesn't have them natively), such as those borrowed from Chinese, Hindi and Mongolian. For example, ཏཱ་བླ་མ་tā-bla-ma (grand lama) (ta from Chinese), and ཤྲཱི་śrī (wealth) from Sanskrit.
Usage tip:
For this purpose you should use ཱ[U+0F71 TIBETAN VOWEL SIGN AA​], and not ྰ[U+0FB0 TIBETAN SUBJOINED LETTER -A​].

The Unicode Standard says of SUBJOINED LETTER -A:

U+0FB0 TIBETAN SUBJOINED LETTER -A ( a-chung ) should be used only in the very rare cases where a full-sized subjoined a-chung letter is required. The small vowel lengthening a-chung encoded as U+0F71 TIBETAN VOWEL SIGN AA is far more frequently used in Tibetan text, and it is therefore recommended that implementations treat this character (rather than U+0FB0) as the normal subjoined a-chung.

Finally, 'a-chung can be used to disambiguate the location of an inherent vowel in a syllable. The sequence དག་dagdàg (I) is interpreted as CVC. To express CCV add 'a-chung, eg. དགའ་dga'gà (virtue).

Most consonants translate to the same basic sound unless they are modified by surrounding letters as mentioned above. In some cases, however, the pronunciation of a consonant is irregular. In particular, b is sometimes pronounced w, eg. རེ་བ་re-bare-wa (hope), དབང་ཆ་dbang-cawang-ʧa (power), and some words have an additional nasalisation which is not shown, eg. ད་ལྟ་da-ltadan-ta (now).

The Tibetan block contains 15 vowel-signs, all of which are combining characters.

Standard Tibetan has five vowels, for which there are four characters, since one vowel, a, is inherent in the consonant.

list all

-ིi

-ུu

-ེe

-ོo

Non-inherent vowels are indicated by a single mark attached to and typed after a consonant or consonant stack. In the example སྤྱིར་ʧí (general) the vowel sign that appears above the stack is typed after the three consonants that make up the stack.

Many of the characters in the Tibetan block are there for transcribing or transliterating non-Tibetan text. The Tibetan script provides for perfect mappings between Sanskrit and Tibetan, but Tibetan is also used to transliterate other languages, such as Chinese, Mongolian and English.

There are a number of consonants, including a range of aspirated consonants, and the following range of retroflex consonants.

GHA

DDHA

DHA

BHA

DZHA

KSSA

TTA

TTHA

DDA

NNA

SSA

head

subjoined

Additional consonant characters for transliteration.

The retroflex consonants, which are reversed versions of Tibetan consonant shapes, are often used to distinguish loan words from sequences of Tibetan syllables. For example, ཁ་ཎ་ཌ་kha-ṇa-ḍa (Canada), མོ་ཊ་mo-ṭa (car).

In transliterated text consonants are sometimes stacked in ways that are not allowed in native Tibetan text.

There are also additional vowel signs between U+0F71 and U+0F7D for Sanskrit transcriptions, and several are compound shapes. The component parts of these compounds should normally be typed individually, rather than using the compound codepoints. The table below shows the characters, and indicates those whose use is discouraged and strongly discouraged.

The six compound consonants GHA, DDHA, DHA, BHA, DZHA and KSSA in the table above, used to represent the Indic consonants during transliteration, can be created by combining a head consonant with a subjoined HA, but the Unicode Standard recommends that the precomposed characters be used in order to maximise effectiveness of transmission and searching. I have suggested that this recommendation be changed in version 7, since many applications silently normalise text to the decomposed sequence.

ར[U+0F62 TIBETAN LETTER RA] at the top of a stack usually has a reduced form, eg. རྐrka. For transliterations it is sometimes desirable to retain the full form of RA where in Tibetan words it would be reduced.
Usage tip:
To do this use ཪ[U+0F6A TIBETAN LETTER FIXED-FORM RA] instead of the normal RA, but only where the normal RA would not produce the full form anyway, ie. do not use eg. རྙrnya, which has the full form already.

By some interpretations, the following shapes each have the value of 0.5 less than the number within which it appears. Used only in some traditional contexts, they appear as the last digit of a multidigit number, eg. ༤༬ represents 42.5. These are very rarely used, however, and other uses have been postulated. For more information see Numbers that Don't Add Up : Tibetan Half Digits, by Andrew West.

Glyphs in Tibetan script need to be adapted sometimes to suit the context in which the character is used. A particularly prevalent example is that of the letter ར[U+0F62 TIBETAN LETTER RA]. When used at the top of a stack it has an abbreviated form, as shown by the grey highlight in the example below on the left.

The example on the right shows what a normal RA looks like. This is the same underlying character. The shape is determined by rules in the font.

In pechas, Tibetan text is written inside a visible box which defines the margin of the page. In more recent publications this box may be invisible. Modern publications also use paragraphs. The initial line of a new paragraph may be indented.

Key divisions of the text are sections (or expressions (brjod-pa)) and topics (don-tshan), which do not necessarily equate to English phrases, sentences and paragraphs. Sections normally end with a shay, །[U+0F0D TIBETAN MARK SHAD], followed by a space. Topics (eg. headlines, verses, and longer paragraphs) are often terminated or separated with shay+space+shay.

Unicode provides ༎[U+0F0E TIBETAN MARK NYIS SHAD] as a means of regularising the spacing between the two shad marks, which tends to be slightly bigger than a normal space. The space between the shad marks can be stretched during justification, however, and it's not clear to me how that would work when using NYIS SHAD.

In a Chinese magazine publication I have, most articles contain no double shay as a delimiter. (The text is formatted in paragraphs.) I did find a double shay at the very end of one of the articles, and it was used at the end of each line on a page containing some verse-formatted folk literature. The same appears to apply for large parts of the Bhutanese newspapers I have, however there are other pages with plenty of double shays - some at the end of paragraphs, some inside paragraphs.

A line that ends with the root consonant ཀ[U+0F40 TIBETAN LETTER KA] or ག[U+0F42 TIBETAN LETTER GA] will normally swallow up the shay that immediately follows it, even if there is a vowel sign. For example, where you might expect to see a double shay, you might see ཀུ ། and སྐུ །. However, the shad is not omitted if these characters have a subscript, eg. གྲུ། །.

Examples of tsek not being used before shay, and of U+0F0C being used between NGA and shay.

Users may use an ordinary TSHEG between NGA and SHAD, but Unicode also provides a special non-breaking character that can be used instead, ༌[U+0F0C TIBETAN MARK DELIMITER TSHEG BSTAR]. The word 'delimiter' in the name is a misnomer.

Whitespace in Tibetan text should use U+00A0 NO-BREAK SPACE. Spaces in Tibetan text are usually wider than spaces in English text, and typically only occur after one of the following: །, ༑, ༔ or ཿ. However, numbers and embedded Western text are surrounded by smaller spaces, eg. ལོ་ ༢༠༠༡ ཤིང་བྱ་ཟླ་ ༩ ཚེས་ ༥ ཉིན་. Looks like this is also something that the application needs to take care of.

Normally, Tibetan only breaks after the tsek, and doesn't break after spaces.

Tibetan never breaks inside a syllable, and has no hyphenation. If a word is composed of multiple syllables, it is also preferable to avoid breaking a line in the middle of the word.

Line breaks do not occur after a tsek
when it follows ང[U+0F44 TIBETAN LETTER NGA] (with or without a vowel sign) and precedes a shay, །[U+0F0D TIBETAN MARK SHAD]. The Unicode Standard also talks of other instances where Tibetan grammatical rules do not permit a break, but it isn't clear what those are.

If the character after NGA is an ordinary INTER-SYLLABIC TSHEG, then applications need to ensure that lines do not break between the TSHEG and the SHAD. Text is likely to be more portable if content authors use the TSHEG BSTAR in these locations, instead of the normal TSHEG.

Line breaks and rin chen spungs shad. In Tibetan, especially in pechas, it is considered a special case if the last syllable of an expression that is terminated by a shay breaks onto a new line. In that case the shay or double shay is replaced by rin chen spungs shad, ༑[U+0F11 TIBETAN MARK RIN CHEN SPUNGS SHAD]. At the end of a topic the rules say that only one shay should be converted, ie. ༑ །, however it is moderately popular to convert both, ie. ༑ ༑. This change serves as an optical indication that there is a left-over syllable at the beginning of the line that actually belongs to the preceding line.

This varies in the following cases:

when a line starts with ལེའུ། །, no rin chen spungs shad would be used, since le'u is pronounced as two syllables.

sometimes only the first of two shays is replaced, ie. ༑ །, but this style is considered less attractive.

some printed books do not use rin chen spungs shad replacements, however the majority of books seem to apply the same rules as are used with pechas.

In an environment where the width or content of the page can change, this feature poses a problem for the content author. The application needs to be able to automatically switch between the two styles of shad as a syllable moves on or off a new line when the page is resized or when preceding content is modified.

The Unicode Standard adds: "Not only is rin-chen-spungs-shad used as the replacement for the shay but a whole class of “ornamental shays ” are used for the same purpose. All are scribal variants on a rin-chen-spungs-shad, which is correctly written with three dots above it."

Method 1: inter-character spacing. Spacing between all characters should be adapted equally. Note that the width of the white-space character should not be changed significantly, so Tibetan texts use the non-breaking space mentioned above, which doesn't change width on justification.

Method 2: tsek padding. While hand writing, authors add small spaces across the text to get the line end as near as possible to the right margin. Where space remains at the margin, it may be left as is, if it is short. Otherwise, the remaining space will be filled with tseks to make the line as flush as possible with the right margin (there will usually still be a slight raggedness to the right edge of the text).

A page of a booklet showing tsek padding.

There are a couple of detailed rules about the use of tsek padding. Justifying tseks are almost always used when the line ends in a tsek. If, however, the line ends in a shay, there are a number of alternatives.

If the line ends with a single shay the shay is followed by spaces. Tsek padding is never applied after spaces. (See examples in the figure above.)

If the line ends in a double shay (with space between), it is unusual (though possible) to add tsek padding. Instead, the space between the shays is stretched or narrowed. (See examples in the figure below.) The same applies if the second shay was removed because it was preceded by a KA or GA.

Booklet pages showing double shay usage at the end of a line.

Use the control below to see how your browser justifies the text sample here. The gaps are all no-break spaces.

Over and above that described in the previous section, traditional Tibetan text uses very little punctuation, but there a number of signs and symbols to choose from.

༈[U+0F08 TIBETAN MARK SBRUL SHAD] is used to separate texts that are equivalent to topics and subtopics, such as the start of a smaller text, the start of a prayer, a chapter boundary, or to mark the beginning and end of insertions into text in pechas.

This drul-shay is usually surrounded on both sides by the equivalent of about three non-breaking spaces (though no rule is specified). The drul-shay should not appear at the beginning of a new line and the whole structure of spacing-plus- shay needs to be kept together.

The use of these marks is not straightforward, since they attach to a syllable rather than a character and therefore to place them correctly the application needs to take syllable boundary positions into account. If entered as combining characters they can be added after the vowel-sign in a stack.

Application software has to ignore these characters for text processing, such as search and collation.

These characters may also be used in interspersed commentaries to tag the root text that is being commented on. An alternative is to set the tsek-bar being commented on in large type and the commentary in small type.