Tibetan and Himalayan Library - THL

Word Boundaries

Tibetan punctuation only marks the boundaries of syllables, not the boundaries of
words. A small dot separates each syllable (and in some exceptional cases may
enclose two syllables), but the reader must determine which syllables combine to
form a word. The following are our general principles for rendering Tibetan
personal names and technical terms in the present context:

Monosyllabic words should be rendered as a single word and should not be
combined with other syllables: rkub is “kup,” khyi is “khyi,” and so forth

Bisyllabic words should be rendered as a single word: lha
sa is “Lhasa,” bsod nams is “Sönam,” sngon po is “ngönpo,” and so forth

Trisyllabic words are generally rendered as a single word: lha mo skyid is “Lhamokyi,” dpal ldan rgyal
is “Pendengyel.” Note: one should be mindful not to
combine the syllables of different words – thus bod rang skyong ljongs is “Bö Rangkyong Jong,” bu ston rin chen grub is “Butön Rinchendrup,” and ye shes ’od is “Yeshé Ö”

For personal and place names, the first letter of each word should be
capitalized. Thus, “Sönam Rinchen” and not “Sönam rinchen”

Grammatical particles should be rendered with the word with which they are
construed – usually the preceding word. Thus, chos kyi rnam
grangs is “chökyi namdrang,” gtan la phab pa
is “tenla pappa,” and ’gyur med is “gyurmé.” An example
of a particle that precedes the word it modifies: ma byas pa is “majepa”

We will maintain a running list of individual words that are exceptions to the
rules stated above. For instance, the city rgyal rtse is
actually pronounced “Gyantsé” rather than “Gyentsé” as the rules would dictate.
We ask that users contact us with such exceptions to factor into the conversion
program.

Ultimately we need to apply a word list for the automated recognition of word
boundaries. In the meantime, we will mark up the boundaries of words to generate
“Sönam Rinchen” rather than “Sönamrinchen.”

Note: at present the THL Simplified Phonetics system is
geared towards computer-generated output for Tibetan words and phrases. We are
working to adapt it for use with longer passages and entire texts – for
instance, converting an entire liturgical text for non-Tibetan speakers who want
to chant the liturgy in Tibetan. However, this currently is not possible due to
the difficulties a computer program has identifying word boundaries in Tibetan
texts. While we are working towards resolving this problem, our interim solution
is for the program to process each syllable individually and separate them with
spaces, ignoring the few rules that depend on identifying word boundaries (ba becomes wa when it is the final
syllable of a word; é with diacritic accent is used when it
is the final sound of a word).

#!essay=/thl/phonetics/

THL Simplified Phonetic Transcription of Standard
Tibetan, by David
Germano