The abbreviation "NLP" stands for natural language processing. 'Natural language' does not mean any programming languages. In this case, we refer to Myanmar language processing on computers.

Thursday, September 6, 2007

Word Break

A word, is a unit of language, carries meaning and consists of one or more morphemes. Although there are spaces to separate between words English, no spaces are needed to add in Myanmar. Typically, a word will consist of a root or stem and zero or more affixes. Words can be combined to create phrase, clauses and sentences (Wikipedia).

So, word boundary must be detected for Myanmar Language. But, Myanmar Language is the tonal and analytic language. Myanmar writing system is a syllabic writing system, so the fundamental building blocks of a language are the syllables. Determining syllable boundary can be done by rule base.

A word can be formed by one or more than one syllables.

Syllable break which can be used for sorting, searching, text to speech, transliteration, can also be used for word breaking methods.

Word break which can be used for spell checking, grammar checking, translation, line breaking, etc,.