B.2 Unsupported

These languages, when written in the given script, are currently
unsupported by Aspell for one reason or another.

Code

Language Name

Script

ja

Japanese

Japanese

km

Khmer

Khmer

ko

Korean

Han, Hangul

lo

Lao

Lao

th

Thai

Thai

zh

Chinese

Han

B.2.1 The Thai, Khmer, and Lao Scripts

The Thai, Khmer, and Lao scripts presents a different problem for
Aspell. The problem is not that there are more than 210 unique symbols,
but that there are no spaces between words. This means that there is no
easy way to split a sentence into individual words. However, it is
still possible to spell check these scripts, it is just a lot more
difficult. I will be happy to work with someone who is interested in
adding Thai, Khmer, or Lao support to Aspell, but it is not likely
something I will do on my own in the foreseeable future.

B.2.2 Languages which use Hànzi Characters

Hànzi Characters are used to write Chinese, Japanese, Korean, and were
once used to write Vietnamese. Each hànzi character represents a
syllable of a spoken word and also has a meaning. Since there are
around 3,000 of them in common usage it is unlikely that Aspell will
ever be able to support spell checking languages written using hànzi
until full Unicode support is implemented. However, I am not even sure
if these languages need spell checking since hànzi characters are
generally not entered in directly. Furthermore even if Aspell could
spell check hànzi the existing suggestion strategy will not work well
at all, and thus a completely new strategy will need to be developed.
However, if it is the case that hànzi needs to be spell checked and
you know something about the issues involved please fell free to contact
me.

B.2.3 Japanese

Modern Japanese is written in a mixture of hiragana,
katakana, kanji, and sometimes romaji. Hiragana
and katakana are both syllabaries unique to Japan, kanji is
a modified form of hànzi, and romaji uses the Latin alphabet.
With some work, Aspell should be able to check the non-kanji part of
Japanese text. However, based on my limited understanding of Japanese
hiragana is often used at the end of kanji. Thus if Aspell was to
simply separate out the hiragana from kanji it would end up with a lot
of word endings which are not proper words and will thus be flagged as
misspellings. However, this can be fairly easily rectified as text is
tokenized into words before it is converted into Aspell’s internal
encoding. In fact, some Japanese text is written in entirely in one
script. For example books for children and foreigners are sometimes
written entirely in hiragana. Thus, Aspell, in its current state, could
prove at least somewhat useful for spell checking Japanese.

B.2.4 Hangul

Korean is generally written in hangul or a mixture of han and hangul. In
Hangul letters individual letters, known as jamo, are grouped together
in syllable blocks. Unicode allows Hangul to be stored in one of three
ways, (A) Individual jamo letters (Hangul Compatibility Jamo, U+3130 -
U+318F), (D) decomposed jamo (Hangul Jamo, U+1100 - U+11FF), and (C)
precoposed sylable blocks (Hangul Syllables, U+AC00 - U+D7AF). In order
for Aspell to work with Hangul it needs to be form A. Unfortunately the
existing Normalization code in Aspell will not be able to adequately
deal with converting Hangul from form D and C to form A and back again.
However, once this code is written, Aspell should be able to spell check
Hangul without any problem.