Am 25.07.2005 um 05:32 schrieb Sean Schertell:
>> UTF-8 sucks for Japanese and Chinese texts mainly due to space
>> reasons. If anything makes sense, then it is UTF-16, which
>> Textmate also supports.
>> Could you explain what you mean by "space reasons"?
Due to the way UTF-8 works, it used 1 byte for US-ASCII characters,
but up to four bytes depending on the Unicode number. Many alphabets
can be encoded with two bytes (especially the European ones, but also
Hebrew or Arabic). Chinese and Japanese characters will require three
or four bytes.
UTF-16 on the other hand encodes everything in two bytes. So that's
why for Chinese or Japanese texts you will waste space when using
UTF-8 compared to UTF-16, while with English and most European
languages you will save a lot of space using UTF-8 compared to
UTF-16. And the latter was IMHO one of the main reasons for
developing UTF-8.
You can read some more about it at http://en.wikipedia.org/wiki/UTF-8.
Patrice