>Subject: Wired 4.09 p. 130: Lost in Translation
>From: David Beroff, d4b@inet.bis.adp.com>
>Interesting 16-bit vs. 32-bit issue for
>characters. (I guess nobody seriously
>considered 24-bit characters?)
>
>Anyway, I have an even more radical idea.
>Could Unicode support variable-length
>characters, so that one or more Unicode
>values would mean "shift"? This would
>allow quite a number of Chinese (etc.)
>characters to be represented in the
>second Unicode byte-pair.
>
>Or am I being way too whimsical?
>
>-- David Beroff <d4b@bis.adp.com>

"Decomposed" Latin characters are an example of variable-length encoding.
In Implementation Level 1 of 10646, restricting oneself to the BMP, all
characters used in text are 16-byte characters, so A and A WITH ACUTE are
the same. When combining characters are used in Implemention Level 3,
characters' identities cannot be trusted, because A might be A or it might
be A WITH ACUTE, or it might be A WITH ACUTE AND DOT BELOW, or it might be
A WITH ACUTE AND DIAERESIS AND DOT BELOW.... is there a limit?

Unicode and 10646 are not the same in this regard, in that Unicode assumes
Level 3 all the time. But it seems that it makes software more complex,
precisely because you don't know when A is A and when it is something else,
unless your software keeps checking ahead, and ahead, and ahead until it
finds something that's not combining.