The statement that "it is impossible for a format to perform identically
in terms for instance of compactness or processing efficiency for a
language that can be entirely captured using a single byte per character
and for one that requires a multi-byte encoding" is untrue. It is
certainly possible to provide equally compact and efficient data for
languages like English and languages like Chinese. To do so simply
choose an encoding form such as UTF-32 that does not preference one over
the other.
Such an encoding is suboptimal for English, but it would absolutely
have the characteristic that English and Chinese would be treated
equally efficiently.
The point of human language neutrality is precisely to avoid
preferencing one language or script over another. This would make UTF-8
an inappropriate choice here. UTF-32 is the most neutral, but as a
practical matter, I suspect no one would be too peeved by UTF-16, and
that's probably the most reasonable compromise for textual data.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim