Stefan Monnier <monnier at iro.umontreal.ca> writes:
>> The various UTF encodings do not have this particular problem; if a UTF
>> string is valid, then it is a unique representation of a unicode string.
>> However, decoding is still a partial function and can fail.
>> And while it is partly true, it is qualified by the problems relative to
> canonicalization (an "é" in Unicode can both be represented as "é" or as two
> chars (an e and an accent) and they should (ideally) compare equal).
In what sense "equal"? They are supposed to be equivalent as far
as the semantics of the text is concerned, but representations are
clearly different and most programs distinguish them. In particular
they are different filenames on both Unix and Windows. AFAIK MacOS
normalizes filenames, but using a slightly different algorithm than
Unicode (perhaps just an older version).
IMHO it makes no sense to pretend that they are exactly the same when
strings consist of code points or lower level units (and I don't
believe another choice for the default string type would be practical).
--
__("< Marcin Kowalczyk
\__/ qrczak at knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/