At 1:35 PM -0400 6/26/00, John Cowan wrote:
>Kevin Regan wrote:
>
>> If it is the usual case that documents are created in the normalized
>> form, then it does not seem like a big issue. What would happen
>> in the case of an editor or application written in Java (Unicode)?
>
>Most people do not have the capability of keyboarding separate accent
>marks anyhow (their keyboards generate the normalized forms).
But this is a gross oversimplification of how users might enter
non-canonicalized characters in a document. An easy example from
plane zero is U+00BC (VULGAR FRACTION ONE QUARTER). Microsoft Word
(and other programs) will insert this into a document as its
uncanonicalized form; Word will even do it behind your back unless
you turn off Word's default "helpful" auto-correction feature. U+00BC
canonicalizes into U+0031 followed by U+2044 followed by U+0034.
There are dozens of other common cases of easily-entered
non-canconical forms, and thousands of less common cases that could
still be found without much effort.
--Paul Hoffman, Director
--Internet Mail Consortium