I'm entirely satisfied with this response.
#g
--
At 17:17 12/05/04 +0900, Martin Duerst wrote:
>Hello Graham,
>
>I have removed the uri list for this issue, because it's
>really iri-specific.
>This is issue
>
>At 12:02 04/05/10 +0100, Graham Klyne wrote:
>
>>Section 3.1:
>>
>>There is a subtlety here that is not obvious to one not well-versed in
>>Unicode specifics:
>>[[
>> Variant B) If the IRI is in some digital representation (e.g. an
>> octet stream) in some known non-Unicode character encoding:
>> Convert the IRI to a sequence of characters from the UCS
>> normalized according to NFC.
>>
>> Variant C) If the IRI is in an Unicode-based character encoding
>> (for example UTF-8 or UTF-16): Do not normalize. Move directly
>> to Step 2.
>>]]
>>
>>This raises two questions in my mind:
>>
>>(a) what is the implication of this NFC stuff; I think a brief example
>>would help.
>
>Non-Unicode encodings are less or more prone to variability when
>transcoding. For example, when transcoding from the windows-1258
>hharset (Vietnamese), you can either transcode codepoint-by-codepoint,
>or you can normalize. For example, Vietnam is written
> Vi&#x1EC7;t Nam
>i.e. a single "LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW" in
>Unicode (in particular NFC/NFKC), whereas in windows-1258, you have
>to use the following characters:
> Vi&#xEA;&#x323;t Nam
>i.e. "LATIN SMALL LETTER E WITH CIRCUMFLEX" followed by
>"COMBINING DOT BELOW", because the character &#x1EC7; just
>cannot be encoding in windows-1252. Similar issues exist
>with all other 8-bit encodings for Vietnamese. Encodings
>for other languages are also affected, but to a lesser extent.
>
>I have added a note using this example.
>
>
>>(b) by saying "Move directly to Step 2" it sounds as if this is saying
>>that step 2 should be operated directly on the "Unicode-based character
>>encoding" rather than on the UCS characters, which I don't think is what
>>you intend. I think something like this is intended:
>>[[
>> Variant C) If the IRI is in an Unicode-based character encoding
>> (for example UTF-8 or UTF-16): Do not normalize. Apply step 2
>> directly to the encoded Unicode character sequence.
>>]]
>
>This is a helpful clarification, and a good catch, which I have
>integrated (capitalizing 'Step' in 'Step 2').
>
>
>I have tentatively closed this issue; please tell me if the
>above changes address your issue.
>
>Regards, Martin.
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact