On 2010/10/23 6:40, Phillips, Addison wrote:
> (co-chair hat OFF)
[Editor hat off]
> Trac issue: http://trac.tools.ietf.org/wg/iri/trac/ticket/23
>
> So here is a slightly different version of the same proposal. Bjoern, does this suitably allay your concerns? I recognize that you would prefer us to eliminate the conversion step altogether, but some feel that the conversion step is not altogether obvious, even if it is intrinsic to what follows.
A background thought: I think it is perfectly fine to put a few strokes
up on a whiteboard (as we just did here at the editorial meeting) that
look like 'BjÃ¶rn', or to send 'BjÃ¶rn' in an email encoded in iso-8859-1
(as in this mail) and claim that these are characters from the
UCS/Unicode (they indeed are in the Unicode repertoire), and
nevertheless still have to do some conversion work to an Unicode form.
This wasn't obvious to me last time we discussed this issue.
Regards, Martin.
> ---
> An IRI or IRI reference is a sequence of characters from the UCS. For IRIs that are not already encoded in Unicode (as when written on paper, read aloud, or represented in a text stream using a legacy character encoding), convert the IRI to Unicode. Note that some character encodings or transcriptions can be converted to or represented by more than one sequence of Unicode characters. Ideally the resulting IRI would use a normalized form, such as Unicode Normalization Form C [UAX15] (see [Section 5] Normalization and Comparison), since that ensures a stable, consistent representation that is most likely to produce the intended results. Implementers and users are cautioned that, while denormalized character sequences are valid, they might be difficult for other users or processes to reproduce and might lead to unexpected results.
> ---
>
--
#-# Martin J. DÃ¼rst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp