Hello everybody,
Thanks to Mark for bringing out the minutes so quickly (for this call,
when compared to the average in the past).
On 2009/06/12 13:01, Mark Nottingham wrote:
> - DRAFT -
>
> W3C/IETF call
>
> 14 May 2009
> URI/IRI coordination
>
> John: it's been escalated to the IAB but no schedule yet.
Good to know. The sooner we know if the IAB has any concerns, what these
concerns are, who has them, and how we can talk to them, the better.
> IDNA bis is
> in a state of confusion but hoping we're making progress.
> ... one of the key questions is what kind of domain names are permitted
> in IRI and URI.
Currently, URIs (RFC 3986) permit ASCII-only domain names and IDNs
encoded in UTF-8 and percent-escaped. There is a backwards-compatibility
warning for the later. You can test whether that is implement in browsers at
http://www.w3.org/2004/04/uri-rel-test.html#reg-percent
Opera works. Firefox does something strange, it shows the decoded IDN in
the address field, but then says Address Not Found in the page area,
giving the percent-encoded version. Definitely confusing for users.
IE7 works only for http://www.w%33.org, which isn't really an IDN.
Safari gives a Network Error (dns_unresolved_hostname), but shows the
IDN, unescaped, in the page area, and the xn-- version in the address
bar. Pressing enter in the address bar resolves the page, and changes
the address bar to the real IDN.
> ... percent encoding in utf-8 in URI would be an issue for IRI as well.
And is covered in RFC 3987. The IRI spec has the following to say re.
the backwards-compatibility issue (appologies for the length):
Systems accepting IRIs MAY convert the ireg-name component of an IRI
as follows (before step 2 above) for schemes known to use domain
names in ireg-name, if the scheme definition does not allow
percent-encoding for ireg-name:
Replace the ireg-name part of the IRI by the part converted using the
ToASCII operation specified in section 4.1 of [RFC3490] on each
dot-separated label, and by using U+002E (FULL STOP) as a label
separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the
flag AllowUnassigned set to FALSE for creating IRIs and set to TRUE
otherwise.
The ToASCII operation may fail, but this would mean that the IRI
cannot be resolved. This conversion SHOULD be used when the goal is
to maximize interoperability with legacy URI resolvers. For example,
the IRI
"http://r&#xE9;sum&#xE9;.example.org"
may be converted to
"http://xn--rsum-bpad.example.org"
instead of
"http://r%C3%A9sum%C3%A9.example.org".
An IRI with a scheme that is known to use domain names in ireg-name,
but where the scheme definition does not allow percent-encoding for
ireg-name, meets scheme-specific restrictions if either the
straightforward conversion or the conversion using the ToASCII
operation on ireg-name result in an URI that meets the scheme-
specific restrictions.
Such an IRI resolves to the URI obtained after converting the IRI and
uses the ToASCII operation on ireg-name. Implementations do not have
to do this conversion as long as they produce the same result.
and later down:
Note: In practice, whether the general mapping (steps 1 and 2) or the
ToASCII operation of [RFC3490] is used for ireg-name will not be
noticed if mapping from IRI to URI and resolution is tightly
integrated (e.g., carried out in the same user agent). But
conversion using [RFC3490] may be able to better deal with
backwards compatibility issues in case mapping and resolution are
separated, as in the case of using an HTTP proxy.
Note: Internationalized Domain Names may be contained in parts of an
IRI other than the ireg-name part. It is the responsibility of
scheme-specific implementations (if the Internationalized Domain
Name is part of the scheme syntax) or of server-side
implementations (if the Internationalized Domain Name is part of
'iquery') to apply the necessary conversions at the appropriate
point. Example: Trying to validate the Web page at
http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.
example.org, which would convert to a URI of
http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.
example.org. The server side implementation would be responsible
for making the necessary conversions to be able to retrieve the
Web page.
> ... some coordination with html5 might be useful to be in sync.
Yes. It would be good if interested people subscribed to public-iri@w3.org.
> mnot: this is a serious issue indeed and the html5 has probably
> different ideas.
> ... would it be useful to collect the issues somewhere?
>
> John: yes.
>
> mnot: if you send me what you have, I'm willing to help. will also get
> some input from Thomas as well.
>
> <scribe> ACTION: John to send IRI issues to Mark
>
> <scribe> ACTION: Mark to put IRI issues in wiki
What wiki is that? There is an issues list for the update of the iri
spec at http://www.w3.org/International/iri-edit/#Issues.
> <JcK> plh: That is really URI/IRI issues in both case -- the URIs are
> the much harder problem in some ways.
It would definitely be good to get to know more about what people on the
call thought the issues were. It is extremely difficult to guess from
the minutes (a general problem with minutes).
> In some respects, the _only_ IRI
> problem is how much they can be treated as protocol elements and what
> that means
Here is what I wrote on the question of protocol elements vs. user
interface elements a couple weeks ago in a widely circulated private
thread, slightly adapted:
Some people think that anything including non-ASCII characters is bound
to fail sooner or later, and therefore don't want to have them in any
protocol. Some people think that the more places IRIs get accepted, the
better. I'm personally somewhat tending to the later view, but it is
very clear that there are some protocols (and formats) where IRIs are
very appropriate, and others where they are not. Also, I think that the
terms 'protocol element' and 'user interface element' are valuable, but
because there are many protocols, and many user interfaces, it's not the
only distinction that counts.
To mention some specific examples, IRIs (or some variant thereof) are
used in HTML, in places that would probably rather be called 'protocol
elements' than 'user interface elements', with the caveat that HTML
isn't really a protocol but a format.
A strict 'user interface element only' view would prohibit that, but
given that IRIs currently work on most browsers, and that people use
them, and that people think that they can copy something from an
address/location field to an href attribute in an HTML document, and so
on, seems to suggest that allowing IRIs in HTML 'protocol elements'
isn't overly harmful. Similar considerations, with less or different
baggage, apply e.g. to XML and Atom.
There may be analogies with other IETF work. For IDNs, you can see
punycode as the protocol element version, and call actual IDNs 'user
interface elements', and that view is certainly correct from a DNS
perspective. But a higher layer protocol might easily use INDs directly.
A good example here would be EAI (email address internationalization),
where indeed IDNs are used directly as right hand sides.
I think such a pragmatic view (make sure we know where IRIs are allowed,
and where not; maybe give some guidelines for protocol designers on how
to decide) will prevail in the end.
Regards, Martin.
> [NEW] ACTION: John to send IRI issues to Mark [recorded in
> http://www.w3.org/2009/05/14-ietf-minutes.html#action06]
> [NEW] ACTION: Mark to put IRI issues in wiki [recorded in
> http://www.w3.org/2009/05/14-ietf-minutes.html#action07]
--
#-# Martin J. DÃ¼rst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp