Proposed text changes

6.4 IRIs

An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [UNICODE] that conforms to the syntax defined in RFC 3987 [IRI]. IRIs are a generalization of URIs [URI]. Every absolute URI and URL is an IRI.

IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.

Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [IRI]. Further normalization MUST NOT be performed when comparing IRIs for equality.

NOTE: When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [IRI]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.

NOTE: Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.

NOTE: Previous versions of RDF used the term “RDF URI reference” instead of “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ (double quote), and “ ” (space). In IRIs, these characters must be percent-encoded as described in section 2.1 of [URI].

NOTE: Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [IRI]. Non-normalized forms that should be avoided include:

Notable consequences

1. The characters “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ (double quote), and “ ” (space) were allowed in URIrefs, and are not allowed in IRIs, so any data containing these characters *unescaped* is now invalid. Data containing these characters in %-encoded form is fine.

2. There was a note stating that URIrefs are compatible with the anyURI datatype. This is no longer the case as anyURI allows the characters above, but IRIs don't, so the note is simply removed.

3. A note said: “The use of %-escaped characters in RDF URI references is strongly discouraged.” This is a problem. There are many completely reasonable URIs the cannot be expressed as IRIs without %-encoding, for example this one: http://google.com/search?q=rdf%20semantics … I removed the note, and subsumed it into another note that discourages the use of %-encoding *iff the unencoded char is allowed in an IRI*.