> The document does sort of say that IRIs must be valid IRIs, as does
> rdf-concepts so it is a matter of how prominently to say it.
>
> Turtle:
> [[
> 6.2 RDF Term Constructors
>
> production type
> IRIREF IRI
>
> The characters between "<" and ">" are unescaped¹ to form the
> unicode string of the IRI.
> ]]
>
> so it says IRIREF produces an IRI and hence conformance checking is
> done. It's prominent though.
>
>
> Did you mean to say it's NOT prominent? At any rate, I think it's
> debatable. That passage could also be read to mean that the only thing
> required to produce an IRI is unescaping the stuff between '<' and '>'.
>
> -Alex
Sorry - yes I did mean not prominent and yes the exact meaning is not
immediately clear. I was just finding a place where it seems to rule
out non-IRIs.
> That makes sense. So what's the point of the IRIREF pattern being
> something more complex than /<[^ \t\n\r>]*>/ ? (Or even /<[^>]*>/, or
> -- if you have the nongreedy operator -- just /<.*?>/)
>
>
> I think the grammar as written is a happy compromise of rejecting input that is obviously not an IRI since it contains illegal characters, without introducing the full-blown complexity of RFC3987. Keeping in mind that not all environments will have access to an IRI library, I don't think it's appropriate to allow absolutely everything within the <> brackets.
+1
We had this debate in SPARQL 1.0 and exactly that point of at least
rejecting impossible characters in the grammar token rule was the
decision. And now, it enforce \u rules in the tokenizer which is good
at such things.
Andy