Ian Hickson wrote:
> On Sun, 20 Jul 2008, Jonas Sicking wrote:
>> Ian Hickson wrote:
>>> On Sat, 19 Jul 2008, Jonas Sicking wrote:
>>>> According to the HTML5 spec space is a valid characted inside URLs.
>>> That wasn't intentional -- can you point to where it says that? The HTML5
>>> spec relies on spaces not being allowed in URLs in various places.
>> In section 2.3.2 (Parsing URLs):
>>
>> # Add all characters with codepoints less than or equal to U+0020 or
>> # greater than or equal to U+007F to the <unreserved> production.
>
> This is in the context of:
>
> # 2. Parse url in the manner defined by RFC 3986, with the following
> # exceptions:
>
> It isn't defining what's allowed. What's allowed is defined in the earlier
> section:
>
> # A URL is a valid URL if at least one of the following conditions holds:
> # ...
>
> ...which basically just says it's a valid URL if it's a valid URI or IRI
> (with some caveats in the case of IRIs to prevent legacy encoding
> behaviour from handling valid URLs in a way that contradicts the IRI
> spec). This doesn't allow spaces.
Hmm.. I'm confused. From your and Maciejs answer it sounds like the
algorithm doesn't specify what is valid, but what is parsed? What is the
difference? What a 'AC header validator' would complain about?
If so, that doesn't really buy much as far as forwards compatibility
goes. We have to be backwards compatible with what UAs accept, not what
validators accept.
However doing something like what Maciej suggests, of stopping the url
parser at the first whitespace character, sounds like it would solve the
forwards compat issue.
However, if the HTML5 algorithm only considers the same URLs valid as
RFC 3986 does, is there a reason not to point directly to RFC 3986
instead? Seems like there is no reason to have more relaxed error
handling here than needed?
/ Jonas