What to do with percent-encoded bytes in the host? Examples (prepend
http:// for a full URL):
1. x%2ex
2. %80
3. %41
4. %C3%A9
(Append % to each of them for confusing results, e.g. Opera handles
%41% and %C3%A9% differently.
http://dump.testsuite.org/url/inspect.html can be used for testing.)
There's a bunch of different approaches we could take (and most of
these seem to be done in one way or another):
1. Convert percent-encoded bytes to bytes, decode as utf-8.
2. Convert percent-encoded bytes to bytes, only decode those as utf-8
that represent a valid sequence.
3. Ignore percent-encoded bytes.
Chrome seems to do 1 (although with weird results when you hit a decoder error).
Opera and Safari seem to do 2.
Firefox seems to do 3.
Personally I'm leaning towards either 1 (without the weirdness) or 3.
The potential downside may be that with either of those you can no
longer transmit bytes higher than 0x7F over DNS or equivalent system.
Not sure if that's a problem in practice as user agents seem to mostly
fail already...
--
http://annevankesteren.nl/