URL Fragments and Redirects

URL Fragments and Redirects

I’ve worked on the Internet Explorer team for six+ years, and on web sites for a decade longer, so I’m understandably excited when I come across a browser behavior I can’t explain. Last week, I encountered such a mystery, and it took me quite a while to figure out what was going on.

Background

Facebook tends to use URL Fragments in their URLs. For instance, a car dealer’s website includes a link to their Facebook page thusly:

http://www.facebook.com/#!/MBofWhitePlains?sk=app_192229990808929

The Fragment component of the URL is the end of the URL from the hash symbol (#) onward. URL Fragments are never sent to the server in the HTTP request— only JavaScript running in the page can see them. So, when your browser loads the URL above, the server sees only “http://www.facebook.com” in the request, and it’s the responsibility of JavaScript in the returned page to examine the URL to find the extra information in the Fragment.

Clicking on the link will go to the specified URL:

…and then script on the page will redirect you to a final page which contains the “MBofWhitePlains” identifier in the URL path, clearing out the URL Fragment.

Now, you may have heard that Facebook now offers an opt-in choice to always use HTTPS when loading Facebook:

If you set this option, Facebook will immediately return a HTTP/302 redirect for a HTTPS page if your browser ever requests a page using HTTP.

That’s a problem for this scenario: because the URL Fragment is never sent to the server, the server sends your browser a redirect to https://www.facebook.com, with no URL Fragment specified. Hence, when the redirected page is loaded, the URL Fragment is blank, and you’re left on the Facebook homepage.

Now, this made perfect sense to me—a simple limitation of the way Facebook is using URLs.

Except for one thing…

While Safari and Internet Explorer both behave as expected, Firefox, Chrome, and Opera were somehow landing on the HTTPS version of the car dealership’s Facebook page—not the homepage. This was a truly surprising outcome, and I spent a ton of time ensuring that the different behavior wasn’t related to Facebook performing User-Agent sniffing and returning different responses, or anything of the sort. It turns out that the code was the same, but the browser behavior was very different.

Peeking behind the curtain

After much debugging, I realized that Firefox, Chrome, and Opera will re-attach a URL Fragment after a HTTP/3xx redirection has taken place, even though that fragment was not present in the URL specified by the Location header on the redirection response. So

The HTTP specification (RFC2616 and the active HTTPBIS revision) doesn’t specify proper behavior either, noting only that the behavior when the Location header itself contains a URL Fragment is not defined:

Note: This specification does not define precedence rules for the case where the original URI, as navigated to by the user agent, and the Location header field value both contain fragment identifiers. Thus be aware that including fragment identifiers might inconvenience anyone relying on the semantics of the original URI's fragment identifier.

…although almost all browsers appear to respect a URL Fragment specified on the redirect response. Specifically, if both the original URI and the redirect Location specify a fragment-- Internet Explorer, Chrome, Firefox, and Safari will use the Fragment component from the Location header. Opera 11.01 will instead keep the Fragment component from the original URL; they only use the Fragment component from the Location header if the original URL didn't contain a fragment at all. Opera 11.11 changed that behavior to match Chrome and Firefox.

Interesting stuff.

-Eric

Update: Internet Explorer 10 now preserves the fragment when loading a redirected resource, matching other browsers and the updated standards documents.

Update-to-the-Update: Internet Explorer 10 and IE11 behave differently than other browsers when there's no fragment on the first URL, there is on the first 302, and there's none on a second 302. (Test case)

Unless I'm mistaken, the Location field, as specified by section 14.30 of RFC 2616, is always an absolute URI. RFC3986 specifies that an absolute URI can contain a query string, but not a fragment. IE and Safari's implementation is the correct one, according to these two RFCs.

EricLaw [MSFT]

16 May 2011 7:37 PM

@DanielKi: Yeah, it's interesting. For what it's worth, HTTPBIS is updating RFC2616 to allow for relative URIs, since all browsers support these and many major sites use them.

Have you tested this with regular # (hash) fragments instead of #! (hash bang)? #! is a dirty workaround Google created to better crawl Ajax apps and it could be that Chrome/Firefox/Opera have special logic to reapply #! frags after a redirect since they are essentially a client-side query string.

@Zoompf: Nah, there's nothing special about #! when it comes to their behavior. I added another link to the test case to demonstrate this.

Nick

17 May 2011 12:22 PM

Ha! I love it when the "standards compliant" browsers bend the rules to get behavior which is probably more desirable in many circumstances. Seems to me this is the exact kind of thing that those same people railed against Microsoft on during the IE6 days.... ;)

Like I've said before -- show me someone who claims to have a standards compliant browser, and I'll show you a liar :)

Gustaf

17 May 2011 3:32 PM

Well, it is clear from the HTTPbis comment that they expect fragments from either URI to be used, since they explicitly state (unfortunately) that no precedence is defined when both URIs contains fragments. There would be no need for this comment if the Location header URI fragment overrides the original URI fragment (even with an empty URI).

But they should have stated it, and they should define the precedence as implementations clearly diverge.

@Björn: Thanks! That's a great reference, but it doesn't seem to be based on any Internet Standard. In this case, it cites tools.ietf.org/.../draft-bos-http-redirect-00, but that appears to have been expired before making progress toward standardization. The "Security Considerations" section of that draft also seems a bit sparse, as it imposes upon the web author the requirement that they avoid placing private information in fragments if there's the possibility of a redirection to an untrusted party.

I'll ask Mark whether the HTTPBIS revision can include this.

Ivan Vega

29 Jun 2011 11:35 AM

Great information. I also found out about this having the opposite problem, wanting to clear the URL fragment after a redirect.

I thought it would take me a long time to figure it out, so thanks for the solution!

In the future, the spec could probably be fixed faster and more reliably if this sort of issue was reported in the W3C bug tracker instead of just mentioned in a blog post. Adrian Bateman seems to be the one who reports most HTML5 bugs for Microsoft, so if you find further errors in HTML5, maybe you could ask him to file a bug if you don't want to create a Bugzilla account and so on.

@Aryeh: It's not entirely clear why HTML5 believes that this behavior is in their "jurisdiction", so to speak. The HTTPBIS guys have it tracked as an issue against their update to RFC2616 (and I'm generally inclined to think it more appropriately belongs there). But the jurisdictional issues here are not anything I'm experienced with or interested in.

As to the "speed" question: the problem has been known to the web standards community for 12 years (see the expired draft I linked) but interoperable behavior was never specified in a draft that made it through standardization. Roughly two months after I "just mentioned" the issue in a post, a proposal appears in a Standards-track draft spec in Last Call.

HTML5 has always been a kitchen-sink standard that's willing to include anything related to web browsers. Often some things are defined in other standards to some extent, but defined much more precisely in HTML5, with the HTML5 definition matching how browsers work much more closely. It's pretty typical that standards like HTTP will leave something undefined and HTML5 will explicitly require something.

I'm also not interested in jurisdictional issues, and nor is any of the WHATWG crowd. HTML5 is generally the best standard to follow because it's written by someone who writes very precisely, leaves nothing undefined, and tries to match real-world browser behavior above all else. Who writes what isn't really important as long as it's correct.

And yeah, I looked at the timeline again after I posted and realized your post was handled about as quickly as a bug report would have been. Speed turned out not to be an issue here at all. The risk of not filing a bug report is that the issue might fall through the cracks, but all's well that ends well. I didn't mean to be critical.

Do keep up the good work with IE's standards conformance, by the way. I realize it looks like I'm constantly criticizing when I post here, but IE9 was a quantum leap forward for the web, IE10 is shaping up to be one as well, and I'm trying to help here. I've filed two standards-related bugs in IE Connect, but I never received any response beyond a form letter, and I wasn't even notified of that by e-mail, so I got kind of discouraged. (Although now I see one wound up being fixed, which is good.)

Basically, you actually respond to what I write here a lot of the time, so I take the effort to leave feedback here when I think it would be helpful. Although sometimes posts of mine have silently disappeared, which is discouraging. If I got the kind of response on IE Connect that I got from you, or from Mozilla's bug tracker, I'd file more bugs there and point out issues less here.