User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081022 Minefield/3.1b2pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2pre) Gecko/20081022 Minefield/3.1b2pre
When you load a page in Firefox the location bar makes some magic and "decodes" the URL, making %c3%b6 for instance appearing as "ö". All this is fine, but when I just go to the location bar and press ENTER the encoding gets screwed up.
What I think Firefox does wrong:
It doesn't remember what encoding the page loaded had (UTF-8) and therefore just encodes it as ISO-8859-1 or something like this. It should "remember" the encoding.
Reproducible: Always
Steps to Reproduce:
1.Load the URL in the URL field, see that the textbox to the right contains "höjning"
2.Go to the location bar and press ENTER
3.See how the text box changes to h�jning
Actual Results:
The URL encoding changed
Expected Results:
The URL encoding stays the same

Presumably this happens because network.standard-url.encode-query-utf8 is false, which seems to be bug 333859. I guess this is a duplicate of that bug. Some context is in bug 261929.
IE 7 has the same bug, but doesn't automatically decode visited URIs, so it's more difficult to hit (you need to paste in the UTF-8 string). I wonder whether we should avoid decoding the query part...

Oops, misunderstood Gavin, who's proposing that we mitigate in Firefox.
The whole point of encoding the query strings is so that when someone types something like "höjning" into their location bar, it matches what they expect. If I'm understanding this right, though, when they then find the thing that looks as expected, hitting ENTER won't have predictable results.
To me that implies that not decoding the query string part will just mean that we won't get the right sort of smart location bar matches, which is a different problem.
I think the best solution is to have this fixed by bug 333859 ...

(In reply to comment #6)
> The whole point of encoding the query strings is so that when someone types
> something like "höjning" into their location bar, it matches what they expect.
> If I'm understanding this right, though, when they then find the thing that
> looks as expected, hitting ENTER won't have predictable results.
If the site expects it's parameters to be UTF-8 encoded, and the character is in the query string, yeah.
> To me that implies that not decoding the query string part will just mean that
> we won't get the right sort of smart location bar matches, which is a different
> problem.
I don't think the awesomebar is affected by this either way, since we explicitly decode URIs there.

(In reply to comment #7)
> I don't think the awesomebar is affected by this either way, since we
> explicitly decode URIs there.
I was wrong here... we store the correctly encoded URI, assuming you reached the site by actually submitting the search or clicking a link rather than just entering the URL in the URL bar, but when you recall that entry in the awesomebar we decode it again, triggering this bug.

Looks like it's windows-only because LoadURI() takes a wstring, and somehow before that gets passed to NS_NewURI that is assumed to be in the platform native character set (which is UTF-8 on Linux and Mac)?

Bug 393246 will fix this. It will have fewer side-effects than bug 333859.
Bug 393246 was backouted because of bug 432836. But I don't think bug 432836 is a valid claim because even Firefox 2 does not "work" if the system codepage is not Windows-1252.

Instead of waiting for a fix for bug 393246 or bug 333859, I see the simple solution to just only decode URLs when encoding them again would lead to the same URL.
For example when we encode using the system charset, only decode URLs that are in the system charset, too. If we encode only in UTF-8, only decode when the source URL is in UTF-8, too.

Comment on attachment 569980[details][diff][review]
Part 2: Use UTF-8 flag in the browser
>diff --git a/browser/base/content/tabbrowser.xml b/browser/base/content/tabbrowser.xml> <method name="loadOneTab">
>+ <parameter name="aIsUTF8"/>> <method name="addTab">
>+ <parameter name="aIsUTF8"/>
Please don't add to the actual parameter lists - just make any consumers that need to pass this parameter use the argument-object form (looks like they already all are).
>diff --git a/browser/base/content/urlbarBindings.xml b/browser/base/content/urlbarBindings.xml> function loadCurrent() {>+ if (!this.valueIsTyped)
>+ flags |= Ci.nsIWebNavigation.LOAD_FLAGS_URI_IS_UTF8;
Can you add a comment explaining the reasoning here? "If the value wasn't typed, we know that we decoded the value as UTF-8 (see losslessDecodeURI)" or somesuch.
>diff --git a/browser/base/content/utilityOverlay.js b/browser/base/content/utilityOverlay.js>@@ -194,16 +194,17 @@ function openLinkIn(url, where, params) >+ var aIsUTF8 = params.isUTF8;
Looks like you only implemented support for this for where=="tab" or "current" - this won't work if where=="window" (and probably "save"). That's probably not a big deal but it's worth a comment at least.
Looks good otherwise, will r+ with those addressed.

Created attachment 571337[details][diff][review]
Part 2: Use UTF-8 flag in the browser, v2
Resolved review comments.
> Looks like you only implemented support for this for where=="tab" or "current" - this won't work if where=="window" (and probably "save"). That's probably not a big deal but it's worth a comment at least.
Yeah, I couldn't find a way to open the non-typed url to a new window or save it from the location bar (not from the dropdown which works already). Added a comment.

Comment on attachment 571557[details][diff][review]
An automated test
>diff --git a/browser/base/content/test/Makefile.in b/browser/base/content/test/Makefile.in>+ browser_bug461304.js \
Please name the test "browser_urlbarEnter.js" instead (we'd like to discourage naming test files browser_bugXXXXXX).
>diff --git a/browser/base/content/test/browser_bug461304.js b/browser/base/content/test/browser_bug461304.js>+let gFocusManager = Cc["@mozilla.org/focus-manager;1"].
>+ getService(Ci.nsIFocusManager);
I think you could remove this from this test (and the associated checks), since that's already covered in other tests.
>+function locationBarEnter(aEvent, aClosure) {
>+ setTimeout(function() {
You can use executeSoon for this instead of setTimeout. But I wonder whether you need to do this here - can you do it only for the cleanup and runNextTest call that is in runNextTest? E.g.:
function runNextTest() {
...
addPageShowListener(function() {
locationBarEnter(test.event, function() {
test.check(tab);
executeSoon(function () {
// Clean up
while (gBrowser.tabs.length > 1)
gBrowser.removeTab(gBrowser.selectedTab)
runNextTest();
});
r=me with those addressed, thanks for the test!

Created attachment 571660[details][diff][review]
Test, for checkin
> But I wonder whether you need to do this here - can you do it only for the
> cleanup and runNextTest call that is in runNextTest? E.g.:
Without using executeSoon here, "current tab" case unexpectedly pass the test. I don't know the reason why.
Other points are addressed.

(In reply to Masatoshi Kimura [:emk] from comment #26)
> > But I wonder whether you need to do this here - can you do it only for the
> > cleanup and runNextTest call that is in runNextTest? E.g.:
> Without using executeSoon here, "current tab" case unexpectedly pass the
> test. I don't know the reason why.
"current tab" case unexpectedly pass the test even if unpatched.