Created attachment 327885[details]
IBM advisory
Justin Schuh and Tom Cross of the IBM X-Force and Peter Williams of IBM Watson Labs reported this vulnerability to security@m.o.
I have attached the full advisory they provided. I can confirm that this crashes Firefox 2.0.0.15 but not 3.0. I will attach a sample backtrace from the crash I'm seeing which for me appears to be a null deref with a small offset. This differs quite a bit from the advisory notes, so it would be great if we can get some additional eyes looking at this to figure out what's going on.

In a debug build I get the following assertions:
WARNING: NS_ENSURE_TRUE(IsUTF8(input)) failed, file c:/dev/ff2/mozilla/netwerk/dns/src/nsIDNService.cpp, line 267
###!!! ASSERTION: Not a UTF-8 string. This code should only be used for converting from known UTF-8 strings.: 'Error', file ../../../dist/include/string\nsUTF8Utils.h, line 264
###!!! ASSERTION: not a UTF8 string: 'Error', file ../../dist/include/string\nsUTF8Utils.h, line 151
###!!! ASSERTION: Input wasn't UTF8 or incorrect length was calculated: 'Error', file c:/dev/ff2/mozilla/xpcom/string/src/nsReadableUtils.cpp, line 262
And then an access violation in nsTAString_CharT::Length() where "this" is suspiciously equal to 0x00410041 ("AA") and of course contains garbage.
String code out of control, need to assume the worst.
Should the string code be more robust in the face of non-UTF8 (but thereby slower), or is it the responsibility of the callers (e.g. IDN in this case) to keep non-UTF8 out? It looks like we're taking the latter tack but that seems dangerous.

The demo you were provided has hard-coded offsets for a locally opened file in Firefox 2.0.0.14 on Windows; the shellcode just pops up a command shell and kills the browser. If you need it, I can see if IBM/ISS will let me provide you a proof-of-concept updated for 2.0.0.16. It's easy to do technically, but I don't see the point since even the version with 2.0.0.14 offsets overwrites EIP with the "AA" filler string from the URL.
Also, I just checked the current code to see if the bug is still there. The surrogate pair handling in ConvertUTF16toUTF8::write() has changed a bit between versions, invalidating the proof-of-concept you were provided. However, an exploit may still be possible by landing the last byte of the input on a valid high-surrogate. I haven't looked at the code in a while, so I'm not sure if that vector is feasible from the improperly cracked URL.

(In reply to comment #5)
> Should the string code be more robust in the face of non-UTF8 (but thereby
> slower), or is it the responsibility of the callers (e.g. IDN in this case) to
> keep non-UTF8 out? It looks like we're taking the latter tack but that seems
> dangerous.
I agree. What is causing this crash is that nsIDNService checks for invalid UTF-8 in Normalize(), but not in ConvertUTF8toACE(). We can patch that specific case (and maybe that's all we should do for the 1.8 branch), but in general I think we should make the string code handle invalid input better.

(In reply to comment #11)
> Hm, but where does the invalid UTF-8 come from?
>
I provided a detailed explanation of the entire vulnerable code path in the advisory attached to this bug. However, the short answer is that you're handling whitespace inconsistently inside nsBaseURLParser::ParseURL(). So, when it cracks the URL the end of the string lands inside a multibyte character.

> However, the short answer is that you're
> handling whitespace inconsistently inside nsBaseURLParser::ParseURL(). So, when
> it cracks the URL the end of the string lands inside a multibyte character.
So why shouldn't we fix ParseURL instead?