Nov 24, 2012

In academic research a problem is solved when it is fully understood and a solution is shown to work in a practical setting. If we define "XSS solved" as every instance of XSS eradicated from earth we will probably not see a solution in our lifetime. So, from a research perspective, is XSS solved already?

"XSS Is Solved"

One of the break-out sessions was on XSS. Someone had voiced the opinion that XSS is solved already the day before. The break-out session took the claim seriously and hashed it out.

From a principal standpoint, which is the typical standpoint of academic research, a problem like XSS is solved when a) we fully understand the problem and its underpinnings, and b) have a PoC solution that is practical enough to be rolled out and has the potential to solve the problem fully.

Do we fully understand XSS and its underpinnings?

Important Papers on XSS

Looking at recent publications we arrived at the following short list that we felt summarizes how academia understands XSS today:

The conclusion was that yes, we think the understanding of XSS is fairly good. But we lack a definition of XSS that would summarize this understanding and allow new attack forms to be deemed XSS or Not XSS.

Current Definitions of XSS

Can you believe that? We still don't have a reasonable definition of XSS.

But it can easily be shot down. Do we need "web pages" to have XSS? Does an attack have to be "viewed by other users" to be XSS? More importantly the Wikipedia definition doesn't say whether the attackers' scripts have to be executed or not or in what context. With default CSP in place you can still inject the script into a page, right? With sandboxed JavaScript you can both inject and execute without causing an XSS attack. And what about these "attackers"? Can they be compromised trusted third parties, legitimate users of the system, or even clumsy business partners?

OWASP says "Cross-Site Scripting attacks are a type of injection problem, in which malicious scripts are injected into the otherwise benign and trusted web sites. Cross-site scripting (XSS) attacks occur when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user."

Again "web sites" seem to be a prerequisite, but are they? Here the injected scripts have to be "malicious", but do they? And does the target web site have to be "benign and trusted"? OWASP just like Wikipedia fails to state that the injected script has to be executed. Then OWASP changes its mind and says XSS happens when an attacker "uses a web application to send malicious code". Clearly, this widens the scope beyond JavaScript. But look at that sentence and imagine Alice using gmail.com to send an email to Bob containing a malicious code sample. Alice has done XSS since she used a web application to send malicious code.

I know I'm nit-picking here. Neither Wikipedia nor OWASP have proposed an academic definition of XSS. They're trying to be pedagogical and reach out to non-appsec people.

But we still need a (more) formal definition. To be clear, we need a definition of XSS that allows us to say if a certain vulnerability or attack is XSS or not. Without such a definition we cannot know if countermeasures such as CSP "solves XSS" or not.

Also, Dave Wichers brought up an interesting detail at this year's OWASP AppSec Research conference in Athens. We need to redefine reflected XSS, stored XSS, and DOM-based XSS into server-side XSS reflected and stored, and client-side XSS reflected and stored.

Current, insufficient categorization of XSS.

Proposed new categorization of XSS.

A New Candidate Definition of XSS

To get the juices flowing at the castle we came up with a candidate definition of XSS that the rest of the participants could shoot down.

Candidate definition of XSS: An XSS attack occurs when a script from an untrusted source is executed in rendering a page.

It was shot down thoroughly, in part by yours truly :).

Terms more or less undefined in the candidate definition:

Script. JavaScript, any web-enabled script language, or any character sequence that sort of executes in the browser?

Untrusted. What does trusting and not trusting a script mean? Who expresses this trust or distrust?

Source. Is it a domain, a server, a legal entity such as Google, or the attacker multiple steps away in the request chain?

Executed. Relates to "Script" above. Does it mean running on the JavaScript engine, invoke a browser event, invoke an http request, or what?

Rendering. Does rendering have to happen for an attack to be categorized as XSS?

Page. Is a page a prerequisite for XSS? Can XSS happen without a page existing?

So Is XSS Solved?

Back to the original question. The feeling at Dagstuhl was that CSP is the mechanism we're all betting on to solve XSS. Not that it's done in version 1.0, not even 1.1. But it's a work horse that we can use to beat XSS in the long run.

What we need right now is a satisfactory definition of XSS. That way we can find the gaps in current countermeasures (including CSP) and get to work on filling them. Don't be surprised if the gaps are fairly few and academic researchers start saying "XSS is solved" within a year. Hey, they need to work on application security problems of tomorrow, not the XSS plague in all the legacy web apps out there.

Please chip in by commenting below. If you can give a good definition of XSS, even better!

3 comments:

Good post! Interesting papers, need to reserve some time for reading them more thoroughly. Especially XSS-4, as described in Dave's paper (https://www.owasp.org/images/c/c5/Unraveling_some_Mysteries_around_DOM-based_XSS.pdf) requires more discussion.

How about something like this for the definition: XSS attack occurs when a script originating from an attacker is executed in the victim's browser under the security context of a target site.

XSS should be, as the name suggests, about a script being run. Other similar kind of "scriptless" injections probably require their own definitions.

Actually, in my opinion the term "security context" should be extended a bit. E.g. what if a script is launched within a sandboxed iframe, but a password prompt box appears hovering over the target site?

To be a bit more academic, I think DOM based XSS could also be categorized under "instant" and "non-instant"/"event-driven" XSS -> whenever e.g. onmouseover or onclick event is required to launch the XSS attack. Also "static" and "dynamic" are something to be considered.

What do you think, should XSS-4 also be split into various subcategories based on where it's stored, e.g.:- HTML5 local storage- persistent cookie- browser plugin- browser bookmark (???)

Actually, reflected XSS initiating URL is also usually stored somewhere (e.g. email client, another site, the same site), of course unless the attacker was able to type it directly into the location bar. This may be confusing for an academic researcher.

Is it relevant to specify different XSS types based on where it originates from and how the attack is launched? Or is it irrelevant from the definition point of view, are they just different attack vectors? I mean, e.g. reflected XSS that utilizes browser vulnerability is hard to be solved, but it's still XSS by the definition.

How about this: should we call it an XSS attack, if a site loads e.g. maliciously modified scripts from known ad/tracking third party domains? This very common vulnerability is known as cross-site script include, but is the attack still XSS or something else like "malicious scripting by trusted 3rd party"??

And more generally, is it anymore XSS, if a scripts are specifically allowed to be included under the security context of a site without encoding, and the attacker makes the script to do something malicious?

I agree that to get an answer to "Is XSS solved", XSS has first to be demystified. With stricter definitions we can more specifically say what kind of XSS a particular "XSS prevention solution" approaches.