We developed Auto-Escape specifically for general purpose template systems; that is, template systems that are for the most part unaware of the structure and programming language of the content on which they operate. These template systems typically provide minimal support for web applications, possibly limited to basic escaping functions that a developer can invoke to help escape unsafe content being returned in web responses. Our observation has been that web applications of substantial size and complexity using these template systems have an increased risk of introducing XSS flaws. To see why this is the case, consider the simplified template below in which double curly brackets {{ and }} enclose placeholders (variables) that are replaced with run-time content, presumed unsafe.

USER_NAME is inserted into regular HTML text and hence can be escaped safely by HTML-escape.

USER_ACCOUNT_URL is inserted into an HTML attribute that expects a URL and therefore in addition to HTML-escape, also requires validation that the URL scheme is safe. By allowing only a safe white-list of schemes, we can prevent (say) javascript: pseudo-URLs, which HTML-escape alone does not prevent.

USER_COLOR is inserted into a Cascading Style Sheets (CSS) context and therefore requires an escaping that also prevents scripting and other dangerous constructs in CSS such as those possible in expression() or url(). For more information on concerns with harmful content in CSS, refer to the CSS section of the Browser Security Handbook.

USER_ID is inserted into a Javascript variable that expects a number as it is not enclosed in quotes. As such, it requires an escaping that coerces it to a number (which a typical Javascript-escape function does not do), otherwise it can lead to arbitrary javascript execution. More variants may be developed to coerce content to other data types, including arrays and objects.

Each of these variable insertions requires a different escaping method or risks introducing XSS. To keep the example small, we excluded several contexts of interest, particularly style tags, HTML attributes that expect Javascript (such as onmouseover), and considerations of whether attribute values are enclosed within quotes or not (which also affects escaping).

Auto-Escape

The example above demonstrates the importance of understanding the precise context in which variables are being inserted and the need for escaping functions that are both safe and correct for each. For larger and complex web applications, we notice two related vectors for XSS:

A developer forgetting to apply escaping to a given variable.

A developer applying the wrong escaping for that variable for the context in which it is being inserted.

Considering the sheer number of templates in large web applications and the number of untrusted content they may operate on, the process of proper escaping becomes complicated and error prone. It is also difficult to efficiently audit from a security testing perspective. We developed Auto-Escape to take that complexity away from the developer and into the template system and therefore reduce the risks of XSS that would have ensued.

A Look at Implementation

Auto-Escape is a functionality designed to make the Template System web application context-aware and therefore able to apply automatically and properly the escaping required. This is achieved in three parts:

We determined all the different contexts in which untrusted content may be returned and provided proper escaping functions for each. This is part science and part practical. For example, we did not find the need to support variable insertion inside an HTML tag name itself (as opposed to HTML attributes) so we did not build support for it. Other factors come into play, including availability of existing escaping functions and backwards compatibility. As a result, part of that work is template system dependent.

We developed our own parser to parse HTML and Javascript templates. It provides methods which can be queried at a point of interest to obtain the context information necessary for proper escaping. The parser is designed with performance in mind, and it runs in a stream mode without look-ahead. It aims for simplicity while understanding that browsers may be more lenient than specifications, particularly in certain corner cases.

We added an extra step into the parsing that the template system already performs to locate variables, among other needs. This extra step activates our HTML/Javascript parser, queries it for the context of each variable then applies its escaping rules to compute the proper escaping functions to use for each variable. Depending on the template system, this step may be performed only the first time a template is used or for each web response in which case some limitations may be lifted.

A simple mechanism is provided for the developer to indicate that some variables are safe and should not be escaped. This is used for variables that are either escaped through other means in source code or contain trusted markup that should be emitted intact.

Current Status

Auto-Escape has been released with the C++ Google Ctemplate for a while now and it continues to develop there. You can read more about it in the Guide to using Auto-Escape. We also implemented Auto-Escape for the ClearSilver template system and expect it to be released in the near future. Lastly, we are in the process of integrating it into other template systems developed at Google for Java and Python and are interested in working with a few other open source template systems that may benefit from this logic. Our HTML/Javascript parser is already available with the Google Ctemplate distribution and is expected to be released as a stand-alone open source project very soon.

For some additional background on context sensitive escaping in HTML, check out OWASP's XSS Prevention Cheat Sheet at http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet. Are there differences with what is recommended there?

So you consider that on large applications, the approach of filtering data when it enters the program is not maintainable. Some dev will forget it, and there is no way to test if all the necessary filters are in place, so you put a big bold filter on the output.

It seems efficient at cleaning the output from xss attacks style, butisn't the remedy worst that the disease ?

If, when it was really necessary devs were forgetting to properly filters inputs, now that there is that filter-on-output system, won't they be even more lazy ?

It seems to me that this approach hardens protection against xss-attacks, and weaken protection against more old school attacks.

@jwilliams: Thanks for the link to the OWASP document, I hadn't seen it before. The different contexts (represented by the individual rules) in that document align well with the ones we targeted, which is great. There are small differences such as handling content in Javascript outside of string literals, CSS in style tags and perhaps more handling of non-quoted attributes but these are all reconcilable. Note that I didn't review the escaping functions themselves.

@olivvv: We didn't write about input validation here as the blog post was already lengthy. We certainly encourage consistent and strong input validation because it helps to prevent not only XSS bugs but also many other correctness and security problems too. There is no reason you can't have both at the same time (input validation and output escaping). That said, in many cases, input validation alone is not a sufficient defense against XSS.

It seems that the inferred filter may be less specific than the actual syntax for the value to be substituted. For instance, the CSS example expects a colour, but the inferred filter will only reject specific unsafe CSS constructs, by the sound of it.

Other than reading the source code, is there any documentation on precisely what each escaping filter does?