Protecting Legacy Web Applications with AntiXSS

Protecting against Cross-Site Scripting, or XSS, attacks can be a daunting challenge, especially given some of the creative ways in which hackers bypass common developer approaches to defeating XSS. Happily, Microsoft's Anti-Cross-Site-Scripting Library provides excellent protection from XSS in a very flexible and very easy-to-implement manner.

Older Websites and the Perils of HTML

I'm currently working on a rewrite of an older, and very successful, site that has collected a large number of members over the years. As the site is essentially social in nature, it has allowed users (or members) to create their own profile pages where users have been able to customize their own profiles by adding their own HTML “flair.” Likewise, forums on the site have allowed unfettered use of HTML and markup as well. Amazingly (and despite the site's successes) it's never attracted hackers bent on injecting XSS attacks. As such, the site's continued allowance of HTML usage by end-users has gone unchecked.

Until the rewrite, of course. There are now well over 20 million different entries on the site where users may have input HTML. Addressing that problem looked to be fairly daunting because it either meant finding something magical that could clean all of this markup on output, or going through all that data and converting it to a sub-set of approved markup - or some form of light-weight markup specifically designed for forums, comments, and the like. And my experience with most conversion routines is that they all take the wrong approach to avoiding XSS. In other words, they try to scrub against certain known attack vectors.

On Injection and the Perils of Scrubbing HTML

A better term for scrubbing against known exploits is what security experts call black-listing. Meaning that developers create lists of un-acceptable input that they try to protect against. The problem with this approach, however, is that it's prone to developer error, and hackers are notoriously known for being able to circumvent scrubbing routines by injecting things like null chars, unicode exploits, and the likes. Consequently, black-listing isn't a security best practice (whether the concern is XSS, SQL Injection, or what have you).

Instead, the best approach to protecting applications and users from malicious input is to use white-listing approaches: where developers define allowed, and acceptable forms of input - and only allow input that matches those patterns. A good, non-development example of white-listing versus black-listing would be the case where a parent tries to set up Internet access for a teen or pre-teen. Using a black-listing approach, the parent would try to block certain terms or words in the names or content from sites where those words or terms were deemed inappropriate, an insanely daunting task. Conversely, white-listing would be like allowing access to just a handful of trusted sites and blocking access to anything that didn't come from those sites.

In development circles, the use of white-listing along with the best-practice of separating user input (i.e. data) from control-of-flow language (i.e. your application) leads to the creation of secure applications. In the case of XSS, separating user-supplied data or input from your application's markup is accomplished by using HTML encoding techniques. Similarly, in terms of SQL injection the use of parameters in code that accesses the database separates user input from control-of-flow language (SQL) to prevent injection attacks.

Enter Microsoft's AntiXSS Library

I first heard of Microsoft's Anti-Cross-Site-Scripting Library quite a while ago, and given that it's called a library, I assumed it was a rather heavy set of APIs, tools, and other routines that could be used to block XSS in a variety of fashions. Ultimately, that's still an accurate perception of what the Anti XSS Library provides. But what I missed was how insanely easy this library is if you're in a situation like mine: I just wanted some easy, and performant, way to white-list user-supplied HTML.

What led me to discovering the simplicity of Microsoft's AntiXSS Library was a well-timed (for me) blog post from Phil Haack. He showcased how to replace the HTML encoding functionality within ASP.NET MVC 2.0 with the functionality provided by the AntiXSS Library. Phil's purpose in posting was to showcase just how flexible ASP.NET MVC can be when it comes to encoding , but his examples showed me just how insanely easy it is to take advantage of Microsoft's Anti Cross-Site-Scripting Library.

Documentation for the AntiXSS Library includes a whole host of other options and features—IF you need them. In my case, I just needed to us the AntiXss.HtmlEncode() and AntiXss.HtmlAttributeEncode() methods and I was set.

Furthermore, since I didn't want to replace MVC's default Html.Encode functionality because I still want to fully encode parts of the site I'm working on, while allowing white-listed HTML input from users to be emitted on other parts of the site. Consequently, I created an Html.SafeMarkupEncode() extension method for use within my Views, and I also created a simple, static, class that wraps calls to the AntiXss object itself - as a replacement/surrogate for my ViewModels that are call HttpUtility.HtmlEncode (and friends).

Then, from there, sit back and watch your legacy data continue to emit safe HTML while blocking malicious content. It's really that easy. (I was expecting to spend a few hours or even days on getting a solution in place. In the end I was up and running with this solution in less than 30 minutes.)

Conclusion

Even though the AntiXSS provides great performance and does a fantastic job of white-listing markup, I'm still of the opinion that forcing end-users to use a sub-set of approved markup would be a better way to go on new sites. In the case of this site, the AntiXSS Library is a total win: It provides excellent protection, great performance, and avoids the need to dump tons of hours into a complex and customized solution that would have been non-trivial to implement.

But, by the same token, letting end-users throw out full blown <table> and <div> tags (or even mis-matched, </div> tags) still leaves a lot to be desired when looking at a member profile page where there's a non-threatening (at least from an XSS perspective) purple marquee floating through an ugly yellow table with pictures of unicorns everywhere. In other words, sometimes allowing full customization of markup is just a bit too much overkill - even when something like the AntiXSS library makes it safe from attack vectors.