Cross Site Scripting (XSS) attacks are amongst the most common types of attacks against web applications. XSS attacks all fall under the same category however a more detailed look at the techniques employed during XSS operations reveals a multitude of tactics that exploit a variety of attack vectors. A detailed look at XSS attacks can be found in the following article: Cross-Site Scripting attack.

This article guides you through preventing XSS attacks – the most common and useful XSS prevention mechanisms which are Filtering and Escaping.

Filtering for XSS

All XSS attacks infect your web site via some form of User Input. XSS attack code could come from a simple <FORM> submitted by your users, or could take a more complex route such as a JSON script, XML web service or even an exploited cookie. In all cases the web developer should be aware that the data is coming from an external source and therefore must not be trusted.

The simplest and arguably the easiest form of XSS protection would be to pass all external data through a filter which will remove dangerous keywords, such as the infamous <SCRIPT> tag, JavaScript commands, CSS styles and other dangerous HTML markup (such as those that contain event handlers.)

Many web developers choose to implement their own filtering mechanisms; they usually write server-side code (in PHP, ASP, or some other web-enabled development language) to search for keywords and replace them with empty strings. I have seen lots of code that makes use of Regular Expressions to do this filtering and replacing. This technique is in itself not a bad one, however unfortunately the hackers usually have more experience than the web developers, and often manage to circumvent simple filters by using techniques such as hex encoding, unicode character variations, line breaks and null characters in strings. These techniques must all be catered for and that is why it is recommended to use some sort of library that has been tried and tested by the community at large.

Many libraries exist to choose from, and your choice will primarily depend on the backend technology that your web server uses. What is important is that you choose a library that is regularly maintained by a reliable source. XSS techniques keep changing and new ones emerge all the time so your filters will need to be updated periodically to keep abreast with the changing attacks.

If you are using Java, then a good place to go is XSS Protect, a project hosted on Google code. It claims to filter all “known” XSS attacks from HTML code. PHP boasts a more comprehensive library called HTML Purifier which licensed as Open Source and can be customised depending on your needs. HTML Purifier also boasts strict standards compliance and better features than other filters.

Another interesting library you can use is HTML Markdown which converts text from your users into standard and clean XHTML. This gives the advantage that minimal HTML Markup can exist in your user’s input (such as bold, underline and colours). HTML Markdown is a Perl library and does not explicitly advertise XSS prevention features so it probably should not be your only line of defence.

The side-effect with these filtering techniques is that legitimate text is often removed because it hits one or more of the forbidden keywords. For example, I would not be able to publish this article if the blogging software I used was filtering out all my HTML tags. I would not be able to write things like <SCRIPT> and alert(‘you have been hacked’) as these would be filtered out and you would not see them. If you want to preserve the original data (and its formatting) as best as possible you would need to relax your filters and employ HTML, Script and CSS Escaping techniques, all of which I explain in the next section.

Escaping from XSS

This is the primary means to disable an XSS attack. When performing Escaping you are effectively telling the browser that the data you are sending should be treated as data and should not be interpreted in any other way. If an attacker manages to put a script on your page, the victim will not be affected because the browser will not execute the script if it is properly escaped.

Escaping has been used to construct this article. I have managed to bring many scripts into your browser, but none of these scripts has executed! The technique used to do that is called, escaping, or as the W3C calls it “Character Escaping”.

In HTML you can escape dangerous characters by using the &# sequence followed by the its character code.

An escaped < character looks like this: &#60. The > character is escaped like this: &#62. Below is a list of common escape codes for HTML:

Escaping HTML is fairly easy, however in order to properly protect yourself from all XSS attacks you require to escape JavaScript, Cascading Style Sheets, and sometimes XML data. There are also many pitfalls if you try to do all the escaping by yourself. This is where an Escaping Library comes useful.

The two most popular escaping libraries available are the ESAPI provided by OWASP and AntiXSS provided for Microsoft. ESAPI can plug into various technologies such as Java, .NET, PHP, Classic ASP, Cold Fusion, Python, and Haskell. AntiXSS exclusively protects Microsoft technologies and is therefore better suited in an all-Microsoft environment. Both libraries are constantly updated to keep up with the latest hacker techniques and are maintained by industry experts who understand changing tactics and emerging technologies such as HTML5.

When to Escape

You cannot just simply escape everything, or else your own scripts and HTML markup will not work, rendering your page useless.

There are several places on your web page which you need to ensure are properly escaped. You can use your own escaping functions (not recommended) and you can use the existing ESAPI and AntiXSS libraries.

Use HTML Escaping when…

Untrusted data is inserted in between HTML opening and closing tags. These are standards tags such as <BODY>, <DIV>, <TABLE> etc…

For example:

<DIV> IF THIS DATA IS UNTRUSTED IT MUST BE HTML ESCAPED </DIV>

Use JavaScript Escaping when…

Untrusted data is inserted inside one of your scripts, or in a place where JavaScript can be present. This includes certain attributes such as STYLE and all event handlers such as ONMOUSEOVER and ONLOAD

For example:

<SCRIPT>alert('IF THIS DATA IS UNTRUSTED IT MUST BE JAVASCRIPT ESCAPED')</SCRIPT>

<BODY ONLOAD=”IF THIS DATA IS UNTRUSTED IT MUST BE JAVASCRIPT ESCAPED">

Use CSS Escaping when…

Untrusted data is inserted inside your CSS styles. As you saw in the Attack Vectors examples, many CSS styles can be used to smuggle a script into your page.

For example:

<DIV STYLE="background-image: IF THIS DATA IS UNTRUSTED IT MUST BE CSS ESCAPED">

Above is a diagram visually representing the internet boundary and where filtering and escaping must happen to ensure XSS protection.

XSS Attacks are a moving target

In this article I attempted to collect as many recommendations and best practices used by security researchers worldwide. This recommendations set out in this article are by no means exhaustive, however they should be a good starting point for your XSS defence endeavours.

Technology is changing, and hacker attacks are getting more sophisticated but by understanding the basics set out in this article you can be prepared to prevent future attack techniques that will most definitely arise.

The first step in defending against XSS attacks is to code your web applications carefully and use the proper escaping mechanisms in the right places. After that comprehensive testing should be performed, ideally using an automated XSS scanner. When updates are made to your web applications, you should scan the affected pages again to ensure that no new vulnerabilities have been exposed.