I've read so many articles describing methods to prevent XSS attacks in user submitted HTML content using functions such a htmlspecialchars and regexs, whitelisting/blacklisting and using html filtering scripts such as HTML Purifier, HTMLawed etc. etc.

Unfortunately, none of these explain how a site like eBay is able to allow such a vast array of potentially malicious HTML tags such as <link>, <script>, <object> and CSS styles and HTML attributes such as background: url() etc. It seems as if they allow users to submit pretty much anything into their item descriptions. I've seen some of the most elaborate HTML, Javascript and Flash templates in item descriptions.

What is eBay doing differently? Is there another technique or layer that I am missing, that allows them to block XSS attacks while still allowing pretty much anything to be included in user's item description?

If you use the htmlpurifier approach of whitelisting known good markup, and build out a really robust whitelist, you can do this safely.
–
Frank FarmerJun 8 '11 at 20:12

1

heh. I was able to create a listing containing an alert, but when I tried alert(document.cookie), I received the following error: "Please provide the correct information in the highlighted fields. Description - Your listing cannot contain javascript (".cookie", "cookie(", "replace(", IFRAME, META, or includes), cookies or base href."
–
Frank FarmerJun 8 '11 at 20:18

I wonder how clever their filters are. I.e., writing javascript to generate "alert(document.cookie)"
–
John CartwrightJun 8 '11 at 20:22

Thanks Frank, I've used HTMLPurifier before, but only out of the box. The developer is friendly and knowledgeable, but he doesn't offer much in terms of preset whitelists or recommendations for this particular situation. Are there any 3rd party whitelists you can point me to?
–
MikeJun 8 '11 at 20:23

1

I saw that. The layer 8 firewall comment was a joke. Consider it a geeky equivalent of "headlight fluid".
–
Frank FarmerJun 8 '11 at 20:59

3 Answers
3

It's easy when you've got an army of programmers and a war chest full of money.

This isn't rocket science. They identify a vulnerability case and are coding around it likely via Regex and Javascript on the front-end as well as heavy back-end validation to ensure the data isn't compromised prior to insertion. It's the same that we should all be doing, except that for Ebay it's far more mature that what most of us work on, and FAR bigger.

If it's anything like the bank I used to work for, they have a PAS team that's dedicated to finding minute bugs in prod, opening tickets with engineers, and following the process through on a priority basis. Between developers, testers, quality management, and PAS, there's no reason a vulnerability should get out, but if it should happen to slip it should be reacted to quickly.

You should consider taking a "progressive enhancement" approach to this challenge if you plan to go this route. Start by blocking javascript flat out initially. Then, enhance to allow --some-- via a method you deem safe--and only allow what's safe as you continue. Continue this process allowing more and more while catching the edge cases as they come up in testing or production. Gradually, you'll migrate from allowing what IS allowed to blocking what ISN'T. While this should be a no brainer, even cutting edge companies miss the boat on the basic concept of lifecycle management and improvement.

That being said, when trying to sanitize input it's best to combine both front- and back-end validation methods. Front end provides more intuitive rapid feedback to clients, but as with any client-side language can be overcome by saavy users. Backend validation is your firewall, ensuring anything that slips past the frontend is dealt with appropriately. Your DB is your lifeline, so protect it at all costs!

Unless you have an army and a huge budget, trying to code for every edge case on something as broad as a CMS that allows near carte blanche input almost always ends up a losing financial venture.