QuoteFind a valid HTML code that will be parsed incorrectly by the parser, AND/OR code that executes (if I forgot to remove some vector).

I will be removing basic JS execution vectors, anyway the CSS parser is not ready yet so I'll disable CSS completely. Also namespaces wont be allowed (no xul nor svg) and ns tags (<asdf:asdf>) will also be disabled for the time being.

frames wont be allowed (as well as embed/object/video/audio/etc..) and I think that's all :).

If you can execute JS let me know please :).

On IE I try to honor conditional comments, anyway I wont support them completely, since they are unsafe by default.

If you find some way to get HTML code where it shouldnt in weird scenarios you win!

I have to warn you that code like this:

<a href="asdf'><img src="http://www.google.com">hello</a>

Will be parsed as:

<a>hello</a>

Since every time I find " in an attribute's name, I will delete all arguments in the tag for security reasons.

So this other code (it's important to note there's no closing " quote):

<a href="asdf'><img src='http://www.google.com'>hello</a>

Will be parsed as:

<a><img src="http://www.google.com">hello</a>

Some may argue that's a vulnerability, but there's no safer way of treating unclosed quotes in attributes.

Other thing: I am only allowing ' and " as quotes (so, ` wont work).

So well, examples of bypasses I've found (and are now fixed):

<!--[if true]><img onerror=alert(1) src=-->

<form action=javascript:alert(1)><input type=submit>

Protections vary from browser to browser (I will only remove dangerous things on a browser if they are dangerous in that browser).

hm.. why should it work? haha I do document.write() that ignores content-type metas afaik (this is a bug I will address when events are working perfectly, and that will happend when I manage to finish the CSS parser).

Quote @sdc: Yes :) I was trying to say good work in my own words. the background gets stripped on only the browsers where it would be executed - Opera and IE.

oh!! thanks then haha :D

Quote<img src=javascript&#4864:alert(1)>

very nice gareth!! any particular reason to choose that char? fuzzing? do you have fuzz results? haha

I was going to be the one applying the attributes to the elements, but I discovered that that's waaaaaaaaaay to slow, even with algorithmic-fu stuff haha, it's been the most algorithmic challenge I've had since the olympics of informatics/acm, but its impossible to optimize, after reviewing the webkit/mozilla implementations their solutions are the same or with a bigger complexity, but my code lives in JS so its slower :( (and they also are slow on the same rules I have problems like nth children and alike).

Now I'll just filter the CSS (in a jsreg-alike approach), since the HTML parser for example was reconstructing the DOM and making it from scratch.

One is that our old IE6 supports 0 to none of DOM Level 3, and the other is that the bug appears when the code is transformed from DOM to String, if the user instead of doing document.write(cleanDOM.innerHTML) does document.documentElement.appendChild(cleanDOM), this bug would not happen.