The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Correct. The former, if escaped, will be perfectly fine because the escaped tags will not be executed.

The latter does have a bit of an issue. What I would recommend is that you scan all links and images for sources of javascript, and obviously remove script and maybe style tags.

You would run that when the page is being displayed, but of course that adds to load. Another solution would be to run it when things are inserted or updated, but that leaves you more vulnerable if someone gains DB access. Another solution maybe to run a cronjob quite often which searches for any rows with '<script', 'href="javascript:"' etc tags.

Jake Arkinstall
"Sometimes you don't need to reinvent the wheel;
Sometimes its enough to make that wheel more rounded"-Molona

Correct. The former, if escaped, will be perfectly fine because the escaped tags will not be executed.

The latter does have a bit of an issue. What I would recommend is that you scan all links and images for sources of javascript, and obviously remove script and maybe style tags.

You would run that when the page is being displayed, but of course that adds to load. Another solution would be to run it when things are inserted or updated, but that leaves you more vulnerable if someone gains DB access. Another solution maybe to run a cronjob quite often which searches for any rows with '<script', 'href="javascript:"' etc tags.

What about using some library like HTMLPurifier to filter the XHTML before saving it to the database?

I think there's some confusion between storing the XHTML in the database and outputting it. You don't have to purify it or do anything to the XHTML when you put it into the database because the database does not care. It will not act on it. When you output it, the browser does care, because it will parse the XHTML. That means that you can purifier the XHTML either before or after you store it in the database.

With proper security practices, you shouldn't need to take those precautions. You probably have bigger problems on your hands if that happens.