Sanitizing HTMLText input with whitelist

I have a form with multiple textarea fields, for some of which I use HTMLText type. I provide the users with a very stripped down version of TinyMCE for those. (buttons for bold, italic, setting links only)

If someone turned javascript off though, they could just put html in there, and it would all get accepted, including scripts.

The data in the forms is displayed on another page. So I guess there are two issues here, how to store it in the database, and how to display what is stored in the database.

I think I would probably be happy for people to write any kind of HTML, as long as everything in script tags will be removed. So that would actually be more like a blacklist really. I am wondering how best to achieve that.

I often don't use regex if I don't understand them and in this case I'd probably use one of the functions on http://php.net/manual/en/function.strip-tags.php that the google search yeilded, there is quite often alot of gold in the comments of the php doc pages...

I did in the end implement HTMLpurifier though. I found that Andrew Short had used it in the rssconnector module, so that gave me an idea how to implement it. I put the HTMLPurifier code in a thirdparty directory under mysite/ , then created a class with a function in which config an purifier objects get created. This I can then call whenever I have some input that I would like to be purified.

It seems to work quite well. It might be a bit heavy, but I really like the idea that only specific tags are left through, rather than stripping out certain tags.