Notes

Warning

Because strip_tags() does not actually validate the
HTML, partial or broken tags can result in the removal of more
text/data than expected.

Warning

This function does not modify any attributes on the tags that you allow
using allowable_tags, including the
style and onmouseover attributes
that a mischievous user may abuse when posting text that will be shown
to other users.

Note:

Tag names within the input HTML that are greater than 1023 bytes in length
will be treated as though they are invalid, regardless of the
allowable_tags parameter.

Callback arguments:* &$result: contains text to be placed insted of original piece (e.g. empty string for forbidden tags), it can be changed;* $tag_or_text: original piece of text or a tag (see below);* $tag: false for text between tags, lowercase tag name for tags;* $is_allowable: boolean telling if a tag isn't allowed (to avoid double checking), always true for text between tagsCallback function isn't called for comments and broken tags.

Caution: the function doesn't fully validate tags (the more so HTML itself), it just force strips those obviously broken (in addition to stripping forbidden tags). If you want to get valid tags then use strip_attrs option, though it doesn't guarantee tags are balanced or used in the appropriate context. For complex logic consider using DOM parser.

"5.3.4 strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags."

This is poorly worded.

The above seems to be saying that, since 5.3.4, if you don't specify "<br/>" in allowable_tags then "<br/>" will not be stripped... but that's not actually what they're trying to say.

What it means is, in versions prior to 5.3.4, it "strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags", and that since 5.3.4 this is no longer the case.

So what reads as "no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)" is actually saying "no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)".

Since strip_tags does not remove attributes and thus creates a potential XSS security hole, here is a small function I wrote to allow only specific tags with specific attributes and strip all other tags and attributes.

If you only allow formatting tags such as b, i, and p, and styling attributes such as class, id and style, this will strip all javascript including event triggers in formatting tags.

Note that allowing anchor tags or href attributes opens another potential security hole that this solution won't protect against. You'll need more comprehensive protection if you plan to allow links in your text.

With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...

As you probably know, the native function strip_tags don't work very well with malformed HTML when you use the allowed tags parameter.This is a very simple but effective function to remove html tags. It takes a list (array) of allowed tags as second parameter:

A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:

<acceptedTag onLoad="javascript:malicious()" />

Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.

Below was a note on "strip_tags" page that got removed off of PHP.net ... I found this note useful, and use the code in parsing before "stripping tags" ... I don't know why in the world you would delete this one, but keep others ... your review system is a bit disturbing ...

On your page you have a warning about how data may be lost, but you delete a user-contributed comment that helps prevent that?

======================

aleksey at favor dot com dot ua 24-Feb-2011 01:06

strip_tags destroys the whole HTML behind the tags with invalid attributes. Like <img src="/images/image.jpg""> (look, there is an odd quote before >.)

So I wrote function which fixes unsafe attributes and replaces odd " and ' quotes with &quot; and &#39;.