I'm afraid there is not only JavaScript among the answers anymore
–
AlexanderFeb 15 '13 at 18:03

1

I believe you should not add jquery as a tag for this question if your expecting a native JavaScript answer. People who search for similar question (but for JQuery) will mistook this for an answer.
–
GideonMay 25 at 3:30

Just remember that this approach is rather inconsistent and will fail to strip certain characters in certain browsers. For example, in Prototype.js, we use this approach for performance, but work around some of the deficiencies - github.com/kangax/prototype/blob/…
–
kangaxSep 14 '09 at 16:08

7

Remember your whitespace will be messed about. I used to use this method, and then had problems as certain product codes contained double spaces, which ended up as single spaces after I got the innerText back from the DIV. Then the product codes did not match up later in the application.
–
Magnus SmithSep 17 '09 at 15:03

9

@Magnus Smith: Yes, if whitespace is a concern - or really, if you have any need for this text that doesn't directly involve the specific HTML DOM you're working with - then you're better off using one of the other solutions given here. The primary advantages of this method are that it is 1) trivial, and 2) will reliably process tags, whitespace, entities, comments, etc. in the same way as the browser you're running in. That's frequently useful for web client code, but not necessarily appropriate for interacting with other systems where the rules are different.
–
Shog9♦Sep 17 '09 at 21:05

113

Don't use this with HTML from an untrusted source. To see why, try running strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
–
Mike SamuelSep 22 '11 at 18:06

7

If html contains images(img tags), the images will be requested by the browser. That's not good.
–
douywFeb 13 '13 at 6:47

Great that it works on non browser js (like node) as well.
–
Daniel RibeiroDec 8 '10 at 19:30

2

Doesn't work for <img src=http://www.google.com.kh/images/srpr/nav_logo27.png onload="alert(42)" if you're injecting via document.write or concatenating with a string that contains a > before injecting via innerHTML.
–
Mike SamuelDec 24 '10 at 15:07

18

an easy fix is to change /<.*?>/g to /<[^>]*>?/g. If you agree, please edit your post so that broken security advice doesn't get copy/pasted by naïve users like Mr. Ribeiro.
–
Mike SamuelDec 27 '10 at 3:16

10

@MikeSamuel Did we decide on this answer yet? Naive user here ready to copy-paste.
–
ZiggyMay 7 '13 at 18:32

3

@AntonioMax, I've answered this question ad nauseam, but to the substance of your question, because security critical code shouldn't be copied & pasted. You should download a library, and keep it up-to-date and patched so that you're secure against recently discovered vulnerabilities and to changes in browsers.
–
Mike SamuelNov 27 '13 at 16:04

We always use jQuery for projects since invariably our projects have a lot of Javascript. Therefore we didn't add bulk, we took advantage of existing API code...
–
MarkMar 14 '12 at 16:31

14

You use it, but the OP might not. the question was about Javascript NOT JQuery.
–
DementicMar 14 '12 at 16:55

62

It's still a useful answer for people who need to do the same thing as the OP (like me) and don't mind using jQuery (like me), not to mention, it could have been useful to the OP if they were considering using jQuery. The point of the site is to share knowledge. Keep in mind that the chilling effect you might have by chastising useful answers without good reason.
–
acjayNov 29 '12 at 1:32

7

@Dementic shockingly, I find the threads with multiple answers to be the most useful, because often a secondary answer meets my exact needs, while the primary answer meets the general case.
–
Eric GoldbergDec 14 '12 at 19:11

19

That will not work if you some part of string is not wrapped in html tag. e.g. "<b>Error:</b> Please enter a valid email" will return only "Error:"
–
Aamir AfridiFeb 5 '13 at 11:10

The above function posted by hypoxide works fine, but I was after something that would basically convert HTML created in a Web RichText editor (for example FCKEditor) and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email (both HTML and plain text).

After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript:

this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1<br><p>Now back to normal text and stuff</p>

and then after the code has run it looks like this:-

this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk) Link Number 1
Now back to normal text and stuff

As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact. Also I have replaced the <p> and <br> tags with \n (newline char) so that some sort of visual formatting has been retained.

To change the link format (eg. BBC (Link->http://www.bbc.co.uk) ) just edit the $2 (Link->$1), where $1 is the href URL/URI and the $2 is the hyperlinked text. With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them.

I altered Jibberboy2000's answer to include several <BR /> tag formats, remove everything inside <SCRIPT> and <STYLE> tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.

I like this solution because it has treatment of html special characters... but still not nearly enough of them... the best answer for me would deal with all of them. (which is probably what jquery does).
–
Daniel GersonOct 17 '12 at 13:17

yikes. if you're going to create a DOM tree out of your string, then just use shog's way!
–
nickfMay 4 '09 at 23:21

Yes, my solution wields a sledge-hammer where a regular hammer is more appropriate :-). And I agree that yours and Shog9's solutions are better, and basically said as much in the answer. I also failed to reflect in my response that the html is already contained in a string, rendering my answer essentially useless as regards the original question anyway. :-(
–
BryanMay 5 '09 at 0:08

1

To be fair, this has value - if you absolutely must preserve /all/ of the text, then this has at least a decent shot at capturing newlines, tabs, carriage returns, etc... Then again, nickf's solution should do the same, and do much faster... eh.
–
Shog9♦May 5 '09 at 4:58

Note that this is not safe in all cases, for example, when adding a title attribute in javascript: var maliciousStringContent="\' onerror=\'alert(1234);\' src=\'bogus"; $("<img title='"+strip(maliciousStringContent)+"'></img>").appendTo("body")
–
Janne AukiaFeb 12 '13 at 8:56

1

Well, the output of the function should go in a context where HTML tags are considered harmful. In this sense, I still think the code is completely safe.This is entirely different from the context in your example, and I think the question is about the former.
–
molnargFeb 13 '13 at 22:09

2

many programmers do not do what they 'should' so marking this script as safe, when it is potentially not, is just plain wrong.
–
RozzAMar 10 '14 at 2:45

Don't do this if you care about security. If the user input is this: '<scr<script>ipt>alert(42);</scr</script>ipt>' then the stripped version will be this: '<script>alert(42);</script>'. So this is an XSS vulnerability.
–
molnargMar 6 '13 at 12:38

Note, it will return an empty string if the HTML markup isn't valid XML (aka, tags must be closed and attributes must be quoted). This isn't ideal, but does avoid the issue of having the security exploit potential.

If not having valid XML markup is a requirement for you, you could try using: