Do NOT use the innerHTML method (the jQuery .html() method uses innerHTML), as on some (I've only tested Chrome) browsers, this won't escape quotes, so if you were to put your value into an attribute value, you would end up with an XSS vulnerability.
– James RoperApr 29 '11 at 3:27

20

in what context is chalk and cheese ever used together 0_o
– d-_-bAug 3 '13 at 18:45

2

@d-_-b when comparing two items. example. they are as different as chalk and cheese ;)
– AnuragJun 18 '14 at 10:31

24 Answers
24

EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.

This works for most scenarios, but this implementation of htmlDecode will eliminate any extra whitespace. So for some values of "input", input != htmlDecode(htmlEncode(input)). This was a problem for us in some scenarios. For example, if input = "<p>\t Hi \n There </p>", a roundtrip encode/decode will yield "<p> Hi There </p>". Most of the time this is okay, but sometimes it isn't. :)
– pettysMar 19 '10 at 16:25

6

Thanks for the solution! I solved the eliminating extra white space issue by replacing new lines with like %%NL%% in the text value, then called .html() to get the HTML encoded value, then replaced %%NL%% with <br />'s... Not bullet proof but worked and my users were not likely to type in %%NL%%.
– bennoAug 4 '11 at 10:59

The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.

Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.

It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.

They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.

Update 2016-04-06:
You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)

Thanks, I never realized that &apos; is not a valid HTML entity.
– FerruccioJan 3 '12 at 14:22

9

Without the /g, .replace() will only replace the first match.
– ThinkingStiffJun 15 '13 at 3:36

1

@Tracker1 I don't agree, if the function receives invalid input it should throw an error. If in a specific use case you want to handle invalid input in that way then either check the value before calling the function or wrap the function call in a try/catch.
– AnentropicJul 14 '16 at 9:27

Here's a non-jQuery version that is considerably faster than both the jQuery .html() version and the .replace() version. This preserves all whitespace, but like the jQuery version, doesn't handle quotes.

@roufamatic - Nice one-liner. But checking for a non-empty value with an if saves having to create a DIV on the fly and grab it's value. This can be much more performant if htmlEncode is being called a lot AND if it's likely that value will be empty.
– leepowersSep 9 '11 at 19:49

FWIW, the encoding is not being lost. The encoding is used by the markup parser (browser) during the page load. Once the source is read and parsed and the browser has the DOM loaded into memory, the encoding has been parsed into what it represents. So by the time your JS is execute to read anything in memory, the char it gets is what the encoding represented.

I may be operating strictly on semantics here, but I wanted you to understand the purpose of encoding. The word "lost" makes it sound like something isn't working like it should.

After looking at Prototype's solution, this is all it's doing... .replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;'); Easy enough.
– Steve WorthamFeb 3 '11 at 0:14

4

shouldn't it do something with quote marks too? that's not good
– AnentropicAug 19 '11 at 13:49

@Anentropic I don't see why it would need to do anything with quotes; as quotes don't need to be escaped unless they are inside an attribute value.
– AndyJun 28 '13 at 8:04

OK after some reflection I take that comment back - if you are building up a piece of HTML you would want to encode each part of it including the attribute values, so I agree with Anentropic and I don't think the Prototypejs function is sufficient in that case.
– AndyJun 28 '13 at 8:49

And it is still bogus more than a year later...
– Alexis WilkeOct 18 '14 at 4:44

While I do really like this answer and actually I think is a good approach I have a doubt, is the bitwise operator on if (value === null | value === undefined) return ''; a typo or actually a feature? If so, why use that one and not the common || ? Thank you!!
– Alejandro ValesOct 17 '17 at 13:02

Well anyhow keep in mind that the | will lead to 0 or 1, so actually it did work ^^
– Alejandro ValesOct 20 '17 at 6:41

couldn't you just use == null? undefined is the only thing to have equivalence with null, so two triple-equals aren't necessary anyway
– HashbrownOct 16 '18 at 6:32

that's not true at all. null and 0 are both falsy, yes, so you cant just do !value, but the whole point of == is to make certain things easier. 0 == null is false. undefined == null is true. you can just do value == null
– HashbrownOct 18 '18 at 0:26

JS doesn't go inserting raw HTML or anything; it just tells the DOM to set the value property (or attribute; not sure). Either way, the DOM handles any encoding issues for you. Unless you're doing something odd like using document.write or eval, HTML-encoding will be effectively transparent.

If you're talking about generating a new textbox to hold the result...it's still as easy. Just pass the static part of the HTML to jQuery, and then set the rest of the properties/attributes on the object it returns to you.

The problem with Prototype I believe is that it extends base objects in JavaScript and will be incompatible with any jQuery you may have used. Of course, if you are already using Prototype and not jQuery, it won't be a problem.

EDIT: Also there is this, which is a port of Prototype's string utilities for jQuery:

Using some of the other answers here I made a version that replaces all the pertinent characters in one pass irrespective of the number of distinct encoded characters (only one call to replace()) so will be faster for larger strings.

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).