I see, only the first occurence is replaced. Didn't know that replace behaves like this. The preg_quote is important if he wants to highlight strings with / or * or other regex characters.
–
okomanNov 11 '08 at 14:36

1

There are two errors in the second code fragment: 1 - it needs 'gi' instead of 'i' RegExp modifier, 2 - it's replacing with search instead of highlighting substrings in data. The first code segment may or may not be a good escaper for javascript (I don't know) but calling it preg_quote is misleading, JS RegExp ≠ PCRE.
–
user213154Apr 23 '10 at 17:16

13

This works nicely, however doesn't return the correct capitalised (or not) version of the search term. if you were searching for "test" in the text "Hello Test World", it will return "Hello <b>test</b> World" instead of "Hello <b>Test</b> World". I fixed this by changing the highlight function to this; return data.replace( new RegExp( "(" + preg_quote( search ) + ")" , 'gi' ), "<b>$1</b>" );
–
JoelDec 6 '10 at 16:06

1

There's indeed a "flags" method in String.replace, but it is non-standard, thus unreliable. The best approach would be to make a "polyfill" method that selects an appropriate option.
–
YellowAfterlifeMay 24 '14 at 13:18

Of course, you need to be careful with what you are replacing in and what you are searching on as @bobince notes. The above will work well for plain text and most searches if you are careful to quote your regex characters...
–
tvanfossonNov 11 '08 at 13:24

@Jerinaw In fact, you need to escape the question mark only once for the regex, so you end up with \? when you use a regex literal. But you need to escape the backslash itself for JS strings, so you end up with \\? when you build the regex from a string. And yes, in a character class the only character you really must escape is ].
–
TomalakApr 16 '13 at 5:26

The difficulty arises if ‘keywords’ can have punctuation in, as punctuation tends to have special meaning in regexps. Unfortunately unlike most other languages/libraries with regexp support, there is no standard function to escape punctation for regexps in JavaScript.

And you can't be totally sure exactly what characters need escaping because not every browser's implementation of regexp is guaranteed to be exactly the same. (In particular, newer browsers may add new functionality.) And backslash-escaping characters that are not special is not guaranteed to still work, although in practice it does.

So about the best you can do is one of:

attempting to catch each special character in common browser use today [add: see Sebastian's recipe]

backslash-escape all non-alphanumerics. care: \W will also match non-ASCII Unicode characters, which you don't really want.

just ensure that there are no non-alphanumerics in the keyword before searching

If you are using this to highlight words in HTML which already has markup in, though, you've got trouble. Your ‘word’ might appear in an element name or attribute value, in which case attempting to wrap a < b> around it will cause brokenness. In more complicated scenarios possibly even an HTML-injection to XSS security hole. If you have to cope with markup you will need a more complicated approach, splitting out ‘< ... >’ markup before attempting to process each stretch of text on its own.