Using Regular Expressions In Javascript (A General Overview)

I love regular expressions; they are the cat's pajamas. I find that hardly a day goes by where I don't use them in some way to solve a problem. But, as much as I love them, there has always been something about using them in Javascript that has felt a bit shaky. I think it's simply that I didn't have a fleshed out understanding of the full breadth of their use within the Javascript language. As such, I figured I would take a few minutes to give myself (and anyone else who is interested) a brief, but well rounded overview of what is available.

Javascript String Replace() - Static Replace

In Javascript, every string object has a replace() method. This replace() method takes a regular expression object as its first argument and can take either a static string or a function as its second argument. When you pass the replace() method a static string, that string is replaced into the source value wherever the regular expression is matched:

// -- String.replace() -------------------------------------- //

// Create a test string.

var value = "Hey Tricia, how's it going?";

// The replace() method can take a static replacement string as

// the second argument. Replace the name Tricia with the name

// Joanna.

var newValue = value.replace(

new RegExp( "\\bTricia\\b", "gi" ),

"Joanna"

);

// Output the new string.

document.write( newValue );

Here, we are finding all instances of the word "Tricia" and replacing it with the static value "Joanna". When we run the above code, we get the following output:

Hey Joanna, how's it going?

As you can see, the single instance of Tricia was replaced with Joanna.

While the value of the replacement string is static, it can have a slightly dynamic nature; by referring to captured groups within the matched pattern, the replacement string can be composed of parts of the matched pattern. Each captured group in the regular expression pattern can be used within the replacement string using the $N notation, where "N" is the index of the captured group:

// -- String.replace() -------------------------------------- //

// Create a test string.

var value = "Hey Tricia, how's it going?";

// The replace() method can take a static replacement string as

// the second argument. This static string can contain references

// to captured groups in the pattern. Replace the name Tricia

// with the name hottest Tricia.

var newValue = value.replace(

new RegExp( "\\b(Tricia)\\b", "gi" ),

"hottest $1"

);

// Output the new string.

document.write( newValue );

In this case, we are replacing the instance of "Tricia" with the value "hottest $1". Since $1 will refer to the first captured group in our regular expression pattern, when we run the above code, we get the following output:

Hey hottest Tricia, how's it going?

As you can see, "hottest $1" was replaced back into the source string as the value "hottest Tricia."

Javascript String Replace() - Function Replace

Using a static string as the replacement value is good and gets the job done most of the time; but, if we need more control over how the replacement takes place, we can pass a callback function to the replace() method. This is a technique that I happen to love and have blogged about numerous times. This callback function gets executed for each pattern match in the source string. When it is executed, the matched value and each captured group are passed to the callback function as individual arguments. The actual replacement value that gets merged back into the source string is determined by the return value of the callback function:

// -- String.replace() -------------------------------------- //

// Create a test string.

var value = "Hey Tricia, how's it going?";

// The replace() method can take a method as the second argument.

// In this case, the return value of the method is what is

// replaced into the string. The first argument passed to this

// method is the matched pattern; each subsequent argument is a

// captured group from the pattern.

var newValue = value.replace(

new RegExp( "\\b(\\w+)'(s)\\b", "gi" ),

// This method will handle the replacement. In this case,

// the $1 is the \\w+ and the $2 is the s.

function( $0, $1, $2){

// Replace in "is".

return( $1 + " is" );

}

);

// Output the new string.

document.write( newValue );

As you can see here, rather than using a static replace value, we are using a callback function. This callback function takes any apostrophe-S words and replaces them with the non-condensed version (as determined by its return statement). When we run this code, we get the following output:

Hey Tricia, how is it going?

As you can see, the value, "how's" was replaced with the return value of the callback function, which evaluated to "how is".

Javascript String Match() - Non-global

We can use regular expressions in Javascript to do more than replace values - we can also gather values. The Javascript string match() method allows us to collect values matched by regular expression patterns. The match() method is a bit funky is that it appears to behave differently depending on whether or not your regular expression contains the global flag, "g". When we call the match() method without a global regular expression, it returns an array that contains the full match as the first index and each captured group within a subsequent index:

// -- String.match() ---------------------------------------- //

// Create a test string.

var value = "aabbccddee";

// Get the matches in the string. When this is run WITHOUT the

// global flag, it returns the entire match, and then each

// captured group as a subsequent match index.

var matches = value.match(

new RegExp(

"(?:(\\w)(\\1))+",

"i"

)

);

// Output the array of matches.

document.write( matches );

As you can see here, we are finding one or more instances of a double-character. Notice that while this pattern does not have a global flag, it will match the entire string due to the "+" operator. When we run the above code, we get the following output:

aabbccddee,e,e

Here, the matched pattern, "aabbccddee", is the first index. Then, the second and third indexes contain the two captured groups... or rather, the last values the two captured groups contained.

NOTE: When no pattern can be matched, the match() method returns null - not an empty array.

Javascript String Match() - Global

When the string match() method is called and the given regular expression is running with the global flag, "g", the returned array contains only the fully matched values; it does not contain any captured groups:

// -- String.match() ---------------------------------------- //

// Create a test string.

var value = "aabbccddee";

// Get the matches in the string. When this is run WITH the

// global flag, it returns a single pattern match in each of the

// array indexes (regardless of capturing).

var matches = value.match(

new RegExp(

"(\\w)\\1",

"gi"

)

);

// Output the array of matches.

document.write( matches );

As such, when we run the above code, we get the following output:

aa,bb,cc,dd,ee

As you can see, each double-character match is returned as an index of the match() array return value - there is no captured group data.

Javascript String Search()

The string search() method is just like the indexOf() method, but for use with regular expressions. This method simply takes a regular expression and returns the index of the first match, or -1 if no match could be found:

// -- String.search() --------------------------------------- //

// Create a test string.

var value = "Kate, you are looking very hot!";

// Search for a value in the string. Returns index or -1.

document.write(

value.search( new RegExp( "are", "i" ) ) +

"<br />"

);

// Search for a value NOT in the string. Returns index or -1.

document.write(

value.search( new RegExp( "fun", "i" ) ) +

"<br />"

);

Here, we are searching for a pattern that will match part of the source string as well as a pattern than will not match part of the source string. And, when we run this code, we get the following output:

10-1

As you can see, the pattern "are", was matched at index 10 while the pattern "fun", was not matched at all.

Javascript RegExp Test()

The RegExp test() method simply returns a true or false as to whether the source regular expression pattern can be matched in the given string:

// -- RegExp.test() ----------------------------------------- //

// Create a test string.

var value = "Kate, you are looking very hot!";

// Create a testing pattern.

var pattern = new RegExp( "\\bhot\\b", "i" );

// Test to see if the given string matches the given expression.

// Return TRUE or FALSE.

document.write(

pattern.test( value )

);

When we run this code, we get the following output:

true

As you can see, the given pattern was matched in the given string. You'll notice that the test() method does not require a full-string match. If you want to test a to whether or not the entire string matches the given regular expression pattern, you have to add the value boundary flags - "^" and "$".

Javascript RegExp Exec()

The RegExp exec() method allows us to collect regular expression pattern matches; but, unlike the match() method off of the String object, the exec() method affords us more insight into the context of each matched pattern - the exec() method allows us to iterate over the target string, matching one pattern instance at a time. Within each match, not only do we get the value that was matched, we also get the index of it within the given string as well as any captured group values:

// -- RegExp.exec() ----------------------------------------- //

// Create a test string.

var value = "aabbccddee";

// Create a pattern to find the doubled characters.

var pattern = new RegExp( "((\\w)\\2)", "gi" );

// Use exec() to keep looping over the string to find the

// matching patterns. NOTE: Since we are using a loop, you MUST

// use the global flag "g", otherwise, this will cause an

// infinite loop to occur.

while (matches = pattern.exec( value )){

document.write( matches + "<br />" );

}

As you can see above, we are matching each double-character instance; but, more than that, we are also capturing the double-character as well as each single character using capturing groups. When we run the above code, we get the following output:

aa,aa,abb,bb,bcc,cc,cdd,dd,dee,ee,e

As you can see, the RegExp exec() method gives us the collecting power of match() with the insight of the replace() method. The exec() method actually does a bit more than this; to see how it actually updates the RegExp instance, take a look at my previous post on the exec() method.

This post was not meant to be an exhaustive exploration of each usage of regular expressions in Javascript; rather, this was meant only as a general overview. Like I said before, regular expressions are awesome and I just wanted to get a better handle on the full breadth of their usage in Javascript.

I just had to go over all this stuff myself last week for a project I was working on. Would have been helpful then but it is still good.

One thing to note that I found easier to do with the implicit constructor was how they handle matching a parentheses. I believe it has to do with the way javascript is handling the "\(" in a string object before being converted into a RegExp() object. Again I believe that the input into the RegExp() is being converted to a String object before into a RegExp() object.

Yeah, the nice thing about the implicit constructor is that you don't have to escape the back slash since the command is not a string. Even with that, though, I still prefer the RegExp() object. But, like I said to @Dave, it's just a personal preference.

Not so long ago, it took me like an hour to debug why "\(" wasn't working. I had forgotten to escape the back slash to be "\\(". That said, once I got it there, it seemed to work as expected... at least as far as I can remember.

I am not saying I prefer one over the other but in that case the more readable and easier one to deal with was the implicit constructor. It allowed me to be able to look at the RegExp as just that a regular expression and not worry about the escaping a backslash that is suppose to escape a parenthesis when the RegExp is used.

IE. /([\w]+\(\))/i versus RegExp("([\w]+\\(\\))", "i")

Not saying RegExp are easy to understand in the first place, but my personal opinion is that the \\ makes it a little bit harder to read.

Note: I believe those regular expressions should find something like "function()" haven't tested it but first thing that came to my mind to look for a ( in.