Perl regex tutorial: non-greedy expressions

Have you ever built a complex Perl-style regular expression, only to find that it matches much more data than you anticipated? If you've ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regular expressions are what you need.

By default, Perl-style regular expressions are "greedy". This means they will match as much data as possible before a line break. Even if the conditions of the regular expression have been met, but a line break has not yet occurred, the regular expression will continue searching for data that satisfies the search criteria.

By using "non-greedy" Perl-style regular expressions, you can prevent this from occuring and stop the search as soon as the search criteria has been satisfied. Read on to find out how this unique feature of Perl-style regular expressions can save you time and frustration!

For more information on Perl-style regular expressions, visit our power tip on this subject.

Non-Greedy Perl Regular Expressions

Typically, when using Perl-style regular expressions to match strings of data, normal Perl-style regular expression syntax will match as much data as possible. For example, if you want to search for an HTML hyperlink using the following Perl-style regular expression:

Everything from the first "<a href..." to the last "</a>" on the same line (as highlighted in red) will be matched by the regular expression. This is undesirable as the purpose of the regular expression is to match one hyperlink at a time, whereas this regular expression is matching two hyperlinks and the normal text between on the same line.

This is where non-greedy regular expressions are useful. To use non-greedy Perl-style regular expressions, the "?" (question mark) may be added to the syntax, usually where the wildcard expression is used.

In our above example, our wildcard character is the ".*" (dot and star). The dot will match any character except a null (hex 00) or new line. The star will match the previous character zero or more times. So a dot followed by a star in Perl-style syntax literally means match any character zero or more times.

To add in the non-greedy operator, we simply need to add a "?" to the end of our wildcard operators. So, our new, non-greedy regular expression would look like this:

<a href=".*?

Our non-greedy "?" operator now tells the regular expression engine to match as little data as possible. As soon as all conditions of the regular expression have been met, the search will end. So now using our above example, only the highlighted text below would be matched:

As you can see from our above example, using non-greedy Perl-style regular expressions can preven much heartache when doing search and replace functions on HTML, XML, PHP, and virtually any other file where matched data must be limited.