Regular Expression Quantifiers

The seventh part of the Regular Expressions in .NET tutorial continues to look at the pattern-matching characters that can be used in regular expressions. This article describes quantifiers, which allow matching repeating items in the source text.

Matching a Specific Number of Repeated Items

When you need more control over the number of repetitions, you can specify exactly how many times a pattern should appear. To do so, follow the repeating part of the pattern with a number contained within braces. For example, to find an item three times you would follow it with "{3}".

The sample code below looks for groups of four adjacent digits. There is only one such match in the input string.

You can use braces to specify the minimum number of repetitions to match for a less restrictive pattern. To do so, add a comma after the number within the braces. For example, "{2,}" matches two or more items.

Finally, you can specify a minimum and maximum number of repetitions within the braces. The minimum appears first and is separated from the maximum by a comma. For example, to find between one and three items you would use, "{1,3}".

The following sample code finds items that contain between one and three capital letters followed by one to three digits:

Greedy and Lazy Quantifiers

Quantifiers can be either greedy or lazy. All of the above examples use greedy quantifiers, which means that they match as many repetitions as possible. For example, matching "\d+" will find a numeric digit and match it and every following character until a non-numeric character, or the end of the text, is encountered.

A lazy quantifier works differently. As soon as enough characters have been found to correspond to the pattern, a match is returned. This means that lazy quantifiers return as few repetitions as possible. Often this means that the source text will return more matches but each will be shorter. To specify that a quantifier should be lazy, you append a question mark (?).

Try running the following code. Here the same input text is matched against two patterns. The first looks for a '2', followed by one or more digits. As the quantifier is greedy, there is a single match that consumes the first '2' and every subsequent digit.

The second pattern is similar to the first but uses a lazy quantifier. As such, there are two matches, each only two characters in length. The matches occur at the index of each '2', matching that number and just one further digit.