Positive examples of positive and negative lookahead

Many tools that support regular expressions (regexes) support positive and negative lookahead. What good is lookahead? Why would you ever use it?

A Positive Example

Say I want to retrieve from a text document all the words that are immediately followed by a comma. We’ll use this example string:

What then, said I, shall I do? You shan't, he replied, do anything.

As a first attempt, I could use this regular expression to get one or more word parts followed by a comma:

[A-Za-z']+,

This yields four results over the string:

then,

I,

shan't,

replied,

Notice that this gets me the comma too, though, which I would then have to remove. Wouldn’t it be better if we could express that we want to match a word that is followed by a comma without also matching the comma?

We can do that by modifying our regex as follows:

[A-Za-z']+(?=,)

This matches groups of word characters that are followed by a comma, but because of the use of lookahead the comma is not part of the matched text (just as we want it not to be). The modified regex results in these matches:

then

I

shan't

replied

A Positive Negative Example

What if I wanted to match all the words not followed by a comma? I would use negative lookahead:

(?>[A-Za-z']+)(?!,)

(Okay, negative lookahead and atomic grouping)

…to get these matches:

What

said

shall

I

do

You

he

do

anything

Huh? Atomic Grouping?

Yep. Otherwise you’ll get the following (unintended matches highlighted):

What

the

said

shall

I

do

You

shan’

he

replie

do

anything

Without atomic grouping (the (?>…) in the regex), when the regex engine sees that a match-in-progress comes up against a disqualifying comma, it simply backs off one letter to complete the match: the + in the regex gives the engine that flexibility. Applying atomic grouping disallows this and says, don’t give up characters you’ve matched.

When Lookahead Does You No Good

Lookahead doesn’t really help if you only care whether or not there was a match (that is, you don’t care what text was matched). If all I care about is whether or not the string contains any words followed by a comma, I would dump lookahead and use the simpler regex:

[A-Za-z']+,

Acknowledgments

Thanks to Jeffrey Friedl for writing Mastering Regular Expressions, 3rd ed., before reading which I had not even heard of regex lookahead.

I want a regex that will validate number between 1 to 9 except 7. i.e, it will initially check if a given number is in the range of 1 to 9 , if true then another check will make if that given number is 7 or not. If its 7 then fail else pass. Does anyone have any idea. Your help will be highly appreciated .