1 Answer
1

Here's what happens when the regex engine tries to match .*(def) against abc defg:

First, the engine starts trying to match the regex at the beginning of the string.

The greedy subpattern .* initially tries to match as many times as it can, matching the entire string.

Since this causes the rest of the match to fail, the regex engine backtracks until it finds a way to match the (def), which happens when the .* matches only abc .

However, if we change the regex to .*(def)?, the following happens instead:

First, the regex engine again starts at the beginning of the string.

Next, it again tries to match .* as many times as possible, matching the entire string.

But at that point, since all the rest of the regex is optional, it has found a match for the entire regex! Since (def)? is greedy, the engine would prefer to match it if it could, but it's not going to backtrack earlier subpatterns just to see if it can. Instead, it just lets the .* gobble up the entire string, leaving nothing for (def)?.

Something similar happens with .*?(def) and .*?(def)?:

Again, the engine starts at the beginning of the string.

The ungreedy subpattern .*? tries to match as few times as it can, i.e. not at all.

At that point, (def) cannot match, but (def)? can. Thus, for (def) the regex engine has to go back and consider longer matches for .*? until it finds one that lets the full pattern match, whereas for (def)? it doesn't have to do that, and so it doesn't.

I am sorry that I did not accept your answer until now. I was new to the forum and was unsure about things. And thank you for your comprehensive explanation. I have actually written a regex engine before, so I should have been able to figure this out.
–
erjoalgoJan 23 '13 at 23:34