The Linux Administration group is for the discussion of technical issues technical issues that arise during the administration of Linux systems, including maintaining the operating system and supporting end-user applications.

Pattern Matching question.

Hi,
I've come across the following pattern matching expression in a PERL script and
would like to understand the logic behind it.....

(.+?)

Now, the parentheses will commit the contents to a memory buffer for later use.
However, '.+' with the addition of '?' is of concern. I know '?' means 0 or one occurrence, but in this construct will it function differently?

Hi Romeo,
Many thanks for your response. What I find confusing is that '+' means one or more occurrences
and '?' means 0 or one occurrence. Am I to assume that together they behave in an entirely different way?
I've ran the script with and without the '?' and without it, the pattern match does not function correctly.
It is obviously required, but I'm having trouble understanding the logic of what it is actually doing.

Well .. actually, in perl ". ?" doesn't mean 1 or more of anything
and then one (the ? wouldn't get a chance to match anything with
the plus, would it?) ... it ? make the non-greedy. In other words
in this specific case it's an unwieldy way of saying
"I want two of anything", the equivalent of (..).

? also has another special meaning in perl regexp. It tells the preceding
pattern to not be "greedy". Normally a perl regexp matches the longest
pattern that satisfies the expression. The ? make it match the shortest. So,
in your case:

$ echo abcdef | perl -ne 'if(m#(.+)#){print $1}'
abcdef

versus:

$ echo abcdef | perl -ne 'if(m#(.+?)#){print $1}'
a

With the ?, the pattern only matches the single character. Out of context of
a larger pattern, using just . (dot) is no different than using .+? because
it will always match 1 character. It does make a difference in the context
of a longer regexp though. For example: