could you provide more information about your task?
–
Oleg PavlivFeb 16 '11 at 15:45

It would be very useful to see some samples of what you would like the expression to match (and to not match)
–
matt bFeb 16 '11 at 15:51

@matt b, My real problem is way to long for an SO question. A similar example to my real problem would be: provided a line with two identifiers, followed by a variable number of integers, followed by an identifier. Now I would like to parse out the first two identifiers, all integers, and the last identifier. And I would prefer if I didn't have to "do a separate regexp" for the group matching the sequence of integers.
–
aioobeFeb 16 '11 at 16:07

5 Answers
5

The captured input associated with a
group is always the subsequence that
the group most recently matched. If a
group is evaluated a second time
because of quantification then its
previously-captured value, if any,
will be retained if the second
evaluation fails. Matching the string
"aba" against the expression (a(b)?)+,
for example, leaves group two set to
"b". All captured input is discarded
at the beginning of each match.

Capturing groups seem to be created when the regex is parsed, and filled when it matches the string. The expression (a)|(b)(c) has three capturing groups, only if either one, or two of them can be filled. (a)* has just one group, the parser leaves the last match in the group after matching.

I would think that backtracking inhibits this behavior, and say the effect of /([\S\s])/ in its grouping accumulative state on something like the Bible. Even if it can be done, the output is unknowable as the groups will lose positional meaning. Its better to do a separate regex on like kind in a global sense and have it deposited into an array.