So I've organized a regex that uses alternation so that the longest match will come out first. That makes sense.

Code

regex: [aeiou]+

Say I have the word "conscious" and I want to match the "scious" portion. When matched with the regex, it returns "iou", which is correct. However, I would also like to get the smallest possible match, which is just "i". The problem is that I can't change the original regex, but I can append to it.

Is there a good way to go about finding both the largest and smallest match?

Yeah, I can do that, but I need this to work in any case, for any regular expression, without knowing the contents of the regular expression. (It's hard to explain, but this is necessary because I'm basically feeding regular expressions into an engine.)

I think I figured something out, but it seems really inefficient. I'm appending a negating lookbehind, which includes the last match string, onto the match expression. That's $matchStr .= "(?<!$lastMatch)". I do this until no more matches are possible.

... like I said, though, that's really inefficient. If there's a better way to do it, let me know.

So I've organized a regex that uses alternation so that the longest match will come out first. That makes sense.

Code

regex: [aeiou]+

Say I have the word "conscious" and I want to match the "scious" portion. When matched with the regex, it returns "iou", which is correct. However, I would also like to get the smallest possible match, which is just "i". The problem is that I can't change the original regex, but I can append to it.

Is there a good way to go about finding both the largest and smallest match?

What you have is not alternation, it is a character class, they are two different things. The smallest possible match is the first "o" in the string, not the first "i". The "+" quantifier means one or more so the first "o" matches that since it is a quantity of one. What I think you are loonking for is 2 or more: {2,} but if that is the case the smallest match is "io".

Code

$foo = 'conscious'; $foo =~ /([aeiou]{2,}?)/; print $1;

prints 'io' because the ? makes the match true for the shortest match (non-greedy matching). If you want to skip single vowels (ie: 'con') you have to change your regexp. Maybe this will work:

Yeah, I can do that, but I need this to work in any case, for any regular expression, without knowing the contents of the regular expression. (It's hard to explain, but this is necessary because I'm basically feeding regular expressions into an engine.)

I think I figured something out, but it seems really inefficient. I'm appending a negating lookbehind, which includes the last match string, onto the match expression. That's $matchStr .= "(?<!$lastMatch)". I do this until no more matches are possible.

... like I said, though, that's really inefficient. If there's a better way to do it, let me know.

Maybe you should explain more about what you're doing. I'm sure we can figure out an efficient way to do what you are trying to do.

For starters -- you're passing a regular expression into a subroutine. Can you show us the code for it?

Yeah, sorry about the terminology mix-up, heh. I had been focusing on alternation earlier, but then I realized I'd need to account for character classes as well. Also, I think my example may have been somewhat confusing. I'm going to post the function I've written, although be warned: it's not actually written in Perl. I'm working with Perl-style regular expressions in the Boost library for C++.

The program has a string "cious". It also has some regular expression that will attempt to match that string; the program can tell what the regular expression is, but it's not sophisticated enough to understand it. I'll call this regular expression R.

As before, let's say R = [aeiou]+. In this example, when R matches "scious", it will match on "iou". That's correct. However, I would also like to be able to have the smallest possible match. As far as I'm concerned here, since [aeiou]+ could match "i" or "io" or "iou", the smallest is "i" and the largest is "iou". Note that I know that R WILL match "iou", but I need to make the assumption that the match could end at any point.

I need to make this distinction because R is not necessarily just a character class or something that can be specified with a quantifier. Say R = (?:ie|i|ey). Now if R matches "thief", then "ie" is the largest match and "i" is the smallest match.

Similarly, if R = (?:eu|ew), then the smallest and largest matches are always the same size.

Thanks for the help, guys. I'm not convinced there's any other way to do this, but if there is, that'd be awesome, haha.

He seems to want to use one regexp, otherwise I would also recommend using two. He also seems to want to only match if there is more than one consecutive vowel, your regexps will match a single vowel. -------------------------------------------------