It seems that the seemingly simple problems are the ones that get you. I'm still a newbie so this is probably peanuts for the gurus here.

Background:First, I am using ActivePerl 5.8 on a WinXP machine. I have been coding in Perl for about a week now so I'm still very green. I am trying to extract a pattern from a long string.

I have a long string with repeated patterns of e.g. junk [foo] fred 6,000 [/bar] junk [foo] wilma betty [/bar]

When any character inside the [foo][/bar] is homogenous no problem extracting that pattern, but when it has any numbers in it, regex doesn't want to put that atom in the $1 backreference, but instead gives me more than I want. Here's an example:

The Problem:But now here's my conundrum. I want my pattern to recognize instances when the meat (characters) inside the [foo][/bar] delimiters is a mixture of numbers with commas and letters but NOT just letters. I want to be able to recognize and accept only things like:

Code

[foo] - fred 6,000 blah barney 69 >= [/bar]

But NOT:

Code

[foo] blah betty blah wilma [/bar]

My problem is that whenever I introduce characters in-between [foo][/bar] that isn't a homogenous type like strictly alphabetical (abc) or strictly numeric (123), perl extracts more than it should.

Let me show you what I mean. SCENARIO 2 CODE:

Code

###Version 2 - extracting comma'ed number surrounded by weird characters # this string has a comma'ed number surrounded by all sorts of crazy characters $string = "xxx[foo]a>a1,300a=}[/bar]xxx[foo]bbbbbbb[/bar]xxx";

# trying to match the string that has a comma and anything around it, as long as it's within [foo][/bar] delimiter # here i use the . pattern character because we have crazy characters surrounding the , while($string =~ m/(\[foo].*\,+.*\[\/bar])/sig) { print $1, "\n";

}

SCENARIO 2 RESULTS:

Code

# perl seems to have extracted more than my match pattern # Why does it extract more than it should?! This appended string doesn't even have a , in it! [foo]a>a1,300a=}[/bar]xxx[foo]bbbbbbb[/bar]xxx

And then when I try to extract based on a number character it gives me the same results.

SCENARIO 3 CODE:

Code

$string = "xxx[foo]aa1,300aaa[/bar]xxx[foo]bbbbbbb[/bar]xxx";

while($string =~ m/(\[foo].*\d+.*[\/bar])/sig) { print $1, "\n";

}

SCENARIO 3 RESULTS:

Code

# the same as Scenario 2 results [foo]aa1,300aaa[/bar]xxx[foo]bbbbbbb[/bar]

SummarySo basically, whenever i try to pattern anything within the [foo][/bar] string I get more than I want. What complicates matters is that the meat in-between the [foo][/bar] wrappers can be any character but I only want to extract the meat which contains numbers with commas and some words but NOT the ones with words only.

I know this is a very long post but this problem has overwhelmed my psyche. Ahhh! I think writing it out helped me really understand the problem. I'd appreciate any suggestions.