regexp problem in perl 5.6.1 and 5.8.4

I have done some Perl programming in the past but I am by no means and
expert. I am currently working on changing some code written some time
ago by an employee no longer with the company. The code is currently
running under 5.005.02. I am making changes and adding some ucs2 ->
utf8 conversion. I want to run the code under Perl 5.8.4 to take
advantage of Perl's internal Unicode support. At any rate, there is a
regular expression in the code the works fine under 5.005.02 but loops
under 5.6.1 and above. Following code illustrates the problem:

$orig_string = 'JKXXAF';

$regex = qr {\G
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

The problem seems to be with the use of the \G attribute. If I take it
out, the regular expression works the same in all versions of Perl.
However, since I did not write the code and the programmer who did was
considerably more experienced using Perl than I am, I am hesitant just
to remove it. Anyhow, I have been looking at this for several days
without success. My Perl expert suggested I post it to this forum. Any
help would be greatly appreciated.

Advertisements

Thomas Stauffer <> wrote in comp.lang.perl.misc:
> I have done some Perl programming in the past but I am by no means and
> expert. I am currently working on changing some code written some time
> ago by an employee no longer with the company. The code is currently
> running under 5.005.02. I am making changes and adding some ucs2 ->
> utf8 conversion. I want to run the code under Perl 5.8.4 to take
> advantage of Perl's internal Unicode support. At any rate, there is a
> regular expression in the code the works fine under 5.005.02 but loops
> under 5.6.1 and above. Following code illustrates the problem:
>
> $orig_string = 'JKXXAF';
>
> $regex = qr {\G
> # Match as many characters as possible
> # that can be passed thru as-is
> ([^\x00-\xFF]+)
>
> # Then try to match $A1 and next two bytes
> | (@..)
>
> # Otherwise just get the next byte
> | (.)
> }sx;
>
> print "regex = $regex\n";
>
> while ($orig_string =~ /$regex/g) {
> print "\$1=$1\n";
> print "\$2=$2\n";
> print "\$3=$3\n";
> }
>
> The problem seems to be with the use of the \G attribute. If I take it
> out, the regular expression works the same in all versions of Perl.
> However, since I did not write the code and the programmer who did was
> considerably more experienced using Perl than I am, I am hesitant just
> to remove it. Anyhow, I have been looking at this for several days
> without success. My Perl expert suggested I post it to this forum. Any
> help would be greatly appreciated.

The \G is really not needed for the function of the loop. //g in scalar
context makes sure \G is implicitly matched before each match is attempted.

Note that adding \G only anchors the first alternative explicitly,
the second and third are free to match anywhere. One could argue
that scalar //g should still anchor the whole match, so the current
would be a bug. In any case, the behavior in presence of both
/G and //g appears to have changed.

Adding non-capturing parentheses around the alternative fixes the
behavior:

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!