making the above text a little bit more readable to the user. I started
of with a program which finds out the different key value pairs and
and based on the values append/create a string with approriate words
like

pseudocode only

parse the line,
load a hashmap with the key, value pairs
if(hash{action}=='commit') <---this is a mandatory field
string.="Commits"
if(defined hash{user})
string.="by hash{user})
if(defined hash{date})
string.="on date hash{date}"
...................................
...................................
if(hash{action}=='checkout') <---this is a mandatory field
string.="Commits"
if(defined hash{user})
string.="by hash{user})
if(defined hash{date})
string.="on date hash{date}"
..............................................
.............................................
I was thinking this sort of logic but a little apprehensive how elastic
it can be as I would be addressing so many actions and seperate if
blocks for all of them. Any suggestions or ideas on how to better
achieve what I want to do above.

I thought this is only the case with "use English;", at least that's
how I understood the "Bugs" section in perlvar (of Perl 5.8.6, that is):

<quote>
Due to an unfortunate accident of Perl's implementation, "use English"
imposes a considerable performance penalty on all regular expression
matches in a program, regardless of whether they occur in the scope of
"use English".
</quote>

I attributed the penalty to "use English" rather than to the regex
implementation. I stand corrected.

Nonetheless, one question may be allowed here: The OP's task was not
very complicated. Let the quantity of his data be 10,000 lines, on
anything faster than a x386 processor the performance penalty in this
simple regex will be unnoticable, or not?

I repeated the runs for a number of times; the deviations between each
run were in the order of 1/100 of a second.

I then tried "use English;" and replaced $& with $MATCH, but the results
were only insignificantly slower than in the (.)/$<digit>-version.

Is there anything where I have a fundamental misunderstanding, or has the
severe performance penalty of which perlvar warns been weeded out in the
perl code while never being purged from the documentation? Or is my example
just a trivial exception?

There's a global variable in the perl source, called sawampersand.
It gets set to true in that moment in which the parser sees one
of $`, $', and $&. It never can be set to false again. Trying to
set it to false breaks the handling of the $`, $&, and $'
completely.

If the global variable sawampersand is set to true, all subsequent
RE operations will be accompanied by massive in-memory copying,
because there is nobody in the perl source who could predict,
when the (necessary) copy for the ampersand family will be
needed. So all subsequent REs are considerable slower than
necessary.

There are at least three impacts for developers:

* never use $& and friends in a library.
* Don't "use English" in a library, because it contains the
three bad fellows.

..... by virtue of the 2nd sentence following your quote above.

> Nonetheless, one question may be allowed here: The OP's task was not
> very complicated. Let the quantity of his data be 10,000 lines, on
> anything faster than a x386 processor the performance penalty in this
> simple regex will be unnoticable, or not?

Even the primary docs for $& can dispatch that:

The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches.
^^^
^^^

Assuming that this is part of a significant program, then there are
lots of pattern matchings going on, and *every one* of them (not
just this 1 regex that actually makes use of it) gets slower.

If you mention any of the 3 match variables anywhere in your program,
*all* of your pattern matches get slower (because perl cannot safely
apply the optimization of not maintaining the 3 of them).

Quoth <-berlin.de>:
> Tad McClellan <> wrote:
> : Yes, but cycles are a terrible thing to waste.
>
> : (See $& in perlvar.pod and elsewhere.)
>
> I thought this is only the case with "use English;", at least that's
> how I understood the "Bugs" section in perlvar (of Perl 5.8.6, that is):
>
<snip>
> I attributed the penalty to "use English" rather than to the regex
> implementation. I stand corrected.

See perlre, the paragraph beginning

WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in
the program, it has to provide them for every pattern match. This may
substantially slow your program.

English.pm used to cause a general Rx slowdown as it made a use of $&
(to alias it to $MATCH). As this is not generally useful, current
versions don't do that if you ask them not to (with -no_match_vars).

[side issue: my version of perldoc (Pod:erldoc v3.14), in my locale
(en_GB.UTF-8), transforms the above quote variables to "\$\x{2018}" and
"\$\x{2019}". In text marked (explicitly or implicitly by perldoc) with
C<>, this is less than useful. Is it worth filing a bug?]
> Nonetheless, one question may be allowed here: The OP's task was not
> very complicated. Let the quantity of his data be 10,000 lines, on
> anything faster than a x386 processor the performance penalty in this
> simple regex will be unnoticable, or not?

The point is not that it slows that regex down (indeed, s/(.)/\u$1/ has
the same penalty) but that it slows down *every other regex in the
program*. This can be significant, so using $& is a bad habit to get
into, except for one-liners where it can really simplify some things.

Ben

--
I've seen things you people wouldn't believe: attack ships on fire off
the shoulder of Orion; I watched C-beams glitter in the dark near the
Tannhauser Gate. All these moments will be lost, in time, like tears in rain.
Time to die.

Quoth <-berlin.de>:
> Taking you and Tad's hint to perlvar with regard to performance
> penalties I kludged a small script and ran it on my Mac mini:
>
> use strict;
> use warnings;
> # use English;
> for (my $i=1; $i<1000000; $i++) {
> $_='undecided';
> s/./\U$&/;
> # s/(.)/\U$1/;
> }
>
> which I ran with time, getting the following result:

I would suggest Benchmark.pm for benchmarking . It is easier and more
flexible than using time(1).

<results snipped>
> Then I modified the script:
>
> use strict;
> use warnings;
> # use English;
> for (my $i=1; $i<1000000; $i++) {
> $_='undecided';
> # s/./\U$&/;
> s/(.)/\U$1/;
> }
> I repeated the runs for a number of times; the deviations between each
> run were in the order of 1/100 of a second.
>
> I then tried "use English;" and replaced $& with $MATCH, but the results
> were only insignificantly slower than in the (.)/$<digit>-version.
>
> Is there anything where I have a fundamental misunderstanding, or has the
> severe performance penalty of which perlvar warns been weeded out in the
> perl code while never being purged from the documentation? Or is my example
> just a trivial exception?

Any match which uses capturing parens has the same penalty as using $&.
It's the ones which *don't* which suffer if you use $&. See my post
cross-thread, and perlre.

: I would suggest Benchmark.pm for benchmarking . It is easier and more
: flexible than using time(1).

Next time I'll do it. Using time(1) is just a die-hard habit of mine, born
in the days when there was no Benchmark.pm module.

: > I then tried "use English;" and replaced $& with $MATCH, but the results
: > were only insignificantly slower than in the (.)/$<digit>-version.
: >
: Any match which uses capturing parens has the same penalty as using $&.
: It's the ones which *don't* which suffer if you use $&. See my post
: cross-thread, and perlre.

Now I understand. It is not $& vs. $<digit>, but $& et collegae vs. rest
of the world. Thank you!

[...]
> > pseudocode only
>
>
> Why?
>
> It takes only a tiny bit of effort to bypass the confusion
> caused by the pseudoness.

Unfortunately, the label "pseudocode" is often used as a license to
write anything that comes to mind and let the reader figure out how
the parts fit together.

Unless you are acquainted with a specific pseudo-language you use, writing
decent pseudocode is *harder*, not easier, than using an existing language.
You'll find yourself inventing the language as you go along. Language
design is serious business, pseudo or not. You won't come up with anything
consistent that way.

Pseudocode is for books, not for casual communication.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

Share This Page

Welcome to The Coding Forums!

Welcome to the Coding Forums, the place to chat about anything related to programming and coding languages.

Please join our friendly community by clicking the button below - it only takes a few seconds and is totally free. You'll be able to ask questions about coding or chat with the community and help others.
Sign up now!