I've got a new regex trick up my sleeve, and although I'm proposing a patch for it, I'm not sure it'll be accepted. It's a new pseudo-anchor, \K, which tells the engine to pretend it just started matching. That's not a very good explanation, so let me use an example:

This is sort of like Prolog's "cut", isn't it? Match
a prefix, then never backtrack back past the \K anchor to
satisfy the regex. Can you have multiple \Ks in a single
regex? I can see Abigail-II having lots of fun with
this.

Yes, you can have multiple \K's in a regex, although only the last one really means anything, should the regex succeed. I'm actually curious why you would need more than one in the same branch, but it's perfectly legal.

But no, it's not cut. Cut is (?>pat). This is just telling the regex engine to pretend it just started matching, that $& is defined starting HERE.

I understand (I think) why this is fast. What I don't understand is how it works. I'll admit that I am not much of regex hacker, but how does the variable length look behind now how far to look, in the case of quantifiers, especially greedy ones? It seems that the goal of the anchor is to prevent backtracking (hence the speed gains) but how does it now when to stop? If the \K anchor tells the regex engine to watch for the oncoming regex and stop matching there, whats the difference between \K and a minimal match, or negated character class? If the engine doesn't stop at the first instance of, say '.' (as in your example) how do you keep it from backtracking (which is where I'm presuming the speed gains come from)? Or is \K an optimization using sexegers?

Now, japhy I have absolute faith in your pattern - foo, so I take it on faith that this works. What I'm curious about is how. Am I missing something in thinking that the performance gains here are related to backtracking?

Cheers,
Erik

Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet

Okay, I know it's because I'm dumb, but I still don't get it. Please don't yell at me, but why does the \K anchor keep .* from matching .jkl? And if it backtracks like normal, then where does the speed come from? I think that may be the essence of my confusion - why is this faster?

If I don't get it this time, I'll give up and just trust it ;-).

Cheers,
Erik

Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet

If this patch doesn't go through (and I can see reasons why it wouldn't, but I'd like it to go through), the same functionality can be achieved by my new module, Regexp::Keep. It's not on CPAN yet, but it's an XS module that does two things:

filters regexes for the \K escape sequence, and turns them into (?{Regexp::Keep::KEEP})...

and defines an XS function, KEEP(), that does what \K is supposed to do

Of course, constant-overloading is nasty, so I hope my patch is received well.

If you're looking for input, then I'd have to say that that's a wonderful idea. It's odd, because I was just wishing there was something like this over the past week -- for one, I think it makes the substitutions look a little bit cleaner as to their true intent (but maybe that's just me).

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other