Community

*Spoiler*: let's slowly deprecate "g" option in std.regex in a few years
or with any luck a bit faster. The better replacement is proposed.
For better or worse the current API has retained a (high) level of
compatibility with the old API. That means I've missed the chance to fix
it when I could, and here is the prime problem (the hardest) I have with it:
foreach(m; match("bleh-blah", "bl[ea]h"))
{
writeln(m.hit);
}
The "quiz" is - how many lines will this print?
The current answer is 1. And that the right solution for all matches is:
foreach(m; match("bleh-blah", regex("bl[ea]h","g"))
{
writeln(m.hit);
}
Which is not only looks unsightly but also confuses operation option
(find _all_ vs find _first_) with property of a pattern (like
case-insensitivity is). And if regex pattern is defined elsewhere it
could easily introduce a bug (albeit one that's easy to track, "usually").
To underline the point: std.regex.splitter doesn't take "g" flag into
account at all (it makes no sense there).
I've pondered a couple of solutions in a bug report by bearophile:
http://d.puremagic.com/issues/show_bug.cgi?id=7260
After all of these ideas born and discarded, here is what I believe is
the way forward out of this mess:
Make "g" indicates only the intended _default_ search mode of this
pattern (global - first match).
User is free to override this default explicitly and in fact encouraged
to do so. The idea of default search mode attached to the regex pattern
is marked as discouraged.
The overrides have to be convenient and backwards compatible.
Thus I propose the follwing:
match and replace become structs (types, oh my!) with the following
"interface":
struct match //ditto for replace
{
//current behavior
static auto opCall(.....);
//get the first match / replace only first occurance
static auto first();
// force to find all matches (still lazy range) and
static auto all();
}
OT: C++ folks call this namespace, but they don't have static opCall -
suckers ;) And I actually proposed (twice) to kill static opCall, sweet
irony.
Then the motivating example would be :
foreach(m; match.all("bleh-blah", "bl[ea]h"))
{
writeln(m.hit);
}
and :
//prints all submatches of the first match:
foreach(m; match.first("bleh-blah", "bl[ea]h"))
{
// don't compile, m - is the first match itself no .hit there
// that should make it harder to confuse
// "first match" with "all matches"
//writeln(m.hit);
writeln(m);
}
We can go further and introduce the enhancement I long dreamed of:
//'any' or 'test' are also the names to choose from
if(match.anywhere(string, "[0-9]+"))
{
//there is at least 1 match (no need for other info)
...
}
The reason I want this "shorthand" is that regex engine can cut a bunch
of corners and serve up this "is there a match somewhere?" request much,
MUCH faster then "where is the first match and all of its submatches?".
And many use cases only need this yes/no thing anyway.
... that got a bit lengthy - any thoughts, criticism, opinions ?
--
Dmitry Olshansky

12-Mar-2013 17:12, Dmitry Olshansky пишет:
> 12-Mar-2013 14:36, Andrej Mitrovic пишет:
>> On 3/12/13, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:
>>> struct match //ditto for replace
>>> {
>>> //current behavior
>>> static auto opCall(.....);
>>> }
>>
>> For a second I was worried this would break UFCS, but actually it
>> still works. Pretty kewl.
>>
>
> Actually it does... but only partially:
>
[snip]
So with my initial idea being **cked up by UFCS on 'b' in case of
'a.b.c' resolution chain.
The problem is not anything new BTW as there is no way to do a fully
qualified call with UFCS. something.std.ascii.isWhite also won't work.
Darn UFCS. Maybe we can discover some sane rule to cover both corner cases?
> There's got to be some way out of it that doesn't involve alias this and
> proxies...
>
And w/o proxies I can get at least match(...).all and match(...).first.
But not replace as it used to return a naked array and thus it would
need to be a proxy... and proxies have another problem.
Again the problem is not anything new but I believe is flaw in any
proxies design in D. It's the fact that auto type inference on
intialization sees proxies for what they are:
auto x = replace(...); // now typeof(x) is some ugly proxy junk
auto y = replace(...).all; //fine - typeof(y) is array
auto z = replace(...)first; //fine - typeof(z) is array
Would it make sense to somehow tweak the langauge to allow proxies to
decay to some 'default' type (of thier choice) on initialization?
It seems to me that any container (or whatever) that builds on proxies
is going to hit this wall.
--
Dmitry Olshansky

12-Mar-2013 19:08, Nick Sabalausky пишет:
> On Tue, 12 Mar 2013 11:06:56 -0400
> Nick Sabalausky <SeeWebsiteToContactMe@semitwist.com> wrote:
>>
>> matchFirst
>> matchAll
>> matchTest
>>
>> ?
>>
>
> s/matchTest/isMatch/
>
or rather 'hasMatch'
But that's too obvious :)
The problem is that I wanted to avoid creating a bunch of new names,
especially as they are tweaks/option on the original behavior.
I'd go with direct enum flags but it's a bit too verbose:
match("blah-bleh", "bl[ae]h", Match.all); //Match.first
That's why I've thought to see a way to get any of
match.all(...) or match(...).all working. And that is possible, but not
with replace.
--
Dmitry Olshansky

On Tuesday, 12 March 2013 at 09:41:08 UTC, Dmitry Olshansky wrote:
> *Spoiler*: let's slowly deprecate "g" option in std.regex in a
> few years or with any luck a bit faster. The better replacement
> is proposed.
>
> For better or worse the current API has retained a (high) level
> of compatibility with the old API. That means I've missed the
> chance to fix it when I could, and here is the prime problem
> (the hardest) I have with it:
>
> foreach(m; match("bleh-blah", "bl[ea]h"))
> {
> writeln(m.hit);
> }
>
> The "quiz" is - how many lines will this print?
>
> The current answer is 1. And that the right solution for all
> matches is:
>
> foreach(m; match("bleh-blah", regex("bl[ea]h","g"))
> {
> writeln(m.hit);
> }
>
> Which is not only looks unsightly but also confuses operation
> option (find _all_ vs find _first_) with property of a pattern
> (like case-insensitivity is). And if regex pattern is defined
> elsewhere it could easily introduce a bug (albeit one that's
> easy to track, "usually").
>
> To underline the point: std.regex.splitter doesn't take "g"
> flag into account at all (it makes no sense there).
>
> I've pondered a couple of solutions in a bug report by
> bearophile:
> http://d.puremagic.com/issues/show_bug.cgi?id=7260
>
> After all of these ideas born and discarded, here is what I
> believe is the way forward out of this mess:
>
> Make "g" indicates only the intended _default_ search mode of
> this pattern (global - first match).
>
> User is free to override this default explicitly and in fact
> encouraged to do so. The idea of default search mode attached
> to the regex pattern is marked as discouraged.
>
I nearly always forget to include "g" so I welcome any changes
that make make "g" go away. match.first/match.all/etc. is easy
to read and the intent is right up front which I prefer over
tacking a flag argument on the end. matchFirst/matchAll/etc. is
fine too but not nearly as cool :).

On Tuesday, 12 March 2013 at 09:41:08 UTC, Dmitry Olshansky wrote:
> ... that got a bit lengthy - any thoughts, criticism, opinions ?
I like it. Maybe Nick is right in just having separate functions
so UFCS is still working. Or since opCall works we can just say
the new types are only callable without UFCS, and maybe the
future will hold an improvement for it.