A difference of that amount is likely expecting something like
regex("Blah") to not have to create a new regex struct each time,
something which I'm guessing Ruby does (as do other standard
libraries like .NET).

A difference of that amount is likely expecting something like
regex("Blah") to not have to create a new regex struct each time,
something which I'm guessing Ruby does (as do other standard libraries
like .NET).

Yea, I agree that's what it sounds like. I tried to post a response, but I'm
just getting this result (and yes, this is with JS enabled):
--------------------------------------------
Asirra validation failed!
ticket = start ASIRRAVALIDATION ir=cd ir data=
start RESULT ir=1cd ir 1 data=Failend Resource id #62cd ir 0 data=
start DEBUG ir=cd ir data=exceptions.Exception: invalid ticket formatend
Resource id #62cd ir 0 data=
end Resource id #62XML:
Fail
exceptions.Exception: invalid ticket format
--------------------------------------------If it's working for anyone else,
maybe you could post it for me?:
--------------------------------------------
A few things on the D verison:
- Make sure you're using a recent version of DMD. The regex engine was
overhauled fairly recently (I forget exactly which version, but the latest,
2.058 definitely has it, along with some bugfixes.)
- Make sure you're using "std.regex", not the deprecated "std.regexp".
- It sounds like this may be your main problem: Make sure you're not
re-creating the same regex multiple times:
// Bad:
foreach(str; strings)
{
auto result = match(str, regex("abc.*def"));
}
// Good:
auto myRegex = regex("abc.*def");
foreach(str; strings)
{
auto result = match(str, myRegex);
}
Some regex engines cache the regex, but D's does't ATM. I think that'll
likely get fixed though.
- Even better yet, if your regex string is a literal (or otherwise known or
computable at compile-time) as above, use the compile-time version instead:
auto myRegex = ctRegex!"abc.*def";
// [...same 'foreach' loop as before...]
--------------------------------------------

Hello everyone,
I'm the author of the blog post.
First of all, thanks so much for the interest in my problem. I
had no idea that the D community was so active (a fact that
pleases me greatly).
A quick update. I've written a small benchmark based on my real
code and I'm now getting *significantly* better performance from
my D code.
I'm currently trying to figure out what I'm doing differently in
my original program. At this point I am assuming that I have an
error in my code which causes the D program to do much more work
that its Ruby counterpart (although I am currently unable to find
it).
When I know more I will let you know.
James Blewitt

I'm currently trying to figure out what I'm doing differently
in my original program. At this point I am assuming that I
have an error in my code which causes the D program to do much
more work that its Ruby counterpart (although I am currently
unable to find it).
When I know more I will let you know.
James Blewitt

That was the same type of thing I was seeing with very simple
regex expressions. The regex was on the order of 30 times slower
than hand code for finding words in strings. The ctRegex is on
the order of 13x slower than hand code. The times below are from
parallel processing on 100MB of text files, just finding the word
boundaries. I uploaded that tests in
https://github.com/jnorwood/wc_test
I believe in all these cases the files are being cached by the
os, since I was able to see the same measurements from a ramdisk
done with imdisk. So in these cases the file reads are about
30ms of the result. The rest is cpu time, finding the words.
This is with default 7 threads
finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms
This is processing the same data with 1 thread
finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms
And this is processing the same data with 13 threads
finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms
The only change in the program that is uploaded is to add the
suggested
defaultPoolThreads(13);
at the start of main to change the ThreadPool default thread
count.

I'm currently trying to figure out what I'm doing differently in my
original program. At this point I am assuming that I have an error in
my code which causes the D program to do much more work that its Ruby
counterpart (although I am currently unable to find it).
When I know more I will let you know.
James Blewitt

That was the same type of thing I was seeing with very simple regex
expressions. The regex was on the order of 30 times slower than hand
code for finding words in strings.

This is a sad fact of life, the general tool can't beat highly
specialized things. Ideally it can be on par though. Even in the best
case ctRegex has to do a lot of things a simple == '\n' doesn't do, like
storing boundaries of match. That's something to keep in mind.
By the way, regex does fine job on (semi-)fixed strings of length >=
3-4, often easily beating plain find/indexOf. I haven't tested
Boyer-Moore version of find, that should be faster then regex for sure.
The ctRegex is on the order of 13x

slower than hand code. The times below are from parallel processing on
100MB of text files, just finding the word boundaries. I uploaded that
tests in https://github.com/jnorwood/wc_test
I believe in all these cases the files are being cached by the os, since
I was able to see the same measurements from a ramdisk done with imdisk.
So in these cases the file reads are about 30ms of the result. The rest
is cpu time, finding the words.
This is with default 7 threads
finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms
This is processing the same data with 1 thread
finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms
And this is processing the same data with 13 threads
finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms
The only change in the program that is uploaded is to add the suggested
defaultPoolThreads(13);
at the start of main to change the ThreadPool default thread count.

Hello everybody,
Thanks once again for the interest in my problem. I have posted
the details and source code that recreates (at least for me) the
poor performance.
I didn't know how to post the code to the forum, so I posted it
to my blog instead (see post update):
http://jblewitt.com/blog/?p=462
Again, if I'm doing something stupid in my code (which is
possible) then I apologise in advance.
I'll take a look at the ctRegex as soon as I can.
Regards,
James

Hello everybody,
Thanks once again for the interest in my problem. I have posted the
details and source code that recreates (at least for me) the poor
performance.
I didn't know how to post the code to the forum, so I posted it to my
blog instead (see post update):
http://jblewitt.com/blog/?p=462
Again, if I'm doing something stupid in my code (which is possible) then
I apologise in advance.

No need to apologize, but you are using 2.054, which is unfashionable :)
More importantly 2.054 contains old and rusty version of std.regex, the
new version was included in 2.057+.
BTW The current release is 2.058.

Hello everybody,
Thanks once again for the interest in my problem. I have posted the
details and source code that recreates (at least for me) the poor
performance.
I didn't know how to post the code to the forum, so I posted it to my
blog instead (see post update):
http://jblewitt.com/blog/?p=462
Again, if I'm doing something stupid in my code (which is possible) then
I apologise in advance.

No need to apologize, but you are using 2.054, which is unfashionable :)
More importantly 2.054 contains old and rusty version of std.regex, the
new version was included in 2.057+.
BTW The current release is 2.058.

Dmitry did impressive work over those few version of Phobos/DMD. The
performance is even more impressive when you consider that std.regex
supports things like named matching and lookbehind that often slow
down a regex (also kinda removes the "regular" from the name regular
expression, technically)
--
James Miller