I'm writing some code that does some very simplistic parsing, and I'm
just totally geeking out on how awesome D is for writing such code:
import std.conv;
import std.regex;
import std.stdio;
struct Data {
string name;
string phone;
int age;
... // a whole bunch of other stuff
}
void main() {
Data d;
foreach (line; stdin.byLine()) {
auto m = match(line, "(\w+)\s+(\w+)");
if (!m) continue;
auto key = m.captures[1];
auto value = m.captures[2];
alias void delegate(string key, string value) attrDg;
attrDg[string] dgs = [
"name": delegate(string key, string value) {
d.name = value;
},
"phone": delegate(string key, string value) {
d.phone = value;
},
"age": delegate(string key, string value) {
d.age = to!int(value);
},
... // whole bunch of other stuff to
// parse different attributes
];
attrDg errordg = delegate(string key, string value) {
throw Exception("Invalid attribute '%s'"
.format(key));
};
// This is pure awesomeness:
dgs.get(key.idup, errordg)(key.idup, value.idup);
}
// ... do something with Data
}
Basically, I use std.regex to extract keywords from the input, then use
an AA to map keywords to code that implement said keyword. That AA of
delegates is just pure awesomeness. AA.get's default value parameter
lets you process keywords and handle errors with a single AA lookup. I
mean, this is even better than Perl for this kind of text-processing
code!
The only complaint is that I couldn't write auto[string] dgs and have
the compiler auto-infer the delegate type. :-) Additionally, I wasn't
sure if I could omit the "delegate(string,string)" after each keyword;
if that's actually allowed, then this would make D totally pwn Perl!!
(I left out some stuff that makes this code even more of a joy to write:
using nested try/catch blocks, I can throw exceptions from deep-down
parsing code and have the loop that loops over input lines automatically
prefix error messages with the filename/line number where the error
occurred. This way, even errors thrown by to!int() will be formatted
nicely. With Perl, this gets extremely messy due to its pathological use
of $. for line numbers which can get overwritten in unexpected places if
you're processing more than one file at a time.)
Did I mention I'm totally in love with D?? Seriously. It can handle
system-level code and "high-level" text-processing code with total
impunity. What's there not to like?!
T
--
Without geometry, life would be pointless. -- VS

Yea, I've done the same trick :) Fantastic stuff. But the one issue I have
with it is that you can't do this:
void delegate()[string] dgs = [
"name": delegate() {
// do stuff
},
"phone": delegate() {
// do stuff
dgs["name"](); // ERR! (Shit!)
// do stuff
}
];
That limitation is kind of annoying sometimes. I think I filed a ticket for
it...
http://d.puremagic.com/issues/show_bug.cgi?id=3995
Ahh, shit, it's been marked invalid :(

Did I mention I'm totally in love with D?? Seriously. It can handle
system-level code and "high-level" text-processing code with total
impunity. What's there not to like?!

It's better not to create a regex every iteration. Use e.g.
---
auto regEx = regex(`(\w+)\s+(\w+)`);
---
before foreach. Of course, you are not claiming this as a
high-performance program, but creating a regex every iteration is too
common mistake to show such code to newbies.

if (!m) continue;
auto key = m.captures[1];

One `.idup` here will be better. (sorry, just like to nitpick)

auto value = m.captures[2];
alias void delegate(string key, string value) attrDg;
attrDg[string] dgs = [
"name": delegate(string key, string value) {
d.name = value;
},
"phone": delegate(string key, string value) {
d.phone = value;
},
"age": delegate(string key, string value) {
d.age = to!int(value);
},
... // whole bunch of other stuff to
// parse different attributes
];
attrDg errordg = delegate(string key, string value) {
throw Exception("Invalid attribute '%s'"
.format(key));
};
// This is pure awesomeness:
dgs.get(key.idup, errordg)(key.idup, value.idup);
}
// ... do something with Data
}
Basically, I use std.regex to extract keywords from the input, then use
an AA to map keywords to code that implement said keyword. That AA of
delegates is just pure awesomeness. AA.get's default value parameter
lets you process keywords and handle errors with a single AA lookup. I
mean, this is even better than Perl for this kind of text-processing
code!
The only complaint is that I couldn't write auto[string] dgs and have
the compiler auto-infer the delegate type. :-) Additionally, I wasn't
sure if I could omit the "delegate(string,string)" after each keyword;
if that's actually allowed, then this would make D totally pwn Perl!!

(I left out some stuff that makes this code even more of a joy to write:
using nested try/catch blocks, I can throw exceptions from deep-down
parsing code and have the loop that loops over input lines automatically
prefix error messages with the filename/line number where the error
occurred. This way, even errors thrown by to!int() will be formatted
nicely. With Perl, this gets extremely messy due to its pathological use
of $. for line numbers which can get overwritten in unexpected places if
you're processing more than one file at a time.)
Did I mention I'm totally in love with D?? Seriously. It can handle
system-level code and "high-level" text-processing code with total
impunity. What's there not to like?!
T

It's better not to create a regex every iteration. Use e.g.
---
auto regEx = regex(`(\w+)\s+(\w+)`);
---
before foreach. Of course, you are not claiming this as a
high-performance program, but creating a regex every iteration is
too common mistake to show such code to newbies.

[...]
Good idea, I really need to work on my delegate syntax. I must admit I
still have to look it up each time, 'cos I just can't remember the right
syntax with all its shorthands thereof.
T
--
Let's not fight disease by killing the patient. -- Sean 'Shaleh' Perry

Good idea, I really need to work on my delegate syntax. I must admit I
still have to look it up each time, 'cos I just can't remember the right
syntax with all its shorthands thereof.

Heh, I can remember the shorthands: It's the full syntax (and syntax for
referring to the type itself) that I can never remember. One puts "delegate"
right next to the opening paren, the other puts something else there, meh, I
can never keep that straight. But the shortcuts I always remember :)

It's better not to create a regex every iteration. Use e.g.
---
auto regEx = regex(`(\w+)\s+(\w+)`);
---
before foreach. Of course, you are not claiming this as a
high-performance program, but creating a regex every iteration is
too common mistake to show such code to newbies.

It's better not to create a regex every iteration. Use e.g.
---
auto regEx = regex(`(\w+)\s+(\w+)`);
---
before foreach. Of course, you are not claiming this as a
high-performance program, but creating a regex every iteration is too
common mistake to show such code to newbies.

And that's why I pluged this hole - it happens too often. At least up to
mm... 16 regexes are cached.
--
Dmitry Olshansky

The only complaint is that I couldn't write auto[string] dgs and have
the compiler auto-infer the delegate type. :-)

Does this not work?
auto dgs = ...
Also, it doesn't look like that needs to be in the inner loop. Each time
you specify an AA literal, it allocates a new one. So you are allocating
another AA literal per line.
-Steve

The only complaint is that I couldn't write auto[string] dgs
and have
the compiler auto-infer the delegate type. :-)

Does this not work?
auto dgs = ...
Also, it doesn't look like that needs to be in the inner loop.
Each time you specify an AA literal, it allocates a new one.
So you are allocating another AA literal per line.
-Steve

The only complaint is that I couldn't write auto[string] dgs and have
the compiler auto-infer the delegate type. :-)

Does this not work?
auto dgs = ...
Also, it doesn't look like that needs to be in the inner loop. Each
time you specify an AA literal, it allocates a new one. So you are
allocating another AA literal per line.
-Steve

The => syntax replaces:
- parentheses around the parameter if there is only one parameter
- curly brackets
- the return keyword
- the semicolon at the end of the return statement
http://dlang.org/expression.html#Lambda
So => is most suitable when there is a single return statement.
Ali

I can't compile it. I get "Out of memory". Is it the regex.d module
again ?:(
This one really needs to be fixed ASAP, as the older working

Ah-ha-ha. OK, come on use it the source are out there in the open :)
It didn't even handle * properly.
regexp is

deprecated.

Just stop using ctRegex for now... it's experimental.
Or more to the point the problem is this. I've seen this one on bugzilla:
version(CtRgx) {
enum Re = ctRegex!re;//auto is OK here BTW
} else {//that's the problem. It's _parsed_ at compile-time
static Re = regex(re);//switch static to auto
}
}
And there is little I can do untill CTFE stops bleeding RAM.
--
Dmitry Olshansky

Just stop using ctRegex for now... it's experimental.
Or more to the point the problem is this. I've seen this one on
bugzilla:
version(CtRgx) {
enum Re = ctRegex!re;//auto is OK here BTW
} else {//that's the problem. It's _parsed_ at compile-time
static Re = regex(re);//switch static to auto
}
}
And there is little I can do untill CTFE stops bleeding RAM.

Just stop using ctRegex for now... it's experimental.
Or more to the point the problem is this. I've seen this one on bugzilla:
version(CtRgx) {
enum Re = ctRegex!re;//auto is OK here BTW
} else {//that's the problem. It's _parsed_ at compile-time
static Re = regex(re);//switch static to auto
}
}
And there is little I can do untill CTFE stops bleeding RAM.

Just stop using ctRegex for now... it's experimental.
Or more to the point the problem is this. I've seen this one on bugzilla:
version(CtRgx) {
enum Re = ctRegex!re;//auto is OK here BTW
} else {//that's the problem. It's _parsed_ at compile-time
static Re = regex(re);//switch static to auto
}
}
And there is little I can do untill CTFE stops bleeding RAM.

[...]
Hmph. I should've checked dmd memory usage when I wrote that. :-(
But anyway, even on my souped up AMD hexacore system, the ctRegex
version takes significantly longer to compile than the non-ctRegex
version. Perhaps I should just avoid ctRegex for now (though it *is* an
ultracool feature of std.regex).
I'm getting confused about the use of 'static' in this context. What I
wanted was to make the regex module-global, but apparently 'static' has
an overloaded meaning here, and also makes it compile-time evaluated?
How do I make it module-global without being compile-time evaluated??
T
--
"You know, maybe we don't *need* enemies." "Yeah, best friends are about all I
can take." -- Calvin & Hobbes

I'm getting confused about the use of 'static' in this context.
What I
wanted was to make the regex module-global, but apparently
'static' has
an overloaded meaning here, and also makes it compile-time
evaluated?
How do I make it module-global without being compile-time
evaluated??
T

it just so happens to be that 'static' function variables in D
require compile-time available initializers. To initialize with a
runtime value on first execution, you have to implement that
manually:
void foo(int a)
{
static int first;
static first_initialized = false;
if(!first_initialized)
{
first = a;
first_initialized = true;
}
// ...
}

Just stop using ctRegex for now... it's experimental.
Or more to the point the problem is this. I've seen this one
on bugzilla:
version(CtRgx) {
enum Re = ctRegex!re;//auto is OK here BTW
} else {//that's the problem. It's _parsed_ at compile-time
static Re = regex(re);//switch static to auto
}
}
And there is little I can do untill CTFE stops bleeding RAM.

[...]
Hmph. I should've checked dmd memory usage when I wrote that.
:-(
But anyway, even on my souped up AMD hexacore system, the
ctRegex
version takes significantly longer to compile than the
non-ctRegex
version. Perhaps I should just avoid ctRegex for now (though it
*is* an
ultracool feature of std.regex).
T

Well, the big problem is, even if I fall back to runtime regex, I
can't compile anymore on a Windows box with 2Gb of RAM. It's hard
to swallow...

Hmph. I should've checked dmd memory usage when I wrote that. :-(
But anyway, even on my souped up AMD hexacore system, the ctRegex
version takes significantly longer to compile than the non-ctRegex
version. Perhaps I should just avoid ctRegex for now (though it *is*
an ultracool feature of std.regex).

Well, the big problem is, even if I fall back to runtime regex, I
can't compile anymore on a Windows box with 2Gb of RAM. It's hard to
swallow...

It's my fault. I really should be using module globals for those
regexes, and a module ctor (static this) for initializing them.
It would be nice if the CTFE implementation was improved, though. CTFE
is one of the big major features of D that I really liked.
But speaking of which, are you using the latest version of dmd? 'cos I
think recently there were some CTFE efficiency issues that got fixed.
T
--
"How are you doing?" "Doing what?"

It's my fault. I really should be using module globals for those
regexes, and a module ctor (static this) for initializing them.

In most cases, I've come to prefer lazy initalization (via a module-level
property) over module ctors because:
1. If it never actually gets used, you avoid adding extra processing at
startup merely because you imported some module.
2. It decreases the risk of hitting the dreaded cyclic-module-dependency
error (major PITA).
And if you do want to throw away #1 and force initialization upon startup,
you can still do that by simply accessing it at the beginning of main.
I've always felt module ctors were a great idea, but after hitting the
cyclic dependency issue enough times (or even just the first time), I've
come to think they're, unfortunately, best avoided.

I'm writing some code that does some very simplistic parsing,
and I'm
just totally geeking out on how awesome D is for writing such
code:
import std.conv;
import std.regex;
import std.stdio;
struct Data {
string name;
string phone;
int age;
... // a whole bunch of other stuff
}
void main() {
Data d;
foreach (line; stdin.byLine()) {
auto m = match(line, "(\w+)\s+(\w+)");
if (!m) continue;
auto key = m.captures[1];
auto value = m.captures[2];
alias void delegate(string key, string value) attrDg;
attrDg[string] dgs = [
"name": delegate(string key, string value) {
d.name = value;
},
"phone": delegate(string key, string value) {
d.phone = value;
},
"age": delegate(string key, string value) {
d.age = to!int(value);
},
... // whole bunch of other stuff to
// parse different attributes
];
attrDg errordg = delegate(string key, string value) {
throw Exception("Invalid attribute '%s'"
.format(key));
};
// This is pure awesomeness:
dgs.get(key.idup, errordg)(key.idup, value.idup);
}
// ... do something with Data
}
Basically, I use std.regex to extract keywords from the input,
then use
an AA to map keywords to code that implement said keyword.
That AA of
delegates is just pure awesomeness. AA.get's default value
parameter
lets you process keywords and handle errors with a single AA
lookup. I
mean, this is even better than Perl for this kind of
text-processing
code!
The only complaint is that I couldn't write auto[string] dgs
and have
the compiler auto-infer the delegate type. :-) Additionally, I
wasn't
sure if I could omit the "delegate(string,string)" after each
keyword;
if that's actually allowed, then this would make D totally pwn
Perl!!
(I left out some stuff that makes this code even more of a joy
to write:
using nested try/catch blocks, I can throw exceptions from
deep-down
parsing code and have the loop that loops over input lines
automatically
prefix error messages with the filename/line number where the
error
occurred. This way, even errors thrown by to!int() will be
formatted
nicely. With Perl, this gets extremely messy due to its
pathological use
of $. for line numbers which can get overwritten in unexpected
places if
you're processing more than one file at a time.)
Did I mention I'm totally in love with D?? Seriously. It can
handle
system-level code and "high-level" text-processing code with
total
impunity. What's there not to like?!
T

Firstly, the current version of dmd doesn't do any special translation
of a switch statement on a string variable, so it just becomes a
sequence of if-statements with string comparisons (slow). Using an
associative array makes the lookup O(1), which is fast(er) when you have
a lot of keys.
Second, using an AA allowed me to factor out the code that does the
parsing (see the parseBlock function in the second version of my code).
There's no way to do this with a switch statement.
In retrospect, the second point is probably more important, because the
use of delegates probably delays the point at which AA performance
starts to overtake a switch statement.
T
--
Caffeine underflow. Brain dumped.