What is the matter with these people? How, after all the experience we've had with XSLT, Ant, WSDL, etc., etc., could they create YET ANOTHER XML language. Are they dolts? Are they idiots?

But when Peter posted a comment asking "how quickly can you write a parser...," I revisited the post from Bob, and dug into it a little.

I'm going to go out on a limb here and use something I learned in school (this doesn't happen often, at least not with the "theoretical" stuff). There areconcise ways to describe languages and grammars, so one would think there exists a tool that can take that description and automatically parse some text for you. It sounds reasonable, anyway. In fact, checking up on it, that's what tools like ANTLR and YACC seem to do.

As far as rolling your own parser: if your language is very simple, you can easily write a parser using string.split(pattern) that would do the job. It's only when the language gets more complex that the parsing becomes difficult. In this case, Robert Martin mentioned that you should "write a little YACC grammar that is nice, and small, and translates into that hideous XML." Since I couldn't find a download for YACC, I decided to get ANTLR and give it a whirl.

I'll show a very simple dependency injection DSL that follows this basic rule: make bean: id, class, constructor-arg {name=value, name2=value2,...}. Obviously, when writing a real one you'd want to take some time to make it simpler for the user, which would lead to a more complex grammar than this. In any case, the code if you were to write it might look like:

First, lets define the tokens for the lexer. In ANTLR, these start with a capital letter, so we have:
MakeBean, BeanID, Class, TypeOfInjection, ArgName, and ArgValue. (I'll put it all together in legal ANTLR statements below)

Then, we'll want to define the rules for our parser. These start with lowercase letters. For this, we have statements, expressions, args and prog, our program. Statements consist of expressions followed by CRLF which may lead to another expression or the end. I added args in, which could have easily been put right into the statement if I had wanted.

Here's the code you'd use in ANTLR. So far, I see that it draws state machines for me, but I don't yet know how to feed it input and get output (however, I imagine that wouldn't be too difficult). I've tried to add comments to explain what I understand to be going on.

I don't claim that this design is the optimal (or even close to optimal) one - this is the first I've done something like this outside of an academic setting, where the goal was to explain the kinds of strings something like this might generate (or, given some strings, construct a grammar that can generate it). In fact, if you've got a better design (with reasons, or some heuristics we can follow), I'd especially love to hear from you in the comments. Also, feel free to ask questions and I'll answer them to the best of my ability.

In all, it is hard to measure how long this took me. I had tons of different distractions going on while doing this, so discounting those I'd estimate about an hour or two to get this VERY minor grasp of ANTLR- but I also have the benefit of already being exposed to the grammar description language, so your mileage may vary. In case you're interested, here are some ANTLR tutorials.

The main drawback to ANTLR is that it has only a few target languages (at the moment): Java, C#, Objective C, C, Python and Ruby. On the other hand, Perl, C++, and Oberon are being worked on, and you are able to add support for others. (Does anyone want to make one of these for ColdFusion?)

Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!

ANTLR is crazy cool, I was toying with writing an actionscript parser once...but then I realized, this was a bit over my head...as for a CF parser, you should talk to Mark Mandel, I believe he's been writing one for the CFEclipse project.

It's how I built TQL for Transfer (tho if I built it now I would have built it differently, but that's the way of all things), and integrated the Java code with Transfer using JavaLoader.

The main power I find with ANTLR is that it is just SO extensible, not only within the grammar, but also in terms of the code you can add to things. You don't like the CommonToken... make it use a different one.. you don't like the Tree implementation, use a different one there too, if you want to do something tricky, you can write inline code into your grammar to do fancy rewrites with island parsers, catch exceptions for better error handling.. the list is endless.

ANTLR is an amazing tool, and the LL(*) parsing technology is crazy smart.

Sam: if you ever want to talk ANTLR or CF/ANTLR integration, drop me a line.

The ANTLR mailing list is an *awesome* resource, I usually have replies back to my questions within the hour, and at the maximum 24 hours (Terrence is very active on the mailing list), but I'm always ready to chat ANTLR via IM or otherwise any time ;)

So yeah, I don't mind.

@Peter: I finally got the book too... after using it for over 6 months, I figured it was about time... ;)

Your email address is not displayed.
It is used only to respond to you if needed, and
send comments if you subscribe to this comment thread.
It is stored in a cookie if you choose to "Remember my details".