Welcome to the Lounge

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

No matter how many times I go back to rewrite the thing, the LALR(1) parser I wrote falls down under the weight of its own contradictions as soon as I try to add error recovery and I have to go back and rewrite it again. I've lost count of how many times I've done this. At least three.

Broken As Designed. FML

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

I have a tool to switch between my alpha releases though because it makes it easier for me to resolve bugs that crop up when i'm regression testing. I don't have automated tests for much of this because parsing is complicated as hell to test extensively. (It's easy at the high level, but things like testing follows sets generation or LR(0) generation is damned tricky)

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

This is the 3rd version of the LALR(1) parsing. The first was an initial creation of Newt that i never released. The second version I never released either, but it worked - at least as well as this one did, which is to say fine until the input has errors.

The trouble with this is LALR(1) is incredibly convoluted. The algorithm is ridiculous. If you saw it the first thing you'd think is "oh my gosh there has to be an easier way, anything is better than this" - and normally you'd be right. The trouble is, mathematically you'd be wrong in this case. LR parsing is just *complicated* like that. The table generation - let's just say even an RDBMS would find it really tough.

The parser, for all that is shockingly simple, but unintelligible because the parse table is unintelligible by the time it's crunched. Traversing it is easy because there's a formula to it, but it still doesn't make sense, if that makes sense. =)

But all this means is as long as everything works as expected, it's great. But when you start getting errors in your input, all of the sudden you have to take over from the normal parsing algorithm, preserve its state, potentially modifying its state to account for the error(s), interrupt the parse to report the errors, and then continue, all while managing to keep the parse table synced with the input. This is convoluted in an LL(1) parser, much less a LALR(1) parser.

When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.

There is a saying in Germany that you build your first house for your fiend, the second for your friend, and the third for you.
This must not only apply to house architecture, but also code architecture.