Rewriting the pattern matching engine — part 1

For the last two weeks, I’ve trying to write Piggy patterns to construct a symbol table from a Java AST. Patch after patch, I’d change the pattern matching code to “fix” something that wasn’t working. Unfortunately, I finally wrote a pattern that broke the camel’s back, “< classBodyDeclaration < modifier >* < memberDeclaration < methodDeclaration > > >”, which looks for methods within a class.

So, I decided to rewrite the engine the way it should have been done: using an NFA. It’s so far taken a week or so, but it turns that the pattern matcher is much cleaner, and likely faster. In addition, the output engine–which executes the code blocks in the pattern–is also much cleaner. I’ll try to see if I can combine the patterns together in one automaton.

I really should have known better than to approach the tree regular expression matching problem following what other people did, using a top-down recursive recognizer. Live and learn. Always follow a clean, clear theory instead of just hacking.

1/2: Adding to #Antlrvsix an analysis tool of #Antlr grammars. This is how it works with cycle detection and useless lexer rules. There are some issues in the responsiveness of the MS LSP client for VS2019.

Adding to #Antlrvsix the refactoring to remove useless parentheses in an #Antlr grammar. This is how it works with the extra parentheses in the arrayAccess_lf_primary rule in Java9.g4 that nobody knew were there. Only yet starting to scratch the surface of grammar optimizations.

Implementing #Antlr grammar fold refactorings in #Antlrvsix. Two types: extract a selected sequence of symbols and make a rule (shown first); replace all occurrences in the grammar with a folded rule (shown second). Spacing and comments do not matter.