Parsing Bison grammars with Antlr

I’ve started to write an Antlr grammar for Bison, with the goal of automatically converting the grammars to Antlr, or another parser generator for that matter. As it turns out, the “central dogma” of parsing (i.e., “you cannot use an LR grammar in an LL parser, and vice versa”) is untrue with the unlimited symbol lookahead parsers that are available nowadays. The major issue will be handling static semantics. Many Bison grammars embed tree construction into the grammar, as well as performing static semantic checks. All this needs to be ripped out and done in a clean manner.

I have a grammar for Bison that works pretty well on eleven different grammars of varying complexity. I am now looking into a Github scraper that searches for and collects all public Bison/Yacc grammars so I can place them in a test suite.

–Ken

Update Feb 24 2020: I have a Github crawler that is now downloading 32K Yacc grammars. I plan to test each of them with this Bison parser. Here is the script (more or less without chunking issues resolved).

1/2: Adding to #Antlrvsix an analysis tool of #Antlr grammars. This is how it works with cycle detection and useless lexer rules. There are some issues in the responsiveness of the MS LSP client for VS2019.

Adding to #Antlrvsix the refactoring to remove useless parentheses in an #Antlr grammar. This is how it works with the extra parentheses in the arrayAccess_lf_primary rule in Java9.g4 that nobody knew were there. Only yet starting to scratch the surface of grammar optimizations.

Implementing #Antlr grammar fold refactorings in #Antlrvsix. Two types: extract a selected sequence of symbols and make a rule (shown first); replace all occurrences in the grammar with a folded rule (shown second). Spacing and comments do not matter.