Search / Analytics / Distributed Systems / Machine Learning / DSLs

Creating DSL With Antlr4 and Scala

Oct19th, 20172:18 am

Domain specific languages, when done right, helps a lot in improving developer productivity. First thing which you need while creating a DSL is a parser which can takes a piece of text and transforms it in structured format(like Abstract Syntax Tree) so that your program can understand and do something useful with it. DSL tends to stay for years so while choosing a tool for creating parser for you DSL you need to make sure that its easy to maintain and evolve the language. For parsing simple DSL, you can just use regular expression or scala’s in-built parser-combinators, but for even slightly complex DSL, both of these becomes performance and mantainenance nightmares.

ANTLR4

ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool’s ease of use. It supports:

Tree construction

Tree walking

Error recovery

Error handling

Translation

Antlr supports a large number of target languages, so same grammar can be used for both backend parsing or frontend validations. Following langauges are supported:

How ANTLR works

Use generated sources to convert some raw input into structured form (AST)

Do something with this structured data

We will understand it with an example. Lets say we want to create a DSL for allowing arithmetic operation. A valid input(expression) will be

1

3 + (4 * 5)

As humans, if we want to evaluate this expression, here’s what we will do:

Split this expression into different components.

For example in above example, each character belongs to one of these group

Operands (3, 4, 5)

Operation (+ - * /)

Whitespaces

This part is called lexical anaysis where you convert raw text(stream of characters) into tokens

Create relationship between tokens

To evaluate it efficiently we can create a tree like structure to define relationship between different expression like this:

This is called AST (Abstract syntax tree) and this gets by applying rules you define in your grammar on input text. Once you have the AST, to evaluate the expression, we need to traverse or ‘walk’ it in a depth first manner. We start at the root ‘+’ and go as deep into the tree as we can along each child, then evaluate the operations as we come back out of the tree.

We will now setup the tools and try creating a simple grammar.

Setup

IDE

ANTLR provides a GUI based IDE for developing grammar. You can download it from http://www.antlr3.org/works/. It combines an excellent grammar-aware editor with an interpreter for rapid prototyping and a language-agnostic debugger for isolating grammar errors.

Add this in ~/.bashrc to be able to directly call antlr4 and grun command from anywhere.

Creating grammar

A grammar will consist of 2 parts

Lexer

Parser

Both of these can be defined in same file, but for maintainence sake its better to define it in separate files. Lets create lexer and parser for a DSL which will allow basic arithmetic operations on 2 numbers. Some valid inputs will be:

123456

127.1 + 2717
2674 - 4735
47 * 74.1
271 / 281
10 + 2
10+2

Lets first define lexer definitions in a file named ArithmeticLexer.g4 to extract tokens from input:

Antlr4 provide 2 ways to walk the AST - Listener and Vistor. Antlr doesn’t generate sources for visitor by default. Since we will be using visitor pattern while using it in scala to avoid mutability, so lets generate visitor source too. It can be done by providing visitor flag, like below:

Using generated sources in code

We will now see how to extend generated interfaces and use it from within code. As I mentioned above, antlr4 provides 2 ways to walk the AST - Visitor and Listener. We will first see how to use the listener pattern. Although listener method is commonly used by java devs, but scala folks will not like it because it can only return unit, hence you need to use intermediate variables leading to side-effects. Refer to this post for a comparision between two patterns.

For every rule which we defined in ArithmeticParser.g4, it created a enter and exit method. Since we had 2 rules, expr and operation, so it created 4 methods. As name implies, these will get triggered every time walker enters and exit a matched rule. For now lets focus on entry method of our starting rule expr. This problem can be solved by using visitor instead of listener as discussed in this post.

1

@OverridepublicvoidenterExpr(ArithmeticParser.ExprContextctx){}

Notice that every rule has a context which has all the meta information as well as matched input info. Also note that all methods return void which means you need to use mutable variables to store computational values if they needs to be shared among different rules or even by main caller.

So now we create our own class by extending ArithmeticParserBaseListener and implement enterExpr rule.