In this post we will start working on a very simple expression language. We will build it in our language sandbox and therefore we will call the language Sandy.

I think that tool support is vital for a language: for this reason we will start with an extremely simple language but we will build rich tool support for it. To benefit from a language we need a parser, interpreters and compilers, editors and more. It seems to me that there is a lot of material on building simple parsers but very few material on building the rest of the infrastructure needed to make using a language practicaland effective.

I would like to focus on exactly these aspects, making a language small but fully useful. Then you will be able to grow your language organically.

./gradlew generateGrammarSource to generate the ANTLR lexer and parser

Implementing the lexer

We will build the lexer and the parser in two separate files. This is the lexer:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

lexer grammar SandyLexer;

// Whitespace

NEWLINE:'\r\n'|'r'|'\n';

WS:[\t]+;

// Keywords

VAR:'var';

// Literals

INTLIT:'0'|[1-9][0-9]*;

DECLIT:'0'|[1-9][0-9]*'.'[0-9]+;

// Operators

PLUS:'+';

MINUS:'-';

ASTERISK:'*';

DIVISION:'/';

ASSIGN:'=';

LPAREN:'(';

RPAREN:')';

// Identifiers

ID:[_]*[a-z][A-Za-z0-9_]*;

Now we can simply run ./gradlew generateGrammarSource and the lexer will be generated for us from the previous definition.

Testing the lexer

Testing is always important but while building languages it is absolutely critical: if the tools supporting your language are not correct this could affect all possible programs you will build for them. So let’s start testing the lexer: we will just verify that the sequence of tokens the lexer produces is the one we aspect.

Conclusions and next steps

We started with the first small step: we setup the project and built the lexer.

There is a long way in front of us before making the language usable in practice but we started. We will next work on the parser with the same approach: building something simple that we can test and compile through the command line.

The ANTLR Mega Tutorial as a PDF

Get the Mega Tutorial delivered to your email and read it when you want on the device you want

Success! Now check your email to confirm your subscription.

There was an error submitting your subscription. Please try again.

First Name

Email Address

We use this field to detect spam bots. If you fill this in, you will be marked as a spammer.

Regexp based lexers tend to be very slow. Your compiler may spend 90% of its time in the lexer. If you create a hand-made lexer you will get dramatic speed up in your compiler and in usual programming languages you usually do not use radically different lexers.

Hi Peter, thank you for your comment. In this case my goal was to build a language quickly so I did not pay much attention to performances. It is true that if the language we built get some traction and we start to use it at a scale that makes performance relevant it could make sense to take another look at the lexer implementation. That said ANTLR generate lexers which are decently fast, for most goals I have. I have written a few times a lexer manually but it was mainly for particular tasks (e.g., extracting all the comments from a Java file). I am not sure about the maintainability of hand-written lexers for large languages. Did you have any experience with that?

[…] a language. The syntax highlighting underline will be formed on a ANTLR lexer we have built in a first post. The formula will be in Kotlin, however it should be simply automobile to Java. The editor will be […]

[…] our language. The syntax highlighting feature will be based on the ANTLR lexer we have built in the first post. The code will be in Kotlin, however it should be easily convertible to Java. The editor will be […]

[…] our language. The syntax highlighting feature will be based on the ANTLR lexer we have built in the first post. The code will be in Kotlin, however it should be easily convertible to Java. The editor will be […]

[…] our language. The syntax highlighting feature will be based on the ANTLR lexer we have built in the first post. The code will be in Kotlin, however it should be easily convertible to Java. The editor will be […]