Integrating ANTLR without learning Ant

This tutorial explains how to integrate the ANTLR parser-generator into a NetBeans Java SE project. It provides easily-understood changes to the Ant configuration and a supplementary file. The recommended changes need no customisation when used in standard projects: you can just copy them. This tutorial was developed with NetBeans 6.9.1, and has been tested under NetBeans 7.0.

Introduction

ANTLR is a tool for constructing recognisers, interpreters, compilers, and translators from descriptions in a variety of target languages, including Java. In order to build an application using components defined in the grammar, the ANTLR compiler has to be invoked to generate Java code, before compilation with your other classes.

The NetBeans IDE gives each Java SE project a sophisticated Ant build script that is invoked by actions in the IDE. Thanks to this standard build configuration, you can develop quite complex Java projects without knowing how to specify them in Ant. Guidance on the ANTLR site shows how to include ANTLR in an Ant build, but requires a complex addition to the Ant configuration file (build.xml), and the installation with optional "antlibs". The NetBeans build files already support a code-generation stage.

This tutorial shows you how to get ANTLR included in your project, using a plain installation of NetBeans Ant, and without the burden of maintaining a project-specific Ant configuration file. The approach recommened here lets Ant find your grammar files in a standard location and allows ANTLR to take care of the dependency order. The changes offered here can be pasted into your build.xml without modification, and the rest of the script is encapsulated in a file you need not modify.

A related article by Nick Krasilnikov introduces the Antlr Editor Support Plugin. The plug-in may be used in conjunction with this tutorial, or not as you choose. The plugin gives you syntax colouring and code completion when editing your grammar (.g) files. (Make sure you find a version that is correct for your version of NetBeans.) At the time of writing (May 2011) Nick's tutorial is presented for NetBeans 6.5 and uses a project-specific build configuration. Also at this time, there is a version of the editor plug-in for NetBeans 6.9.1, but not for NetBeans 7.0.

Worked Example

This tutorial demonstrates a complex grammar involving four source files, adapted from an example on the ANTLR site. The example has been modified so that the grammar source files specify several different Java packages. In the original example, all files generated code into the default package. It is not
necessary to understand the grammar to follow the tutorial.

You can open the project in NetBeans and poke around, or follow the steps below, which will take you through creating your own project with the same source.

Summary of Steps

The tutorial will guide you through these steps:

Create a Java Application project.

Create a your grammars in a subdirectory.

Tell the IDE about the grammar subdirectory

Tell the IDE about the ANTLR jar

Add the supporting Ant script

Add three lines to your build.xml

Clean and Build

Run

Even though you are not using the downloaded project, you'll need several files from there, so have the directory of the downloaded project open in a file explorer.

Create a Java Application project

Create a Java Application project, that is, a new Java SE application with the IDE-generated Ant build script. Name the project "Poly Diff", and call the Main Class "mypackage.poly.diff.Main".

Open the Main.java file that NetBeans generated and replace its contents with the contents of the corresponding file Main.java in the downloaded project. You will find that a great many errors are highlighted in the source window. This is because the Main class refers to entities that are not yet defined. Either they are components of the ANTLR runtime, which we have not told the IDE about yet, or they are classes that will be generated from the grammar.

Create your grammars in a subdirectory

In the Files tab of the IDE, or otherwise, create a subdirectory of your new project called "grammar". (It has to be that name exactly.) Copy the grammar sources from the corresponding locations in the downloaded project into the grammar subdirectory of your new project, making subdirectories as you go, so that you have the following file structure:

grammar/mypackage/poly/Poly.g

grammar/mypackage/poly/PolyPrinter.g

grammar/mypackage/poly/diff/PolyDifferentiator.g

grammar/mypackage/poly/simple/Simplifier.g

You can accomplish this whole step in one go by copying the grammar directory from the downloaded project to the new project root in the Files tab.

Tell the IDE about the grammar subdirectory

In the Project Properties dialog, and the Sources category, add the "grammar" folder to the "Source Package Folders" as shown.

Tell the IDE about the ANTLR jar

In the Project Properties dialog, and the Libraries category, on the Compile tab, use the Add JAR/Folder button to add the ANTLR "complete" JAR to the compile-time libraries. You do this by navigating to the location where you have installed it. (Mine is at C:\antlr\antlr-3.3-complete.jar.)

If you look at Main.java in the IDE editor now, you should find that some of the red ink has gone. The remaining undefined classes are those that will be generated from the grammars.

Optional step: add source and/or Javadoc

If you have downloaded the source and/or Javadoc for the ANTLR support classes, you can tell the IDE about them in this dialog. Select the JAR file in the dialog above, and press edit. You can then navigate to the source and Javadoc folders. This will enable the "Navigate to source" feature of the IDE editor to take you to relevant sources. (Make the source files read-only so you don't accidentally edit them.)

Add the supporting Ant script

In the Files tab of the IDE, or otherwise, navigate to the "nbproject" subdirectory. (Amongst other files, there will be a file there called build-impl.xml that contains the logic of the compile and build processes invoked by the IDE.) Add to this subdirectory the file build-antlr-impl.xml from the corresponding location in the downloaded project. This new file provides the extra logic for builds that include ANTLR. You don't need to edit any of these files.

Add three lines to your build.xml

In the Files tab of the IDE navigate to the project directory and open the file build.xml from the corresponding location in the downloaded project. This file controls the build, but by default it delegates every action to nbproject/build-impl.xml. The file begins like this:

This is short enough to type, but for the sake of accuracy, copy them from this page, or from the build.xml file in the downloaded project. In the downloaded project, there are more than three lines because of the explanatory comments.

Clean and Build

You could simply clean and build the application now, but it is instructive to see what the first couple of steps in the build process achieve. Give build.xml focus in the editor. The Navigator panel should show a series of Ant targets. Right-click the "clean" target and select Run Target. This just deletes any build directories for a clean start.

Right-click the "antlr" target and select Run Target. This runs the ANTLR compiler to generate the Java code for the Lexer, Parser, and other modules. Two important changes will have occurred:

A new source root has appeared in the Projects tab under Poly Diff, and called "Generated Sources (antlr-output)". Expand this root and its packages to find the Java source files that ANTLR generated.

In the editor pane for Main.java, the red-ink should be entirely gone. The Parser and Lexer classes that Main references are now known to be defined. If you were to try some simple modifications, you would find that the code completion mechanisms know what methods and fields these classes have.

From the IDE menus, select Run>>Build Main Project (Poly Diff). In the Output window, you will see the build proceed. The ANTLR compiler is called again, but recognises that it has already generated the sources (assuming you have not actually edited any of them).

The Java compiler will probably warn you about the unchecked use of container types. This is expected behaviour stemming from the generated code, which does not use generics in a manifestly type-safe way. There seems not to be a way to suppress the warnings in their particular locations. For this reason, the IDE may mark certain source packages as containing errors, even though it cannot tell you in what file the problem is. (Observed in 6.9.1, but seems fixed in 7.0.)

Run

You are now almost ready to run the application. You simply need to arrange some input for it to work on. Create a text file in the project root directory called "input", with the following content:

x^3 + 6x^2 + 12x + 8

In the Project Properties dialog and the Run category, check that the Main Class is "mypackage.poly.diff.Main". In the Arguments field type "input" (which is the filename). Now, run the application from the IDE. You may have to dismiss several warnings that say you are running code that compiled with errors. You should see the following in the Output window.

javadoc you generate includes the classes generated from the grammars. Of course, there is no explanatory text unless you can include it somehow via the grammar files.

code-completion understands the generated source (after building target "antlr" and until the next "clean").

the antlr library JAR file is included in the distributable output (the IDE build dist subdirectory).

Under the Covers

The main point of the approach described in this tutorial is that you don't have to understand Ant to apply it. However, perhaps you will need to change options on the call to ANTLR, or need understand how it works. If you don't want to download the example project, this can be a source for your own project's build file.

The supplementary Ant script

<?xml version="1.0" encoding="UTF-8"?>
<!--
====================================================
Extension of the NetBeans Java SE Ant build to ANTLR
====================================================
-->
<project name="build-antlr-impl">
<!--Target to call when just the ANTLR output is needed.-->
<target name="antlr" depends="init,-do-antlr"
description="Process the grammar files with ANTLR." />
<!--Property definitions here will precede reading various properties
files, and therefore take precedence.
-->
<target name="-antlr-pre-init">
<!--Cause tools (javac, javadoc) to include generated sources.
do.depend comes from file nbproject/private/private.properties .
Maybe it is owned by a property sheet somewhere in the IDE,
and we ought not to override it, but where?
-->
<property name="do.depend" value="true"/>
</target>
<!--Execute the ANTLR processing of the grammar directories. This results
in generated code in "${build.generated.sources.dir}/antlr-output".
Token files are written to that exact directory. Java files are
written to package folders below that root, according to the
location of the .g file below "${src.grammar.dir}". This location
comes from a properties set by the IDE. The arrangement of .g files
must correspond to the Java package statements they contain.
-->
<target name="-do-antlr">
<!--Where the grammar files actually reside. (This can be the root of
a tree structured according to *destination* Java packages.)
src.grammar.dir is set in the IDE's project properties file as a result
of naming the grammar sub-directory as a source.
-->
<property name="antlr.src.dir" location="${src.grammar.dir}"/>
<!--Destination for generated Java files.-->
<property name="antlr.generated.dir"
location="${build.generated.sources.dir}/antlr-output"/>
<mkdir dir="${antlr.generated.dir}"/>
<!--Compose file list to pass to ANTLR.-->
<!--Method here to deal with paths that contain spaces. Credit to
stackoverflow.com question 2148390-->
<pathconvert property="antlr.src.list.0" pathsep="' '" >
<!-- Make a list of all the .g grammar files in the tree.-->
<fileset dir="${antlr.src.dir}" >
<include name="**/*.g" />
</fileset>
<!--Trim the names to specifications relative to the grammar base
directory.-->
<mapper type="glob"
from="${antlr.src.dir}${file.separator}*.g"
to= "*.g" />
</pathconvert>
<!--Last bit of dealing with paths that contain spaces-->
<property name="antlr.src.list" value="'${antlr.src.list.0}'"/>
<!-- -->
<echo>ANTLR will translate ${antlr.src.list}</echo>
<echo>working relative to ${antlr.src.dir}</echo>
<echo>and generate files in ${antlr.generated.dir}</echo>
<!--
<echoproperties prefix="build" />
<echoproperties prefix="javac" />
-->
<!--Implementation using the ANTLR3 task does not accept multiple
source files. So use the java task. When grammar files are
identified by relative paths, the ANTLR Tool produces corresponding
package-structured output. In this call, ANTLR runs with the
grammar base directory as the current directory.
-->
<java classname="org.antlr.Tool" fork="true" dir="${antlr.src.dir}"
failonerror="true">
<arg value="-verbose"/>
<arg value="-report"/>
<arg value="-make"/>
<arg value="-o"/>
<arg path="${antlr.generated.dir}"/>
<arg value="-lib"/>
<arg path="${antlr.generated.dir}"/>
<arg line="${antlr.src.list}"/>
<classpath>
<!--IDE will have included ANTLR jar here-->
<pathelement path="${javac.classpath}"/>
</classpath>
<jvmarg value="-Xmx512M"/>
</java>
</target>
</project>

Early versions of the script encountered problems with file paths containing space characters. The current version of the script overcomes that. It makes use of Ant features that insulate you from differences between platforms (principally the path and file separator characters), however, it has only been tested on a Windows 7 system.

Dependency Resolution

The grammars in this example refer to each other, so there is a dependency order that, in other examples available on the web, has to be specified explicitly in the Ant build. In this implementation, all the grammar source files are submitted together and ANTLR seems to be able to work out the dependency order. The submitting of multiple files at once is the main reason why the Ant antlr task (see ANTLR site) is not used: it only takes one file at a time.