Oracle Blog

java javac compiler antlr tools compiler front end

Generating java byte code by building AST trees

There are several ways to generate byte code to run on JVM. You can
write a .java file then compile it with javac, or write ASM similar code
directly then compile it with tools like
JASML, or with tools like
BCEL it is even possible to generate your
own class in runtime. Aside from these, a quite interesting approach is to construct AST nodes representing the structure of the java code, then generate byte code from that. Actually, this is what the javac does.

When javac compiles .java files into .class files, there is a two step process involved.
First is parsing. Javac reads in the source code, parses it, builds a tree structure representing the source code.
Second is code generation. The code generator takes the tree, acts upon it, produces the .class file.

These two steps are quite independent from each other, which makes it
possible to replace either of the two without affecting the other.
So, to achieve our goal, we can create a tree ourself, then hand it to the code generator to generate code.
This is actually not a difficult task. The javac is very decently
implemented, with a very clear separation between the two steps.

The javac source is located on the OpenJDK langtools repository, which
hosts a series of tools like javadoc, javah etc.. Go to this
link for more detail about
langtools. If you have Mercurial installed, check out the code from
http://hg.openjdk.java.net/jdk7/jdk7/langtools, or if not, you can
download an archived copy from this link as well. After you get the
source, try to build and run it to make sure it works properly. Refer
to this link for
how to do this.

After getting the code, let's try to make a very simple javac tree for this file [Test.java]

The names of the parameters are quite self explanatory. And most of
them are not really needed here, like the third one- the type
parameters. We are not using generics, so just leave it to be a blank
list.

Two things worth noting here are:

First, the List here is not of java.util.List. It's instance of com.sun.tools.javac.util.List.

Second, the parameter name is a special class for storing identifiers
string in the parser. Refer to the attached source file for how to
constructing it. For now, just think it as a String representing the
name of the class.

So, as you can see, it is pretty easy to builds a tree for our simple
class. I won't dwell on how to make the rest of the tree nodes, refer
to DummyTreeMaker.java for details, which creates the tree matches Test.java.

Then after building the tree, next thing to do is to make the code generator to generate code for us.

However, javac is not designed to let you do this and there is no easy way to achieve this without some inelegant hacking.

What we want to do is to add an -XDxtest=true option, so when you invoke the javac against any file, it will still verify the existence of the file, but not try to parse
it, rather, it takes the tree we build, thinks it as the product of the
parser, hands it to the code generator, then write the class file to
the disk.

Then build the workspace, create a blank file called Test.java, try to compile it with the javac you just built, like

javac -XDxtest=true Test.java

Run the generated file and you'll see "Hello!" printed on the screen.
As you can see, javac takes a blank java file, but uses our tree to generate code.

This is a pretty simple example, and is pretty much how javac parses your source file, although the real process is a little more complicated because the javac parser also has to generate line-info, process javadoc, do error report etc..

Now imagine we have a grammar, like this one in the Java Language Specification, and we embed java code into the grammar calling different method in TreeMaker to build different AST nodes as the grammar recognize different constructs. Then we have an automated parser doing the same thing as the javac. This is how the Compiler Grammar project works -- with the help of Antlr, a automatically grammar-generated parser building the same kind of AST trees as javac does. Refer to my previous post on how to build and run the Compiler Grammar project. What's more interesting, once we have this grammar, it can be used more than building trees -- code formatting, code translation etc. all made possible, just use some imagination :)

Download the source files:

DummyTreeMaker.java This need to be put under src/share/classes/com/sun/tools/javac/parser/DummyTreeMaker.java

JavaCompiler.java Replace this one with the file with the same name located under src/com/sun/tools/javac/main. This file may not compile in the future, as the langtool source code is changed very often. If so, just locate the file and make changes as decribed above.

Nice so see an example of the data structures of javac in DummyTreeMaker.java. But unfortunately the type com.sun.tools.javac.util.Names is not available under Java 6 in tools.jar. ParserFactory make problems too. Would be nice if you update your example. Best regards, Chris