NoteCinC developer Nicola Momettopointed out that the analyzer written by Ambrose and CinC are indeed different projects. Which I should’ve noticed myself since
the analyzer by Ambrose uses the analyzer from the original Clojure compiler, which is exposed as a function. Part of my mistake was surely derived from the fact one is called tools.analyzer.jvm and
the other one is called jvm.tools.analyzer

Compilation process

One of supposed advantages of Lisp-like languages is that the concrete syntax is already the abstract syntax. If you’ve read some of the foguswritings about Clojure compilation tough, he has some opinions on that statement:

This is junk. Actual ASTs are adorned with a boatload of additional information like local binding information, accessible bindings, arity information, and many other useful tidbits.

Bytecode has no information about macros whatsoever, emitted bytecode corresponds to what you see with macroexpand calls.
Since macros are expanded before analyzing, you shouldn’t expect to find anything about your macro in the compiled bytecode, nada, niet, gone.

Meaning, we shouldn’t expect to be able to properly decompile macro’ed stuff either.

Compile vs. Eval

As said on the first post, the class file doesn’t need to be on disk, and that’s better understood if we think about eval.

When you type a command in the REPL it needs to be properly translated to bytecode before the JVM is able to execute it, but it doesn’t mean the compiler will save a class file, then load it, and only then execute it.

It will be done on the fly.

We will consider three entry points for the compiler, compile, load and eval.

The LispReader is responsible for reading forms from an input stream.

Compile Entry Point

compile is a static function found in the Compiler.java file, member of the Compiler class, and it does generate a class file on disk for each function in the compiled namespace.

For instance it will get called if you do the following in your REPL

1

(compile'clojure.core.reducers)

Clojure function just wraps over the Java function doing the actual work with the signature

The reader

Languages with more complicated syntaxes separate the Lexer and Parser into two different pieces, like most Lisps, Clojure combines these two into just a Reader.

The reader is pretty much self contained in LispReader.java and its main responsibility is given a stream, return the properly tokenized s-expressions.

The reader dispatches reading to specialized functions and classes when a particular token is found, for instance ( dispatches to ListReader class, digits dispatch to the readNumber function and so on.

Much of the list and vector reading classes(VectorReader, MapReader, ListReader, etc) rely on the more generic readDelimitedList function which receives the particular list separator as parameter.

This is important because the reader is responsible for reading line and column number information, and establishing a relationship between tokens read and locations in the file.

One of the main drawbacks of the reader used by the compiler is that much of the line and column number information is lost, that’s one of the reasons we saw in our earlier post that for a 7 line function only one line was properly mapped, interestingly, the line corresponding to the outter s-expression.

We will have to modify this reader if we want proper debugging information for our debugger.

The analyzer

The analyzer is the part of the compiler that translates your s-expressions into proper things to be emitted.

We’re already familiar with the REPL, in the eval function analyze and emit are combined in a single step, but internally there’s a two step process.

First, our parsed but meaningless code needs to be translated into meaningful expressions.

In the case of the Clojure compiler all expressions implement the Expr interface:

Only notice that no matter if code is eval’ed or not, JVM bytecode will be generated.

What’s next

One of the reasons I ended up here when I started working on the debugger was to see if by any means, I could add better line number references to the
current Clojure compiler.

As said before and as we saw here, the Java Clojure Compiler is not exactly built for extensibility.

The option I had left, was to modify the line numbers and other debugging information at runtime, and that’s what I will show you on the next post.

I will properly synchronize Clojure source code with JVM Bytecode, meaning I will synchronize code trees, that way I will not only add proper line references, but I will know
which bytecode corresponds with which s-expression in your source.