Mathematics , Microcode, Manufacturing, and Musings

An Incremental Approach to Compiler Construction

The first ‘real’ language I tried learning was C. I had written extensively in BASIC throughout middle school, but never felt satisfied. It was too far removed from the hardware, too distant from the ‘true’ nature of the computer. C, on the other hand, seemed like a backdoor left open by some inattentive operator, with direct access to the control room.

I purchased a copy of “C – The Complete Reference” at a local bookstore, my first technical manual.

I believed the tome to be an ancient wizard’s codex containing all the knowledge needed to master C programming, and by extension the computer. I was unprepared for the depth of technical background and vocabulary required to put the book to use. The arcane, almost alien, language consisted of symbols jumbled in random order, indirect references to objects of unknown identity lost in translation to the English language.

Simply put, “C – The Complete Reference” was not a good introductory programming guide for middle school students.

I turned instead to Pascal, the ‘native’ language for my new computer – a Macintosh 512K aka “fat mac.” Apple’s API toolbox, the ROM itself, used the Pascal calling conventions. It would be some years before I returned to C, this time with little doubt I could master it.

Compiler Construction

As a first stab at compiler construction, let’s try An Incremental Approach to Compiler Construction by Abdulaziz Ghuloum. He describes the “back-to-front” technique of compiler construction. I was introduced to this technique by my coworker Jonathan Sobel, also from IU. Jonathan says it’s a valuable technique because you “always have a working compiler,” so if you make a mistake you can just back up a step.

Roughly, the paper suggests writing small programs using an existing compiler’s back-end to generate code. By examining the code and abstracting, you can generate the same code. Repeating this process for each of the semantic concepts in your language allows you to build up a collection of code-generation templates.

Let’s see if we can extend this technique to dynamic compilation!

Fixnum Compilation in C

Total-functional languages are very similar to Haskell’s pure-functional semantics, and I’m partial (heh!) to its syntax and runtime, so I’ll be using Haskell as the implementation language. I’m writing this on a Macintosh with an x86-64 CPU, using clang instead of gcc, but I don’t foresee any roadblocks after reading the paper. I created a file “fixnum.c” which contained the “return 42” example, and compiled it with clang:

Here we’ve put some test expressions in a list and then mapped the compile function over the list. Unlines just glues those strings together with newlines. I prepend the label “_main” to the assembly so clang will accept it.

Next Steps

We now have the framework for a native code generating compiler written in Haskell. In the next post I’ll go through some initial dynamic-compilation concepts, then we’ll get back to the paper to add more language constructs.