A Java to x86 compiler written in F#

A little personal information about me, I’m currently working on a masters degree at the University of Washington. And a few weeks ago I completed taking a course inCompiler Construction, where the course project was to build a simple Java compiler that generated x86 executables.

While most students chose to write their compilers in Java or C#, my partner and I wrote ours in F#. Though I won’t be publishing the source code - since it would be too tempting for other students using Appel’s Modern Compiler Implementation in Java – I would like to point out where F# makes the task of writing a compiler much easier than say, C#. F#’s superiority in this domain can largely be attributed to discriminated unions, as you will soon find out.

Abstract Syntax Tree, Parsers, and Lexer

The first phase of the compiler was perhaps the easiest to implement in F#. Rather than representing some complex object hierarchy and use polymorphism, the AST can be represented as a series of discriminated unions.

Once we were able to successfully lex and parse source code, we moved onto type checking and generating a symbol table. Like most compilers, we simply used the Visitor design pattern. Which if implemented in C# requires some funky notion of double dispatch. That is, the object being visited has a method called ‘visit’ or ‘accept’ which is given a parameter of type Visitor, then, the object being visited calls ‘visit’ on the visitor passing ‘this’ as parameter.

The problem with this is that it adds a lot of lame boiler plate code. For example, from the Wikipedia article each ‘part’ needs to implement the ‘accept’ method, which in turn calls the ‘visit’ method on the visitor. Via method overloading, the right visitor method will be called.

class Engine {

publicvoid accept(Visitor visitor){

visitor.visit(this);

}

}

class Body {

publicvoid accept(Visitor visitor){

visitor.visit(this);

}

}

In F# with discriminated unions we can write a single ‘walker’ for the discriminated union types, and keep all the visitation code in one place. (Rather than decorating each node.) This keeps the code much simpler.

Once the list of instructions has been generated, we then needed to build executables. To link them you could use ml.exe and cl.exe, Microsoft’s C/C++ compiler and linker. But since requiring a second step is lame, we embedded the files into .RESX files and exported them at runtime. While the F# compiler supports this easily, we actually did this part in C# so that we could take advantage of Visual Studio’s Managed Resource Designer.

///<summary>

/// Takes the given assembly file and compiles it using the hard-coded boot loader code.

///</summary>

publicstaticstring LinkAsmFile(string asmFile)

{

try

{

// First, dump ML and boot.obj to disk.

string tempPath = Path.GetTempPath();

string mlExeFile = Path.Combine(tempPath, "ml.exe"); ;

// Extract the binary resources embedded in this DLL and write to file.

Sounds good, but notice that to run our F# compiler you need F#MiniJava.exe (the program), MASMResources.dll (C/C++ linker), and FSharp.Core (F# libraries). That all sounds lame. Fortunately F# has two awesome command line flags that will essentially embed any required assembly references into your executable:

--standalone:

Statically link the F# library and all transitively referenced DLLs

that depend on the F# library into the assembly being generated.

Disables the generation of the F# interface and optimization

resources and renames all types beginning with Microsoft.FSharp,

making them private to the target assembly. There must be no

duplicate type names across any statically linked

assemblies, no code may use reflection on any of the

types in those assemblies, and no code may export F# library types

as part of an API.

--static-link <string>:

Statically the given assembly and all transitively referenced DLLs

that depend on this assembly.NOTE: use an assembly name e.g. mylib

, not a DLL name, e.g. mylib.dll.

(Note: many bad things can happen if you do this, from security vulnerabilities to subtle codegen errors. These flags may be removed in a future release of F# and use at your own risk.)

Having taken an undergraduate course in compilers I found the course material to be challenging, but when it came to the project implementation F# made it a breeze.

The –standalone flag is percisely for just that, allowing you to simply deploy an executable and have people be oblivious to the fact it was written in F#.

You may recall Visual J# required a seperate runtime component, which was a pain in the neck for a lot of people. Ideally, the F# core libraries (FSharp.Core.dll, FSharp.Compiler.dll) would be included in the next release of the .Net framework so everything ‘just works’.

If you do apply the –standalone flag however, be aware that you are giving up the ability to service your application. For example, if there is a security hole in System.dll, you can simply GAC the new one and all your applicaions are now patched. However, by baking everything into your app you ‘carry around’ those security holes with you.

But for simple apps you are sharing with your colleagues it does simplify things.

Excellent post. I’ve been playing with F# for a month now. Currently I’m implementing a compiler too in order to get the feel of functional programming. I was wondering how did you managed annotate the AST with semantic information, since DUs do not support seamless extension,in other words adding a new union element breaks existing code. Although it may be possible to compute these kind of information (e.g. symbol table for the current scope) on demand they don’t persist.

Maybe the decoration of the AST nodes is not a good idea at all, I’m just stuck with the common OO ideas.

The AST did not contain any additional information, for example line/column spans for the code where the element was declared.

In the implementation I went with I just had a seperate SymbolTable to lookup additional information. While I agree associating information with AST nodes directly is helpful, it erodes the simplicity that discriminated unions offer.