“Wait, why are you dancing around the F-word?”

I have clients rather than a boss and, as you probably know, offending a boss is different from offending a potential client.

I’m already offending with substance and don’t want to press my luck.

Because of one of my favorite movies of all time.

Do you see what happens?

Let’s describe this fabulous computational device. Start with a Harvard Architecture. Add an array of cells for the memory. And we’ll throw in a pointer to the current cell. For the instruction memory, let’s say we have eight instructions. Those instructions:

+ increments the value in the current cell.

- decrements the value in the current cell.

> increments the memory pointer by one cell.

< decrements the memory pointer by one cell.

. outputs the value of the current cell as a character.

, reads one byte of input and stores it in the current cell.

[ jumps to the matching ] if the value in the current cell is zero.

] jumps to the matching [ if the value in the current cell is not zero.

Do you see what happens, Larry?

Okay, we have a very simple language with a very simple instruction set and semantics that are easy enough for any programmer to understand. It’s no Nock, but it’ll do. Let’s take a first pass at implementing it in Limbo.

This is what happens.

A straightforward implementation would use eight instructions, so let’s start there. As a very minor optimization, we’ll add an EXIT instruction to stop the program. We’ll need to keep track of some state (the cells, the instruction pointer, and the memory pointer). Also, we need to…nope, that’s it. It’s a really simple language!

The little execute() funcion takes an array of integers representing our instructions, and an array of bytes representing the program’s memory. pc is the instruction pointer (“program counter”), p is the memory pointer, if you couldn’t guess by the names. Then a big loop with a switch in it, and one case for each of our instructions, plus an EXIT instruction. Very nearly as simple as it gets; it’s almost embarrassing to do any explanation. One item of note is that we keep the jump addresses inline with the instruction stream. (It does have the single dispatch problem, where we frustrate any attempt at branch prediction, but I’m certain we can deal with a somewhat less performant implementation of +++++++++++[->++++++<]>.++++. for now.) How do we get there from ++++++++++[->++++++++++>+++++++++++<<]>++++.---.>++++.<., though?

This is what happens, Larry.

Parsing a program is, in general, a non-trivial task. Know what’s nice about a language like ++++++++++[>++++++++++>>>+<<<<-]>[-<+>>+>+<<]>>>[<+<->>-]<[<<+>>-]<<<--.>++++.<-.++++++++.>----.[>>+>+<<<-]<---.>>+.<<---.++.>>>.+++++.>+.<-.<<<.-.>>++.? There’s nothing to parse. No tokenizer, no lexer, no AST. (Well, no real tokenizer or lexer.) A simple loop over the program that emits the appropriate instruction will do:

Takes a program as its argument, returns an array of instructions. As noted before, we store the addresses for the jumps inline. We also keep a stack of addresses (marks, borrowed from Forth) When we hit a [ (JZ), we leave the address field blank and store the address’s address on the stack to be resolved when we hit the matching ] (JNZ). (If you’re reading carefully, you’ll spot a couple of ridiculous bits in the branching, but we’ll fix them shortly.)

So, there’s a VM and a compiler for it, and we’ve hit exactly 70 lines of code, with no black art. It’s a trivial runtime for a trivial language, though. (But really, we can read standard input and write standard output and compute in the interim, so you shouldn’t need anything else, right? Right?) Now what happens?

This is what happens when you find a stranger in the Alps.

Let’s talk a little about AWIB. It’s quite an achievement: it’s a polyglot program in Tcl, C, bash, and BF. On top of that, it’s a compiler for BF. On top of that, it emits optimized code. On top of that, it has five backends for code generation: Linux/386 machine code (a statically linked ELF with no symbols), Tcl, Ruby, Go, and C.

So, given that it’s written in the language that it executes (among other languages), you can use it to compile itself to machine code. Pretty cool, right? But it doesn’t have a frontend that works under Inferno, unfortunately. (Sort of; there’s an implementation of Tcl, but it doesn’t run AWIB as-is.) But if we use our little VM to run AWIB, then we can use it to compile itself. That’s a bit less than satisfying, though.

We’ve got a small set of instructions, and the code to execute them in Limbo is trivial. Let’s revise the code a bit, and then just go crazy with it. First, we’ll replace INC, DEC, INCP, and DECP with two other instructions: ADD and ADDP, and we’ll have them take integer arguments the same way that JZ and JNZ do, then we’ll adjust compile() and execute() accordingly:

So, we do one character of “look ahead”, and generate somewhat more compact code (depeinding; if you want to complicate things, you could re-add the four old instructions and have it emit, e.g., INCP when ADDP 1 would be emitted by this code).

We’ve also fixed up JZ and JNZ somewhat so that we don’t need to do math on the address we load into pc when we jump.

And then copying that to an x86 or x86-64 Linux machine (by some means) and chmod’ing it 755 (or 700 if you are embarrassed), you can verify that we have indeed written a Limbo program that we used to create another Limbo program from a BF program and then used that program to produce a valid executable:

$ ./program
BF

Oddly enough, I couldn’t convince GNU’s objdump to disassemble it, despite trying some options that seemed like they ought to work and then a few dozen other permutations of objdump’s options. ndisasm had no trouble and only required a glance at the man page, so if you’d like to disassemble it and you have nasm, ndisasm -e 84 -a ought to work.