a blog about doing what it says in the title

Hacker School, Weeks 4, 5 and 6

2013 July 14

by Richard Harrington

tl;dr It’s been a bit of a whirlwind these past few weeks, but I’m back on track with my Robotwar project.

Before the end of the 4th week of Hacker School, I went to Ottawa, Ontario for five days to perform in a comedy I wrote with my friend Chris Kauffman, at a theater festival there (see harringtonkauffman.com). It was a lot of fun and I was able to reconnect with some old friends from the Canadian fringe theater circuit, a part of my life that seems very far away this summer. But I had to spend more time preparing for it than I expected, since I hadn’t done the show in two years.

I returned on Monday, July 1, and the 4th of July was the following Thursday, so my 5th week of Hacker School ended up being only two days long.

Because of all this, I got a little off my stride in terms of the big project I was theoretically going to work on over the course of the Hacker School batch: the reverse engineering of the Robotwar game. I started to feel a bit adrift again, as I had in the beginning of June. I wasn’t sure I was appropriately equipped to thrive in the unstructured environment of Hacker School, and I was nervous that I wouldn’t accomplish a tenth of what I’d hoped to accomplish. By the end of the 5th week, I had only barely gotten started on Robotwar.

But late in the 6th week (which just ended), I got my second wind with the help of a long pairing session with one of the facilitators, Zach Allaun, who helped me tease apart the various steps in the compilation process for the Robotwar language, making each one more sensible.

The end result of the compilation step (which itself is only the first of many things I have to do to make a functioning game) will be to turn this:

And that begins with lexing. For those of you (like myself before this summer) who have never heard of the term "lex," it means "lexical analysis." It's the first step in the process of translating human-written source code into machine-readable code, and it involves separating the strings of source code into words and other meaningful chunks. It's also the most notoriously difficult and tedious step, because a lot of syntactic forms look obvious to us but are baffling to a computer.

This Robotwar language is actually pretty simple in that respect -- the only tricky part is determining whether "-" characters are binary subtraction operator or unary minus sign operators.

So here's the version of the lexer I had before I worked with Zach. It separates strings into tokens and also adds an index property to each token, which will be useful information later when we add error reporting to the program. Note the nasty minus sign logic, which checks to see whether the previous token is a numeric value (either a number literal or a register name).

Also, note the use of the core.match module to do pattern matching. Since it's not binding any variables in the matches, one could argue core.match is overkill here, but I feel it does prevent me from having to use a series of suspiciously similar "if" expressions -- that is, if we're in the middle of parsing a token do one thing, otherwise do something else -- for each of the cases.

I was thinking of having one parsing step after the lexing step in the above code, and then a compiling step down to the Robotwar virtual machine code. But after this week I realized that I should have as many steps as I need to make it easy, and I also realized that there's no reason for this first step to have any logic in it at all. I ended up separating a couple of things out into their own functions: stripping out the comments (which happens before the lexing step now) and dealing with the ambiguity of the "-" character (which happens after the lexing step). The minus sign issue turns out to be pretty easy to do in the next step, the parsing step, once we've isolated all the operators and words in the lexing step.

Last but not least, I jettisoned core.match for the time being, since I realized that by getting rid of that minus sign nonsense, I was down to a level of simplicity where I could use regular expressions (it's possible I could have used them before, but I suspect not, and I wasn't quite enough of a regexp master to attempt that one). Here's the new code for the lexer, which is drastically shorter while only sacrificing a tiny bit of functionality that is better off elsewhere in the program anyway:

All that remains now is parsing, compiling to the virtual machine object code, writing an interpreter for it, and making a graphics engine to show robots fighting!

Now that I feel a little more grounded and solidly positioned to make some headway this coming week, I look over the last few weeks and I can see that I actually learned a great deal helping others on their projects, learning Clojure (particularly by doing problems on 4clojure.com), implementing a "trie" data structure in JavaScript, having a thousand fascinating conversations about programming, and slowly coalescing ideas for another semi-major project I might work on if I have the time: abstracting out the generic parts of my tic-tac-toe-playing AI, so that you could pass in a game description and have it play any game -- tic-tac-toe, connect four, checkers, chess, whatever. (More on that in a later blog post.)

Last but not least, the above code can be found on github in the following locations: