Eric Lippert's blog

Main menu

Post navigation

North of house

You are facing the north side of a white house. There is no door here, and all the windows are boarded up. A narrow path winds north through the trees.

> go north

Let’s start with some housekeeping details before we get into coding.

I’m going to start each of these episodes – and there will be a LOT of them – with a spoiler from Zork. So: spoiler alert! If you want to solve all the puzzles in Zork, now would be a good time to do so. You’ve had over 30 years, get on it!

There’s a tradition amongst Z-machine implementers to name the project after funny stuff found in the Great Underground Empire — the Frotz and Nitfol interpreters, the Quetzal save file format, the Blorb resource format all come to mind — and I will be no different. I hereby name this project “Flathead”, after Lord Dimwit Flathead, the ruler of the Great Underground Empire.

We are going to start very, very small in this series. How do we twiddle a bit in OCaml? Because we’re going to need to! The Z-machine specification is chock full of text like this:

If the top two bits of the opcode are 11 the form is variable; if 10, the form is short. […] In short form, bits 4 and 5 of the opcode byte give an operand type as above. If this is 11 then the operand count is 0OP; otherwise, 1OP. In either case the opcode number is given in the bottom 4 bits.

I do not like bit twiddling code. The code is frequently inelegant, the mechanisms overwhelm the meaning, and it is rife with the possibility for bugs. So it was with some trepidation that I started writing bit twiddling code in OCaml.

But perhaps I am even getting ahead of myself here, as most of my readers will not be familiar with OCaml.

OCaml is primarily a functional language in the spirit of Lisp and Scheme: it encourages immutable data structures and passing around functions as data. It has an awesome type inference system; everything is typed and there are very few type annotations necessary. There are no “statements” per se; most of the code you write is expressions. The language permits both object-oriented programming (the O in OCaml) and mutable data structures, but I am going to use neither in this project.

OCaml has both a “REPL” (read-eval-print-loop) interactive mode and a more traditional compile-a-bunch-of-sources-and-execute-the-result mode. I’ll be using the latter throughout, but I encourage you to play around in the interactive mode.

When I embarked upon this project I had written no code in any ML variation since the early 1990s. I am very out of practice, and learning as I go here. If you are an OCaml expert reading this, I strongly encourage leaving constructive comments! I want to learn how to write high-quality code here, not code that looks like a C# developer who had no clue about functional languages wrote it.

So let’s start small. I have an integer. The Z-machine refers to two-byte integers as “words”, so I’ll call my first integer “word”:

let word = 0xBEEF

Pretty straightforward. A declaration like this may end in two semicolons, and if you are using OCaml in its interactive mode, the two semis tell the REPL that you’re ready to have this evaluated. In non-interactive mode it is legal to include the semis but apparently this is not considered cool by experienced OCaml programmers, so I will omit them in listings in this series.

This declares a variable called word, though “variable” is a bit of a misnomer in OCaml. Variables do not (typically) vary. (There are ways to make variables that vary – they are somewhat akin to “ref” parameters in C# – but I’m not going to use them in this project.) Variables are actually a name associated with a value.

Later on we’ll see another way to “change” a variable in OCaml, but let’s learn to crawl before we try to walk.

Note that the standard C-style convention for hex literals works as you’d expect in OCaml.

Back to the problem at hand. How would we extract, say, the top four bits? I will follow the convention of the Z-machine specification that the least significant bit is bit 0, so we are looking for bits 15 through 12. In C-like languages we would use bit shifts and bitwise masks. OCaml is no different, though the names of the operators are a little odd:

let () =
Printf.printf "%0x\n" ((word lsr 12) land (lnot (-1 lsl 4)))

Yuck.

Let’s unpack that mess.

The let-expression with the () to the left of the equals is a convention for “this is the main routine”. The contents of that routine are a unit-returning – the OCaml jargon for void returning – expression which ought to look surprisingly familiar to C programmers.

Yes, OCaml has printf, bizarrely enough. The capitalized Printf names a module — we’ll talk more about modules later — and the function is printf. The format string is just like format strings in C — percent indicates a field to be replaced, backslash-n is a line break, and so on.

Argument lists follow directly after the function, and arguments are separated by spaces. No parenthesizing or commas are required or permitted. We want to give two arguments to printf: a format string and a number value, so the number expression is in parentheses; otherwise all those elements could be treated as individual operands.

lsr, land, lnot and lsl are the logical shift right, and, not and shift left operators.

I don’t think I could live with myself if I had to write a whole lot of code that looked like this illegible mess. What I really want to say is “I want four bits, starting from bit 15 and going down, from this word.”

So let’s write a helper method to encapsulate that logic. I’ll break the computation into two parts while I’m at it: compute the mask, and then shift and apply the mask:

Principle #2: Embrace small code.
Abstraction encourages clarity.
No computation is too small to be put into a helper function.
No expression is too simple to be given a name.
Small code is more easily seen to be obviously correct.

Let’s again look at this in some detail.

Here we see how to define a function in OCaml: just the same as you define a variable. What distinguishes a function? It has a parameter list following the name of the value. The body of the function is a single expression.

Really? A single expression? It looks like two statements: compute the mask, and the evaluate the value. But notice the in at the end of the let line. This means “the variable binding I just made is local to the following expression”, and the whole thing is one expression.

I dearly wish this feature was in C#; there have been proposals to add it a number of times throughout the years.

A couple things to notice here. First, there’s a semicolon separating the two unit-returning expressions. This syntax is used in OCaml mostly for expressions that are useful for their side effects, like printing. The semicolon means basically what the comma operator means in C: evaluate the first expression, discard the result, evaluate the second expression, use the result. It’s the sequential composition operator. (One can think of the semicolon as logically the sequential composition operator on statements in C#, but in OCaml it really is an operator on expressions.)

We call our new function the same as any other function: state the name, and then give argument expressions separated by spaces.

Note that the function we’ve defined here is completely typed. OCaml deduces that all three arguments must be integers because integer operators are used on each. It also deduces that the return is an int. An attempt to pass in a string would result in a compile-time error. OCaml is statically typed, but not explicitly typed. There is a way to add explicit type annotations but I’m not going to use it in this project.

Have we gone far enough in building this abstraction for messing around with bits? No, we have not. We can do better.

Right. But Eric’s program is still valid F#, despite all that. Simply use the file extention .ml and f# will compile in ML compatibility mode.

In ML Compatibility mode, those keywords are released, so they can be implemented as functions. They actually are functions in OCaml too, and are only keywords there in the sense that the compiler is permitted to assume (without checking) that they are defined in the normal fashion.

Of course you also need definitions for these pervasive functions and any other used parts OCaml Standard library that are not implemented in Fsharp.Core. Those are implemented in a library named FSharp.Compatibility.OCaml. To use them while maintaining source compatibility with OCaml, you need to add the following conditional comment to the top of the file:

(*F# open FSharp.Compatibility.OCaml F#*)

With just a single line additional, the use of of the .ml file suffix, and referencing a single assembly available on NuGet, Eric’s code so-far works fine with F#, without needing to change a bunch of operators. I expect that to continue to be true, moving forwards, given the language subset Eric will be using. Also the only reason that one line of code is needed is because compatibility library chose to use namespaces. In theory the library could have been written with its modules in the global namespace, in which case the conditional comment would be unnecessary, and Eric’s code would work completely unmodified.

using VS2013, I did not need to use .ml file extensions, though I did need to add a –mlcompatibility option to the build, besides adding the nuget compatibility library and referencing it as you described.

Well, there is no argument validation whatsoever so things can go horribly wrong in that sense; length and high can both be equal or smaller than zero which doesn’t make much sense, and length can be greater than high + 1 which also seems nonsensical.