I wrote this helper method and a few others like it and everything was great, right up until I spent I kid you not a solid hour trying to figure out why a computation was wrong:

blah blah blah (fetch_bits branch 5 6) blah blah blah

Now I’ve made it pretty clear where the mistake is here, but I assure you it was by no means clear what was wrong when the program started giving me completely crazy results.

The problem of course is that we have a function that takes three integers and I passed them in the wrong order. This gets 5 bits of the number 6, starting at bit “branch”, when I intended to say 6 bits of number “branch” starting at bit 5.

This is of course an easy bug in most languages. In C#, if I had a similar helper method, I could easily have written FetchBits(b, 5, 6) just as easily, and been just as wrong.

There are a number of things I could have done to prevent the bug. OCaml, like C#, allows arguments to be annotated with the name of the corresponding parameter, which certainly would have been a big help. But I am not in the habit of using this feature, and I was thinking: surely in a language with excellent type inference, there ought to be a way to have the compiler itself detect the error.

Writing this bug really got me thinking about how powerful a tool the type checker is in a statically typed language, and what limited use we make of it. A statically typed language is a tool for producing a partial proof that the program is correct, or at least not broken in some obvious, detectable way.

I also started thinking about how many different things the Z-machine uses integers for. We’re only a couple episodes in and we’ve seen bit numbers, bits and words. The Z-machine uses integers to identify addresses, variables, branch modes, opcodes, characters, screen widths, and dozens upon dozens more things, none of which make any sense to be assigned to each other. So:

Principle #3: Use “naked” primitive types like integers and bools as little as possible. Use the type system to both express meaning and provide evidence of correctness.

As we’ll see, by using the type system wisely we can also make the code easier to understand and to emphasize its meaning, rather than its mechanisms.

A nice idea. How do we do it?

OCaml both allows and encourages creation of super-lightweight “wrapper” types around any other type; I think of them as “tagging” the underlying type with a meaning, much the same way that an enumerated type in C# is essentially just a meaningful wrapper around an integral type. (They are not exactly enumerated types; later on we’ll use a similar technique to make enumerated types in OCaml.) The type declaration could not be simpler:

type bit_number = Bit_number of int
type bit_size = Bit_size of int

The left of the equality gives a name to the type. The right of the equality gives a constructor: to construct an instance of bit_number, we say what constructor we want to use: Bit_number. Let’s try it out:

let b = fetch_bits (Bit_number 5) (Bit_size 6) 0xBEEF

Hmm. That gives us a type checking error:

Error: This expression has type bit_number but an expression
was expected of type int

We will need to do bit arithmetic on the wrapped number, which means that there needs to be a way to unwrap it. Let’s rewrite fetch_bits as follows:

And now our call succeeds. OCaml’s type checker says “I need something in the first argument that matches the pattern an integer wrapped in a bit number, and I wish to name that integer high“.

Now, on the one hand we now have solved the problem; an attempt to pass a naked integer for either the bit number or the bit size will fail, and similarly so will any attempt to swap the number and the size. The type system now guarantees that I cannot write my bug! But at what cost?

let result = fetch_bits (Bit_number 5) (Bit_size 6) word

Ugly! But hey, if I am going to be passing constants to those things then why don’t I just define some constants?

I can easily define all the constants I need. And now my code looks like this:

let result = fetch_bits bit5 size6 word

The code is not only completely type-safe, preventing me from making stupid transposition bugs, but it is also far more self-documenting than any of my earlier attempts. The code does not prevent stupid bugs like asking for ten bits starting at bit two, but I think I am unlikely to write that bug now; if I felt that I was then I could put additional checks into the helper method.

If you take a look at the code for this episode on github you’ll notice that I’ve put the type declarations and utility functions and main method all in separate files. Obviously that is a bit silly for such a small example, but we’ll be adding a lot more code soon and it will be convenient to have logical areas separated into different modules. More on that later.

Share this:

Like this:

Related

22 thoughts on “Forest path”

I wish there was a simple way to do this in C#. I’ve created wrapper structs around primitive types several times in the past to avoid this kind of bug and to leverage the type inference benefits. But it is quite a bit of boiler-plate code to express something simple. And in most cases I’d prefer that once compiled, it performs & behaves identically to the underlying primitive type.

I cannot disagree more. The number of things to implement for a simple wrapper is quite a mouthful to say the least. There’s the overrides on System.Object, then interfaces like IComparable and IEquatable and IFormattable, then there’s the operator overloads, etc. A simple language construct to wrap an existing type would be very useful indeed.

using separate source files in VS2013, took me a while to realize that F# needs the correct compilation order (no forward reference allowed in other words). Right-click files and choose move up or move down to get them into the right order. (using “module X” and “open X” lines to declare and reference modules, respectively.)

dmo has already beaten me to it, so let me instead to support the point made.

In a recently previous life, I wrote financial software and we had templates to lightly wrap decimal so that we could manage Notional Value, Market Value, PreHaircut, PostHaircut, and so on with type-safe conversions.

It would have been so nice to have a succinct way to tell the compiler “The new type NotionalValue should be treated exactly like a decimal except that it should not be implicitly convertible to any other type”

It looks like you hit a nerve, I also picked up following along your posts…in F#. Which is great, because there is a good chance that I don’t have to change much of the code 😉 Anyway, for anyone interested – http://realfiction.net/tag/ionz/

Well, this being mainly a c# blog, I’ve decided to start a small project implementing the FlatHead ZMachine in…..yup thats right, c#! Surprise, surprise. Probably not half as interesting as other translations already up and running (Haskell, F#, etc.) but I figured out it would be a fun excercise. If anyone is remotely interested you can find it at CSharpZMachine.
I wont have time to blog about the experience but I will update the project’s readme file explaining what I’m doing as I go along.

Explicit types for adjacent parameters is exactly the sort of thing I would have done 3 years ago, but today feels a bit heavy to me. Perhaps my coworkers beat me out of pulling this sort trick in more verbose languages and I’m improperly transferring those lessons back.

Some alternatives:

* In a curly brace language, use builders to name parameters

`bits.skip(5).take(6)`

* In currying languages, use some infix operator

bits |> drop 5 |> take 6

* Demand that all bit twiddling be wrapped in semantically named functions which you unit test thoroughly

(Relying on unit tests is probably what I’d do. Only constants will be fed into fetch_bits. If the caller of fetch_bits made an error, that error will be a mistranscribed value. Miscomposed functions will not be the problem; type checking won’t save you. The *shape* of your call graph will be correct; only the *value* of the leaves will be incorrect. You *need* to use unit tests to catch this sort of bug in most cases.)

* Use language-specific interval idioms that you’ve read so often that they’re burnt into your brain

python_list[-4:-2]

* Write a program to interpret types of opcodes from some sort of opcode spec. Question begging, but thinking about how I would construct an interpreter might yield a 1 in 4 shot of coming up with a really elegant solution (but a 1 in 8 chance of falling prey to a useless time sink).

I’d be more amenable to the explicit type creation technique you used here if the constants “bit1” and “size5” had more semantic names. But my guess is that either (1) names wouldn’t map nicely to the standard language, or (2) the concepts mentioned in the standard, like “opcode_header_length”, would be used as both sizes and offsets.

IDPIMLPAHFAHISY (I don’t program in ML professionally and haven’t for a hobby in several years).