Things that amuse me

Friday, November 09, 2007

Some lambda calculus examples

Syntax

In a previous blog entry I described a simple evaluator and type checker for the lambda cube, i.e., various forms of lambda calculus.

Here I'm going to show some examples of code in pure typed λ-calculus. All the examples are typable in Fω; the full lambda cube is not necessary.

Before doing any examples we'd better have some syntax that is not too painful, because writing λ-expression in raw Haskell is tedious. The syntax for variables, * kind, and application is easy, I'll just use Haskell syntax. For λ-expressions the Haskell syntax doesn't allow explicit type annotations, but various Haskell compiler implement that extension, so I'll just pick that. So a λ will be written, "\ (var::type) -> expr". And as in Haskell I'll allow multiple variables between the "\" and "->"; it's just a shorthand for multiple lambdas.

So what about the dependent function type? The syntax (x:t)→u suggests (x::t)->u, so I'll use that. And when the variable doesn't occur we'll write t->u as usual.
For type variables Haskell (well, not Haskell 98, but extensions) uses forall (a::*) . t, so I'll allow that too.

An example, the identity function:

\ (a::*) (x::a) -> x

with type

forall (a::*) . a->a

And using it

id Int 5

Writing a pretty printer and parser for this is pretty straight forward so I'll skip that and just point you to the code. BTW, instead of using Parsec for the parser like everyone else I used ReadP. The ReadP library is very nice, partly because the alternative operator is actually commutative (unlike Parsec). But the error messages suck.

Enter let

Now if we want to use, say, the identity function more than once we need to name it. There is a mechanism for that, namely λ. But it looks awkward. Look:

What makes it awkward is that the name, id, is far away from the body, \ (a::*) (x::a) -> x. From Haskell we are more used the let and where expressions.
So let's add that let. Instead of what we have above we'll write

The let construct could be just be syntactic suger for a lambda and an application, but I've decided to add it as a constructor to the expression type instead.

| Let Sym Type Expr Expr

Adding a new constructor means that we have to modify all the functions operating on Expr, and it's extra much work because Let is a variable binding construct.
For substitution we just cheat a little and use the expansion into an application and λ

Bool

We could start by throwing in all kinds of primitive types into our little language, but who knows what might happen then. All the nice properties that have been shown about the language might not hold anymore. So instead we'll code all the types we need with what we already have.

The Bool type has two values: False and True. So we need to find a type that has exactly two different values. Fortunately that's easy: a->a->a (I'll be a bit sloppy and leave off top level quantifier sometimes, just like Haskell; That should be forall (a::*) . a->a->a).

Why does that have two possible values? Well, we have a function type that must return an a for any possible a that it's given, so it can't conjure up the return value out of thin air. It has to return one of it's arguments. And there are two arguments to choose from, so there are two different values in that type.
Here's Bool, False, True:

It would be easy to add more sugar and allow type which just means you're defining something of type *. And you could also make an omitted type in a forall mean type *. But I've not added these little extras.

Defining the if function is trivial; it's just a matter of permuting the arguments a little, because the boolean values come with the "if" built in.

if :: forall (a::*) . Bool -> a -> a -> a;
if a b t f = b a t f;

Note how the type signature is exactly the same as you'd find in Haskell. A difference is that we have explicit type abstraction and type application. For instance, if takes a first argument that is the type of the branches. So when using if we must pass in a type.

What is it whining about? Expected Bool, got forall (a :: *) . a->a->a. But it says right there in the code that Bool is exactly that. What's going on?
Well, the type checker knows the type of everything, but not the value of anything. So the type checker knows that type of Bool is *, but it doesn't know what it is equal to.

The problem is that when we have a let binding we know the value of the defined variable and to be able to do dependent type checking the type checker needs to know it too. We need to change the type checking of let. Here's a simple solution:

tCheck r (Let s t a e) = do
tCheck r t
ta
Note how we substitute the value of the definition (a) in the body before type checking it. This is a sledge hammer approach. A better one would be to change the environment in the type checker to carry values when they are known. These values could then be used when computing the normal form of an expression. But let's keep it simple for now.

A simple top level

To make experimentation easier I've added a simple top level where you can evaluate expression, make definitions, load files, etc. The files are just a bunch of definitions the way they would go inside a let.

By staring carefully at these you can convince yourself that it's right.
What's making it a little hard to read are all those leading lambdas (and the corresponding function types). But there's an option to suppress the printing of them.

Characters

So we can define booleans. What about something like characters? You can define those too. What is the ASCII type? It's a type with 128 different values. We saw how we can make a type with two values (Bool); we can do the same for 128. Since that takes a lot of room, I'll do it for just four values.

This is a very handy notation during program development, btw. Inside i let you can first just write the type of something and then use it freely. This something will then be lambda abstracted until such a time that you provide the definition.

Now eqChar no longer gets evaluated, naturally, because it has no definition; it's λ bound.

Now we know how the Char type could be implemented (or supplied from the outside). It would then be all right to add it as a primitive (with the same behavior), because it will not ruin any properties. (Speaking of ruining properties, the seq function in Haskell is an example of a function that cannot be defined in the λ-calculus. And sure enough, adding it to Haskell ruined the validity of η-reduction.)

It's not difficult to extend the Expr type with some Prim constructor for primitive functions, and adding some special cases in the syntax and type checking for them. But since this is not intended to be a usable language I'll resist the temptation (for now).

Pairs

OK, booleans and characters were pretty easy. Let's do pairs next. The Pair type is parameterized over two types; the type of the first and the second component. As with booleans the representation of pairs come with the case analysis "build in". (Case analysis on pairs is flip uncurry in Haskell.) If we can correctly implement building pairs, fst, and snd we are done.

The :let command is used to make bindings without having to load them from a file. The first :let just introduces some values to play with.

Again, the big difference compared to Haskell is that all types are explicit. (But users of templates in C++ or generics in Java/C# might appeciate it?)

Maybe

We can follow the same pattern and define Haskell's Maybe type. In fact any non-recursive Haskell data type has a mechanical translation and easy into lambda calculus. We've seen boolean, characters, and pairs above.

The natprim function corresponds to foldr for list and is our means of recursion. This type for natural numbers come with primitive recursion built in, just as the non-recursive data types earlier came with case analysis built in.

I'll skip defining subtraction&co, they are possible, but a little tedious (and very inefficient).

Lists

Lists are like a hybrid of natural numbers and the Maybe type. They follow the same pattern as before.

Tuesday, November 06, 2007

Benchmarking ray tracing, Haskell vs. OCaml

On his web site about OCaml Jon Harrop has a benchmark for a simple ray tracing program written in a number of languages. When I saw it I wondered why Haskell was doing so badly, especially since this benchmark was taken as some kind of proof that Haskell performs poorly in "real life".
So I have rerun the benchmarks. I've also rewritten the Haskell versions of the programs. The Haskell versions on Jon's web site were OK, but they were too far from the OCaml versions for my taste. I prefer to keep the programs very similar in a situation like this.
My rewrite of the benchmarks from OCaml to Haskell was done with a minimum of intelligence. Here are the only things I did that needed creative thought:

Use the type Vector to represent vectors instead of a tuple. This allows the components to be strict.

Use the type Scene instead of a tuple to represent a scene. The tuple used in the OCaml code uses the dubious feature of equi-recursive types (even Xavier thinks it's strange enough to have a flag to enable it).

Rewrite the loop that computes a pixel's value using an accumulating updatable variable into a list comprehension that sums the list.

Finally, the compiler flags needed a bit of tweaking to get good performance, even though "-O3 -funbox-strict-fields -fexcess-precision -optc-ffast-math" were pretty obvious.

In addition to this I made the code look a little more Haskellish, e.g., using overloading to allow + and - on vectors. This is really just minor syntactic changes, but makes the code more readable.

To make the program size comparison fair I removed some dead code from the OCaml code.

I then reran the benchmarks using Haskell, OCaml, and C++.

The benchmarks are five programs that starts from very simple ray tracing and specializing the program more and more to speed it up.

The numbers are in the tables below. The time is execution time in second, the characters are non-white characters in the file, and the lines are the number of lines in the file. To ease comparison I also include the relative numbers compared to OCaml (smaller numbers are better).

Interestingly, and unlike Jon's benchmark, the Haskell code is always smaller than the OCaml code. Furthermore, the Haskell code ranges from much faster to slightly faster than the OCaml code. Again, this is very unlike Jon's benchmark. I find the unoptimized version of the benchmark especially interesting since Haskell is 5 times(!) faster than OCaml. I've not investigated why, but I suspect laziness.

Results

The programs, ray1-ray5, are variations on the ray tracer as given on Jon's web site. I've used the same size metrics as Jon does.

Target architecture is x86 (even though the processor is x86_64 capable).

Some observations

Haskell should really have the definitions of infinity and epsilon_float in a library. They are quite useful. Also, having them in a library would have made the Haskell code somewhat shorter and faster.

Converting these programs from OCaml to Haskell was very mechanical; it could almost be done with just sed.

I'm glad version 5 of the benchmark didn't show much improvement, because it's a really ugly rewrite. :)

Note that the code is all Haskell98, no strange extensions (even though -funbox-strict-fields deviates subtly from H98).

Conclusion

Benchmarking is tricky. I'm not sure why my and Jon's numbers are so different. Different hardware, slightly different programs, different software.

Haskell is doing just fine against OCaml on this benchmark; the Haskell programs are always smaller and faster.

Edit: Updated tables with more numbers.

PS: Phil Armstrong wrote the Haskell code on Jon's web site and I took some code from his original.