Saturday, February 02, 2008

There is a certain truth to Greenspun's tenth law of programming. A Python project I was developing at work has slowly mutated into a compiler for a programming language without me planning it that way. Usually (I assume) compilers parse their input and construct an AST which is passed to the compiler proper. My code didn't have an AST, just a bunch of lambdas. I realised that I'd actually come across a real world example of what Wadler was talking about in Recursive Types for Free!.

In Haskell, the foldr function reduces a list using a binary function and some initial value. Suppose the function is called a and the initial value is b. Take a list, for example [1,2,3]. Now write it without using list notation, directly in terms of its constructors. Ie. 1:(2:(3:[])). foldr replaces (:) by a and [] by b. So this becomes a(1,a(2,a(3,b))). The best known example is a=(+) and b = 0 so we get 1+2+3+0 and hence the sum of the values in the list. Here is how we'd use foldr in Haskell:

> x = foldr (+) 0 [1,2,3]

The interesting thing is that anything you might want to know about a (finite) list can be extracted using foldr. There is a sense in which it the universal function on lists and all other functions can be factored through it. For example, we can implement head and tail as follows

So if x is a list, \a b -> foldr a b x tells you everything you could want to know about the list. In other words, you can completely replace the list itself with functions like this. In fact, we can replace the list constructors with functions that build such functions:

> nil a b = b> cons h t a b = a h (t a b)

We can use nil and cons just like [] and (:). In fact, given an element defined by

> y = cons 1 (cons 2 (cons 3 nil))

We can convert it to a conventional list via

> z = y (:) []

So foldr embeds a list as a function.

We can write the same thing in Python. (Note that Python already has a variation of foldr, called reduce.)

Folds can be generalised to any recursive type, not just lists. (Stricly speaking I mean recursive rather than corecursive types. Folds aren't appropriate for infinite structures.) Note how for lists, foldr takes two arguments besides the list: a two argument function and a zero argument function. Applying a fold simply replaces the list constructors (:) and [] with these functions. Generalised folds do something similar: each constructor gives rise to an argument to the fold and when the fold is evaluated, each constructor is replaced with the appropriate function. Here's an example:

Now consider a simple expression type in Haskell:

> data Expr = X | Const Int | Binop (Int -> Int -> Int) Expr Expr

This is a recursive type so it has a generalised fold associated with it. This fold will take three arguments, one for each of X, Const and Binop, and each one will take the same number of arguments as the constructor. Here it is:

efold simply replaces each constructor with an application of the matching function recursively through the entire Expr.

Anything you might want to do to an Expr can be done using efold, and many things you might naturally want to do with an Expr are particularly easy to write using it. Here the functions to (1) evaluate the expression for X equal to some Int, and (2) to determine whether or not an expression is free of references to X:

This can sometimes be an inefficient style of programming, especially so in a strict language. Look again at tail for the cons/nil lists. But many uses are quite efficient, and folds capture a very common design pattern.

When I wrote this post a while back I left out mention of what the main point of the paper was. This post fixes that.

Wadler's paper also describes a dual version of this for codata such as streams. But as far as I understand it's not very interesting.

It's interesting that theory about static types has something to say about programming in a dynamically typed programming language.

Just so you know, my work project doesn't look anything like the code above.

Oh...and I guess you could say this was a form of the visitor pattern. Ugh. It's hideously complicated in C++."""

Now I -really- want to see (or write) a discussion of recursive data types and generalized folds in the language of operads! What you said a generalized fold is looks to me very much like the construction of a free operads on a set of operations. (and/or possibly representations of such)

You say that "Wadler's paper also describes a dual version of this for codata such as streams. But as far as I understand it's not very interesting." I disagree; I think the dual is very interesting! Whereas the inductive version captures algebraic datatypes as purely functional recursive types, the coinductive version captures abstract datatypes. I exploit this in my paper Unfolding Abstract Datatypes. (I'd explain here, but the margin is too narrow.)

Aaaugh, scary but interesting post! However, the problem discussed there* isn't with lambdas per se,* it's with a closure (lambda or def) referring to loop variables in the enclosing function* so doing the closure with def doesn't change that situation* anyway it doesn't come up in your situation (referring to the enclosing function's parameters)

My sense even after that post is that Python's closures are about as pure or impure as Scheme's (unless you think of the tail-call memory leak as impurity).