Saturday, February 23, 2008

Of all the branches of mathematics I have studied, Topology was the one with the biggest mismatch between what I expected, and how it actually was. Eventually I reached Algebraic Topology, and that is all about the things you expect: Mobius strips, Klein bottles, pairs of pants, knots and handlebodies. But the subject starts with the machinery of Point Set Topology, and in particular with a set of axioms that can seem highly unmotivated. So I’m going to describe Topology in a way that is completely different from what you’ll find in any Topology textbook I know, but which does draw on published ideas in Computer Science. Nonetheless, I am describing standard off-the-shelf point set topology, just dressing it with different intuitions.

Suppose I have some set X and a (possibly infinite) collection of machines. To each machine M is associated a subset of X, U. The idea is that given an element x of X, you can use the machine M to test to see if x is an element of U. The catch is that the machine can only reply “yes”, and that you don’t know how long to wait for the answer, only that if the answer is “yes”, the machine will eventually, after a finite time, give you an answer. If x isn’t in U, the machine sits there forever, neither saying “yes” nor “no”. We’ll call a set V “observable” if for every x in V, we can use a machine, or some combination of machines, to show (in a finite time of course) that x is in V. So all of the U’s associated to machines M are observable.

Now suppose we have two machines, M and N, associated with sets U and V respectively. We input x to both machines. If x is in U intersection V then eventually both machines will reply “yes”. We don’t know how long to wait, but we know it’ll be a finite wait. Similarly, if x is in U union V we known that one machine or the other will eventually say “yes”. So U intersection V and U union V are both observable.

Now suppose we have a (possibly infinite) collection of observable sets Ui. Let U be the union of all the Ui. Given any element x of U, it must be a member of one of the Ui. But every Ui is observable, so there is some combination of machines that can prove that x is in Ui. And hence we can show that x is in U. So unions of infinite collections of observable sets are observable.

(Note that in the previous paragraph I didn’t say you had to have an algorithm for saying which Ui you should use, given x. I’m just saying that some Ui must exist. This means I’m not talking about semidecidability. But observability is similar to semidecidability.)

So what are the machines that I’m talking about? Let’s leave that undecided. We can use the above observations to make a definition:

Given a set X, a system of observables is a set of subsets of X, called observable sets, with the properties

X and the empty set are observable

arbitrary unions of observable sets are observable

finite intersections of observable sets are observable

So let me fill in with an example. Suppose we have some devices for measuring signed lengths (or, say, x-coordinates in some frame of reference). These lengths take value in the real line, R. If the device measures a length within its observable set it eventually says “yes”, otherwise it hangs until you press reset. Suppose x is in a device’s observable set. Then some mechanism eventually leads to a “yes” response. But if the device is based on reasonable physical principles, it must be possible to tweak x by a tiny amount and still trigger the same mechanism, otherwise the device would, in some sense, be infinitely discriminating around x. So if x is in an observable set U, then the interval [x-e,x+e] must also be in U for some, possibly very small, e. So let’s define a system of observables on the real line by saying U is observable if for every x in it, there is some e such that [x-e,x+e] is in the set. Intuitively, these are sets that are “fuzzy round the edges”. That’s a reasonable property of a real world measuring device.

Suppose we can augment our devices by applying transformations to a value x before applying the machine. For example, if the machine works optically you could imagine applying a magnifying lens to it to improve its accuracy. For a 10x lens, say, we’d apply the function x -> 10x to our point, and then apply a machine M to see if 10x lies in U. More generally, if our transforming function is f, then applying that to x converts our machine into a machine that tests whether x is an element of the set

f-1(U) = {y | f(y) in U}

If f is the product of a physical process, it’s reasonable to expect f to be continuous at all of its arguments. So what does f-1(U) look like for continuous f? Well, by definition, f is continuous at x if for all d there is an e such that |x-x’|<e means |f(x)-f(x’)|<d. So if x is in f-1(U), then f(x) has a small interval around it in U (because by stipulation, U is observable) and so x has a small interval around it in f-1(U) (by continuity of f). In other words, the continuous functions are precisely those functions for which f-1(U) is observable for all observable U, ie. the functions that don't let us observe the unobservable.

So...I’ve talked about observability, and machines, and ways to pre-process the input to these machines. What does any of this have to do with topology? Well, it simply *is* topology. Not a generalisation, or a special case. What I have described is precisely the subject matter of topology. The only difference is that a “system of observables” is normally called a topology and an “observable” set is normally called an open set. And given a topology on a set A, and a topology on a set B, topologists define continuity by saying that f:A->B is continuous if f-1 maps open sets to open sets.

What I’ve tried to do in a few paragraphs is motivate the usual axioms of topology. Usually they are presented as a fait accompli which is justified after the fact by geometrical intuitions. I hope I’ve given a way to look at these axioms that makes them seem natural a priori and that spans both geometrical and computational ideas.

Friday, February 08, 2008

I was recently looking at some of Paul Taylor's writings on what he calls Abstract Stone Duality. It's partly another approach to 'topologising' computer science, independently of Escardo's work. I'm not yet in a position to say much about what he does, except that it gives a neat new language to talk about computable continuous functions without having to build things up from Set Theory. But just for fun I want to look, out of context, at one teeny weeny little thing that he mentions, the Sierpinski space. If you've not met general topology before, it can be a tough topic to grasp. So what I want to do is look in detail at how we might think about a really simple seeming problem from a topological perspective.

Haskell defines a type called (). There is one value of type (), confusingly called (). Most of this post is about functions of the type () -> (). You'd imagine there couldn't be much to say. But this topic is much more complicated than some people might imagine.

How many functions of type () -> () are there?

(Before proceeding, the character ⊥ should look like _|_. Apologies if your font/browser makes it looks like something else.)

Let's start with the obvious function of this type.

> f1 x = x

It seems like this is the only possible function. The argument can only be () and the result can only be (). What other choices could we make?

Well here's another implementation:

> f2 _ = ()

It certainly looks different, but it still just maps () to (). So it appears that f1 and f2 are the same. But there is a way to tell them apart. Make the following definition:

> loop = loop

Attempting to evaluate loop sends Haskell into an infinite loop. Either you'll get an error message, or your computer will loop forever, never giving a result. Similarly, evaluating f1 loop will also fail to terminate with a sensible result. But f2 loop terminates fine and returns (). Because Haskell is a lazy language, f2 doesn't need to evaluate its argument before giving its result. f1, on the other hand, returns its argument, so looking at the result causes non-termination. So amazingly, f1 and f2 are different. We have at least two distinct functions of type () -> (). Are there any more?

We can treat a non-terminating function as if it returns a special value called ⊥ (pronounced bottom). So we can summarise the above in a kind of truth table:That immediately suggests two more functions, f3 and f4, which are also illustrated.

So it now looks like there are four functions of type () -> (). Here's a possible implementation of f3:

f3 _ = loop

f3 is simply a function that deliberately sabotages itself. But f4 is the interesting one. In order to have f4 ⊥ = (), f4 must ignore its argument so that it doesn't get caught in a quagmire of non-termination. But if it ignores its argument, then it has to ignore its argument when f4 () is evaluated. So f4 () must also equal (). In other words, f4 cannot be implemented. We have only three functions of type () -> ().

So how can we characterise these functions mathematically? Haskell functions don't correspond to mathematical functions from the set {()} to {()}. If we think of () as a set with two elements, {(),⊥}, then Haskell functions still don't correspond to functions from this set to itself.

Let's give the set {⊥,()} a name, S. One approach to formalising Haskell functions is to impose an ordering on S. Here's one:

⊥<=⊥⊥<=()()<=()

(Now you can see why ⊥ is called "bottom". If you think of <= as a ranking, ⊥ is at the bottom of the heap.) A monotone function is defined to be a function f such that if x<=y then f(x)<=f(y). If we plot a 'graph' of f with ⊥ and () along the axes, monotone functions are the ones that don't descend as you go from left to right:The possible Haskell functions correspond precisely to the monotone functions. (Mathematicians would probably call these functions monotonically increasing but the computer science literature I have just seems to call them monotone).

Another useful definition is the notion of a strict function. This is one that maps ⊥ to ⊥ and so goes through the 'origin' on the graph above. In a strict language you can't implement f2.

But maybe you can think of another possible implementation of a function of type () -> ():

> f5 = loop :: () -> ()

When working with fully lazy evaluation this is indistinguishable from f3 so we can ignore it. But in a strict language we can distinguish f3 and f5. In Haskell, for example, we can use the function seq to force evaluation of its first argument before moving onto its second. f3 `seq` () and f5 `seq` () are distinguishable because the first evaluates to () but the second gives ⊥.

So the answer to the original question is (I think):* 1 in a total language or "in mathematics"* 3 in a lazy language like Haskell when working completely lazily* 4 in Haskell when using seq to enforce strictness* 3 in a strict language like Ocaml

You'd think I'd have exhausted everything that there is to say by now. But there's a whole lot more. But before I get to it I need to talk a bit about topology in my next post.

And I'm completely ignoring the philosophical issue of what ⊥ means when you can't tell the difference between an answer of ⊥ and a slow computer taking longer than you expect to compute ().

Update: This problem was trickier than I originally anticipated. The above text incorporates a couple of changes based on comments I received.

Tuesday, February 05, 2008

Suppose Fnxy is a program, written in language x, that takes as input n string arguments as input, G1,...,Gn and outputs a program in language y that is the application of the function whose source code is G2 to the strings G2,G3,...,Gn,G1. Then F3xy(F3xy,F3yz,F3zx) will be a program in language x that that outputs a program in y that computes F3yz(F3yz,F3zx,F3xy).

Saturday, February 02, 2008

There is a certain truth to Greenspun's tenth law of programming. A Python project I was developing at work has slowly mutated into a compiler for a programming language without me planning it that way. Usually (I assume) compilers parse their input and construct an AST which is passed to the compiler proper. My code didn't have an AST, just a bunch of lambdas. I realised that I'd actually come across a real world example of what Wadler was talking about in Recursive Types for Free!.

In Haskell, the foldr function reduces a list using a binary function and some initial value. Suppose the function is called a and the initial value is b. Take a list, for example [1,2,3]. Now write it without using list notation, directly in terms of its constructors. Ie. 1:(2:(3:[])). foldr replaces (:) by a and [] by b. So this becomes a(1,a(2,a(3,b))). The best known example is a=(+) and b = 0 so we get 1+2+3+0 and hence the sum of the values in the list. Here is how we'd use foldr in Haskell:

> x = foldr (+) 0 [1,2,3]

The interesting thing is that anything you might want to know about a (finite) list can be extracted using foldr. There is a sense in which it the universal function on lists and all other functions can be factored through it. For example, we can implement head and tail as follows

So if x is a list, \a b -> foldr a b x tells you everything you could want to know about the list. In other words, you can completely replace the list itself with functions like this. In fact, we can replace the list constructors with functions that build such functions:

> nil a b = b> cons h t a b = a h (t a b)

We can use nil and cons just like [] and (:). In fact, given an element defined by

> y = cons 1 (cons 2 (cons 3 nil))

We can convert it to a conventional list via

> z = y (:) []

So foldr embeds a list as a function.

We can write the same thing in Python. (Note that Python already has a variation of foldr, called reduce.)

Folds can be generalised to any recursive type, not just lists. (Stricly speaking I mean recursive rather than corecursive types. Folds aren't appropriate for infinite structures.) Note how for lists, foldr takes two arguments besides the list: a two argument function and a zero argument function. Applying a fold simply replaces the list constructors (:) and [] with these functions. Generalised folds do something similar: each constructor gives rise to an argument to the fold and when the fold is evaluated, each constructor is replaced with the appropriate function. Here's an example:

Now consider a simple expression type in Haskell:

> data Expr = X | Const Int | Binop (Int -> Int -> Int) Expr Expr

This is a recursive type so it has a generalised fold associated with it. This fold will take three arguments, one for each of X, Const and Binop, and each one will take the same number of arguments as the constructor. Here it is:

efold simply replaces each constructor with an application of the matching function recursively through the entire Expr.

Anything you might want to do to an Expr can be done using efold, and many things you might naturally want to do with an Expr are particularly easy to write using it. Here the functions to (1) evaluate the expression for X equal to some Int, and (2) to determine whether or not an expression is free of references to X:

This can sometimes be an inefficient style of programming, especially so in a strict language. Look again at tail for the cons/nil lists. But many uses are quite efficient, and folds capture a very common design pattern.

When I wrote this post a while back I left out mention of what the main point of the paper was. This post fixes that.

Wadler's paper also describes a dual version of this for codata such as streams. But as far as I understand it's not very interesting.

It's interesting that theory about static types has something to say about programming in a dynamically typed programming language.

Just so you know, my work project doesn't look anything like the code above.

Oh...and I guess you could say this was a form of the visitor pattern. Ugh. It's hideously complicated in C++."""