Saturday, July 03, 2010

Unprovable PropositionsAmong other things, Godel's first incompleteness theorem allows us to construct a statement in the language of Peano arithmetic that can't be proved using the axioms of Peano arithmetic. Unfortunately, this statement is a highly contrived proposition whose sole purpose is to be unprovable. People who learn of Godel's theorems often ask if there are other more natural and uncontrived mathematical statements that can't be proved from the Peano axioms.

My goal in this post will be to describe one of these propositions. Not just uncontrived, but actually very useful. I only intend to tell half of the story here because I feel like there are many good treatments already out there that tell the rest. I'm just going to get to the point where I can state the unprovable proposition, and then sketch how it can be proved if you allow yourself a little Set Theory.

TerminationSuppose we implement a function to compute the Fibonacci numbers like so:

> fib 0 = 0> fib 1 = 1> fib n = fib (n-2) + fib (n-1)

How do we know that fib terminates for all natural number arguments? One approach is this: if we pass in the argument n it clearly never recurses more than n levels. Each time it recurses it calls itself at most twice. So it must terminate in O(2n) steps (assuming that the primitive operations such as addition take constant time). We can think of this code in a kind of imperative way. It's a bit like n nested loops, each loop going round up to two times.

Suppose instead that we have some kind of recursive function g that goes n levels deep but for which the number of calls of g to itself is no longer two. In fact, suppose the number of self-calls is very large. Even worse, suppose that each time g is called, it calls itself many more times than it did previously, maybe keeping track of this ever growing number through a global variable. Or instead of a global variable, maybe an evil demon decides how many times g calls itself at each stage. Can you still be sure of termination?

A Simple MachineIn order to look at this question, we'll strip a computer right down to the bare minimum. It will have an input (that the evil demon could use) for natural numbers and will output only one symbol. Here's a design for such a machine:

> data Machine = Done | Output Machine | Input (Natural -> Machine)

A value of type Machine represents the state of the machine. Done means it has finished running. Output s means output a symbol and continue in state s. Input f means stop to input a number from the demon (or elsewhere), call it i, and then continue from state f i. This is very much in the style discussed by apfelmus and I in recentblog posts.

For any n we can easily build a machine to output n stars. This is such a natural machine to want to build it seems only right to give it the name n. If we want to do this then we need to make Machine an instance of Num and define fromInteger for it:

Now given two of these machines there is a natural notion of adding them. a + b is the machine that does everything b does followed by everything a does. (Remember, that's b then a.) To do this we need to dig into b and replace every occurrence of Done in it with a. That way, instead of finishing like b, it leads directly into a. In the case of a + Input f, for each number i we need to dig into f i replacing each Done with a:

There's a natural way to multiply these machines too. The idea is that in a * b we run machine b. But each time the Output command is run, instead of printing a star it executes a. You can think of this as a control structure. If n is a natural number then a * n means running machine an times. In the case of a * Input f, instead of multiplying by a fixed natural number, we get an input from the user and multiply by f i instead:

We can make a machine to input a number and then output that many stars. Here it is:

> w = Input fromInteger

Try running run1 w.

Can you guess what the machine w * w does? Your first guess might be that it inputs two numbers and outputs as many stars as the product of the two numbers. Try it. What actually happens is that we're computing w * Input fromInteger. Immediately from the definition of * we get Input (\i -> w*i). In other words, the first input gives us an input i, and then w is run i times. So if we initially input i, we are then asked for i more inputs and after each input, the corresponding number of stars is output. Although the original expression contains just two occurrences of w, we are required to enter i+1 numbers.

Given the definitions of + and * it seems natural to define the power operation too:

The power operation corresponds to the nesting of loops. So, for example, w ^ n can be thought of loops nested n deep.

Try working out what w ^ w does when executed with run1.

Consider the set M of all machines built using just a finite number of applications of the three operators +, * and ^ to w and the non-zero naturals. (The non-zero condition means we exclude machines like 0*w that accept an input and do nothing with it.) Any such expression can be written as f w, where the definition of f makes no mention of w.

Suppose we use run1 and we always enter the same natural n. Then each occurrence of w acts like n. So if we start with some expression in w, say f w, then always inputting n results in f n stars. We could test this with run1 (w^w^w^w), always entering 2, but it would require a lot of typing. Instead we can write another intepreter that consumes its inputs from a list rather from the user (or demon). And instead of printing stars it simply prints out the total number of stars at the end:

Now you can try run2 (w^w^w^w) [2,2..] and see that we (eventually) get 2222.

Termination AgainIf we run a machine in M there's a pattern that occurs again and again. We input a number, and then as a result we go into a loop requesting more numbers. These inputs may in turn request more inputs. Like the mythological hydra, every input we give may spawn many more requests for inputs. As the number of inputs required may depend on our previous inputs, and we may input numbers as large as we like, these machines may run for a long time. Suppose our machine terminates after requesting n inputs. Then there must be some highest number that we entered. Call it m. Then if the original machine was f w (with f defined in terms of the 3 operators and non-zero naturals), the machine must have terminated outputting no more than f m stars. So if our machine terminates, we can bound how many steps it took.

But do our machines always terminate? The input we give to the machine might not be bounded. If we run run2 (w^w) [4,5..], say, the inputs grow and grow. If these inputs grow faster than we can chop off the heads of our hydra, we might never reach termination.

Consider a program to input n and then output fib n. It accepts an input, recurses to a depth of at most n, and calls itself at most twice in each recursion. Compare with the machine 2 ^ w. This accepts an input n, recurses to a depth n, calling itself exactly twice each time. So if 2 ^ w terminates, so does fib. The more complex example above where I introduced the evil demon will terminate if w ^ w does, as long as the demon doesn't stop inputting numbers. So if we can show in one proof that every machine of type Machine terminates, then there are many programs whose termination we could easily prove.

Let's consider an example like run1 (w ^ w) with inputs 2, 3, 4, ...

We start with w ^ w. Examining the definition of the operator ^ we see that this proceeds by requesting an input. The first input is 2. Now we're left with w ^ 2. This is w * w. Again it accepts an input. This time 3. Now we go to state w * 3. This is w*2 + w. Again we accept an input. This time 4. We are led to w*2 + 4. This now outputs 4 stars and we are left with w * 2 which is w + w. We accept an input 5, output 5 stars and are left with w. After a further input of 6, it outputs 6 stars and terminates. Or we could just run run2 (w ^ w) [2,3..] and get 15(=4+5+6) as output.

Now for some Set Theory. Rewrite the above sequence using the transfinite ordinal ω instead of w. The sequence becomes a sequence of ordinals. Any time we accept an input, the rightmost ω becomes a finite ordinal. So we have a descending sequence of ordinals. This is true whatever ordinal we start with. The execution of either Input or Output always strictly decreases our ordinal, and any descending sequence of ordinals must eventually terminate. Therefore every machine in M eventually terminates.

But here's the important fact: to show termination we used the ordinal ω, and this required the axiom of infinity and some Set Theory. Instead we could encode the termination question, via Godel numbering, as a proposition of Peano arithmetic. If we do this, then we hit against an amazing fact. It can't be proved using the axioms of Peano arithmetic. So we have here a useful fact, not a contrived self-referential one, that can't be proved with Peano arithmetic.

Why can't it be proved using just the Peano axioms?A few years back, Jim Apple made a post about constructing (some) countable ordinals in Haskell. His construction nicely reflects the definitions a set theorist might make, but the code doesn't actually do anything. Later I learned from Hyland and Power how you can interpret algebraic structures as computational effects. apfelmus illustrates nicely how an abstract datatype can be made to do things with the help of an interpreter. Roughly speaking, doing this is what is known as operational semantics. So I thought, why not apply this approach to the algebraic rules for defining and combining ordinals. The result are the interpreters run1 and run2 above.

run1 gives an example of a Hydra game. In fact, its precisely the hydra game described in this paper because it always chops off the rightmost head. The Kirby-Paris theorem tells us we can't prove this game terminates using just the Peano axioms. A web search on Goodstein's theorem will reveal many great articles with the details.

A well-ordered quantity that you can keep decreasing as a program runs, and that can be used to prove termination, is an example of a loop variant. Loop variants are often natural numbers but the above shows that transfinite ordinals make fine loop variants. But in the interest of being fair and balanced, here's a dissenting view. The author has a point. If you are forced to use transfinite ordinals to show your program terminates, the age of the universe will probably be but the briefest flicker compared to your program's execution. On the other hand, if you don't want an actual bound on the execution time, ordinals can provide very short proofs of termination for useful programs.

ExerciseAlgebraic structures give rise to monads. Can you see how to generalise the definition of Machine to make it a monad? If you pick the right definition then the substitution of a for Done in the definition of + should give you a particularly simple definition of ordinal addition. (See this for a hint on how substitution works in monads.)

10 comments:

Another termination 'proof' is simply that all your code, aside from the IO in run1, is (nearly) valid Agda, or easily translatable to Coq, passing their termination checkers. And it pretty clearly doesn't contain any weird fiddling designed to exploit logical inconsistencies you've discovered in their type systems. :)

You can even go beyond this simple ordinal definition. If the definition here is O_1, then you can create an O_2 with the same definition, except that the limit constructor takes an O_1 -> O_2. And then you can create an O_3, with O_2 -> O_3. I once found a short paper by someone who'd tied all this up in an inductive-recursive definition in Agda, which roughly contained all of O_N similar to the above for every inductive-recursive ordinal N he could construct, which is pretty mind boggling. Unfortunately, his stuff seems to be inaccessible now.

On an unrelated note, is the second incompleteness theorem usually considered contrived (the statement it uses is self-referential in a way)? When viewed through Curry-Howard, it becomes a rather practical concern for computer scientists, at least. Many have interest in languages that are guaranteed to be terminating, but the ability to write a metacircular interpreter is also viewed as desirable. The second incompleteness theorem roughly translates into saying that languages (satisfying its premises) cannot do both.

Dan, whereas you are perfectly correct, I'm sure you are aware that the termination of the Agda and Coq systems themselves is proved by appeal to an ordinal construction to justify the correctness of the recursive calls. It is interesting to note that in Coq we traditionally appeal to the First Uncountable Ordinal in what may be the most blatant case of mathematical overkill since Graham's Number!

Of course the presence of circular arguments should not be surprising in a discussion about foundations :)

Thanks for this article, Dan Piponi, the interpretation of ordinals in terms of recursion is fascinating. I finally get around to ponder this, but I still don't quite understand what's going on.

One thing that bugs me a lot is that Machine is not an "injective" representation of ordinal numbers! There are many machines with different operational behavior that represent the same ordinal number. For instance, we have 3 + w = w or 2*w = w as ordinal numbers, but not as machines.

The culprit is the limit constructor Input. In terms of ordinal numbers, it represents the supremum of a sequence (:: Natural -> Machine) of ordinal numbers. But many operationally different sequences have the same supremum.

Worse, since we are now underestimating the output size for some machines, this might affect the proof that every machine in M terminates. It's probably fine, but still, there's something going on I don't fully understand.

@Dan Doel, thecod

Wait a moment, it's not clear at all whether this will pass Agda's or Coq's termination checker: the second argument to the run functions is an infinite list, a coinductive thing! Mixing recursion and corecursion recklessly is known to result in nontermination.

@Heinrich The equalities you point out aren't significant from an operational perspective. For example, consider 2w vs. w. The first is like second except that each time round the first loop you perform 2 inner loops. Inputting N to the first machine takes the same time as inputting 2N to the second machine. So if we're asking questions like "does this terminate for all input N?" the answer is the same.

Well, they are significantly different from an operational perspective: after all, they behave differently for different input. (The argument that one machine can simulate the other breaks down when considering for example 2*w vs 1+2*w: one machine always prints even while the other always prints odd numbers.)

The difference does seem to cease to matter when we're not interested in the full operational behavior, only in whether the machine terminates or not. This corresponds to mapping different machines to the same ordinal number. (Read the other way round, this means ordinal numbers don't necessarily have a unique, canonical machine associated to them, which is a bit at odds with the "promise" of your post). It's this additional step that I'm confused about.

Actually the use of coinduction here is perfectly safe, as we are just destructing a coinductive element and recursing on the element of machine (no corecursive element is being built). Anywho, it would be easy to replace the infinite list by a function f:Nat->Nat, and giving an additional Nat parameter to run2.