Saturday, July 14, 2007

In Gödel, Escher, Bach, Hofstadter introduces the programming language Bloop. Bloop is a little like BASIC except that it forces programmers to specify in advance how many times each loop will iterate. As a result, Bloop programs have the handy property that they are guaranteed to terminate. Unfortunately this property also makes it impossible to write something like an operating system in Bloop. To write an OS we need open-ended loops that keep running until the user explicitly chooses to shut the OS down. One solution to this is to write code in Floop. Floop allows us to write unbounded loops, the equivalent of C's while (1) { ... }. The problem with that, however, is that we can write runaway infinite loops that never terminate and never give us any output. Is there some language that lies between Bloop and Floop that can give us unbounded looping when we need it, but which never allows us to hoist ourselves by our petards by writing runaway loops?

Wishing no disrepsect to OS writers, at first blush it might seem that the distinction between a runaway loop and an idle OS is too fine - if we can write an infinite loop that does something useful, then surely we can write a useless one too. But it turns out that there is a very elegant and well-principled way to distinguish between these kinds of loops, and this allows us to write open-ended interactive software in a programming language that nonetheless always produces a well-defined output, no matter what the input. In order to do this we need to distinguish between two kinds of data: data and codata. By ensuring that a function expecting codata never receives data, and vice versa, we can ensure that even programs with open-ended loops always produce a well defined output.

The concepts I want to talk about are very general and can apply to whatever programming language you use. I'm going to use some simple Haskell examples but most of these will translate to other languages. So consider something like

sum [] = 0sum (a:as) = a + sum as

This sums the elements of a list. Note how it's well behaved as long as we give it a finite list as input. On the other hand, consider

sum' [] = 0sum' a = sum' (1:a) - 1

This isn't well behaved at all. Except when you input empty lists, it never gives a result. From a mathematical perspective it's not a good definition either, there are many functions that satisfy these two properties. Is there some general principle at work here that allows us to tell immediately that one of these terminates and the other doesn't? We know from Turing that there is no procedure that guarantees we can always tell such programs apart, but in this case there is something that we can easily point to. In the first program, the right hand side of the second line uses the sum function recursively but we only apply it to a strict subpart of the input, its tail in fact. In the second example we apply sum' to something that contains the argument. The former function is using what is known as structural recursion, and it's not hard to convince yourself that structural recursion, applied to finite datastructures, always terminates.

So if we limit ourselves to structural recursion we can guarantee our programs will always terminate. But what about a definition like:

fact 0 = 1fact n = n * fact (n-1)

That doesn't appear to use structural recursion. But we can view it as such like this. Define the natural numbers as follows:

data Nat = Zero | S Nat

0 is represented as Zero, 1 is represented as S Zero and so on. We can represent every natural this way. Here's the important thing: if n>0 then n-1 is simply a subpart of n. So we can view this kind of recursion as a type of structural recursion.

(In fact, by a curious quirk of the Haskell 98 standard we can rewrite our definition to look more like a structural recursion:

fact' 0 = 1fact' (n+1) = (n+1) * fact' n

I'm guessing this feature is in Haskell precisely so that people can 'pretend' they are using structural recursion with the + in n+1 serving a role as a kind of pseudo-constructor.)

So we have a nice rule for ensuring our code terminates. But sum fails when applied to infinite lists. Should we simply rule out infinite datastructures? That seems a bit drastic. The example that convinced me to look into Haskell was

fib = 1 : 1 : zipWith (+) fib (tail fib)

We really don't want to rule out such a succinct definition of the Fibonacci numbers. But how can we allow such structures when we have functions like sum sitting around. Applying sum to fibs will obviously cause non-termination.

Let's consider another example:

sumSoFar x [] = [x]sumSoFar x (y:ys) = x : sumSoFar (x+y) ys

Like sum, this fails to terminate for an infinite input. But unlike sum, it's possible to make sense of it. If the inputs were 0 and the infinite list [1,1,1,1,...] then the result would be [0,1,2,3,...]. The program might not terminate, but from a mathematical perspective this is a completely well defined function. What's more, suppose the input list represented a stream of data being input at a keyboard, and that the output was displayed on a screen one element at a time, then we'd have a simple cash register. This program might not terminate, but it's completely well behaved. Note that this could only work in a lazy language. A strict language would want to evaluate the entire list before outputting anything. But in a lazy language we can start outputting the beginning of the list before the rest of it is computed.

From the point of view of using infinite lists, it's sum that's badly behaved, and sumSoFar that's well behaved. Again, can we point to a simple distinction between these two programs that explains this? Turns out we can, and in some sense it's dual to what we said above. sumSoFar is well behaved because when we recursively call sumSoFar on the right hand side we do so from inside a list constructor. (Remember that : is the list constructor.) This is known as guarded recursion and it guarantees that even though our programs don't terminate, they still define a unique mathematical function and result in a well behaved program. In the case of sumSoFar, each time we look at another element of the result, we trigger another lazy evaluation that terminates. But the right hand side won't just run off and compute endlessly until we do that triggering because the recursion is 'guarded' within a constructor. (That, by the way, was the main point of this article, so you can probably relax now.)

Note the duality: in structural recursion we 'deconstruct' the argument and then we're allowed to recurse. In guarded recursion we recurse first, and then we're allowed to use the constructor.

So we've almost achieved our goal of describing rules to allow open-ended loops because we've managed to give a rule for writing functions that are guaranteed to be well-behaved even though they manipulate infinite data structures. But we're not quite home yet - we still can't have functions like sum coexist with infinite lists. How can we ensure that an infinite list is never handed to sum?

Consider the definition of Haskell lists. It's something like this:

data [a] = [] | a : [a]

I.e. a list of a's is either the empty list [] or it's made from an a and a list of a's.You can think of this as an equation in [a]. In Haskell we take this as uniquely defining what [a] is, but in reality there is more than one solution to this equation. Consider the type consisting of only finite lists. That satisfies this equation. A finite list is either an empty list, or an element followed by a finite list. Similarly a possibly infinite list is either an empty list, or an element followed by a possibly infinite list. There is an ambiguity. Finite lists form, what is in some sense, the smallest possible solution to this equation. The possibly infinite lists form the largest possible solution. Haskell takes the largest possible solution.

But suppose we were to distinguish between these two different solutions. We could use the keyword data to mean the smallest solution and codata to mean the largest solution. The former represents data, and it's always finite. The latter represents what we call codata, and it's possibly infinite. And now we can refine our rules for well-behavedness. Consider data and codata to be distinct types. In a strongly typed language this immediately allows us to restrict sum to data, not codata. The rule is: you're only allowed to use structural recursion with data and guarded recursion with codata. With that rule, we're guaranteed that our recursions will always be safe, and yet that we can still have open-ended loops in our code. Sometimes these are called recursion and corecursion respectively.

And now we can go a little crazy with language. When we want to prove that a structurally recursive program terminates we use induction. This doesn't work straightforwardly for corecursion, so instead we use a principle called coinduction. Recursive programs typically terminate. Corecursive programs don't necessarily terminate, but they're still well-behaved as long as they keep on going whenever we give them input. We can call this cotermination. And so on... I'm not going to say what coinduction is because I'd have to talk about bisimulation, and this post would get way too long.

So now a mathematical aside. The reason for all the "co"s is that data and codata have categorical descriptions and they turn out to be dual to each other. But you don't hear mathematicians talking about corecursion and coinduction much. Why not? Well one of the axioms of set theory is the Axiom of Foundation. One way of interpreting this is the statement that there is no infinite sequence

...a3∈a2∈a1∈a0.

So even though we can construct infinite lists in mathematics, we can't construct 'infinitely deep' lists. This means that in mathematics we can use a form of structural recursion. In fact, the familiar principle of induction follows from this. So for many of the things that mathematicians do, induction works fine. But if we decide to use a non-standard variation of set theory where the axiom of foundation doesn't hold we can no longer use recursion. For example the Axiom of Extension says that two sets are equal if their elements are equal. This is a recursive definition, but it's useless in the presence of a set a such that a∈a. At this point mathematicians need a principle of coinduction. And for more on that, I refer you to Vicious Circles.

Oh...time for a quick rant. Over the years I've seen a few people argue that there's something fundamentally wrong with the notion of the algorithm because it doesn't apply to the kind of open-ended loop we see in operating systems and interactive applications. Some have even gone further to suggest that somehow mathematics and computer science are fundamentally different because mathematics can't seek to describe these kinds of open-ended phenomena. As I've tried to show above, not only are there nice ways to look at open-ended computations, but from a mathematical perspective they are precisely dual, in the category theoretical sense, to terminating computations. It may be true that mathematicians sometimes spend more time with things, and computer scientists with cothings. But this really isn't such a big difference and the same langiage can be used to talk about both.

I learnt all this from a variety of sources including Turner's paper on Total Functional Programming. My original motivation for trying to understand this stuff came from this post on Ars Mathematica. Curiously, that post had a significant effect on me. It made me realise there was this entire co-universe out there that I knew nothing about, and I was forced to make a real effort with learning Haskell because Haskell was the programming language closest to the mathematical notation used to talk about this co-universe. (Thanks Walt, but sorry to everyone who wants to see less Haskell here!)

I didn't really get why anyone would want to distinguish between data and codata until I saw some clues in some slides by Altenkirch and some comments by Conor McBride. I'm still not sure anyone says it quite as clearly as this: distinguishing between data and codata means we can allow the coexistence of infinite lists, structural recursion and open-ended loops, without risk of causing bad behaviour.

Is Haskell's type system powerful enough that (with a rewritten Prelude etc, but without changes to the core syntax) it could be be made to distinguish data and codata, and force coders to use them in correct ways?

Having a compiler distinguish between data and codata is straightforward. I think it's probably easy to have compilers recognise structural and guarded recursion - but I'm not sure as I don't know what a full set of rules would look like. For example I don't know what happens when you start trying to "doubly recurse" over both data and codata. There's probably a paper on this somewhere.

There is something that is not yet clear to me : in the wikibook about the denotational semantic of Haskell it is written that the infinite list is a least upper bound. So, it is not the largest solution.

So, it looks like that due to lazy evaluation there is something special in the category of Haskell program and that the coalgebras are also algebras. But, I am not sure of it.

Essentially structural and guarded recursion are the same as cata- and ana-mophisms which are the same as folds and unfolds in F-algebras and F-coalgebras. But there may be some more details needed to make this precise.

My understanding is probably wrong, but I may not be the only one, so I'll just ask: the LFP and GFP of Haskell's initial algebra data types is (I thought) different from what you're describing:

I.e. a list of a's is either the empty list [] or it's made from an a and a list of a's.You can think of this as an equation in [a]. In Haskell we take this as uniquely defining what [a] is, but in reality there is more than one solution to this equation. Consider the type consisting of only finite lists. That satisfies this equation. A finite list is either an empty list, or an element followed by a finite list. Similarly a possibly infinite list is either an empty list, or an element followed by a possibly infinite list. There is an ambiguity. Finite lists form, what is in some sense, the smallest possible solution to this equation [i.e. least fixed point solution? --DDW]. The possibly infinite lists form the largest possible solution [i.e. greatest fixed point solution? --DDW]. Haskell takes the largest possible solution [really? --DDW].

I thought the LFP (returned by fix or induced by Fix) was not a finite list but a possibly finite list, and that the GFP was not a possibly infinite list but a truly infinite list (aka stream).

According to my (apparently wrong) understanding, the type PFL (short for Possibly FiniteList)

> data PFL a = Nil | Cons a (PFL a)

(as the least fixed point solution that Haskell generates) is inhabited by both

> x, y :: PFL Int> x = Cons 3 Nil> y = Cons 3 y

and that you'd have to ditch the Nil constructor to force the GFP solution using the implicit LFP meaning that Haskell imposes on data.

If I've been thinking about this wrong all this time, I'm going to kick myself.

Anyway, I definitely mean finite, and possibly infinite for the least and greatest fixed point respectively. As Haskell has no means of checking for infinite datatypes, and no way to stop you building them, it always builds codata.

In your example, PFL could equally be a data or codata types. Both x and y could be codata. But only x could also be data.

I'm not sure exactly what point you're stuck on so do ask further questions if needed.

OK, I think I see where I went wrong. According to Meijer et al, "Functional Programming with Bananas, Lenses, Envelopes, and Barbed Wire", p. 2:

"Working in the category SET...means that finite data types (defined as initial algebras) and infinite data types (defined as final co-algebras) constitute two different worlds....Working in CPO has the advantage that the carriers of initial algebras and final co-algebras coincide, thus there is a single data type that comprises both finite and infinite elements. The price to be paid however is that partiality of both functions and values becomes unavoidable."

So Haskell does not distinguish between data and codata. I guess a summary of your post is that it could have, the design choice of types as objects of CPO instead of SET not being a predestined design choice, and that there is use for a language with types in SET.

I guess where I went wrong was that everywhere you read about the initial F-algebra that Haskell types use, but (almost) nowhere do they come out and say that these are simultaneously final F-coalgebras. The role of "undefined" in Haskell (that it is not necessary, but does enable finite and infinite to cohabit) seems pretty important to be kept such a secret.

Is this very topic why you named your blog sigfpe, because embracing the undefined allows you access to the "Neighborhood of Infinity"?

I read some of the banana paper about a 18 months ago. Most of it made perfect sense but I hadn't the faintest clue what it was talking about in that sentence you quoted. It's funny returning to that paragraph now, after having made sense of the data/codata distinction, and seeing that it makes sense after all.

"there is use for a language with types in SET"

That would be a total functional language, ie. one where a 'function' has its proper set-theoretical mathematical meaning and has a sensible value for every element of its domain. But it wouldn't be Turing complete so that'd be a big design decision.

As for the name sigfpe. Years ago I was jealous of people who owned domains whose names were the names of Unix signals. So I did some searching and found that sigfpe.com and grabbed it. Having said that, I can make up all kinds of post hoc rationalisations about why it's appropriate, including some along the lines you suggest. And one very down-to-earth explanation is simply that when I was working at Cinesite I was the guy who quadrupled the speed of large amounts of code simply by noticing that under Irix, the CPU was spending vastly more time servicing SIGFPE exceptions than anything else. It seemed to fit in a bunch of ways, so I kept it.

Wow!!! This article (and reading the article about Total FP) gave me another idea:

That even the data and codata are exactly the same thing, not just duals of each other, because we can infer equally on both. The truth is that no inference of one can be proven valid without making some inference on the other.

As a consequence any axiomatic in a datasystem cannot be said valid aslong as we have not defined their associated coaxioms, and a system (like any programming language, should it be functional or imperative) cannot be complete as long as this is not true.

If the Turing machine is complete and can be defined using finite states and finite sets of inference rules, it can only exist because it is associated with a comachine. If you look closely at the Turing machine, it is complete not because of these finite states or finite inference rules, but because of the existence of the infinite band on which it operates and retreives/stores its data and instructions. The fact that the Turing complete system uses a simple infinite band (i.e. an infinite list) as its internal comachine is the reason why it works, but even this comachine operates with finite rules, using the infinite processing capabilities of the machine operating it (despite this machine has finite states and finite rules).

This consideration extends to the type system itself: this makes me think that type instances and types are no different, or equivalently, that elements and ur-elements in a set theory are also not different: neither of them can exist without the existence of the other.

Take for example the ur-element 'x' in a set theory, its dual is the infinite set of sets that can contain it. The concept of inclusion of sets cannot live without the dual concept of membership for ur-elements.

Why not unifying all this as a required condition for validity? Suppose that an algorithm can't be proven valid, or a predicate can't be proven to be true or false in a finite-time using a Turing machine, why dowe need to say they are undecidable?

For me, we don't need this, and what we really want is a complete system where inference is still possible. We can just build the complete system by stating that "undecidable" maps to "true" (and also its dual system where undecidable maps to "false"). So the twosystems coexist, work within the same complete universe where both "true" and "false" live simultaneously. We don"t need to artificially split the universe into a real universe made of finite elements and its dual couniverse made of infinite elements, because neither of these two can exist isolately.

Let's think about the type-system as a whole, a simple finite element like 0 (or 2), termed "ur-element" in set theories, does not exist without its coelement: the list of sets where it is a member is infinite, each of these sets being in fact a "type".

What we are infering with typesystems with notations like <e,t> is too restrictive, because the type "t" has been restricted into a first-order language.We'd like to be able to replace *freely* either e (the ur-element) or t (its type) by assigning them any other type u.

For example: allowing inferences on <e,<t,u>> instead of just <e,t>, for any "meta-type" u, and then allowing u as a possible member of our universe. And for the system to be complete, all "meta-"levels (or higher order levels) are part of their effective co-type.

<e,t> is then just a restriction of the type of e,and if you want it to be complete (per the Turing analogy) you must also admit that e is just one finite abstraction of an infinite set of <e,t> which is also just a particular realization of <e,<t,u>>, as well as <e,<t,<u,v>>> and so on...To make the bracketed notation easier to handle, why do we need to restrict to the pair only form, when we could use the concept of possibibly infinite tuples?

In this case, the ur-element 'e' just needs to become a productive dual synonym to the infinite set of elements:{ <e>, <e,t>, <e,t,u>, <e,t,u,v>... }where each member of this sets just belongs to a n-order language, where n is the size of the tuple.

In the complete system, e exists only because of this dual set that is also a member of the same complete system. Let's not differenciate them, because when viewed from the Universe, they are of the same "type" (i.e. any inference on them can be made equally and proven at every level of language, the n-order language being just a projection, i.e. a restriction of the universe, exactly like the ur-elements that are part of this universe). In this case, the functions operating on elements in the universe are also members of this complete Universe.

Currying is also a partial view of what functions are really describing, and thinking about them with just this model is necessarily incomplete (meaning that prooving their validity becomes impossible with this restriction)

So what do we need? just an infinite set of valid recursion rules, where all languages and meta-languages can be defined with simple finite sets of inference rules.

This is possible! Because the infinite set of inference rules cannot exist without its dual set of finite rules defining the language in which the inference rules are specified for each n-order language! We don't need to prove that these rules are valid (in fact we can't because we would return to the dual reasoning system). So let's choose it ARBITRARILY in a way that works for our common logic.

So let's just build our complete system by a finite set of rules governing the inference rules used in each n-level machine. And then let's say that a assertion in level 'n' is prooven if it can be proven that its dual assertion in the type system at level (n+1) is infinite and undecidable. To prove that the dual assertion at level (n+1) is undecidable, we then just need to prove that it is decidable at the level (n+2). This will not prove anything for the level (n) but it will make it *productive*.

In other words, we won't be able to prove every theorem in a complete system if we just think in terms of data and codata (or ur-elements and their types which are only particular sets in which these ur-elements are members).

Let's just introduce cocodata (reasoning at level n+2) and say that to prove something at level n (starting at level 0 for ur-elements like simple constants) we can arbitrarily work at level 2,or 4...

It immediately appears the even/odd structure of this model, which is equivalent to the left/right arbitrary distinction in the <e,t> notation used in strong-type systems.

As a consequence, any strong type system limited to only one level of abstration of types will fail to be complete, so that it will be impossible to prove anything, but if we admit the existence of aninfinite number of meta-languages, then what is a type (or codata) at level n is just an ur-element (or data) at level (n+1) where it also has its own type.

So any replacement of <e,t> by <e,<t,u>>, rewritten <e,t,u> where u is an arbitrary type becomes valid. Then if we can safely ignore t in the triplet <e,t,u> and just infer on <e,,u> we can prove many things. Let's make a language where 'e' (ur-elements or data) and 'u' (meta-types) are the sameand can be part of the same valid sets.

The only problem for us is to define a minimum set of rules for allowing this such that it does not violates our initial axioms at level 0.

To do that, we must be able to define more strictly what are the axioms of our level 0, i.e. what are its primitives, for the system to be productive. To be productive, means that it canbe implemented in a finite-state Turing machine (without needing to represent what its I/O band works which is another blackbox just working based on contracts on what each of its storage cell can do with our limited Turing engine).

The basic Turing machine operates on a band that just has black or white cells: it jut uses this bit of information from the band and a finite number of bits in its internal state to determine a limited numner of actions that will change zero of more of its internal bit states, or modify the bit of information in the I/O band or move the band. This can be modelized by the simple concept of lists (for the band), and two ur-elements, each one being the codata of the other.

So the only thing that we need is a set of inferences between an empty list (nil) and non-empty lists, given distinct types. nil is of type Nil, non-emty lists means the type Colist.

If 0 is given type Bit, and 1 is dual to 0, then 1 is also of the same nature as types like Bit, a complete Bit type needs to include it, when excluding 0 as its meaning. This can be made by partitioning arbitrarily like this:

and then saying that the "bit" is complete only through this dualityThen in dual formulas we'll have pairs like:<0,Bit> or <1,Bit>whiche can be freely replaced by inferences of:<0,Bit,list of Bit> or <1,Bit,list of Bit>and then the medial elements in those tuples removed or changed arbitrarily like:<0,Nat,non-empty infinite list of Bit> or <1,Nat,non-empty infinite list of Bit>

(the third term of these pairs are blackboxes, like the Turing I/O band, it just models a band whose current position is reading as 0 or 1 depending on which of the triplet we speak about.)

The interesting thing is that "Nat" in the triplet is freely replacable ccording to the properties of the band, but not according to what it actually contains. This is analogous to the infinite numberof ways to represent a single bit on the band as a list of bits of arbitrary values or lengths (even if the length on the I/O band is finite, what is stored ater it does not matter and we can as well extend the band indefinitely by padding the same arbitrary default bit on it, so that we can make inference without knowing what is actually wrriten on those places of the Turing I/O band.

Now we need a language where we can assign static properties to the relations that link co-codata to the associated data, so that they respect the axioms on data and their associated inference rules. To check that these relations are valid, we can use normal inference because every thing is in a finite state, there's a finite number of rules and a finite number of axioms for the root level 0. If we can prove it, then we have proven that the inference rules used at level 1 are productive, even if they are undecidable, but the system will remain productive and usable, despite of the quite arbitrary choice of rules for thinking at level 2.

What do you think about this idea? Why not thinking about a language that permits such arbitrary definition based on productivity rather than decidability? Shouldn't it have the same power and completeness as a Turing machine?

For now Haskell just works on data and codata in a limited way, through its strong type engine, using currying as its only inference rule for handling the distinction between data and codata, but not permitting the enforcement of type rules between the the first parameter and the third in a function. Is we add such rule enforcement, then all we need to prove something with it, is to be able to rewrite our algorithms with such constraint.

So we need a subset of Haskell where no provable function has a type signature with less than 4 types, the 1st one being constrained by the 3rd, and the second one being constrained by the 4th, and a production rule that allows transforming 2 parameter functions into 4-parameter functions (more powerful than just currying which gives no constraints).

Example:the type signature would become verifiable:a->b->c->d provided that there exists an fixed inference/production rule between a and (a->b), and between c and (c->d), and this 4-parameter function, if it is provable computationally, would become a proof that the same function operating on a->c (totally curried as it takes one parameter and returns 1) is equally valid and provenby demonstating it in a constrained typesystem where all parameters are in the form a->b where b is the dual of a, except the last one, the function returning always an odd number of inputs and returning an odd number of values, the even parameters being always in the dual (necessarily infinite) space of the preceding parameter.

In addition, it would deprecate the separate specification of types and input/output variables: they would become part of the same expression.

which can be proven by currying it, or by replacing types by infinite lists and remapping for example the inference rules such as "recursion on infinite data tails" in classic programming, by "type-inference on finite type heads" under the new representation (but still having the possibility to recurse on tail)

The additional inference rules (in the stricter type-system of the language) enforced during the generalization is the type compatibility of all variables at even position, and the type compatibility of all variables at odd positions, so that it does not break the dualities needed for completeness, plus normal inference rules for proving the currified subfunctions with odd numberof parameters) by reducing them to axioms of the generalized but stricter type system.

Interesting (re)reading. Once I attempted (kind of succeeded) to implement ZFC in Java; this kind of reasoning would make my attempt more succinct, I believe.

Are you intentionally hiding the notions of algebra and coalgebra under more innocent notions of data and codata? This way it must be more digestable by the public, but maybe it's time (okay, it's 2011 now) to get out of closed with our functors, natural transformations, adjoints, monads etc?

I don't see program termination as all that important. In an operating system, you want an event-based system that handles interface signals, updates internal data, and reflects the changes externally. The same goes for many programs that run in the OS. The only time something should halt is if a program is designed to perform exactly one calculation and end, returning the result, or if an event-driven program is sent a signal to stop.