Monday, March 21, 2016

Agda is not a purely functional language

One of the more surprising things I've learned as a lecturer is the
importance of telling lies to students. Basically, the real story
regarding nearly everything in computer science is diabolically
complicated, and if we try to tell the whole truth up front,
our students will get too confused to learn anything.

So what we have to do is to start by telling a simple story which is
true in broad outline, but which may be false in some
particulars. Then, when they understand the simple story, we can point
out the places where the story breaks down, and then use that to tell
a slightly less-false story, and so on and so on, until they have a
collection of models, which range from the simple-but-inaccurate to
the complex-but-accurate. Then our students can use the simplest model
they can get away with to solve their problem.

The same thing happens when teaching physics, where we start with
Newtonian mechanics, and then refine it with special relativity and
quantum mechanics, and then refine it further (in two incompatible
ways!) with general relativity and quantum field theories. The art of
problem solving is to pick the least complicated model that will solve
the problem.

But despite how essential lies are, it's always a risk that our
students discover that we are telling lies too early -- because
their understanding is fragile, they don't have a sense of the zone of
applicability of the story, and so an early failure can erode the
confidence they need to try tackling problems. Happily, though, one
advantage that computer science has over physics is that computers are
more programmable than reality: we can build artificial worlds which
actually do exactly obey the rules we lay out.

This is why my colleague Martín
Escardó says that when teaching beginners, he prefers teaching
Agda to teaching Haskell -- he has to tell fewer lies. The simple
story that we want to start with when teaching functional programming
is that data types are sets, and computer functions are mathematical
functions.

In the presence of general recursion, though, this is a lie. Martín
has done a lot of work on more accurate stories, such
as data types
are topological spaces and functions are continuous maps, but this
is not what we want to tell someone who has never written a
computer program before (unless they happen to have a PhD in
mathematics)!

Agda's termination checking means that every definable function is
total, and so in Agda it's much closer to true that types are sets
and functions are functions. But it's not quite true!

Here's a little example, as an Agda program. First, let's get the mandatory
bureaucracy out of the way -- we're defining the program name, and importing
the boolean and identity type libraries.

So now we have two definitions of the identity function f,
and g, so they must be equal, right? Let's see if we can write a proof this fact.

wat:f≡gwat=refl

Oops! We see that refl is in red -- Agda doesn't think
it's a proof. In other words, two functions which are identical except for their name are not
considered equal -- α-equivalence is not a program equivalence as far as Agda is concerned!

This is somewhat baffling, but does have an explanation.
Basically, an idea that was positively useful in
non-dependent functional languages has turned toxic in Agda.
Namely, type definitions in ML and Haskell are generative
-- two declarations of the same type will create two fresh names
which cannot be substituted for each other. For example, in Ocaml
the following two type definitions

typefoo=A|Btypebar=A|B

are not interchangeable -- Ocaml will complain if you try to use
a foo where a bar is expected:

In other words, type definitions have an effect -- they create fresh type names. Haskell alludes to this fact with the newtype keyword, and in fact that side-effect is essential to its design: the type class mechanism permits overloading by type, and the generativity of type declarations is the key feature that lets users define custom overloads for their application-specific types.

Basically, Agda inherited this semantics of definitions from Haskell and ML. Unfortunately, this creates problems for Agda, by complicating its equality story. So we can't quite say that Agda programs are sets and functions. We could say that they are sets and functions internal to the Schanuel topos, but (a) that's something we may not want to hit students with right off the bat, and (b) Agda doesn't actually include the axioms that would let you talk about this fact internally. (In fact, Andy Pitts tells me that the right formulation of syntax for nominal dependent types is still an open research question.)

11 comments:

Fascinating post! I was going to complain that whilst Agda signatures/developments can definitely be seen as formulated internally to the Schanuel topos, Agda *programs* need not be seen in this light, but then I remembered that it is possible to define a fresh datatype inside a where-clause!

For what it's worth, in the Red JonPRL proof assistant we have implemented a nominal variant of Computational Type Theory, which can be seen as taking Stuart Allen's meaning explanation for CTT and running it internally inside the Schanuel topos. The correct syntax for this type theory is quite straightforward—we essentially use the abstract binding trees from PFPL, which have nominal characteristics built-in; I've written up the categorical semantics for this, since the precise semantics for the framework were not spelled out in PFPL. As far as the type theory is concerned, do not yet exploit the nominal character of our framework in any significant way, but it will be interesting to see what can be done in the future.

One other remark I meant to say is, I am not sure I would agree that this aspect of Agda makes it "impure". Impurity, as far as I (and Longley, and most people) am concerned is when you can define an operation that does not respect equality; Agda may fail to recognize extensional equalities in many cases, but I do not believe it is possible to actually *subvert* extensional equality (i.e. observe that it fails).

I would not consider datatype generativity to be a real "effect" in the usual sense. In my first work on ML modules (POPL'13), we called it a "static effect" in the sense that it somehow only takes effect in a kind of static prepass over the program, prior to execution. In particular, you do not want to view datatype generativity as causing a dynamic, computational effect. Zhong Shao did effectively (ho, ho) just this in his ICFP'99 paper, and as a result it meant that in his type system any datatype definitions in the body of a functor forced it to be generative rather than applicative, even though there's no good reason for that. My first real contribution to science, as a matter of fact, was pointing out to my advisors that Shao's treatment of datatype generativity, which matched their own intuitions about generativity as a computational effect, did not in fact match the semantics of OCaml (or Moscow ML). For example, in OCaml, you can have a "generative" definition of datatype T in the body of a functor F, but each application of F to the same argument X will give you the same type F(X).T. As we argue in the F-ing modules paper, this is exactly as it should be because signature ascription (sealing) by itself is not a true effect. A functor should only be treated as generative if the body has true computational effects (whatever those are).

Sorry, that was kind of rambling...I'm not sure what is the takeaway for teaching about datatype generativity. Generally, I feel this is a kind of tricky point to explain. There are good "engineering" reasons in practice for datatypes to be opaque rather than transparent, and there are times where this bites you, but it is not so easy to explain the matter directly in terms of effects.

I tried to address this issue (that equality is not structural) in the definition of PiSigma which is a partial language with dependent types (hence it is not pure because of non-termination). See http://www.cs.nott.ac.uk/~psztxa/publ/pisigma-new.pdf

I think this is a laudable but still probably too high standard for what pure functional programming should mean. You seem to be implying that a language is not a pure functional programming language unless all its programs can be treated, using the features of the language, exactly as mathematical functions. So while computationally, programs in Agda behave as pure (total) mathematical functions (their outputs are always defined and always completely determined by their inputs), the reasoning facilities of Agda don't let you equate extensionally equal functions. But I think this is not a fair criticism. You can just as well define a notion of extensional equality, and then functions will be identified if they are input-output equivalent. The fact that the standard propositional equality is not extensional just means that it is not intended for this kind of reasoning. I feel your criticism would be like objecting to the use of propositional equality for reasoning about co-recursively defined values. "Hey, equality is broken for these!" No, we all know you are supposed to use bisimulation for reasoning about streams and such. So I have to say I don't accept your argument that Agda is not a pure functional language. I do appreciate that you raise the point in an interesting way.

What about saying to students that Agda is incomplete—f = g is true in at least a model, just not provable. Indeed, postulating extensional equality is known to be consistent (though it does break canonicity of course), hence Jon Sterling (I guess) is right when claiming you can't subvert extensional equality:

> Agda may fail to recognize extensional equalities in many cases, but I do not believe it is possible to actually *subvert* extensional equality (i.e. observe that it fails).

How is declaring a new data type an "effect"? At best, it's a hidden tag on the type. A piece of data that has no effect on the structure other than distinguishing it from other types which are not tagged the same way.

The issue with f not being reflexively equal to g seems like an implementation quirk to me. I wonder if they would be equal if expressed via explicit case expressions. It's really more of an artifact that case-wise pattern match definitions are not done in any principled way in Agda. Which I suppose is a fair criticism.

Instead, proving equal functions f and g from the blog would require an eta rule on datatypes. Such a rule is false on datatypes with indexes. And it's not enough on structurally recursive functions. Even just eta on sum types is pretty tricky to implement.

> The issue with f not being reflexively equal to g seems like an implementation quirk to me.

Not quite.

> I wonder if they would be equal if expressed via explicit case expressions.

You can prove f and g are equal on all inputs (see fx≡gx below), by case analysis on the input, but you can't prove f and g are equal unless you *postulate* extensional equality for functions. This is a consistent postulate, but you can't derive it in Agda or in intensional Martin-Löf type theory.

fx≡gx : ∀ x → f x ≡ g xfx≡gx true = reflfx≡gx false = refl

> It's really more of an artifact that case-wise pattern match definitions are not done in any principled way in Agda.

That seems too harsh. The view from the left explains how you could reduce pattern matching to eliminators, even though that's not implemented.

I see that if you reduced to eliminators this particular problem would disappear.Still, the category of sets and functions has extensional equality and Agda and Coq don't—two different stable sorts are extensionally equivalent functions, but not equal.