Monday, March 21, 2016

One of the more surprising things I've learned as a lecturer is the
importance of telling lies to students. Basically, the real story
regarding nearly everything in computer science is diabolically
complicated, and if we try to tell the whole truth up front,
our students will get too confused to learn anything.

So what we have to do is to start by telling a simple story which is
true in broad outline, but which may be false in some
particulars. Then, when they understand the simple story, we can point
out the places where the story breaks down, and then use that to tell
a slightly less-false story, and so on and so on, until they have a
collection of models, which range from the simple-but-inaccurate to
the complex-but-accurate. Then our students can use the simplest model
they can get away with to solve their problem.

The same thing happens when teaching physics, where we start with
Newtonian mechanics, and then refine it with special relativity and
quantum mechanics, and then refine it further (in two incompatible
ways!) with general relativity and quantum field theories. The art of
problem solving is to pick the least complicated model that will solve
the problem.

But despite how essential lies are, it's always a risk that our
students discover that we are telling lies too early -- because
their understanding is fragile, they don't have a sense of the zone of
applicability of the story, and so an early failure can erode the
confidence they need to try tackling problems. Happily, though, one
advantage that computer science has over physics is that computers are
more programmable than reality: we can build artificial worlds which
actually do exactly obey the rules we lay out.

This is why my colleague Martín
Escardó says that when teaching beginners, he prefers teaching
Agda to teaching Haskell -- he has to tell fewer lies. The simple
story that we want to start with when teaching functional programming
is that data types are sets, and computer functions are mathematical
functions.

In the presence of general recursion, though, this is a lie. Martín
has done a lot of work on more accurate stories, such
as data types
are topological spaces and functions are continuous maps, but this
is not what we want to tell someone who has never written a
computer program before (unless they happen to have a PhD in
mathematics)!

Agda's termination checking means that every definable function is
total, and so in Agda it's much closer to true that types are sets
and functions are functions. But it's not quite true!

Here's a little example, as an Agda program. First, let's get the mandatory
bureaucracy out of the way -- we're defining the program name, and importing
the boolean and identity type libraries.

So now we have two definitions of the identity function f,
and g, so they must be equal, right? Let's see if we can write a proof this fact.

wat:f≡gwat=refl

Oops! We see that refl is in red -- Agda doesn't think
it's a proof. In other words, two functions which are identical except for their name are not
considered equal -- α-equivalence is not a program equivalence as far as Agda is concerned!

This is somewhat baffling, but does have an explanation.
Basically, an idea that was positively useful in
non-dependent functional languages has turned toxic in Agda.
Namely, type definitions in ML and Haskell are generative
-- two declarations of the same type will create two fresh names
which cannot be substituted for each other. For example, in Ocaml
the following two type definitions

typefoo=A|Btypebar=A|B

are not interchangeable -- Ocaml will complain if you try to use
a foo where a bar is expected:

In other words, type definitions have an effect -- they create fresh type names. Haskell alludes to this fact with the newtype keyword, and in fact that side-effect is essential to its design: the type class mechanism permits overloading by type, and the generativity of type declarations is the key feature that lets users define custom overloads for their application-specific types.

Basically, Agda inherited this semantics of definitions from Haskell and ML. Unfortunately, this creates problems for Agda, by complicating its equality story. So we can't quite say that Agda programs are sets and functions. We could say that they are sets and functions internal to the Schanuel topos, but (a) that's something we may not want to hit students with right off the bat, and (b) Agda doesn't actually include the axioms that would let you talk about this fact internally. (In fact, Andy Pitts tells me that the right formulation of syntax for nominal dependent types is still an open research question.)

Thursday, March 17, 2016

Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. It has been applied successfully in a wide variety of problem domains thanks to its "sweet spot" combination of expressivity, optimizability, and declarativeness. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to track monotonicity via types.

I've always liked domain specific languages, but have never perpetrated one before. Thanks to Michael, now I have! Even better, it's a higher-order version of Datalog, which is the language behind some of my favorite applications, such as John Whaley and Monica Lam's BDDBDDB tool for writing source code analyses.

You can find the Github repo here, as well. Michael decided to implement it in Racket, which I had not looked at closely in several years. It's quite nice how little code it took to implement everything!