Solving a Logic Problem with Coq

I’ve been playing around with the Coq proof assistant tool over the weekend. Here’s an interesting problem and its solution proven with Coq. I am by no means an expert in this. I’ve used Coq for less than 6 hours at this point. There are a lot of nicer tutorials out there, including the one on the Coq web site. But this is an interesting motivating example, and I’m writing to share my experiences.

The puzzle I’ll solve here comes from John Pratt’s page. A lot of supposed “logic puzzle” sites aren’t really logic puzzles at all, but just plays on words; but that one is okay. Here’s the puzzle:

When asked her 3 children’s ages, Mrs. Muddled said that Alice is the youngest unless Bill is, and that if Carl isn’t the youngest then Alice is the oldest. Who is the oldest and who is the youngest?

The first step is to start up Coq. In my case I’ve generally used CoqIde, which is a simple apt-get in Ubuntu. However, for this blog post I’ll use the command-line coq-interface in hopes of more portability. Let’s tell Coq a few of the basics of the problem, such as that there’s a domain of discourse, and it has three children in it named Alice, Bill, and Carl.

The last two lines declare unary predicates for the domain of discourse, indicating whether a child is the youngest or the oldest. Unfortunately, Coq doesn’t know some things that we’re assuming here. One such fact is that the three children are different from each other. So let’s tell it that.

Now that Coq understands the premises, we need to tell it what we are proving. That requires solving the puzzle. Simple enough: Alice is the youngest unless Bill is, which means Carl can’t be the youngest. But if Carl is not the youngest, then Alice is the oldest, which leaves Bill as the youngest. So the youngest is Bill, the middle child is Carl, and the oldest is Alice. Let’s tell Coq that.

Goal Oldest Alice /\ Youngest Bill.

And Coq responds by listing the goal, and all the hypotheses we may use to prove it:

In other words, we’re currently working on proving Oldest Alice, but once we finish that, we’ll need to prove Youngest Bill. Well, there’s only one good way to conclude Oldest Alice, and that’s to show that Carl is not the youngest. So let’s prove the current subgoal by applying Given2.

apply Given2.

I’ll stop quoting the entirety of Coq’s responses, but here’s the important part:

============================
~ Youngest Carl

So how do we know that Carl is not the youngest? Well, it’s because Bill is the youngest. That’s rule Superlative1, but we’ll also have to tell Coq to use Bill as the other child.

apply Superlative1 with Bill.

And Coq responds:

============================
Bill <> Carl
subgoal 2 is:
Youngest Bill

So we need to prove that Bill is not Carl, and then that Bill is the youngest. (That Bill is the youngest is also subgoal 3 left over from splitting the conjunction, a fact that we’ll think about shortly.) For now, it’s easy to prove that Bill is not Carl, because that was one of our hypotheses.

apply Diff2.

And Coq responds:

============================
Youngest Bill

Now it remains only to prove that Bill is the youngest. The way to go about that is to apply the first given.

apply Given1.

And Coq responds:

============================
~ Youngest Alice

This is the tricky part. The reason that Alice is not the youngest is that if she were the youngest, then Carl would not be the youngest, but then the second given statement would lead us to conclude that Alice is the oldest, which is a contradiction. Proof by contradiction, though, is generally a technique valid only in classical logic, not in the constructive logic that Coq typically uses. It turns out we won’t need to resort to classical logic for this theorem, but just to make things easier in our first pass we’ll go ahead and use it. This is one statement in Coq.

Require Import Classical.

So now having classical logic at our disposal, we can go about proving that Alice is not the youngest by applying Peirce’s law.

Note that the new hypothesis is the double-negation of “Alice is the youngest.” In classical logic, which we’re now using, that’s the same as Alice being the youngest, so we’ve essentially used Peirce’s law to start the proof by contradiction. Now we’ll go backward through the contradiction argument given above. First, Alice can’t be the youngest because she’s the oldest. She’s the oldest because Carl is not the youngest. Carl is not the youngest because we’ve assumed (to obtain the contradiction) that Alice is the youngest.

This is just a double negation: a rule from the Classical package that goes by the name NNPP. Once we double-negate the conclusion, it will look slightly different from the hypothesis H, but they will actually mean the same thing. (~A is just shorthand for A -> False).

apply NNPP.
exact H.

Coq says:

============================
Youngest Bill

So we’ve proven the left half of the conjunction: that Alice is the oldest. We now need to prove the right half. But wait! We already did that, as the second step to proving the left half. For now, we’ll just copy and paste the last part of the earlier proof.

Ah, we’re done. But it turns out we can improve the proof a little by using a few more Coq features: specifically, by proving some lemmas (or lemmata, if your prefer), and by eliminating the use of classical logic (since this happens to be a constructively valid statement; not all classically true statements will be, though).

First, although Peirce’s law ((~P -> P) -> P) is not true in constructive logic, but we can prove (P -> ~P) -> ~P. Let’s do so, by introducing a lemma, and then by using the automatic prover.

Lemma NPeirce : forall P : Prop, (P -> ~P) -> ~P.
auto.
Qed.

In this case, the automated theorem prover worked fine. We could have proven the theorem by hand, as well (by treating the ~P as P -> False, using the intro tactic, applying the modus ponens rule, and then easily satisfying all of the hypotheses). Using this, we can simplify our proof of Youngest Bill. The process is the same, but instead of applying Peirce’s law and the beginning and the double-negation at the end, we apply our new NPeirce lemma at the beginning, and we don’t need double-negation at the end.

Sure, but this misses the point: it tells us a lot about the simplicity of the example rather than anything particularly compelling about either Prolog or Coq. First of all, the point of Coq is not to fully automate proof, although the firstorder tactic does everything that Prolog’s resolution does, and more. Secondly, precisely because the Calculus of Inductive Constructions is more expressive than the decidable fragment of first-order logic that Prolog treats, cdsmith had to make certain things that are implicit in Prolog explicit, e.g. that the children are members of a set, that these members are disjoint, that if “youngest” or “oldest” are true of one then they are not true of another, that “youngest” and “oldest” are opposites, and so on. But once these non-problem-specific things are encoded, the actual problem encoding—Given1 and Given2—are direct from the problem description, and firstorder does indeed completely automate the proof.

But again, you should ignore most of that, because all it shows is that Coq does at least as much as Prolog. What Coq is really for is helping people prove things that can’t be automatically proven at all, e.g. the four-color map theorem, or that a compiler for a programming language correctly compiles all possible source programs into their semantically-identical respective target programs.

Sorry, yeah, I should have explained the transcript better. I’ll try to be brief. Basically, Coq is a proof assistant using the natural deduction style. Given a set of definitions such as the Variables and Hypotheses in this problem, when you say what it is that you want to prove, Coq shows you the current context above the “=========” line, and the current (sub)goal below that line. Given the context, the Coq standard libraries, and possibly third-party libraries if necessary, you apply what are called “tactics” in an effort to prove, or at least simplify, the current subgoal, and you repeat until you don’t have any subgoals left. What this process is literally doing is constructing a term in Coq’s logic, the Calculus of Inductive Constructions, piecemeal. This term is the actual proof. The stuff you, the user, are typing in is more properly called a “proof script.” So one thing cdsmith and I have shown is that there may be more than one proof script that will successfully construct a proof for a goal, and some of these scripts will be shorter than others. :-)

When cdsmith said “Goal Oldest Alice /\ Youngest Bill.” s/he was literally saying “Here’s a theorem I want to prove.” “Goal” essentially introduces anonymous theorems, so that’s where “Unnamed_thm” came from. S/he could also have said “Theorem OldestAliceAndYoungestBill Oldest Alice /\ Youngest Bill” and the prompt would have become “OldestAliceAndYoungestBill < ” instead. When you say you want to prove something to Coq, it goes into proof-editing mode and the prompt changes to reflect what you’re proving. This is the context in which you ask Coq to apply tactics in an attempt to simplify (away) the current (sub)goal. Coq has a lot of built-in tactics, and many, many more are available in libraries, but again, this is an extremely simple puzzle, so the built-in ones are enough. cdsmith is a newcomer to Coq, so s/he used the most basic tactics that all of the introductory materials teach you about (intro, split, apply, exact…) and some understanding of both classical and intuitionistic logic variants of Pierce’s law. What I did was to pull out one of Coq’s sledgehammer tactics, firstorder, which implements a very powerful first-order proof-search procedure, the basis of which is documented in Pierre Corbineau’s paper, First-order reasoning in the Calculus of Inductive Constructions. So the firstorder tactic does automatically what cdsmith did somewhat more manually, although it’s worth emphasizing that both cdsmith’s proof script and mine generate a term in the Calculus of Inductive Constructions that is quite a bit larger than either script. :-) You can see the term by asking Coq to “Print Unnamed_thm.” after it is defined.

This, by the way, is another difference between Prolog and Coq: typically in Prolog, you just want the result(s) of the resolution, and in fact one of the nice things is you get to ask “Who is oldest and who is youngest?” In Coq, you have to say what you want to prove (“I intend to prove that Alice is oldest and Bill is youngest”), but you don’t just get True/False, you also get the proof. When you’re just doing logic, this probably doesn’t matter much, but one of Coq’s most interesting applications is that you can develop software with it that is proven correct, and then literally extract the code from the proof, in any of Scheme, Haskell, or OCaml. This code is “correct by construction” with respect to its specification in Coq, and in Coq, specifications can be extremely strong, again thanks to the expressive power of the Calculus of Inductive Constructions. A nice example of this is this implementation of Finger Trees in Coq.

By the way, I hope you don’t believe I’m attempting to downplay the power of Prolog here—my point really is more about the simplicity of the puzzle. My copy of “PROLOG Programming for Artificial Intelligence,” 3rd edition, is about 6″ away from my right elbow, and I recently got ODBC working on my Mac OS X box, which means that I can play with FLORA-2‘s Persistent Module package. FLORA-2, in turn, relies on the XSB Prolog system. I’m a huge fan.