[This is the first part of a two-post series. The second part is here.]

In previousposts on this blog, I have talked about how research on programming languages and tools can contribute to K-16 education. In this post, I will share with you some of my experiences while trying to use a PL tool — specifically, a semi-automated program verifier — in an undergraduate algorithms class. The experience was a success in many ways. A program verifier gives students a hands-on understanding of how algorithms are mathematical objects, and verification tools are now mature enough for productive classroom use. I think every CS program would benefit students by introducing them to these tools at some point. At the same time, the experience exposed me to some serious limitations of current-day program verification, and opened up a number of directions for new research in this area.

Teaching proofs with Dafny

The undergrad course that I teach at Rice is called “Reasoning about algorithms”. In many ways, the class is a standard-issue algorithms course: it teaches algorithm design techniques like dynamic programming, greedy optimization and randomization, and elementary complexity theory. But a distinctive feature of the class is its heavy emphasis on proof techniques. I want my students to develop an intuition for rigorous thinking, and learn different ways to formally reason about the correctness and efficiency of algorithms.

One thing that I have wanted to do, ever since I took on this course, is to teach proofs through appropriate PL tools. After all, reasoning about programs is one of the main preoccupations of PL research. The tools that best fit my class are systems for semi-automated program verification. These systems allow programmers to write propositions and proofs just like they would write programs. Proofs are checked for correctness just like programs would be checked for type safety.

Specifically, my class from the past spring used the Dafny program verifier (here’s a tutorial). At one level, Dafny is a programming language with imperative and first-order functional constructs. However, it differs from mainstream languages in allowing logical annotations such as preconditions and postconditions, ranking functions, and invariants for loops and recursive calls. In effect, through these annotations, the programmer embeds a purported proof of correctness within their code. Dafny verifies that these annotations form a real proof.

For example, a Dafny function to compute the n-th Fibonacci number looks like this:

Here, the “requires” clause defines a precondition on the inputs of the function. The “ensures” clause is a postcondition that is asserted when the function terminates. Dafny checks automatically that in any given level of recursion, the precondition implies the postcondition. Inductively, the annotations imply that the postconditions hold on all terminating executions.

To prove that the function always terminates, Dafny uses the “decreases” clause in the program. This annotation defines a ranking function whose value is supposed to be bounded from below and also to decrease at every recursive call. If the ranking function satisfies these conditions (Dafny checks automatically that it does), then the program must terminate eventually on all inputs.

You can also write an imperative definition of Fibonacci in Dafny, along with annotations that prove that it is equivalent to the functional version above:

Here, the “ensures” clause states that the return value x of the procedure equals Fibonacci(n). The while-loop in the program is annotated with a set of loop invariants that form an inductive proof of this claim. These invariants hold at the point when the loop is entered (the base case of the induction), are preserved by every loop iteration (the inductive case), and imply the postcondition upon exit from the loop.

Why program verifiers belong to undergrad CS

I am a big fan of Dafny, and I think every CS program should consider exposing their students to Dafny or a system like it at some point. Students who are new to proofs often struggle to understand what exactly it is that makes a proof “formal”. A program verifier offers an immediate answer to this question. Every computer science student has written code since at least her first semester in college, and is familiar with the precision that coding demands. By using a program verifier, the student gets to extend this sense of precision to proofs.

Program verifiers illustrate one of the most basic ideas in theoretical CS: that an algorithm is a formal, mathematical artifact. In particular, systems like Dafny are excellent tools for illuminating the usefulness of induction in computing. Take those problems in discrete math where you use induction to prove that a series sums up to a certain expression. A system like Dafny allows a student to code up the series as a simple program and to experiment with various induction hypotheses within the same programming environment. This experience obliterates the distinction between mathematical objects (for example a series) and programs, and that between proofs in discrete math and proofs of algorithms. Now the student can go on to understand, once again in an experiential way, that the same principles of induction lie behind proofs of more complex algorithms that manipulate recursive data structures or have complex loops or recursive calls.

There are reasons why Dafny fits my needs better than other verification tools or proof assistants. First and foremost, Dafny checks proofs fully automatically. Following the laws of Hoare logic, Dafny compiles a proof task into a verification condition, or a logical formula that is valid if and only if the user-written annotations constitute a proof. It checks the validity of this formula using an SMT solver. This means that the student can avoid tedious, low-level formula manipulation while doing proofs.

Second, Dafny’s programming language closely matches languages with which all CS students are already familiar. This significantly lowers the barrier to entry.

Third, Dafny is widely available. In particular, Dafny is available through a web interface that undergrad students are likely to find very useful. Go check it out if you haven’t already!

Limitations

As much as I am a Dafny enthusiast, I only use it in a small part of my course that concerns induction and formal proofs of program correctness. I would like to do much more. I would like to use Dafny or its cousins for teaching a wide variety of proof techniques, from complexity analysis to exchange arguments in correctness proofs of greedy algorithms to analysis of randomized algorithms using Chernoff bounds. Unfortunately, the state of the art of program verification doesn’t permit this. For instance, something as simple and mechanical as big-Oh complexity analysis of divide-and-conquer algorithms is beyond the scope of all publicly available verification tools. Also, automated program verifiers do not do well with programs that manipulate unbounded, mutable data structures (for example graphs or even arrays). In fact, in my class, I only use Dafny for programs that operate on numbers and functional data structures.

Some of these limitations can be easily fixed: for example, Rustan Leino and I have talked about extending Dafny with some forms of reasoning about complexity. A more fundamental issue is the giant abstraction gap between the level at which humans think about proofs, and their formal, machine-checkable versions. For example, a correctness proof for Dijkstra’s shortest-paths algorithm that deserves full credit in an undergrad class would fit in half a page. A machine-checked proof of Dijkstra’s algorithm is a research paper.

This is not a cause for despair; if anything, these limitations are opportunities for exciting new research. In my next post, I will explore this topic in more detail, so stay tuned.

5 Responses to Program verification in the undergraduate CS curriculum

It is good to see students getting exposed to program verification at an undergraduate level. At NUS last semester I taught 1 lecture in a course on “Logic and Formal Methods” which was based on automated verification. We used the HIP/SLEEK verifier, a web interface along with the lesson plans and exercises is available at http://loris-7.ddns.comp.nus.edu.sg/~project/TeachHIP/

This sounds like a really cool class. At Northeastern, a variety of pople have created a quite different class also focused on formal verification for undergraduates, using ACL2. This class was originally created by Carl Eastlund, Dale Vallaincourt, and Matthias Felleisen using Carl’s Dracula tool embedding ACL2 in DrRacket, and then taught for real to every first-year CS major at Northeastern by Pete Manolios and Thomas Wahl.

The course focuses on both learning first-order logic and verifying recursive programs written in the Program By Design style, which works very nicely with the heuristics for automatic verification built in to ACL2.

Sam, CS 2800 looks like an excellent way to introduce undergrads to proofs/logic. One challenge, however, is that at many schools, there isn’t enough room in the required curriculum for a course on logic or formal methods. So one question for me is how to integrate these ideas into other classes.

I am trying to do this with algorithms/discrete math, but I wonder if anyone has tried to teach an undergraduate-level compiler course where the compiler is provably correct. Another candidate is a PL course. (Here, Penn’s Software Foundations is probably the gold standard.) Security is another candidate too.

This is very good! Such verification puts the science back into computer science. I urge you to have a look at SPARK (a subset of Ada) for teaching formal verification to undergraduates. SPARK has been used to create and formally verify large real world systems such as the British air traffic control system and the Paris underground driverless trains.

I learned about Dafny from Rustan Leino at a HILT (High Integrity Language Technology) conference. SPARK has everything that Dafny has. But I find SPARK a lot more student friendly than Dafny.

Preconditions, Postconditions, loop invariants, etc are now part of Ada 2012 and can be executed at runtime (as in Eiffel). You use the SPARK proof tools to prove them without executing the code. I find the ability to combine testing of assertions with proof of assertions a real plus for teaching verification to undergraduate students. This combination is also good for verifying real world programs.

There is a gnu implementation of the latest version, SPARK 2014, coming in May. By the beginning of summer, the undergraduate level textbook “Building High Integrity Applications with SPARK” written by Peter Chapin and myself will be available from Cambridge University Press. Peter’s students wrote the software for their CubeSat in SPARK and proved it free of runtime errors. Their satellite is the only one of the group of 13 student satellites launched together that did not die from a software error. They also used SPARK to discover problems in the navigation code given to them by NASA. As a reward, they are building the first CubeSat to go to the moon.