Ideas about infrastructure are most convincing when they are deployed. In 2014, my group’s ideas about code generation were deployed in a new code generator for the Glasgow Haskell Compiler (GHC). Deployment required several years’ effort from João Dias, Simon Marlow of Microsoft Research, Simon Peyton Jones of Microsoft Research, and me. The new code generator’s most interesting component is a reusable, higher-order optimization library, Hoopl (Haskell 2010), which uses generalized algebraic data types to guarantee at compiler-compile time that no matter what Haskell program it is given, GHC never builds an ill-formed control-flow graph.

Languages and learning

People who want to learn to use programming languages effectively can try starting with an industrial language, but the sheer size of a typical industrial language and library make it hard to discover and apply the ideas that make the language worth learning. People can also consult books, but existing books primarily talk about programming languages, or they talk about how programming languages are implemented, or they steer people to industrial languages. To make it possible for people not just to learn about great ideas in programming languages, but to build software that applies these ideas effectively, I have designed and implemented a collection of tiny programming languages for language learners. Using my languages, learners can build programs that apply some of the greatest ideas in programming languages: functions, types, and objects. The languages form the skeleton of a new book, Programming Languages: Build, Prove, and Compare, which will be published by Cambridge University Press.

While implementing the languages I created for the book, I found an interesting problem: there is a big engineering gap between simple, definitional interpreters, which we write to illustrate precisely what a language is supposed to mean, and industrial-strength interpreters and compilers, which we write to run programs efficiently. To narrow this gap, I’ve investigated ways of engineering definitional interpreters to be more efficient, without making them much harder to write (PPDP 2013).

Because programming languages are the medium in which software is written, they are also the medium in which beginners learn to build software. The programming language and technique taught in introductory courses are widely believed to affect learning. I have recently piloted a first course based on How to Design Programs by Felleisen et al., which uses functional programming languages and techniques. My analysis, refinements, and recommendations have just been published at the International Conference on Functional Programming (ICFP 2014).

Language applications, design, and semantics

Languages and techniques are refined, evaluated, and improved by using them on real problems. I’ve applied functional-programming techniques to computational biology; working with graduate students in computational biology, I showed ways in which functional programming works to help solve their problems (ICFP 2012). We also identified obstacles that prevented functional programming from working as well as it could have.

Another application area, machine learning, builds on Bayesian reasoning about probabilities. But in current practice, a programmer’s Bayesian ideas are usually hand-translated into a general-purpose programming language like Matlab or C++. Such programmers might be much more productive if given a probabilistic programming language, in which Bayesian reasoning can be expressed directly. With support from DARPA, I'm working with colleagues from BAE Systems and Northeastern University on the design and semantics for probabilistic programming languages.

Curricular development

Over the past seven years, I have revamped a significant portion of Tufts's required undergraduate curriculum in computer science. Our students are required to take four courses that have programming assignments. As a result of my work, these courses now ask much more of our students than they did formerly, and they also offer more. In particular, we now offer many more challenging, rewarding problems of a sort that students would not think to tackle on their own.

In addition to the pilot introductory course mentioned above, I have completely redesigned the third and fourth courses in our programming sequence.

Our third course, COMP 40 (Machine Structures and Assembly-Language Programming), is most similar to a course in machine organization or systems programming. I redesigned the course to focus on two sets of skills: applied data abstraction and machine-level programming. The redesigned course, which has now been taught by three other instructors, is viewed by some as the most valuable course in our department. It is highly praised on surveys of graduating seniors, who are given the opportunity to identify just one course that exemplifies "what a truly excellent college course should be."

I replaced an older "paradigms" course in programming languages with a new required course in programming languages, COMP 105, which demands that students learn to use key programming-language ideas in actual programming, and which also demands some mathematical content (e.g., operational semantics, equational proofs). The course, which uses my draft book Programming Languages: Build, Prove, and Compare, effectively serves as our fourth programming course. COMP 105 is also highly regarded by students and mentioned on senior surveys, although not quite as much as COMP 40.

This page show my most significant and most recent papers.
Links are to abstracts so you can check
out the topic without downloading a monster.
For a complete view, including older work, see my
publications list.

Stochastic Lambda Calculus and
Monads of Probability Distributions (with Avi Pfeffer).
Proceedings of the 29th ACM Symposium on the Principles of Programming
Languages, in SIGPLAN Notices, 37(1):154–165, January
2002.
This paper explores the design of probabilistic languages in a
foundational, principled way. Its special contribution is to analyze
important implementation techniques in a way that is completely formal and is
rigorously connected to the theory of probability.

Automatically Generating
Instruction Selectors Using Declarative Machine Descriptions (with João Dias).
In Proceedings of the 37th ACM Symposium on the Principles of Programming
Languages, pages 403–416, January 2010.
The most beautiful results from João Dias's doctoral
dissertation: (a) if all you know is the semantics of the intermediate
code and the target instruction set, generating a code generator is
undecidable; and (b) by using a clever new heuristic search based on
algebraic laws, João can generate code generators for real machines
quickly. The core of the algorithm combines Hoare logic and unification to
find sequences of machine instructions that implement intermediate code. Will
appeal especially to those who like inference rules with their compilers.

An Algebraic Approach to File
Synchronization (with Elöd Csirmaz).
In Proceedings of the 8th European Software Engineering Conference (ESEC)
and 9th ACM SIGSOFT Symposium on the Foundations of Software Engineering
(FSE-9), pages 175–185, September 2001.
This paper discusses how to maintain consistency among multiple
replicas of files that may be modified concurrently. We proposed reasoning
about this problem by examining the algebraic structure of a sequence of
modifications.

These are some recent papers not listed above.
They are in chronological order,
so the most recent work is at the bottom.

Hoopl: A Modular, Reusable
Library for Dataflow Analysis and Transformation (with João Dias and Simon L.
Peyton Jones).
In Proceedings of the 3rd ACM SIGPLAN Symposium on Haskell
(Haskell 2010), September 2010.
We implement dataflow analysis and transformation in modular fashion
using advanced features of Haskell. Unlike our 2009 technical report, which
emphasizes the advantages of using Hoopl, this paper focuses on the
implementation, which has been carefully modularized to separate each of the
tricky bits from all of the others. Of greatest interest to hard-core
functional programmers and compiler writers.

Experience Report: Haskell in
Computational Biology (with Noah M. Daniels and Andrew Gallant).
In Proceedings of the Seventeenth ACM SIGPLAN International Conference on
Functional Programming (ICFP'12), pages 227–234, September
2012.
Selected as one of the best papers of ICFP 2012.
Encouragement for computational biologists to try Haskell. If you are a
functional programmer, you will see an interesting variation on lazy search.
You might also get some insight into why beginners may find it difficult to
use QuickCheck.

Engineering Definitional
Interpreters (with Jan Midtgaard and Bradford Larsen).
In Proceedings of the 15th International Symposium on Principles and
Practice of Declarative Programming (PPDP'13), pages 121–132,
September 2013.
Definitional interpreters can be simple but often don't perform very
well. Mature bytecode interpreters can perform very well but aren't simple or
easy to build. This paper recounts our extensive search for the "best simple"
variation on a definitional interpreter written in a functional language.

MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov
Random Fields and Stochastic Search (with Noah M. Daniels, Andrew
Gallant, Norman Ramsey, and
Lenore J. Cowen).
IEEE/ACM Transactions on Computational Biology and Bioinformatics,
12(1):4–16, 2015.
A preliminary version of this paper appeared in the Fourth ACM
International Conference on Bioinformatics, Computational Biology, and
Biomedical Informatics (ACM BCB 2013), where it was recognized as a Best
Student Paper.
Results of our work on stochastic search for protein homologies. The
search tool was written in the functional language Haskell.

On Teaching How to Design
Programs: Observations from a Newcomer.
In Proceedings of the Nineteenth ACM SIGPLAN International Conference on
Functional Programming (ICFP'14), pages 153–166, September
2014.
In Fall 2013 I taught introductory programming using the interesting
textbook How to Design Programs. If you might teach from this book,
this paper contains lots of useful information about how to prepare and teach
such a course.

Symbolic Bayesian Inference
by Lazy Partial Evaluation (with Chung-Chieh Shan).
July 2015.
To draw a probabilistic inference from an observation of measure zero,
disintegrate a continuous space into an infinite family of posterior
distributions. Then use the observation to select one member of the family.
Chung-Chieh Shan's work is brilliant and I'm proud to be associated with it.
(If you liked my work on the probability monad, this paper is a good next
step.)

The documents contained in these pages are included
to ensure timely
dissemination of scholarly and technical work on a
non-commercial basis. Copyright and all rights therein
are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their
works here electronically. It is understood that all
persons copying this information will adhere to the terms
and constraints invoked by each author's copyright.
These works may not be reposted without the explicit
permission of the copyright holder.

Office
hours for
Fall 2018
are
by appointment only until further notice.
(I am dealing with a serious family medical issue.)
To make an appointment, please post to Piazza, or send email, with
three times that work for you.

I am always interested in discussing
projects with students, and I
often have several undergraduate or master's students working with me part-time.

I have twice served ACM as a member of the SIGPLAN Executive Committee.
I served as program chair for ICFP '07.

I signed the email
charter; if you won't, I won't.
After years of relying on the kindness of strangers, I finally
started carrying a cell phone.
I no longer maintain a
hot list; this is more of a random list.
An interest in personal productivity and pointers from
Benjamin Pierce
and
Phil Wadler
got me to Inbox Zero
on Wed 21 Feb 2007 at 6:00 PM.
After serious lossage caused by various alarums and excursions,
I recovered Zero at 6:30 PM on Mon 31 Dec 2007.
It was a pleasure to start the New Year with an empty inbox!
After starting at Tufts, I got a little behind; at the end of my
first year, my email debt was over 600 messages.
At the end of my third year, at 9:12 PM on Monday 23 May 2011,
I recovered Zero once again, but I cheated—I put
600 messages from 2009 and 2010 into an
email
demilitarized zone.
The next time I got to Zero was at 5:03 PM on Wednesday
29 August 2012,
again at 2:54 PM on 30 May 2014,
again at 6:34 PM on 12 November 2014,
and
again at 12:00 PM on 7 June 2017.
For a time I almost thought I detected a pattern there.