Tuesday October 16 is an excellent day for a lot of truly excellent
reasons. The one of concern here is that it was chosen to be the Ada
Lovelace Day, whose purpose is to celebrate important women in one's
scientific field.

(A recent ESOP submission has left me enthusiastic about exotic time
zones. I'm aiming to publish this article on Octobre 16 before 23:59
GMT-11, which might look like the following day to some readers,
including those living in Samoa
as it changed timezone in December 2011. Oh well.)

As far as I have witnessed, the Gallium-related fields in computer
science research (programming languages design, compilation, type
systems...) share the bane of most other computer science domains (or
software development communities) in having a very low ratio of women
at all levels.

Despite this imbalance, we actually have quite a few women that made
very important advances. In the process of writing this billet,
I gathered a rather long list of names of persons I would like to
write about. Those are mostly researchers from the second half of the
20th century, but not only: Emmy
Noether was suggested to
me by Xavier, as the well-founded induction techniques she developed
are quite useful when writing formal devlopments in a proof assistant
such as Coq -- and of course her leading role in the movement of
formalization of mathematics through abstract algebra had wide-ranging
consequences that can definitely be felt, indirectly, in the
development of programming languages and systems.

I have finally decided to avoid enumerations and rather attempt to
write something interesting on just a few women. This also has the
advantage of leaving ample material for the next years, even if they
will be new names to add to the list by then.

Barbara Liskov

Abstract data types

The idea of abstract data
types and
encapsulation seems so natural now that it is difficult to imagine
that is wasn't around before the beginning of the seventies. A lot of
people were doing similar things at that time, but Barbara Liskov
proposed a lucid notion of abstract datatypes, whose internal
implementation choices were hidden from outside user modules. She then
integrated it in 1974 into a new language design,
CLU,
which proved very influential for many other aspects as well: garbage
collected, iterators, multiple value return, exception handling... If
you want to learn more about the genesis of abstract types and CLU at
large, you should read A History of
CLU
(PDF,31 pages) that she wrote later in 1992.

Abstract types were named "clusters", and this is where the CLU name
comes from. Let's show the Wikipedia example:

This cluster exports the type complex_number along with operations
add, substract and multiply. This is all the user of the cluster
ever sees. Internally, the type is given a representation (a type
definition rep = ...) and the operations are implemented in terms of
this representation, but this is encapsulated and doesn't leak to the
outside.

Barbara Liskov has also played an influential in the development of
object-oriented techniques (and languages for distributed computing),
but CLU was not object-oriented. Some aspects of the design were
influenced by Simula (1967), but the "History of CLU" paper gives
a careful and detailed explanation of the difference between the
"abstract data-type" and the "object-oriented" (in the spirit of
Simula and Smalltalk) design style, that would be later expanded upon
and constrasted -- see William Cook's On Understanding Data
Abstraction,
Revisited.

Abstract types are a very important concept in ML programming
languages. In 1988, Plotkin and Mitchell proposed to understand such
type abstraction as use of existential types (Abstract Typess have
Existential
Type),
but the ML module systems actually takes a relatively different
approach to type abstraction. Several authors have tried to bridge the
gap, including in particular our very own Benoît Montagu in his 2009
article with Didier Rémy, Modeling Abstract Types in Modules with
Open Existential
Types.

Liskov Substituability Principle

The Liskov Substituability Principle
(LSP)
stems from a 1994 paper that Barbara Liskov wrote with Jeannette
Wing, A Behavioral
Notion of
Subtyping. It
gives the right definition of what subtyping means: a type S is
a subtype of a type T if all values of type S behave like values
of type T. From this vision you can derive most properties of
subtyping relations (in particular contravariance in
input parameters), but those were already well-understood in 1994;
Liskov and Wing made the notion of "behave like" formal, by saying
that anything that could be proved about values of type T should
also be provable on values of type S.

The rest of the paper is interesting (they provide limited extension
techniques that guarantee that this strong semantics requirement
is met), but what made history was how this principle got a name, LSP,
and was widely diffused to practitioners of object-oriented
programming. It is now part of all methodology advice on programming.
Mainstream object-oriented languages went for richer notions of
extension that those that are clearly correct, and systems mixing
programs and proofs to statically ensure substituability are still at
the research stage, but everyone shoud know by now that the derived
types the programmers implement must respect the LSP -- it could be
argued that the preferred extension method of object-oriented
languages, implementation inheritance, makes it particularly easy to
mistakenly violate this invariant, so talking about it loud is
extremely useful.

Christine Paulin-Mohring

I had the luck to have Christine Paulin-Mohring as teacher at the
MPRI (if you are
looking for a research-oriented graduate curriculum in computer
science, consider coming to Paris for the MPRI). Christine played an
instrumental role in the development of the
Coq proof assistant, including
leading the development of it. Among the most apparent works of
Christine are her work on program extraction out of Coq, and on
inductive definitions.

If you already know a bit of Coq, you should consider reading the
Credits
page
of the Coq documentation, that is essentially a changelog of important
Coq version since Coq 5.10 in 1995 -- it really is surprisingly
interesting. Christine Paulin-Mohring is directly cited in most of
those release notes, and wrote several of them.

Program extraction allows to take a Coq file and output a program in
a functional programming language such as OCaml, Haskell or
Scheme. A Coq file actually is a mix of programs (or proofs) written
in a style that resembles strongly typed functional programming
language, and tactics that are evil little metaprogramming scripts
that build proofs (or programs) and are composed interactively by
the user, with direct feedback on the desired type of the terms to
build and the available environment. The job of extraction is to take
this happy mix and produce a Haskell, OCaml or Scheme program that can
be compiled down to something efficient -- in particular, it strives
to remove unnecessary computations that are not really needed during
the program execution but corresponds to static correctness proof
about the program.

In Coq, this separation between what's needed at runtime and what can
be "erased" is done by a strict typing discipline: some types live in
Prop, the universe of logical propositions (this mean that they
themselves have type Prop), and their values can always be erased,
and the rest, outside Prop, is kept. Note that this does not depend
on how the terms are written (programs or tactics); if you're
adventurous, you may use tactics as a meta-programming tool to write
honest programs executed at runtime, or write proofs by directly
writing the corresponding proof terms as ML-like programs.

Prop has specific typing rules (it is called impredicative)
designed to make it convenient to use for mathematical proofs, and the
type-checking rules have been designed to allow extraction. In
particular, if an inductive type in Prop has non-erasable parameters
or at least two constructors, you may not pattern-match on it in an
expression returning a non-erasable value. The following examples
fails to type-check with a rather self-descriptive error:

Error:
Incorrect elimination of "choice" in the inductive type "Choice":
the return type has sort "Set" while it should be "Prop".
Elimination of an inductive object of sort Prop
is not allowed on a predicate in sort Set
because proofs can be eliminated only to build proofs.

Indeed, the parameter choice, having type Choice, will be erased
during extraction. Yet the return type of zero_or_one is a concrete
natural number; it is impossible to compute this result if the
parameter choice is not passed at runtime. One can fix this by
moving the type of Choice out of Prop, or the result type inside
Prop. When writing certified programs these choices are usually
rather simple to make, as it is clear from the whole program
architecture which parts are "program" and which are "proofs about the
program" : the program definition should not depend on its own
correctness proof. Of course in some cases you would wish for a more
flexible distinction, but that's for another discussion.

While this whole description is a bit fuzzy on the details and make
some simplifications, I took care to point out that the limitation on
pattern-matching of propositions to return runtime values held for
inductive types of at least two constructors (or using
non-erasable parameters). Indeed, if a type has only one constructor,
of which all parameters are themselves erasable, matching on it does
not actually influence the runtime execution of the program (it is
called "non-informative") and may be used for its type-checking
virtues. The reason I develop this example here is that it interleaves
both topics of inductive definitions and program extraction.

The one-constructor case is extremely useful. That is how, for
example, equalities between terms can be used to type-check
a non-erasable program : they have only one constructor,
reflexivity. This is also the key to the use of accessibility
predicates (a constructive remnant of noetherian, or
well-founded, induction) to write programs with subtle termination
proof. Given a relation R on the type A, here is the type that
claims that R is well-founded:

When making a recursive call that has not obviously decreasing
parameters, one may instead use such an accessibility predicate to
show only that the recursive arguments are related (~ smaller than)
the arguments of the current call. This allows the user to choose the
termination argument (encoded in the relation R) in a very general
way.

The other major use of this rule is the case of the inductive type
having no constructor at all. This is the proposition False, whose
proposition can be asked to the Coq system (here using the coqtop
interactive toplevel):

Coq < Print False.
Inductive False : Prop :=

That's it: False have no constructor. This means that when you have
a term at type False (a proof of contradiction), you may say "I'm
done, this case never happens" through an empty pattern matching:
match contradiction with end. This pattern-matching works at any
target type A (the outer function you're defining tries to produce
an A) as all cases of the match are well-typed at type A. And it
works inside proofs, but also to build terms despite False being
a proposition, as it has less than two constructors.

Christine Paulin-Mohring worked on all of this, and later taught it to
me and numerous other students.