Brian
Friel’s play Translations, set
in 19th-century Ireland, is about using languages to communicate
and control. As you might expect, the relation between Irish and
English figures prominently in the play’s plot. It is only fitting
that Oleg and I are
writing this post in Dublin, at the European Summer School in
Logic, Language and Information, because what we want to show
you is a nice perspective we learned here last week that relates
multiple ways to interpret natural and programming languages. In
honor of multilingualism, you can download
this post as both a literate Haskell program and an OCaml
program. To make sense of this post, you only need to know either
Haskell or OCaml.

{-# OPTIONS -fglasgow-exts -fno-monomorphism-restriction #-}

A programming-language researcher and a (natural-language)
linguist both aim to relate form to meaning. Usually, there are an
infinite number of forms and meanings, so it is impossible to
exhaust them in a brute-force list. Instead, it is natural to
assume a finite set of building blocks for expressions, then
describe systematically the form and meaning of all expressions by
specifying the form and meaning of each building block and how they
compose. Formally, we define an abstract data type of expressions:
the constructors are the building blocks and the observers are a
function from expressions to forms and a function from expressions
to meanings.

Form ← Expression → Meaning

Linguistics often labels this picture as

Sound ← Syntax → Meaning

or as

Phenogrammatics ← Tectogrammatics → Semantics

whereas computer science usually labels this picture as

Concrete syntax ← Abstract syntax → Semantics

so the left arrow is a pretty printer (whose inverse is a
parser) and the right arrow is an evaluator or compiler (whose
inverse is a decompiler).

Of course, we want to restrict ourselves to well-formed
expressions only, using what some linguists call categories and
most computer scientists call types. We also want to restrict the
two arrows to produce well-formed forms and well-formed meanings
only. In other words, the building blocks should preserve
well-formedness.

One way to implement this picture formally is to represent an
expression as a data structure, perhaps a tree of building blocks.
But then, to enforce well-formedness, it would seem that we need a
family of tree types that call for advanced type-system features
such as generalized algebraic
data types or dependent types. Instead, we will represent an
expression as a program that invokes constructors of
abstract data types. The latter data types are abstract in
that they may be either form or meaning. In other words, we will
represent an expression as a program that is parametrically
polymorphic in types that are either form or meaning.

A linguistic example

In natural language, two fundamental types are e,
for entities, and t, for propositions. Our abstract
syntax includes two entities john and
mary, two quantifiers everyone and
someone, and a transitive verb saw. In
Haskell, we define a type class that relates two types
e and t. We call this type class
Symantics because its interface defines syntax and its
instances define semantics.

In OCaml (or more generally ML), we define a module signature
Symantics that includes two types e and
t. For example, the function saw maps an
object entity and a subject entity (in that order) to a
proposition.

Without specifying either form or meaning, we can already
construct expressions in the abstract. In Haskell, an abstract
expression has an overloaded type. The following definitions each
have the type Symantics e t => t because they
denote propositions, not entities. For example, test1
is the proposition that John saw Mary.

Analogously
in OCaml, an abstract expression is a functor that takes a
Symantics module, say S, as input, and
produces (a module containing) values whose types are defined in
S. The following functor produces values of type
S.t because they denote propositions, not
entities.

With these test cases in place, it is now time to implement
Symantics concretely. The two arrows in the pictures
above amount to two implementations. For simplicity, we take the
meaning of a proposition to be whether it is true in a model with
just two people, John and Mary, who saw each other but not
themselves. In Haskell, we provide an instance of the
Symantics class.

We can then resolve the overloading in test1,
test2, and test3 above to compute their
meaning. Indeed, test2 is false because no single
person saw everyone—John didn’t see John, and Mary didn’t see
Mary—but test3 is true because each person was seen
by a (different) person.

In programming-language terms, an instance of the
Symantics class in Haskell or a module for the
Symantics signature in OCaml is an
interpreter for the object language whose abstract syntax
is defined by the class or signature. In particular, the meaning
arrow is an interpreter that evaluates an expression. On the other
hand, the form arrow is also an interpreter—one that prints (or
pronounces) an expression. We can define it as another Haskell
instance

to reveal the well-known scope ambiguity in the English sentence
“someone saw everyone”. To wit, test2 and
test3 have the same form but we have already seen that
they have different meanings.

In these definitions of someone and
everyone, linguists will recognize quantifier
lowering or quantifier raising, computer scientists
will recognize continuation-passing style (CPS), and hey,
logicians might even recognize a representation theorem or
two.

Exercise: extend the language to include sentences such as
“someone’s mother saw everyone’s father” and “Mary knows who John
saw”.

Much related work

There is a ton of work in both computer science and linguistics
along these lines: to extend the language of abstract expressions,
to refine the form and meaning arrows, to simplify the metalanguage
in which the interpreters are implemented, to build more
interpreters that perform additional processing such as logical
deduction or partial evaluation, to compose interpreters, to
optimize self-interpretation…