Substitution is pullback

I am sitting on a tutorial on categorical semantics of dependent type theory given by Peter Lumsdaine. He is talking about categories with attributes and other variants of categories that come up in the semantics of dependent type theory. He is amazingly good at fielding questions about definitional equality from the type theorists. And it looks like some people are puzzling over pullbacks showing up, which Peter is about to explain using syntactic categories. Here is a pedestrian explanation of a very important fact:

Substitution is pullback.

I will stick to “ordinary math”. No dependent types, no contexts, no fancy logic. Suppose we have an expression $t(x)$ which describes a function $A \to B$ mapping $x$ to $t(x)$, and a predicate on $B$, i.e., an expression $P(y)$ which tells us whether any given $y \in B$ has a given property. Then we can define a new predicate $Q(x)$ on $A$ by
$$Q(x) \equiv P(t(x)).$$
We substituted the term $t(x)$ for $y$ in the expression $P(y)$. This is completely obvious and it gets used all the time. For example, whenever we say something like “$2 x + 1$ is even”, we have substituted the expression $2 x + 1$ for $y$ in the predicate $\exists z . y = 2 z$, where all variables range over integers.

Let us think in terms of sets. The expresion $t(x)$ corresponds to a function $t : A \to B$. The predicate $P(y)$ on $B$ corresponds to a subset $P \subseteq B$ of those elements which satisfy it. Then $Q(x) \equiv P(t(x))$ corresponds to a subset $Q \subseteq A$, namely
$$Q = \lbrace x \in A \mid t(x) \in P \rbrace.$$
But this $Q$ is the pullback of the inclusion $P \hookrightarrow B$ along $t$. And if you draw the relevant pullback diagram for yourself, and figure out why we have a pullback, you will have understood why substitution is pullback.

Isn’t substitution composition?

Sometimes we substitute terms into terms. Suppose we have an expression $t(x)$ which maps $x \in A$ to $t(x) \in B$, and an expression $s(y)$ which maps $y \in B$ to $s(y) \in C$. Then we can substitute $t(x)$ for $y$ in $s(y)$ to obtain the expression $s(t(x))$ which maps $x \in A$ to $s(t(x)) \in C$. In terms of functions this is composition. So the full story is that

Substitution of a term into a predicate is pullback, but substitution of a term into a term is composition.

Or if you are a type theorist:

Substitution of a term into a dependent type is pullback, but substitution of a term into a term is composition.

This result is folklore, which is a technical term for a method of publication in category theory. It means that someone sketched it on the back of an envelope, mimeographed it (whatever that means) and showed it to three people in a seminar in Chicago in 1973, except that the only evidence that we have of these events is a comment that was overheard in another seminar at Columbia in 1976. Nevertheless, if some younger person is so presumptuous as to write out a proper proof and attempt to publish it, they will get shot down in flames.

Undoubtedly people have known this as a fact for a very long time.

However, I couldn’t find any published papers from the 1970s that formulated this result. It seems unlikely that it was done then because the categorists of that day had come from traditional pure mathematics and were not familiar with formal syntax.

I would like to know who did write this down in any formal way first.

Without asserting my own priority, let me cite my book Practical Foundations, in particular Section 4.3 and Chapter VIII, as a place where it is done.

I describe a construction there of the classifying category or category of contexts and substitution that works for any type theory with all of the structural rules (ie not linear logic) and possibly dependent types. This derives a sketch (essentially, generators and relations for the category) more or less directly from the syntax. Then the category is obtained easily from the sketch. This separates the “chalk” of recursive definitions of syntax from the “cheese” of associative composition in a category.

Note to the old-timers: substitution is not given by forming the pullback. We construct the denotations of the original and substituted term or predicate first and then observe that they obey the universal property of a pullback. Not all pullbacks need exist in the syntactic category.

Now I have a confession to make. I cannot see the obvious reason why the interpretation of the language in another category with the appropriate structure defines a functor from the classifying category. Certainly there is a function on objects and morphisms and the universal properties of products and exponentials correspond to the type theoretic rules. What I cannot see is why the function preserves composition.

The Subsititution Lemma in type theory says how substitutions commute: $$ [a/x]^* ([b/y]^* t) = [a/x]^* ([[a/x]^*b/y]^*t),$$ where $[a/x]^*t$ is the result of substituting $a$ for $x$ in $t$. The superscript star comes from inverse images and pullbacks in category theory.

All of this is an action on $t$, which we can omit. This leaves composition of abstract morphisms, in fact in the classifying category. So, as Andrej says, substitution of terms is composition.

However, this commutative square of morphisms is a pullback.

The morphism $[b/y]$ that arises from the term $\Gamma,x:X\vdash b:Y$ splits the product projection or display map $\hat y:\Gamma\times X\times Y\to\Gamma\times X$. Just as the action of the term is by substitution, that of this display map is weakening by the variable $y$.

The pullback of the composition $[b/y];\hat y=id$ along $[a/x]$ is shown on page 449 of my book:

It is an example of the easy “pullback lemma” of which Andrej for some reason recently gave a proof.

Let me try to give a bit more explanation of the construction of a category from syntax that is described at length in my book and also answer my own question about why the interpretation in another category defines a functor.

The antecedents of this construction include clones in universal algebra and classifying toposes in geometric model theory, but I called it the category of contexts and substitutions in line with the tradition of referring to the category of widgets and homomorphisms.

People originally took individual types as the objects, but the relationship with syntax is made much more fluently if we use contexts instead, ie lists of variables together with their (possibly dependent) types. The lazy way of describing the morphisms is as strings of terms.

The difficulty with formalising this, particularly for dependent types, is that we need to mix up recursion for the type theory with associativity for categories. Anyone who has tried to write programs for associative operations will know that this is a mess.

To solve this problem I used an elementary sketch. Traditionally, sketches (esquisses in French) were used to describe categories of models of theories that involve limits and colimits. The journal Diagrammes includes a lot of work about them. Pierre Ageron extended these ideas to exponentials in a way that could probably deal with most of the ideas in type theory in a purely categorical way, but unfortunately never developed his work.

However, my idea only uses sketches in their simplest form, without limits, colimits, exponentials or anything else that invokes new objects. Every object (context) of the intended category is given as a node of the sketch. Then there are two classes of arrows and five equations (commutative diagrams), giving generators and relations for the morphisms.

In detail, the sketch has

— a node for each context of the language,
— an arrow $\hat x:[\Gamma,x:X]\to\Gamma$ called a display map for each type-in-context $\Gamma\vdash X$,
— an arrow $[a/x]:\Gamma\to[\Gamma,x:X]$ for each term $\Gamma\vdash a:X$, and
— five families of equations, the most complicated of which is the substitution lemma above.

The substitution lemma is the square on the left in the diagram above and the other square says that terms commute with displays; these are both pullbacks. Displays commute with each other and form the pullback or fibre product of the types over the context.

However, the proof that these squares are pullbacks in the dependent case is delicate, making repeated use of the pullback lemma in a particular sequence of ways.

All of this structure is exploited in the final chapter of the book to provide a fluent translation between diagrams and syntax, so that the universal properties of the former correspond exactly to the proof rules for the latter.

However, since I was still rather biassed towards diagrams and against syntax when I wrote the book, I did not actually spell out what it is to be an interpretation of a type theoretic language in a category. I simply took this to be a structure-preserving functor from the category of contexts and substitutions.

We can rectify this, but as with substitution as pullback we have to do the construction in a specific order.

We have to define a functor from the category generated by a sketch, for which we need to give its effect on nodes and arrows in such a way that the equations hold.

For dependent types at the algebraic level, ie without $\Pi$, $\Sigma$ etc, the semantic structure that is required is a category with display maps, ie a class of maps that is closed under pullback against arbitrary maps. This provides the object part of the interpretation, for types and contexts, and the display maps that link them.

Now we need to fill in the interpretations of the terms, as sections of the display maps that interpret their types-in-context. The operation-symbols (which we understand as having variables as their arguments) have given meanings as morphisms of the semantic category.

Other terms are obtained by substitution of sub-terms for variables in the outermost operation-symbol.

The result of such a substitution is the top left map in the diagram. This map is the mediator to the right-hand pullback such that the composite along the top is the identity. This makes the left-hand square commute, indeed it is a pullback.

The pullback square that captures the substitution lemma is therefore the definition of substitution and its image in the semantic category is the definition of the interpretation of a expression that consists of an operation-symbol applied to sub-terms.

In particular, this square commutes, as required. We have also made the other squares, capturing the simpler equations, commute too.

Here is a programming problem that arises from the above description of the category: what is the appropriate datastructure for morphisms?

The lazy definition of a morphism is that it is an assignment of terms to variables, ie what is known as an environment in compiler design.

Traditionally, hash tables were used to do this, but this is a monolithic solution based on assuming that there is only one environment to be considered. It works in practice because it can be updated in place and previous versions of the environment will not be needed again.

In a category there are morphisms every which way. We would also like to think of this problem in a functional way. There need to be multiple versions of the environment.

One of the operations that needs to be done on this datastructures is composition, whilst clearly the generating maps $\hat x$ and $[a/x]$ need to be represented, so in the first instance we use this representation.

From this we obtain the term that is to be assigned to a variable $x$ by reading the string of generators backwards from target to source. (Substitution is a contravariant action.)

— If we encounter a generator $[a/x]$ then $a$ is the required term, except that it contains its own free variables, which may each need to be substituted using assignments that are further back in the string.

— If instead we encounter $\hat x$ then there is an error, because this means that $x$ is excluded from the target context.

This is a linear search, which is expensive if the morphism is the composite of a very long string. This situation arises in the compilation of an ordinary program during reading of the hundreds of definitions that might be present in the library headers and leads to the use of hashes.

There are other datastructures that can be used for dictionary-like data in a functional way. There is a book by Chris Okasaki about them. One idea that might be adapted to this problem is that of red-black trees.

This problem is more complex than dictionary search because we need to know that neither $[a/x]$ nor $\hat x$ occurs in the right-hand sub-tree that arises from composition (in diagrammatic order) before we select an occurrence from the other sub-tree. So the items in the dictionary have both an “alphabetical” order that we use to search for them quickly and also the order in which they occur in the composite, from which we require the right-most.

Intuitively, composition is also a pullback. $f;g$ is the “pullback” of a morphism $g : B \to C$ along $f : A \to B$ to obtain a morphism of type $A \to C$.

One way of formalizing the intuition is to work in subsumptive reflexive graph categories (Dunphy & Reddy, Parametric Limits, LICS 2004), where we have “logical relations” in addition to morphisms. Perhaps there are other ways as well.

On the pullback diagram, are the objets sets?, predicates? sets formed by predicates?
When you say: “Q is the pullback of the inclusion P \hookrightarrow B along t”, do you say it in the same why as “p2 is a pullback or fiber product of the pair. We also say that p2 is the pullback of f along g” from [http://www.math.mcgill.ca/triples/Barr-Wells-ctcs.pdf page 273]; if yes, then I must conclude that Q is an arrow, t(x) is an arrow, the inclusion from P to B is an arrow and the objects are the sets A, P & B; but then I’m missing the arrow from A to P and from A to B.
My attempt: https://postimg.org/image/dvw8j0moh/

Use the fact that a predicate $P$ on a set $A$ can be viewed as a subset inclusion $P \subseteq A$. That is, you should switch between “$P$ is a predicate on $A$”, “$P$ is a subset of $A$” and “subset inclusion $i_P : P \to A$”, as the occasion requires.

For instance, if someone says “substutution of a term into a predicate is pullback”, then we clearly need a pullback square, so four arrows. The two arrows we start with are $t : B \to A$ (the substituted term) and the subset inclusion $i_P : P \to A$ (and you can figure this out because of the three forms of understanding “$P$ is a predicate on $A$” only this one is an arrow). The other two arrows are a subset inclusion $i_Q : Q \to B$ where $Q = \lbrace y \in B \mid t(y) \in P\rbrace$ and the arrow $Q \to P$ which is the restriction of $t$ to $Q$, i.e., it takes $y \in Q$ to $t(y) \in P$. Does that help?