In denotational semantics and functional programming, the terms
monad morphism, monad layering, monad
constructor, and monad transformer have by now
accumulated 20 years of twisted history. The exchange between
Eric Kidd and
sigfpe about the probability
monad prompted me to investigate this history.

Chapter 4 of Eugenio Moggi’s “An
Abstract View of Programming Languages” (Edinburgh LFCS
technical report ECS-LFCS-90-113, 1990) defines a monad
morphism as you’d expect in the category of monads: a natural
transformation between two monads as functors that commutes with
the two monads’ unit (return) and bind
(>>=) operations. The two monads don’t even need
to be over the same category of types, but Haskell has only one
category of types (pace type
classes), so Moggi focuses on the simplified case,
where the functor U is an identity. Further, Moggi cares
only about strong monads, which Haskell programmers like
Wadler call a monad for short. Roughly, a strong monad respects the
Cartesian structure (in particular of environments) of the category
of types. Moggi thus defines a simplified strong monad
morphism, which relates two strong monads over the same
category of types. For short, Haskell programmers (like me
henceforth) call it a monad morphism, of type forall a. m a
-> n a where m and n are the
two (strong) monads being related.

Moggi continues with “some examples of monad
constructions” (emphasis mine). He defines “several
endofunctors on Obj(Mon(C))”, where Obj(Mon(C)) is a category with
monads for objects but only trivial (identity) morphisms. In other
words, he defines several maps from monads to monads, including
what Haskell calls ErrorT, StateT,
ReaderT, WriterT, and
ContT.

Endofunctors on Obj(Mon(C)) have very little structure. One of
them exchanges the Identity monad with the
StateInt monad while leaving all other
monads alone. You can’t even define this exchange as a Haskell type
constructor of kind (*->*)->(*->*). Moggi
thus considers several ways to add structure to a given endofunctor
F on Obj(Mon(C)).

“Can F be extended to an endofunctor F over
Mon(C)?” That is, does every monad morphism M → N
yield a monad morphism FM → FN such that blah
commutes? For the Haskell monad maps listed above, the answer is
yes except for ContT.

“Is there a natural transformation from IdMon(C) to
F?” That is, does every monad M yield a monad
morphism M → FM such that blah commutes? For the
Haskell monad maps listed above, the answer is all yes.

Can we lift the operations (such as changing the state
or producing output) of each monad M into FM?
That is, as F adds new operations to a monad, can it
preserve the old operations of the monad?

The last question is crucial for Moggi’s goal, to make
denotational semantics modular. Steele, Espinosa,
and Liang, Hudak,
and Jones turn Moggi’s goal into a puzzle in functional
programming, that of building a modular interpreter. Liang et al.
define a monad transformer as a monad-to-monad map that
satisfies item 2 above, in other words, a natural transformation
from IdMon(C) to an endofunctor on Obj(Mon(C)). The
monad morphism M → FM in their definition is
known as lift in Haskell’s MonadTrans
class.

Of items 1 and 3 above, the latter has received by far more
attention in the literature. After all, monads are useful only to
the extent that they provide useful operations. Filinski
defines a layering of a monad N over another
monad M as a function family M(NA) →
NA (for each type A) such that blah commutes. In
Haskell notation, a layering defines a polymorphic function

layer :: forall a. m (n a) -> n a

where m and n are the two monads.
Filinski explains that monad morphisms and monad layerings are
equivalent:

lift = layer . liftM return
layer = join . lift

Filinski then encodes layered monads in terms of delimited
control. Lüth
and Ghani propose another approach to combining monads and
their operations using coproducts.

So, that’s what I know of the story of monad morphisms,
layerings, constructors, and transformers in the literature.

Three asides. First, Cartwright
and Felleisen pursue modular denotational semantics with side
effects but not (directly) monads. Second, monads generalize to
endofunctor-enriched categories or parameterized
monads, a more general notion of computation.

Finally, although the probability monad factors into the list
monad and the writer monad transformer (the latter for the monoid
of probabilities with multiplication as the binary operation), we
need to distinguish the list monad from the set monad: the
distribution “True 0.2, True 0.2, False 0.6” is equivalent not to
“True 0.2, False 0.6” but to “True 0.4, False 0.6”. This
equivalence arises only from the probability monad as a whole. As
long as we only
apply the monad to types with equality, we can reduce
distributions using such equivalences to compute more efficiently.
Then the monad generalizes from probabilities to any semiring (with a
multiplicative identity), while remaining an instance of
MonadWriter and (assuming the semiring also has an
additive identity) MonadPlus. This generalization is
used in speech
recognition and other
applications in natural-language processing.