Implementing the Giry Monad

In my last post I went over the categorical and measure-theoretic
foundations of the Giry monad, the ‘canonical’ probability monad that operates
on the level of probability measures.

In this post I’ll pick up from where I left off and talk about a neat and
faithful (if impractical) implementation of the Giry monad that one can put
together in Haskell.

Measure, Integral, and Continuation

So. For a quick review, we’ve established the Giry monad as a triple
, where is an endofunctor on the
category of measurable spaces , is a marginalizing
integration operation defined by:

and is a monoidal identity, defined by the Dirac measure at a point:

How do we actually implement this beast? If we’re looking to be suitably
general then it is unlikely that we’re going to be able to easily represent
something like a -algebra over some space of measures on a computer,
so that route is sort of a non-starter.

But it can be done. The key to implementing a general-purpose Giry monad is to
notice that the fundamental operation involved in it is integration, and that
we can avoid working with -algebras and measurable spaces directly if
we focus on dealing with measurable functions instead of measurable sets.

Consider the integration map on measurable functions that we’ve been
using this whole time. For some measurable function , takes a
measure on some measurable space and uses it to
integrate over . In other words:

A measure in has type , so
has corresponding type .

This might look familiar to you; it’s very similar to the type signature for a
continuation:

newtypeContar=Cont((a->r)->r)

Indeed, if we restrict the carrier type of ‘Cont’ to the reals, we can be
really faithful to the type:

newtypeIntegrala=Integral((a->Double)->Double)

Now, let’s overload notation and call the integration map itself a
measure. That is, is a mapping , so
we’ll just interpret the notation to mean the same thing -
. This is convenient because we can dispense with
and just pretend measures can be applied directly to measurable functions.
There’s no way we can get confused here; measures operate on sets, not
functions, so notation like is not currently in use. We just set
and that’s that. Let’s rename the ‘Integral’ type
to match:

newtypeMeasurea=Measure((a->Double)->Double)

We can extract a very nice shallowly-embedded language for integration here,
the core of which is a single term:

integrate::(a->Double)->Measurea->Doubleintegratef(Measurenu)=nuf

Note that this is the same way we’d express integration mathematically; we
specify that we want to integrate a measurable function with respect to
some measure :

The only subtle difference here is that we don’t specify the space we’re
integrating over in the integral expression - instead, we’ll bake that into the
definition of the measures we create themselves. Details in a bit.

What’s interesting here is that the Giry monad is the continuation monad with
the carrier type restricted to the reals. This isn’t surprising when you think
about what’s going on here - we’re representing measures as integration
procedures, that is, programs that take a measurable function as input and
then compute its integral in some particular way. A measure, as we’ve
implemented it here, is just a ‘program with a missing piece’. And this is
exactly the essence of the continuation monad in Haskell.

Typeclass Instances

We can fill out the functor, applicative, and monad instances mechanically by
reference to the a standard continuation monad implementation, and each
instance gives us some familiar conceptual structure or operation on
probability measures. Let’s take a look.

The functor instance lets us transform the support of a measurable space
while keeping its density structure invariant. If we have:

then mapping a measurable function over the measure corresponds to:

The functor structure allows us to precisely express a pushforward measure or
distribution of under . It lets us ‘adapt’ a measure to other
measurable spaces, just like a good functor should.

In Haskell, the functor instance corresponds exactly to the math:

instanceFunctorMeasurewherefmapgnu=Measure$\f->integrate(f.g)nu

The monad instance is exactly the Giry monad structure that we developed
previously, and it allows us to sequence probability measures together by
marginalizing one into another. We’ll write it in terms of bind, of course,
which went like:

Finally there’s the Applicative instance, which as I mentioned in the last
post is sort of conceptually weird here. So in the spirit of that
comment, I’m going to dodge any formal justification for now and just use the
following instance which works in practice:

where denotes Lebesgue measure. I could expand this further or simplify
things a little more (the beta and binomial are conjugates) but you get
the point, which is that we have a way to evaluate the integral.

What is really required here then is to be able to encode into the
definitions of measures like and the method of integration
to use when evaluating them. For measures absolutely continuous w/respect to
Lebesgue measure, we can use the Riemann integral over the reals. For measures
absolutely continuous w/respect to counting measure, we can use a sum of
products. In both cases, we’ll also need to supply the density or mass
function by which the integral should be evaluated.

Creating Measures

Recall that we are representing measures as integration procedures. So to
create one is to define a program by which we’ll perform integration.

Let’s start with the conceptually simpler case of a probability measure that’s
absolutely continuous with respect to counting measure. We need to provide
a support (the region for which probability is greater than 0) and a
probability mass function (so that we can weight every point appropriately).
Then we just want to integrate a function by evaluating it at every point in
the support, multiplying the result by that point’s probability mass, and
summing everything up. In code, this translates to:

The second example involves measures over the real line that are absolutely
continuous with respect to Lebesgue measure. In this case we want to evaluate
a Riemann integral over the entire real line, which is going to necessitate
approximation on our part. There are a bunch of methods out there for
approximating integrals, but a simple one for one-dimensional problems like
this is quadrature, an implementation for which Ed Kmett has handily
packaged up in his integration package:

Here we’re using quadrature to approximate the integral, but otherwise it has
a similar form as ‘fromMassFunction’. The difference here is that we’re
integrating over the entire real line, and so don’t have to supply a support
explicitly.

We can use this to create a beta measure (where logBeta again comes from
Numeric.SpecFunctions):

Note that since we’re going to be integrating over the entire real line and the
beta distribution has support only over , we need to implicitly
define the support here by specifying which regions of the domain will lead to
a density of 0.

In any case, now that we’ve constructed those things we can just use
a monadic bind to create the beta-binomial measure we described before. It
masks a lot of under-the-hood complexity.

There are a couple of other useful ways to create measures, but the most
notable is to use a sample in order to create an empirical measure.
This is equivalent to passing in a specific support for which the mass function
assigns equal probability to every element; I’ll use Gabriel Gonzalez’s
foldl package here as it’s pretty elegant:

Though I won’t demonstrate it here, you can use this approach to also create
measures from sampling functions or random variables that use a source of
randomness - just draw a sample from the function and pipe the result into
‘fromSample’.

Querying Measures

To query a measure is to simply get some result out of it, and we do that by
integrating some measurable function against it. The easiest thing to do is to
just take a straightforward expectation by integrating the identity function;
for example, here’s the expected value of a beta(10, 10) measure:

> integrate id (beta 10 10)
0.49999999999501316

The expected value of a beta(, ) distribution is , so we can verify analytically that the result should be
0.5. We observe a bit of numerical imprecision here because, if you’ll recall,
we’re just approximating the integral via quadrature. For measures created
via ‘fromMassFunction’ we don’t need to use quadrature, so we won’t observe the
same kind of approximation error. Here’s the expected value of a binomial(10,
0.5) measure, for example:

> integrate fromIntegral (binomial 10 0.5)
5.0

Note here that we’re integrating the ‘fromIntegral’ function against the
binomial measure. This is because the binomial measure is defined over the
integers, rather than the reals, and we always need to evaluate to a real
when we integrate. That’s part of the definition of a measure!

Let’s calculate the expectation of the beta-binomial distribution with , , and :

> integrate fromIntegral (betaBinomial 10 1 8)
1.108635884924813

Neato. And since we can integrate like this, we can really compute any of the
moments of a measure. The first raw moment is what we’ve been doing
here, and is called the expectation:

expectation::MeasureDouble->Doubleexpectation=integrateid

The second (central) moment is the variance. Here I mean variance in the
moment-based sense, rather than as the possibly better-known sample variance:

Ahhh, but remember: the binomial measure is defined over the integers, so we
can’t integrate it directly. No matter - the functorial structure allows us to
adapt it to any other measurable space via a measurable function:

> variance (fmap fromIntegral (binomial 10 0.5))
2.5

Expectation and variance (and other moments) are pretty well-known, but you can
do more exotic things as well. You can calculate the moment generating
function for a measure, for example:

And probably the least interesting query of all is the simple ‘volume’, which
calculates the total measure of a space. For any probability measure this must
obviously be one, so it can at least be used as a quick sanity check:

volume::MeasureDouble->Doublevolume=integrate(const1)

Convolution and Friends

I mentioned in the last post that applicativeness corresponds to
independence in some sense, and that independent measures over the same
measurable space can be convolved together, à la:

for measures and on . In Haskell-land it’s well-known
that any applicative instance gives you a free ‘Num’ instance, and the story is
no different here:

First, consider a chi-squared measure with degrees of freedom.
We could create this directly using a density function, but instead we can
represent it by summing up independent squared Gaussian measures:

To sanity check the result, we can compute the mean and variance of a
measure, which should be and respectively for :

> expectation (chisq 2)
2.0000000000000004
> variance (chisq 2)
4.0

As a second example, consider a product of independent Gaussian measures. This
is a trickier distribution to deal with analytically (see here), but we
can use some well-known identities for general independent measures in order to
verify our results. For any independent measures and , we have:

and

for the expectation and variance of their product. So for a product of
independent Gaussians w/parameters (1, 2) and (2, 3) respectively, we expect to
see 2 for its expectation and 61 for its variance:

Wrapping Up

And there you have it, a continuation-based implementation of the Giry monad.
You can find a bunch of code with similar functionality to this packaged up in
my old measurable library on GitHub if you’d like to play around with
the concepts.

That library has accumulated a few stars since I first pushed it up in 2013. I
think a lot of people are curious about these weird measure things, and this
framework at least gives you the ability to play around with a representation
for measures directly. I found it particularly useful for really grokking,
say, that integrating some function against a probability measure
is identical to integrating the identity function against the probability
measure . And there are a few similar concepts
there that I find really pop out when you play with measures directly, rather
than when one just works with them on paper.

But let me now tell you why the Giry monad sucks in practice.

Take a look at this integral expression, which is brought about due to a
monadic bind:

For simplicitly, let’s assume that is discrete and has cardinality
. This means that the integral reduces to

for and the appropriate Radon-Nikodym derivatives. You
can see that the total number of operations involved in the integral is
, and indeed, for monadic binds the computational complexity
involved in evaluating all the integrals involved is exponential, on the order
of . It was no coincidence that I demonstrated a variance calculation
for a distribution instead of for a .

This isn’t really much of a surprise - the cottage industry of approximating
integrals exists because integration is hard in practice, and integration is
surely best avoided whenever one can get away with doing so. Vikash
Mansinghka’s quote on this topic is fitting: “don’t calculate probabilities -
sample good guesses.” I’ll also add: relegate the measures to measure theory,
where they seem to belong.

The Giry monad is a lovely abstract construction for formalizing the monadic
structure of probability, and as canonical probabilistic objects, measures and
integrals are tremendously useful when working theoretically. But they’re a
complete non-starter when it comes to getting anything nontrivial done in
practice. For that, there are far more useful representations for probability
distributions in Haskell - notably, the sampling function or random variable
representation found in things like
mwc-probability/mwc-random-monad and random-fu, or even
better, the structural representation based on free or operational monads like
I’ve written about before, or that you can find in something like
monad-bayes.

The intuitions gleaned from playing with the Giry monad carry over precisely to
other representations for the probability monad. In all cases, ‘return’ will
correspond, semantically, to constructing a Dirac distribution at a point,
while ‘bind’ will correspond to a marginalizing operator. The same is true for
the underlying (applicative) functor structure: ‘fmap’ always corresponds to a
density-preserving transformation of the support, while applicativeness
corresponds to independence (yielding convolution, etc.). And you have to
admit, the connection to continuations is pretty cool.

There is clearly some connection to the codensity monad as well, but I
think I’ll let someone else figure out the specifics of that one. Something
something right-Kan extension..