Saturday, April 07, 2012

Introduction
The entropy of a probability distribution can be seen as a measure of its uncertainty or a measure of the diversity of samples taken from it. Over the years I've talked lots about how probability theory gives rise to a monad. This suggests the possibility that maybe the notion of entropy can be generalised to monads other than probability. So here goes...

Shannon entropy
I've talked in the past about how there is some trickiness with defining the probability monad in Haskell because a good implementation requires use of the Eq typeclass, and hence restricted monads. Restricted monads are possible through a bunch of methods, but this time I don't want them.

It's common to represent probability distributions on finite sets as lists of pairs where each pair (p, x) means x has a probability p. But I'm going to allow lists without the restriction that each x appears once and make my code work with these generalised distributions. When I compute the entropy, say, it will only be the usual entropy in the case that each x in the list is unique.

An important property of entropy is known as the grouping property which can be illustrated through an example tree like this:

The entropy for the probability distribution of the final leaves is the sum of two components: (1) the entropy of the branch at the root of the tree and (2) the expected entropy of the subtrees. Here's some corresponding code. First simple bernoulli trials:

In English: the expectation of certainty is just the certain value, and the expectation of an expectation is just the expectation. But these rules are precisely the conditions that define an -algebra, where is a monad.

So let's define a type class:

> class Algebra m a | m -> a where
> expectation :: m a -> a

We'll assume that when m is a monad, any instance satisfies the two laws above. Here's the instance for probability:

Binary trees
It's not hard to find other structures that satisfy these laws if we cheat and use alternative structures to represent probabilities. For example We can make Tree an instance by assuming Fork represents a 50/50 chance of going one way or another:

Lists
We could make non-empty lists into an instance by assuming a uniform distribution on the list. But another way to measure the diversity is simply to count the elements. We subtract one so that [x] corresponds to diversity zero. This subtraction gives us a non-trivial instance:

Tsallis entropy
There are measures of diversity for probability distributions that are distinct from Shannon entropy. An example is Tsallis entropy. At this point I'd like a family of types parametrised by reals but Haskell doesn't support dependent types. So I'll just fix a real number q and we can define:

Operads
This is all derived from Tom Leinster's post last year at the n-category cafe. As I talked about here there's a close relationship between monads and operads. Operads area a bit like container monads where the containers don't contain anything, but just have holes where contents could be placed. This makes operads a better place to work because you don't have the awkward issue I started with: having to disallow lists of value/probability pairs where the same value can appear more than once. Nonetheless, in (unrestricted) Haskell monads you don't have Eq available so you can't actually have definitions of return or >>= that can notice the equality of two elements. If such definitions were possible, the grouping law would no longer work as stated above.

Crossed homomorphisms
The generalised grouping law even makes sense for very different monads. For the Reader monad the law gives the definition of a crossed homomorphism. It's pretty weird seeing a notion from group cohomology emerge like this and I recommend skipping to the final section unless you care about this sort of thing. But if you do, this is related to research I did a long time ago. This is to test that the Schwarzian derivative really does give rise to a crossed homomorphism.

We can give Q a a geometrical interpretation. The underlying type is a pair (a, C4). If we think of elements of C4 as charts charts on a piece of Riemann surface then for any , an element of (a, C4) represents a local piece of a section of the th tensor power of the canonical bundle. Ie. we can think of it as representing . I'll concentrate on the case which gives quadratic differentials. We can think of an element of ((a, C4), C4) as forms where we're composing two charts. We can collapse down to an ordinary chart by using the chain rule. Here's the code: