MaxEnt: the Second Law of Thermodynamics as a modeling principle in biology

Most people who’ve taken introductory physics probably remember learning the famous “Second Law of Thermodynamics”: in a closed system, entropy never decreases. Roughly speaking, we can think of entropy as a quantity that thermodynamics maximizes given some set of constraints (e.g., fixed average energy). (I’m being sloppy here with lots of details, like differences between microcanonical and canonical ensembles and between entropy and free energy, but those are beyond the scope of this little blog post.)

In thermodynamics we normally use this principle in the “forward” sense — given a model (e.g., an ideal gas with known temperature, particle mass, etc.), we calculate the probability distribution over states, from which we can calculate the total entropy and any observable quantities, such as means and correlations among degrees of freedom. However, one can also imagine the reverse version of this procedure: given a set of observables, can we infer a model that reproduces this data but with a probability distribution that maximizes entropy? This so-called inverse problem is more relevant to most areas of biology, where the underlying model is not known (unlike in physics) but must be inferred from some data. The idea is to construct a model that accounts for the data while incorporating as little extra information as possible (since a probability distribution with greater entropy encodes less information than one with lower entropy).

This “principle of maximum entropy” (or “MaxEnt” as it is sometimes stylized) has received a lot of attention in recent years among statistical physicists, especially those interested in biology, thanks to its wide applicability as a modeling principle given limited data. It has been applied to a very broad array of systems, including neurons, protein sequences, and flocks of birds. Consider neurons as a typical example. Each neuron is observed to be firing or not firing over a series of time points; there is typically enough data to robustly determine the mean state of each neuron and pairwise correlations among neurons, but typically not enough data to determine the joint probability distribution for all neurons. Thus, we can use maximum entropy to infer a model for the whole distribution of firing patterns that reproduces the observed means and correlations but with maximum entropy.

Rather than considering the probability distribution over states of the system, some recent work has considered the probability distribution over trajectories of the system — e.g., trajectories of a protein stochastically transitioning between structural states or a population evolving through genotypes. The goal is then to infer the transition rates that maximize entropy of the trajectory probability distribution. This is the basis of the following new preprint:

Here they show how to use observed steady-state probability distributions and simple dynamical averages (e.g., mean number of transitions per unit time) to infer transition rates that maximize trajectory entropy. I think this is a very interesting approach, since there are a great many important stochastic processes where some limited observations can be made, but the underlying transition rates cannot be easily measured or derived. From a theoretical standpoint, I think the most provocative result of this work is that the transition rates that maximize entropy are qualitatively different from the widely-used Metropolis transition rates. Specifically, the inferred transition rates depend on the square root ratio of the stationary probabilities (Eq. 8), rather than just the ratio as in Metropolis rates. The authors point out this key difference but do not discuss what it might mean, and indeed its significance is not at all clear to me.

My more general skepticism about this approach is the same as for other MaxEnt methods: I’m not convinced that maximum entropy is a reasonable approximation in all these systems in which it is applied. In thermodynamics maximum entropy is essentially a consequence of a more fundamental principle (the “fundamental postulate of statistical mechanics”: all microstates with the same energy are equally likely), but most systems studied with MaxEnt lack such deeper principles. So as a modeling tool it is extremely powerful and flexible, but I believe that same generality warrants caution when interpreting its results.