A cumulant is defined via the cumulant generating function
$$ g(t)\stackrel{\tiny def}{=} \sum_{n=1}^\infty \kappa_n \frac{t^n}{n},$$
where
$$
g(t)\stackrel{\tiny def}{=} \log E(e^{tX}).
$$
Cumulants have some nice properties, including additivity- that for statistically independent variables $X$ and $Y$ we have
$$
g_{X+Y}(t)=g_X(t)+g_Y(t)
$$
Additionally, in a multivariate setting, cumulants go to zero when variables are statistically independent, and so generalize correlation somehow. They are related to moments by Moebius inversion. They are a standard feature in undergraduate probability courses because they feature in a simple proof of the Central Limit Theorem (see for example here).

So the cumulants are given by a formula and have a list of good properties. A cumulant is clearly a fundamental concept, but I'm having difficulty figuring out what it is actually measuring, and how it is more than just a computational convenience.

Question: What are cumulants actually measuring? What is their conceptual meaning? Are they measuring connectivity or cohesion of something?

I apologise that this question is surely completely elementary. I'm in low dimensional topology, and I'm having difficulty wrapping my head around this elementary concept in probability; Google did not help much. I'm vaguely imagining that perhaps they are some kind of measure of "cohesion" of the probability distrbution in some sense, but I have no idea how.

$\begingroup$A great introduction to both classical and free cumulants is "Three lectures on free probability" by Novak and LaCroix arxiv.org/abs/1205.2097.$\endgroup$
– Tom CopelandFeb 20 '15 at 22:13

5 Answers
5

Cumulants have many other names depending on the context (statistics, quantum field theory, statistical mechanics,...): seminvariants, truncated correlation functions, connected correlation functions, Ursell functions...
I would say that the $n$-th cumulant $\langle X_1,\ldots,X_n\rangle^{T}$
of random variables $X_1,\ldots,X_n$ measures the interaction of the variables
which is genuinely of $n$-body type.
By interaction I mean the opposite of independence. Denoting the expectation by $\langle\cdot\rangle$ as in statistical mechanics, independence implies the factorization
$$
\langle X_1\cdots X_n\rangle=\langle X_1\rangle\cdots\langle X_n\rangle\ .
$$
If the variables are Gaussian and centered then for instance
$$
\langle X_1 X_2 X_3 X_4\rangle=\langle X_1 X_2\rangle\langle X_3 X_4\rangle
+\langle X_1 X_3\rangle\langle X_2 X_4\rangle
+\langle X_1 X_4\rangle\langle X_2 X_3\rangle
$$
so the lack of factorization is due to $2$-body interactions: namely the absence of factorization
for $\langle X_i X_j\rangle$.
The $4$-th cumulant for variables with vanishing moments of odd order would be
the difference $LHS-RHS$ for the previous equation. Thus it would measure
the "interaction" between the four variables which is due to their conspiring all together
instead of being a consequence of conspiring in groups of two at a time.
For higher cumulants, the idea is the same.

Cumulant are definitely related to connectedness. For instance for variables whose joint probability density is a multiple of a Gaussian by a factor $\exp(-V)$ where $V$
is quartic, one can at least formally write moments as a sum of Feynman diagrams.
Cumulants are given by similar sums with the additional requirement that these diagrams or graphs must be connected.

A nice question, with probably many possible answers. I'll give it a shot. I think three phenomena should be noted.

i) The cumulant function is the Laplace transform of the probability distribution. Uniqueness of Laplace transforms then tells you that the cumulant function can be used to fully characterize your probability distribution (and in particular, its properties like its connectivity or cohesion, whatever these might be). Since a probability distribution is essentially a measure, and it is often more convenient working with functions, the Laplace transform is useful. As an example, all moments may be computed from the cumulant function, and probability distributions for which the moments coincide are the same (under some extra conditions). The idea of transforming a probability distribution into a function is also exemplified by the Fourier transform of a probability distribution, i.e. the characteristic function $u \mapsto \mathbb E[e^{i u X} ]$, with $u \in \mathbb R$. For this transform there is the well known result that pointwise convergence of characteristic functions is equivalent to weak convergence (narrow convergence from analysis point of view) of corresponding probability measures. See [Williams, Probability with Martingales].

ii) Sums of independent random variables. Their probability distributions are given by convolutions, and thus hard to work with. In the Laplace/Fourier domain, this difficulty disappears.

iii) The soft-max principle. This idea plays a key role in large deviations theory. Note that $\frac 1 t \log \mathbb E[e^{t X}] \rightarrow \mathrm{ess} \sup X$ as $t \rightarrow \infty$. Related terminology is the 'Laplace approximation of an integral' in physics (see here). Extensions of this idea, combined with a little amount of convex optimization theory (in particular Legendre-Frenchel transforms), allow one to deduce estimates on the distribution of sums of (not necessarily independent) random variables. Consult e.g. the Gärtner-Ellis theorem in any textbook on large deviations theory (recommended are [Varadhan], [den Hollander] or [Dembo and Zeitouni]), or here. Again, this explains mostly why the cumulant is useful, but not really what it is.

The somewhat disappointing summary is that it seems from the above observations that the (log) cumulant function is mostly a technical device. But a very useful one.

Hopefully somebody else has a suggestion on how the cumulant function may be given a more intuitive meaning, perhaps even related to your suggestion of the cumulant function measuring cohesion of probability measures. I would certainly be interested in such an explanation.

It might help to take a broader perspective: in some contexts (notably quantum optics) the emphasis is not on cumulants but on factorial cumulants, with generating function $h(t)=\log E(t^X)$. While cumulants tell you how close a distribution is to a normal distribution, the factorial cumulants tell you how close it is to a Poisson distribution (since factorial cumulants of order two and higher vanish for a Poisson process).

So I would think that any privileged role of cumulants is linked to the prevalence of normal distributions.

$\begingroup$I don't know what happened but I seems I voted this down by mistake (instead of up), and now I cannot repair this as long as the post is not edited. So think (-(-1) + 1) = +2! Or, if you wish to make the effort, make a slight edit so I can correct my mistake (I don't want to interfere with your post myself.) My apologies!!$\endgroup$
– Joris BierkensOct 15 '13 at 10:45

$\begingroup$@JorisBierkens --- thank you, Joris, for the explanation; I made the small edit, so go ahead.$\endgroup$
– Carlo BeenakkerOct 15 '13 at 14:39

The cumulants beyond the second are all zero for a normal distribution. So intuitively they measure deviations from normality. In more detail, V.V. Petrov Sums of independent random variables (Springer-Verlag 1975) has estimates of the approach to normality in central (actually local) limit theorems, and they involve cumulants.

In statistical physics there are often sums of almost independent variables, for example displacements in a diffusion process. In this context they are called Burnett coefficients, and correspond to a diffusion equation with higher derivative terms. See for example H. van Beijeren, Rev. Mod. Phys. 54, 195-234 (1982); R. Steinigeweg and T. Prosen, Phys. Rev. E 87, 050103(R) (2013).

$\begingroup$Thank you for this. In what sense are they a "natural" measure of deviation from normality, more so than the moment generating function itself? What does "deviation from normality" mean, precisely?$\endgroup$
– Daniel MoskovichOct 14 '13 at 12:23

2

$\begingroup$@Daniel Clearly there is a 1:1 correspondence between moments and cumulants - they convey the same information. The naturalness is what you have already mentioned, the additivity, and the fact they are zero exactly at an interesting limit, the normal distribution.$\endgroup$
– user25199Oct 14 '13 at 12:47

$\begingroup$Yes- I made a silly comment without thinking things through. Now I understand your answer better- thanks! I'll also have a look at the book by Petrov which you recommended.$\endgroup$
– Daniel MoskovichOct 14 '13 at 13:25

Suppose you have N billiard balls on a pool table. If N is not too large, the collisions (hence correlations) will mostly include two balls at a time. However, if you add more balls on the table, you will start seeing new types of collisions where three, four and more balls would hit each other at the same time (one ball hitting the second and then the third still falls under the category of two-ball collisions).

The n-th cumulant quantifies the probability of n ball bumping each other at the same time at the same point. We can show the fourth cumulant with this diagram

Cumulants are also called "connected" correlation functions, and it is so depicted in the diagram above.

The n-th moment quantifies the probability of n ball bumping each other at the same time at the same point as well as some balls hitting only some other ones. For example, fourth moment includes four-ball correlations (as in the diagram above) as well as three-, two- and one-ball correlations (the last one is the "mean"). The fourth moment will include, but not limited to, three disconnected four-ball diagrams constructed from two connected two-ball diagrams as shown below

These diagrams are "disconnected;" they contribute to the fourth moment, but they do not contribute to the fourth cumulant.

So, moments quantify correlations in general (both connected and disconnected) whereas the cumulants quantify only the direct, simultaneous correlations (connected). For example, consider three balls. That one ball hitting the second but not the third will contribute to the third moment, but not to the third cumulant.

(Warning: This explanation includes oversimplifications for pedagogical purposes. Here I used simultaneity for colliding billiard balls (contact interaction); however, simultaneity is not a necessary condition in general.)

$\begingroup$Although I am not a mathematician, I found this explanation to be quite intuitive. Incidentally, it is the main reason (or explains) why cumulants are used in SOFI imaging instead of other correlation measures. I ended up here trying to figure out the significance of cumulants in SOFI :)$\endgroup$
– KrisSep 14 '15 at 16:17

$\begingroup$Actually, is there some kind of resource that somehow explains why this is so? One of the previous answers also goes into the difference between the "simultaneous" correlation and pairwise ones...$\endgroup$
– KrisSep 14 '15 at 16:27