Information Entropy and Experiments

December 24, 2007

There’s a new paper out (arxiv:0712:3572) which aims to provide a “figure of merit” for proposed experimental programs. It revolves around the concept of information entropy – an old concept from communication/information theory developed by Claude Shannon.

The basics of entropy: a communicated “symbol” – a letter or a word of a text, for example – carries information content that increases as it becomes less likely (more surprising). Intuitively this makes sense: if you know exactly what you’re going to hear (say, an airline safety announcement), you tune out because there’s no information transfer, while you pay the most attention when you can’t anticipate what’s next. Mathematically, the information content of a received symbol with a probability of occurring is . Note that this is sweeps the meaning of “information” into ; a string of digits may seem completely random (and thus each one has information ), but if you know it happens to be starting from the 170th decimal place, suddenly you can predict all the digits and the information content is essentially zero.

We would like to get is an expectation value (average) of the transmitted information: you’d like to transmit the maximum content per symbol. The expectation value – entropy – is

The logarithm factor means that transmitting an occasional highly unlikely symbol is less useful than symbols which appear at roughly equal rates – for two symbols, you get more entropy out of both appearing with a 50% probability than one at 99% and the other at 1%.

How does this relate to physics experiments? The author suggests that the proper figure of merit for an experiment (or analysis) is the expected information gain from it – or, perhaps, the information per dollar. The symbols are replaced by outcomes, like “observation/nonobservation of the Standard Model Higgs boson.” The function is obtained from our a priori theoretical biases, so for example “confirmation of Standard Model” or “discovery of low-scale supersymmetry” carry relatively high probabilities.

This leads to results he considers at odds with conventional wisdom – for example, the search for single top production, a well-predicted Standard Model process that everyone expects to be there, has low entropy (since there’s one large and one small probability), while a low-energy muon decay experiment which has good sensitivity to supersymmetry has high entropy (people think SUSY has a reasonable chance of being realized).

There’s an additional wrinkle that in general you get more entropy by having more symbols/results (in this case the log factor helps you); so the more possible outcomes an experiment has, the more information content you expect. In particular this means that global analyses of the author’s VISTA/SLEUTH type, where you try to test as many channels as possible for departures from the Standard Model, get a boost over dedicated searches for one particular channel.

It’s an interesting and thought-provoking paper, although I have a few concerns. The main one is that the probabilities are shockingly Bayesian: they are entirely driven by current prejudice (unlike the usual case in communication theory, where things are frequentist).

Recall that there’s not much entropy in experiments which have one dominantly probable outcome. On the other hand, should an extremely unlikely outcome be found, the information content of that result is large. (The author determines the most significant experimental discoveries in particle physics since the start of the 70s to be those of the τ and J/ψ. I think this implies that Mark I was the most important experiment of the last four decades.) We are thus in the paradoxical situation that the experiments that produced the most scientific content, by this criterion, are also the ones with the least a priori entropy. The J/ψ was discovered at experiments that weren’t designed specifically to search for it!

How does one compare merit between experiments? We hope the LHC can provide more than a binary yes/no on supersymmetry, for example; if it exists, we would try to measure various parameters, and this would be much more powerful than rare decay experiments that would essentially have access to one or two branching fractions. The partitioning of the space of experimental outcomes has to be correctly chosen for the entropy to be computed, and the spaces for two different experiments may be totally incommensurable. (It’s a bit simpler if you look at everything through “beyond the Standard Model” googles; with those on, your experiment either finds new physics, or it doesn’t.)

My last major complaint is that the (practical) scientific merit of certain results may be misstated by this procedure (though this is a gut feeling). The proposed metric may not really account for how an experiment’s results fit into the larger picture. Certain unlikely results – the discovery of light Higgs bosons in Υ decays, electroweak-scale quantum gravity, or something similar – would radically change in our theoretical biases, and hence expectations for other experiments. This is a version of the digit problem above; external information can alter your function in unanticipated ways. It’s unclear to me whether this can be handled in a practical manner, though I can’t claim to be an expert in this statistical realm.

In short: interesting idea, but I would be wary of suggesting that funding agencies use it quite yet.

[…] in Particle Physics at 7:35 pm by Michael Schmitt Charm etc. posted an interesting discussion on Information Entropy and Experiments. The bloggers describe an attempt to evaluate the worth of experiments on a statistical basis, […]

This seems sort of like “risk analysis” to me: taking a bunch of information and smushing it into a number in a somewhat arbitrary way, when one could probably draw better conclusions by just looking at all the information in unsmushed form.

Anyway, how in the world do you assign likelihoods to different varieties of new physics? Take a poll? Of course I’m sure that the powers that be do consider rough likelihoods when designing and funding experiments, but it seems like they should be careful about turning these guesses into hard numbers.

Of course, the author of the paper has probably sat on a lot of review committees, and I have not.

I’d agree that priors can be hard to estimate (I’ve heard some pretty funny jokes in the safety announcements on small commuter flights). There is also a bit of “expect the unexpected” about it, as your Mark I paradox shows.

On your final point, I think the “larger picture” problem is severe. The paper depends on the validity of the claim that how much is learned from a result is equivalent to how surprising the particular result is. I don’t think that holds in general, and the results of the paper are only valid in the domain where that equivalence does hold, leading to the bias towards discovery experiments and the problems with finding a FoM for precision measurements that have been noted by Michael Schmitt over on Collider Blog.