An Entropy Challenge

If you like computer calculations, here’s a little challenge for you. Oscar Dahlsten may have solved it, but we’d love for you to check his work. It’s pretty important for the foundations of thermodynamics, but you don’t need to know any physics or even anything beyond a little algebra to tackle it! First I’ll explain it in really simple terms, then I’ll remind you a bit of why it matters.

We’re looking for two lists of nonnegative numbers, of the same length, listed in decreasing order:

that sum to 1:

and that obey this inequality:

for all (ignoring ), yet do not obey these inequalities:

for all

Oscar’s proposed solution is this:

Can you see if this works? Is there a simpler example, like one with lists of just 3 numbers?

This question came up near the end of my post More Second Laws of Thermodynamics. I phrased the question with a bit more jargon, and said a lot more about its significance. Suppose we have two probability distributions on a finite set, say and We say majorizes if

for all when we write both lists of numbers in decreasing order. This means is ‘less flat’ than , so it should have less entropy. And indeed it does: not just for ordinary entropy, but also for Rényi entropy! The Rényi entropy of is defined by

where or . We can also define Rényi entropy for by taking a limit, and at we get the ordinary entropy

The question is whether majorization is more powerful than Rényi entropy as a tool to to tell when one probability distribution is less flat than another. I know that if majorizes its Rényi entropy is less than than that of for all Your mission, should you choose to accept it, is to show the converse is not true.

That looks convincing—thanks a lot, David! For those who haven’t thought about this at at all: we want the function graphed here to be negative for and positive for

By the way, there are simple formulas for the Rényi entropy at and which allow us to understand exactly what’s going on at these points. Oscar used them to construct his example. At we get the min-entropy and at we get the max-entropy, which is the logarithm of the number of such that .

Since both of Oscar’s probability distributions have the same max-entropy, the curve you graphed approaches zero as just as it looks.

It approaches a certain negative value as though the eye could be fooled into thinking it approaches zero.

Oscar was asking if this behavior—all Rényi entropies of p bigger than of q, but p not majorizing q—is unusual. Maybe without much extra work you could figure out roughly what ‘percentage’ of pairs (p,q) display this behavior, say for probability distributions on a 3-element set?

If you’ve already written code to decide if a given pair of probability distributions displays this behavior, then you just need to

1) decide what measure you’ll use to define the ‘percentage’ of pairs of probability distributions that display a given behavior,

2) figure out how to estimate that percentage, either using Monte Carlo estimation or some other procedure.

You can think of a probability distribution on a 3-element set as a triple

such that

and

The space of these is an equilateral triangle. Since the problem involves majorization and (implicitly) convex-linear maps, the symmetries are just the obvious symmetries of the triangle. So, it makes some sense to just use the obvious measure on this triangle, which is proportional to

So for Monte Carlo, you can just randomly pick in a uniformly distributed way, then randomly pick in a uniformly distributed way, then let .

As you know, there’s also another measure on this equilateral triangle, coming from the Fisher information metric. That would give a different answer!

Here are some simple computations. For 10,000,000 randomly selected pairs of points in the simplex (n=3), 826,289 pairs had both p and q decreasing; of these, 360,420 satisfied the Renyi inequality, and of these pairs, 325,013 had p majorizing q. So the desired proportion is ~10%.

Here is one such pair if anyone wants to verify (Renyi for all beta and p majorizes q):
(0.702035380032679, 0.25269600971504746, 0.04526861025227358)
(0.6012357521664494, 0.32989515938159686, 0.06886908845195372)

To me, it meant that there must be at least one for which this inequality does not hold.

In words, the constraint is that the cumulative sum of must be strictly smaller than that of . And actually, this cannot be true for all because clearly, when , then (though this is a bit of a degenerate case).

Yes, as soon as the inequality does not hold true for a given then we cannot say that majorizes

So, we have here two distributions and The Rényi entropy of is always smaller than that of yet does not majorize so that John’s conjecture is true that “majorize” is not equivalent to “have a smaller Rényi entropy” but is in fact a stronger statement.

To me, it meant that there must be at least one k for which this inequality does not hold.

Right. I was asking for examples where these inequalities fail for at least one ‘Majorization’ occurs when these inequalities hold for all I was looking for a probability distribution that has all its Rényi entropies less than another yet fails to majorize

John, I suspect this could for good turn into a memorable programming languages-and-implementations competition. The core of it would remain your exact problem statement. Maybe one or some of the early competitors could be seduced into implementing the infrastructure site? I guess your were unconsciously jealous of the attention London got for the Olympic Games :)

If you get a collection of code solving this problem this site would appreciate the donation if possible.http://rosettacode.org/wiki/Rosetta_Code
“Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different”

would make the entropy undefined for probability distributions that vanish at some points, and it would probably have many other evil features as well. For example, low-probability events would contribute enormously to the entropy.

In a certain analogy to thermodynamics, which I used to relate Rényi entropy to free energy, would correspond to temperatures below absolute zero. At negative temperatures, a system is more likely to be found in states of high energy than states of low energy. Negative temperatures are not completely absurd, but they only make sense for systems with an upper bound on their energy—just as positive temperatures only make sense for systems with a lower bound on their energy.

• G.P. Patil and C. Taillie, Diversity as a concept and its measurement, Journal of the American Statistical Association 77 (1982), 548-561.

For instance, Definition 5.1 defines what it means for one community to be “intrinsically more diverse” than another, in terms of transferring abundance from a more common species to a rarer one. That seems very close to majorization (which appears in Theorem 5.1).

To my knowledge, this is the first ever paper where the definition of “Tsallis” entropy is written down (some years before Tsallis, needless to say). That’s at the end of Section 3.2. It’s not merely written down, but developed. They notice that these entropies are related to the Hill numbers and Rényi entropies by increasing transformations (Section 3.3).

I don’t know whether they address the exact question that you’re asking here, but it seems quite plausible.

How To Write Math Here:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.