Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 14, 2017

Entropy Modulo a Prime

Posted by Tom Leinster

In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The 1121\tfrac{1}{2}-logarithm”.

Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:

Conclusion: If we have a random variable ξ\xi which takes finitely many values with all probabilities in ℚ\mathbb{Q} then we can define not only the transcendental number H(ξ)H(\xi) but also its “residues modulo pp” for almost all primes pp !

Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.

The “HH” that Kontsevich uses here is the symbol for entropy — or
more exactly, Shannon entropy. So, I’ll begin by recalling what that is.
That will pave the way for what I really want to talk about, which is
a kind of entropy for probability distributions where the “probabilities”
are not real numbers, but elements of the field ℤ/pℤ\mathbb{Z}/p\mathbb{Z} of integers modulo a
prime pp.

Let π=(π1,…,πn)\pi = (\pi_1, \ldots, \pi_n) be a finite probability distribution.
(It would be more usual to write a probability distribution as pp, but I
want to reserve that letter for prime numbers.) The
entropy of π\pi is

Usually this is just written as HH, but I want to emphasize the role of
the real numbers here: both the probabilities πi\pi_i and the entropy
Hℝ(π)H_\mathbb{R}(\pi) belong to ℝ\mathbb{R}.

There are applications of entropy in dozens of branches of science… but
none will be relevant here! This is purely a mathematical story, though if
anyone can think of any possible application or interpretation of entropy
modulo a prime, I’d love to hear it.

The challenge now is to find the correct analogue of entropy when the
field ℝ\mathbb{R} is replaced by the field ℤ/pℤ\mathbb{Z}/p\mathbb{Z} of
integers mod pp, for any prime pp. So, we want to define a kind of
entropy

Hp(π1,…,πn)∈ℤ/pℤ
H_p(\pi_1, \ldots, \pi_n) \in \mathbb{Z}/p\mathbb{Z}

when πi∈ℤ/pℤ\pi_i \in \mathbb{Z}/p\mathbb{Z}.

We immediately run into an obstacle. Over ℝ\mathbb{R}, probabilities are
required to be nonnegative. Indeed, the logarithm in the definition of
entropy doesn’t make sense otherwise. But in ℤ/pℤ\mathbb{Z}/p\mathbb{Z},
there is no notion of positive or negative. So, what are we even going to
define the entropy of?

… and try to write down the analogous formula over
ℤ/pℤ\mathbb{Z}/p\mathbb{Z}.

The immediate question is: what should play the role of the logarithm mod pp?

The crucial property of the ordinary logarithm is that it converts
multiplication into addition. Specifically, we’re concerned here with
logarithms of nonzero probabilities, and log\log
defines a homomorphism from the multiplicative group (0,1](0, 1] of nonzero
probabilities to the additive group ℝ\mathbb{R}.

Mod pp, then, we want a homomorphism from the multiplicative group
(ℤ/pℤ)×(\mathbb{Z}/p\mathbb{Z})^\times of nonzero probabilities to the additive
group ℤ/pℤ\mathbb{Z}/p\mathbb{Z}. And here we hit another obstacle: a simple argument using Lagrange’s
theorem shows that apart from the zero map, no such homomorphism exists.

So, we seem to be stuck. Actually, we’re stuck in a way that often happens when
you try to construct something new, working by analogy with something old:
slavishly imitating the old situation, symbol for symbol, often doesn’t
work. In the most interesting analogies, there are wrinkles.

To make some progress, instead of looking at the formula for entropy,
let’s look at the properties of entropy.

The most important property is a kind of recursivity. In the language
spoken by many patrons of the Café, finite probability distributions form
an operad. Explicitly, this means the following.

Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I
draw from a pack of cards. This is a two-stage process with 58 possible
final outcomes: either the face of a die or a playing card. Assuming that
the coin toss, die roll and card draw are all fair, the probability
distribution on the 58 outcomes is

For example, take the coin-die-card process above. Writing unu_n for the
uniform distribution on nn elements, the final distribution on 5858
elements is u2∘(u6,u52)u_2 \circ (u_6, u_{52}), which I wrote out explicitly above.

The important recursivity property of entropy is called the
chain rule, and it states that

It’s easy to check that this is true. (It’s also nice to understand it
in terms of information… but if I follow every tempting explanatory byway, I’ll run
out of energy too soon.) And in fact, it characterizes entropy almost
uniquely:

Theorem Let II be a function assigning a real number I(π)I(\pi)
to each finite probability distribution π\pi.
The following are equivalent:

II is continuous in π\pi and satisfies the chain rule;

I=cHℝI = c H_\mathbb{R} for some constant c∈ℝc \in \mathbb{R}.

The theorem as stated is due to Faddeev, and I blogged about it earlier
this
year.
In fact, you can weaken “continuous” to “measurable” (a theorem of Lee),
but that refinement won’t be important here.

What is important is this. In our quest to imitate real entropy in
ℤ/pℤ\mathbb{Z}/p\mathbb{Z}, we now have something to aim for. Namely: we
want a sequence of functions Hp:Πn→ℤ/pℤH_p : \Pi_n \to \mathbb{Z}/p\mathbb{Z}
satisfying the obvious analogue of the chain rule. And if we’re really
lucky, there will be essentially only one such sequence.

We’ll discover that this is indeed the case. Once we’ve found the right
definition of HpH_p and proved this, we can very legitimately baptize HpH_p
as “entropy mod pp” — no matter what weird and wonderful formula
might be used to define it — because it has the same characteristic
properties as entropy over ℝ\mathbb{R}.

I think I’ll leave you on that cliff-hanger. If you’d like to guess what
the definition of entropy mod pp is, go ahead! Otherwise, I’ll tell you
next time.

Posted at December 14, 2017 11:00 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3001

12 Comments & 0 Trackbacks

Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of R are the finite fields rather than the p adic fields or p adic integers.

An obvious advantage of working in the p adics is that logarithm is well defined automatically. Moreover, if we restrict our probabilities to take values in the p adic integers, then entropy is also automatically an integer and reduction makes sense.

I suppose the reduction of the p adic entropy equals whatever you have in mind over the finite fields?

Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of ℝ\mathbb{R} are the finite fields rather than the pp-adic fields or pp-adic integers.

Well, we can dream of replacing ℝ\mathbb{R} with all sorts of things! Kontsevich’s note was specifically about entropy over the field with pp elements. One can speculate about other finite fields (and maybe someone’s already figured it out). And I agree that ℚp\mathbb{Q}_p and ℤp\mathbb{Z}_p are also natural contexts.

I’m not sure I know what pp-adic entropy is. A few times, I’ve browsed a 2006 paper of Deninger, and there’s a definition of pp-adic entropy at the top of page 2. But that’s a definition of the pp-adic entropy of a group action (an important special case being ℤ\mathbb{Z}-actions, i.e. automorphisms), whereas here we’re talking about entropy of probability distributions. What’s the connection between the two?

Or to make it specific, how would you calculate (say) H3(1,2,2,2)H_3(1, 2, 2, 2) using the 33-adic approach that you outline?

Re: Entropy Modulo a Prime

An obvious advantage of working in the p adics is that logarithm is well defined automatically.

The usual logarithm formula converges for π∈ℚp\pi \in \mathbb{Q}_p iff |1−π|<1|1-\pi| \lt 1, i.e. π∈ℤp×\pi \in \mathbb{Z}^\times_p. Because ℤp×\mathbb{Z}_p^\times is a group under multiplication, ℤp×\mathbb{Z}_p^\times-valued measures on finite spaces are closed under the operadic composition Tom describes. And the same calculations which show the chain rule holds in the usual case should show it also holds here.

I’m not sure what to think of the condition of being a probability measure in this context.

If we define a probability measure π\pi to be one with ∑kπ(k)=1\sum_k \pi(k) = 1, then there are no ℤp×\mathbb{Z}^\times_p-valued probability measures on an nn-element set unless n≡1n \equiv 1 mod pp. This seems like a strange restriction.

Another option might be to define a probability measure π\pi to be one such that |∑kπ(k)|=1|\sum_k \pi(k)| = 1. With this definition, there are no ℤp×\mathbb{Z}^\times_p-valued probability measures on nn unless nn is a unit mod pp, and moreover in this case every ℤp×\mathbb{Z}^\times_p-valued measure is a probability measure. That feels a little more natural.

Unfortunately for my feelings of “naturalness”, it appears that that the first, but not the second option, is closed under operadic composition. One might go with a hybrid option where one simply defines a probability measure on a finite set to be a ℤp×\mathbb{Z}_p^\times-valued measure on a finite sset whose cardinality is 1 mod pp.

Any of these options also starts to resemble the conditions for π\pi to be an ℝ\mathbb{R}-valued probability measure: first, to be in the radius of convergence of the logarithm formula, the values π(k)\pi(k) should be positive [and <2\lt 2], while to be a probability measure the π(k)\pi(k) should also add up to 1 (in absolute value, for the second or third option).

Re: Entropy Modulo a Prime

Just a few thoughts :

The p adic logarithm is usually extended to all of Q_p by defining log p = 0. I don’t know if that is the right thing to do in this case.

2 The interval [0,1] is not all that natural in a number theoritic /adelic perspective. Maybe instead of working with probability, we can work with odds (p/(1-p))which have range (0,infinity) which is a more natural thing to consider since it is a connected subgroup R-0.

At least on class field theory, the right analogues of this connected component are the p adic numbers congruent to 1 modulo a fixed power of p. Once again, I don’t know if this is the right thing to look at in this context.

Re: Entropy Modulo a Prime

Note that one needs to specify the value of the extension of the logarithm not just on the uniformiser, but on Teichmüller lifts. At least according to Wikipedia, the Iwasawa logarithm simply ignores the Teichmüller lift, which means (it seems to me) that we don’t understand anything better about the finite-field logarithm by using this lift.

Re: Entropy Modulo a Prime

You say that the pp-adic logarithm log(x)\log(x) is defined if and only if |1−x|<1|1 - x| &lt; 1, with which I agree; but then you say that the domain of the pp-adic logarithm is ℤp⨉&#8484;_p^&#10761;, and I think that you must mean 1+pℤp1 + p&#8484;_p. (If it were really defined on ℤp⨉&#8484;_p^&#10761;, then, assuming it behaved nicely in a sense to be obvious from the rest of this sentence, we’d have a reasonable candidate 𝔽p⨉→ℤp⨉→ℤp→𝔽p&#120125;_p^&#10761; &#8594; &#8484;_p^&#10761; &#8594; &#8484;_p &#8594; &#120125;_p for the finite-field logarithm.) (Sorry about those enormous multiplication signs; I can’t figure out how to get them reasonably sized.)

Re: Entropy Modulo a Prime

Re: Entropy Modulo a Prime

Huh; I could have sworn I tried exactly that (except that I usually omit braces around my single-letter arguments), and the post just stopped entirely immediately before the first backslash. Maybe I should have been using a filter other than “Markdown with itex to MathML”?

As a test: ℤp×\mathbb{Z}_p^\times. Huh, looks like it’s the braces that make all the difference. Here it is (or, rather, isn’t) without:

Re: Entropy Modulo a Prime

… the observation was that the power serieslog(1+x)=∑nxnnlog (1+x) = \sum_n \frac{x^n}{n} only converges to define a function for |1−x|<1|1-x| \lt 1; this is just as true in ℝ\mathbb{R} or ℂ\mathbb{C} as it is in ℤ(p)\mathbb{Z}_{(p)}. One then makes reasonable choices to analytically extend the function so defined…

Re: Entropy Modulo a Prime

OK, so where have we got to? As I understand it, the current status is as follows:

I asked the question: how should we define the entropy of a “probability distribution mod pp”, i.e., a finite tuple of elements of ℤ/pℤ\mathbb{Z}/p\mathbb{Z} summing to 11? (That entropy should also be an element of ℤ/pℤ\mathbb{Z}/p\mathbb{Z}.) I’ll answer the question in the next post, but I’m also interested in learning different approaches or viewpoints.

Asvin mentioned something called “pp-adic entropy”. I’d met that concept before, but only the concept of the pp-adic entropy of a group action, not of anything like a probability distribution of pp-adic numbers. I haven’t yet seen any literature on pp-adic entropy of that kind (and would be glad to be pointed to some).

Asvin also pointed out that there’s such a thing as the pp-adic logarithm. So, given π=(π1,…,πn)∈ℚpn\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{Q}_p^n summing to 11, we could attempt to define the entropy of π\pi by the usual formula —
H(π)=−∑i:πi≠0πilogπi
H(\pi)
=
-\sum_{i: \pi_i \neq 0} \pi_i \log \pi_i
— but now interpreting the logarithm as a pp-adic logarithm. However…

Tim pointed out that the power series defining the pp-adic logarithm only converges in the disc of radius 11 about 11, and deduced that in this sense, the formula above is only well-defined if the cardinality of the support of π\pi is congruent to 11 mod pp. So, it’s usually not well-defined. But…

Asvin then observed that the pp-adic logarithm can be extended to ℚp×\mathbb{Q}_p^\times (or even ℂp×\mathbb{C}_p^\times) in such a way that the usual functional equation
log(zw)=log(z)+log(w)
\log(z w) = \log(z) + \log(w)
holds everywhere and log(p)=0\log(p) = 0. According to Wikipedia, this is called the Iwasawa logarithm.

So, we can use the Iwasawa logarithm to define the entropy H(π)H(\pi) of any finite tuple π=(π1,…,πn)∈ℂpn\pi = (\pi_1, \ldots, \pi_n) \in \mathbb{C}_p^n summing to 11. Simply take Shannon’s formula (above) and interpret log\log as the Iwasawa logarithm. Then H(π)H(\pi) is also an element of ℂp\mathbb{C}_p.

What I’m not getting is how you could use this definition of pp-adic entropy to obtain a definition of the entropy of a probability distribution in ℤ/pℤ\mathbb{Z}/p\mathbb{Z}. Asvin, could you explain? E.g. can we try the example of p=3p = 3 and π=(1,2,2,2)\pi = (1, 2, 2, 2)?