In statistical mechanics, the Boltzmann distribution gives the probability of a system being in state $i$ as

$$\displaystyle \frac{e^{- \beta E_i}}{\sum_i e^{-\beta E_i}}$$

where $E_i$ is the energy of state $i$. I have generally seen this demonstrated, starting with some reasonable physical assumptions, via a heat bath argument (as exposited e.g. by Terence Tao) involving interactions between the system and a larger external system. For me, an unsatisfying aspect of the heat bath argument is that it doesn't give me a strong reason to expect that a fundamental function like the exponential should appear at the end.

Here is what I think could be an argument which accomplishes that. By inspection, the Boltzmann distribution only depends on the relative energies of the different states. Under some mild assumptions this actually characterizes the Boltzmann distribution. Let us suppose there is a non-negative function $f(E)$ such that WLOG $f(0) = 1$ and such that the probability of a system being in state $i$ is

$$\displaystyle \frac{f(E_i)}{\sum_i f(E_i)}.$$

Let us suppose that the system has two states. Then the statement that the Boltzmann distribution only depends on the relative energies turns out to be equivalent to the functional equation $f(x + y) = f(x) f(y)$, which under any kind of continuity assumption whatsoever gives $f(x) = e^{ax}$ for some constant $a$.

Question 1: How can this argument be fleshed out? In particular, what physical principle would suggest that the Boltzmann distribution only depends on the relative energies of the states? (I seem to recall from my high-school physics lessons that energies are only well-defined up to an additive constant, but I would really appreciate some clarification on this issue.)

Question 2: How does this argument relate to the heat bath argument or the combinatorial argument given, for example, at Wikipedia?

(Motivation: some important functions in mathematics, like the Jones polynomial and various zeta functions, can be interpreted as partition functions of certain statistical-mechanical systems, and I am trying to sharpen my physical intuition about these constructions.)

Hi Qiaochu, did you read before about the variational method ? I liked it. I wrote a post about that question, because of a question of a friend of mine. Unfortunately it was written in Portuguese. If you want to take a look in this approach the link is : leandromat.wordpress.com/2010/07/04/… It is a very basic text and it was wrote with help of these books: Thermodynamic Formalism - David Ruelle Entropy, Large Deviations; and Statistical Mechanics - Richard Ellis; Equilibrium states in ergodic theory - Gerhard Keller.
–
Leandro Jul 11 '10 at 21:40

5 Answers
5

Like Andreas, I find a maximum entropy argument to be intellectually appealing. However,
he says the solution can be found by Lagrange multipliers and I don't know the justification for using Lagrange multipliers. That is, in the space of all probability distributions on the particles, how do you know the maximum entropy solution is really accessible to variational methods?

Thanks! That paper was very helpful. Is it correct to say that the dependence on relative energies comes from the mean-energy constraint?
–
Qiaochu YuanJul 11 '10 at 22:24

The question is in part about the non-finite case, but even in the finite case how do you know in advance that the max. entropy distr. does not lie on the boundary of the convex space of prob. distr., where one of the particle probabilities is 0? You need to rule out the answer being located there to know that the answer by variational methods is max. over all the possibilities. (In fact the max. entropy distr. is on the boundary for a finite state space if you want the avg. energy to be min or max of the $E_i$'s, so there is something to show in the other "non-degenerate" situations.)
–
KConradJul 11 '10 at 22:32

Qiaochu, yes if you read Theorem 4.9 you will see there is a mean-energy constraint $\sum q_jE_j = \langle E\rangle$. That is the only condition imposed, along with the necessary condition that your choice of $\langle E\rangle$ has to lie in the closed interval between the min and max of the $E_i$'s. (If you want $\langle E\rangle$ to be outside that range then of course there's no answer.)
–
KConradJul 11 '10 at 22:36

Since I asked a question in my answer about why the variational method is justifiable even though the space of prob. distributions has a boundary, I should add that the variational method does have the virtue of telling us what form the answer ought to be! A downside to the nonvariational proof in the link I give is that it doesn't explain where the family of Boltzmann distr. comes from. I see two parts: (1) variational methods tell us what kind of answer to expect and (2) we then need a proof taking the whole space, incl. the boundary, into account. I don't know how to do (2) variationally.
–
KConradJul 11 '10 at 22:49

Thanks, Steve. I don't think I have a clear understanding of why energy is only defined up to an additive constant. Do you know anywhere this issue is clarified?
–
Qiaochu YuanJul 12 '10 at 1:02

The equations of motion are always invariant under the transformation $U \mapsto U + const$ of any potential. This is a fancy way of talking about the work-energy theorem.
–
Steve HuntsmanJul 12 '10 at 1:22

1

BTW, I always thought it was funny that Feynman (not to mention anyone else) never did this, especially given his observation about this invariance in his statistical physics lectures. See the footnote on page 3: books.google.com/…
–
Steve HuntsmanJul 12 '10 at 1:44

2

I wondered the same thing when I read that footnote, actually. (It's mildly annoying that Feynman didn't state a continuity hypothesis - I guess he didn't know about pathological solutions to the Cauchy functional equation.)
–
Qiaochu YuanJul 12 '10 at 1:56

1

Even if Feynman had known about them, he probably wouldn't have mentioned them—not really his style (even compared to other physicists) to let mathematical pathologies derail physical reasoning.
–
Steve HuntsmanJul 12 '10 at 2:34

For me the clearest derivation of the Boltzmann distribution is by maximizing the entropy $\sum n_i \ln(n_i)$ unter the constraint of constant total energy $\sum n_i E_i = \text{const.}$ and constant total particle number $\sum n_i = \text{const.}$. The Lagrange multiplicator for the first constraint gives $\beta$. You can immediately see that a shift of the energies does not change the distribution.

That shifts the source of my confusion to what the rationale behind the definition of entropy is!
–
Qiaochu YuanJul 11 '10 at 21:34

1

Qioachu, see Theorem 5.1 of the link I put in my answer for a justification of the formula for entropy (on finite sample spaces). Section 6 may also be interesting to you in terms of the relation between maximum entropy and invariance.
–
KConradJul 11 '10 at 21:43

This answer is just an expanding version of Kconrad answer's. I am posting it here because this argument support the variational method for finite state space and also touch in the observation made by Kconrad about a technical issue about boundary values of the variational approach.
Proposition: Suppose $\Omega$ non empty finite set and let $\mathcal M$ denote the set of the probability measures on $\Omega$ then
$$
\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]=\log Z
$$
moreover, the supremum is attained for the measure $\mu$ given by
$$
\mu(\{\omega\})=\frac{1}{Z}e^{-U(\omega)}.
$$
Proof:
Let be $n$ the cardinality of $\Omega$. Define the function $f:\mathbb R_+^n\to\mathbb R$ by
$$
f(x_1,\ldots,x_n)=-\sum_{i=1}^n \Big[x_i\log x_i +K_ix_i\Big],
$$
where $K_i\in\mathbb R$ for all $i\in\{1,\ldots,n\}$. Consider the function $g:\mathbb R_+^n\to\mathbb R$ given by
$$
g(x_1,\ldots,x_n)=\sum_{i=1}^n x_i.
$$
We fix an enumeration for $\Omega$ and let be $K_i=U(\omega_i)$. So the following optimization problem
$$
\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]
$$
can be solved by finding a maximum for $f$ restricted to $g^{-1}(1)$. Note that for any critical point $(x_1,\ldots,x_n)$ of $f$ in $(0,\infty)^n\cap g^{-1}(1)$, it follows from the Lagrange Multipliers Theorem's that
$$
\nabla f(x_1,\ldots,x_n)=\lambda \nabla g(x_1,\ldots,x_n)
$$
for some $\lambda\in\mathbb R$, i.e.,
$$
-(\log x_i +1+K_i)=\lambda, \ \ \ \text{for all}\ i=1,\ldots,n.
$$
So for any pairs of index $i,j\in\{1,\ldots,n\}$, we have
$$
\log x_i +K_i=\log x_j+K_j
$$
taking the exponentials it follows that
$$
x_ie^{K_i}=x_je^{K_j}.
$$
Using that $\sum_{i=1}^nx_i=1$ and the above identities, we have
$$x_ie^{-K_i}=\left[1-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_j\right]e^{-K_i}$$
$$=e^{-K_i}-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_je^{-K_i}$$
So
$$
x_ie^{-K_i}=e^{-K_i}-x_i\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}e^{-K_j}.
$$
Explicting $x_i$, we show that all critical points of $f$ in $(0,\infty)^n\cap g^{-1}(1)$ are given by (here there is just one)
$$
x_i=\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}.
$$
The image of $f$ at this point is given by
$$
-\sum_{i=1}^n \left[\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\log
\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)
+K_i\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\right]
=
\log\left(\sum_{j=1}^ne^{-K_j}\right)
$$
to see that $(x_1,\ldots,x_n)$ is local maximum we can compute the Hessian and check that it is negative definite at this point.

To show that the point is global maximum point, we can compare the image of
$f$ at this point, with the value of $f$ in any point of the set
$\partial (0,\infty)^n\cap g^{-1}(1)$.
The restriction of $f$ to this set is given by
$$
f(x_1,\ldots,x_n)=-\sum_{i\in\{1,\ldots,n\}\backslash I}\Big[x_i\log x_i +K_ix_i\Big]
$$
Where $I\subset \{1,\ldots,n\}$ is a index set such that $|I|\geq 1$ e $x_i=0$ para todo
$i\in I$ .
We define $f_I$ which is a function of $n-|I|$ variables. It is maximum point can be determined in the same way and therefore we have that the max of $f_I$ is
$$
\log\left(\sum_{j\in\{1,\ldots,n\}\backslash I}e^{-K_j}\right)
$$
which is less than
$$
\log\left(\sum_{j=1}^ne^{-K_j}\right).
$$
Repeating this argument at most $n$ times we conclude that maximum of $f$ restricted to $g^{-1}(1)\cap \mathbb R^n_+$, is not attained in the boundary.

Thanks for posting this with the discussion of the boundary case. If the set $\Omega$ is countably infinite, is this method still complete? I don't know about justifications of Lagrange multipliers in that situation.
–
KConradJul 12 '10 at 1:36

@KConrad, you are welcome. About your question, unfortunately I do not know the answer.
–
Leandro Jul 12 '10 at 3:26

I absolutely love the derivation given by Landau in volume 5 on statistical physics, chapter 1. The basic idea is that since the log of the probability distribution function (i.e. the entropy) is an additive constant of the motion, it can be expressed as a linear combination of the 7 fundamental additive constants of the motion, namely the three components of momentum, the three components of angular momentum, and the energy. But since the momentum/angular momentum components can be reduced to zero with an appropriate frame of reference, the log of the distribution function depends only on some multiple of the energy, which turns out to be 1/T. We obtain the partition function naturally by normalizing the probability distribution.

I think this answers your question 1 from a physics point of view.

EDIT:

in view of the comments below, I should point out the the probability distribution I am referring to gives the probability of finding a system of N particles which obey the laws of classical mechanics in the state for which the nth particle is at position rn and moving with a velocity vn

I think this argument uses too many properties specific to systems of particles. Keith Conrad's answer shows that the underlying principles here are information-theoretic in nature and don't depend on the specific details of the physical system.
–
Qiaochu YuanJul 12 '10 at 2:07

To add to Qiaochu's comment, the physicist Edwin Jaynes (sorry, I don't know how well-known he is in physics, so maybe this looks as dumb as speaking of "the mathematician Frobenius"?) promoted the information-theoretic approach to explaining the Boltzmann distribution. See bayes.wustl.edu/etj/articles/theory.1.pdf.
–
KConradJul 12 '10 at 3:57

If you would forgive me for protesting your comment Qiaochu, then I would say that I would not know how to state a "physical principle" to show "that the Boltzmann distribution only depends on the relative energies of the states" without a reference to the energy of particles. The information theoretic approach is useful for quantum mechanics, but,in my opinion, if we want a clear picture in our head of why the Boltzmann distribution is related to the relative energy of states, we must resort to an analogy with the classical mechanics of systems of extremely large numbers of particles.
–
MattJul 12 '10 at 5:51

@Matt—see my answer for a derivation that does not rely on any of that stuff.
–
Steve HuntsmanJul 12 '10 at 6:14