For many standard, well-understood theorems the proofs have been streamlined to the point where you just need to understand the proof once and you remember the general idea forever. At this point I have learned three different proofs of the Birkhoff ergodic theorem on three separate occasions and yet I still could probably not explain any of them to a friend or even sit down and recover all of the details. The problem seems to be that they all depend crucially on some frustrating little combinatorial trick, each of which was apparently invented just to service this one result. Has anyone seen a more natural approach that I might actually be able to remember? Note that I'm not necessarily looking for a short proof (those are often the worst offenders) - I'm looking for an argument that will make me feel like I could have invented it if I were given enough time.

4 Answers
4

I don't know whether this helps or whether you've already seen this before, but this made a lot more intuitive sense to me than the combinatorial approach in Halmos's book.

The key point in the proof is to prove the maximal ergodic theorem. This states that if $M_T$ is the maximal operator $M_T f= \sup_{n >0} \frac{1}{n+1} (f + Tf + \dots + T^n f)$, then $\int_{M_T f>0} f \geq 0$. Here $T$ is the associated map on functions coming from the measure-preserving transformation.

This is a weak-type inequality, and the fact that one considers the maximal operator isn't terribly surprising given how they arise in a) the proof of the Lebesgue differentation theorem (namely, via the Hardy-Littlewood maximal operator $Mf(x) = \sup_{t >0} \frac{1}{2t} \int_{x-t}^{x+t} |f(r)| dr$. b) In the theory of singular integrals, one can define a maximal operator in the same way and prove that it is $L^p$-bounded for $1 < p < \infty$ and weak-$L^1$ bounded in suitably nice homogeneous cases (e.g. the Hilbert transform). One of the consequences of this is, for instance, that the Hilbert transform can be computed a.e. via the Cauchy principal value of the usual integral. c) I'm pretty sure the boundedness of the maximal operator of the partial sums of Fourier series is used in the proof of the Carleson-Hunt theorem. So using maximal operators (and, in particular, weak bounds on them) to establish convergence is fairly standard. Once the maximal inequality has been established, it isn't usually very hard to get the pointwise convergence result, and the ergodic theorem is no exception.

The maximal ergodic theorem actually generalizes to the case where $T$ is an operator of $L^1$-norm at most 1, and thinking of it in a more general sense might meet the criteria of your question. In particular, let $T$ be as just mentioned, and consider $M_T$ described in the analogous way. Or rather, consider $M_T'f = \sup_{n \geq 0} \sum_{i=0}^n T^if$. Clearly $M_T'f >0$ iff $M_Tf >0$. Moreover, $M_T'$ has the crucial property that $T M_T' f + f = M_T' f$ whenever $M_Tf>0$.

Therefore,
$\int_{M_T'f>0} f = \int_{M_T'f>0} M_T'f - \int_{M_T'f>0} TM_T'f.$ The first part is in fact $||M_T'f||_1$ because the modified maximal operator is always nonnegative. The second part is at most $||T M_T'f|| \leq ||M_T'f||$ by the norm condition. Hence the difference is nonnegative.

Perhaps this will be useful: let $M$ be an operator (not necessarily linear) sending functions to nonnegative functions such that $(T-I)Mf = f$ wherever $Mf>0$, for $T$ an operator of $L^1$-norm at most 1). Then $\int_{Mf>0} f \geq 0$. The proof is the same.

This answer is extremely helpful - I was sort of hoping that one of the proofs that I had already encountered could be made to seem more natural in a broader context, and you accomplished exactly that. Thanks!
–
Paul SiegelJul 1 '10 at 9:59

If $f\equiv 1$ and $T=I$ we have $M_T'\equiv+\infty$, so the recurrence formula is $+\infty + f=+\infty$, which is useless. Am I missing something?
–
Marcos CossariniSep 12 '14 at 19:30

The last one seems the most easy to remember for me. This may look like a combinatorial trick, but with some thought, it appears quite natural, and I know several results that use some similar idea.

So let $\epsilon>0$ and x in X. We can find an $n(x)$ depending on x such that
$$ \overline{\lim}\ {1\over n}\ \Sigma_0^{n-1}\ f(T^k(x)) \leq \lim {1\over n(x)} \sum_{k=0}^{n_(x)-1} f(T^k(x))\ +\ \varepsilon$$
Note that n(x) is finite everywhere, hence bounded on a set R with complement of arbitrary small measure. Then cut the Birkhoff sum according to the sequence $n_{i+1}(x)=n_i(x)+n(T^{n_i}(x))$ if $T^{n_i}(x)$ is in R, $n_{i+1}(x)=n_i(x)+1$ otherwise. A picture should make clear what is going on. The rest of the proof is routine check.

Of course if you are in the business of non-standard analysis, Kamae's proof is both short and enlightening but then you need some work to get the standard statement.

Applying the lemma to $f+\epsilon$ we obtain that there is a positive measure set on which $\liminf_n \frac{S_n}{n} \ge -\epsilon$.

Letting $\epsilon$ go to zero and using ergodicity (i.e. the liminf is constant) you obtain $\liminf_n \frac{S_n}{n} \ge 0$ almost everywhere.

Do the same with $-f$ and obtain $\lim_{n}\frac{S_n}{n} = 0$ almost everywhere.$\Box$

One thing I like about this lemma is that it extends to the subadditive case. In subadditive ergodic theory the maximal inequality gives upper bounds, but not lower. But in fact the above lemma can be obtained independently and gives lower bounds. As far as I can tell it was first introduced in this context by Anders Karlsson and Gregory Margulis in "A multiplicative ergodic theorem and nonpositively curved spaces" (Communications of Mathematical Physics) 1999. Their proof of the lemma is based on a nice observation about sequences of real numbers, due to Riesz (called the Leader Lemma).

I mean the "weak $L^1$ norm" (which is not actually a norm at all) defined by
$\|g\|_{1,\infty}=\sup_a a \cdot m( \{ x\colon |g|(x)\le a \}) $
(the area of the biggest rectangle that fits under $|g|$).

This has a beautiful (and simple and intuitive) proof if you are working on the integers and $T$ is the map $T(n)=n+1$. There is then a technique called transference that allows you to take the proof in your favourite system and transfer it to any other system.

You then show that this implies that when you have a maximal inequality, the set of functions for which you get almost everywhere convergence is closed in $L^1$. (Suppose the ergodic theorem works for $f_1,f_2,\ldots$ and $\|f_n-f\|_1\to 0$. Use the maximal inequality to deduce that the ergodic theorem holds for $f$).

At this stage you know that the set of $f$'s for which the ergodic theorem holds is a closed set, so you're done if you can find a dense set of $f$'s for which this works. Since you're looking for a dense set you're OK to work with $L^2$. The result is obvious for coboundaries and the orthogonal complement of the space of coboundaries is the set of invariant functions (for which the result is even more obvious). QED.