The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for -actions on probability spaces . For instance, for the spherical averaging operators

(where denotes the length of the reduced word that forms ), they showed that converged pointwise almost everywhere provided that was in for some . (The need to restrict to spheres of even radius can be seen by considering the action of on the two-element set in which both generators of act by interchanging the elements, in which case is determined by the parity of .) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition to the weaker condition .

The question remained open as to whether the pointwise ergodic theorem for -actions held if one only assumed that was in . Nevo and Stein were able to establish this for the Cesáro averages , but not for itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on , but due to the non-amenability of , this inequality did not transfer to and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an ergodic theorem for iterates of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the setting, thus settling the problem in the negative:

Theorem 1 (Failure of pointwise ergodic theorem) There exists a measure-preserving -action on a probability space and a non-negative function such that for almost every .

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator on a probability space and a non-negative such that for almost every . By some standard manipulations, it suffices to show that for any given and , there exists a self-adjoint Markov operator on a probability space and a non-negative with , such that on a set of measure at least . Actually, it will be convenient to replace the Markov chain with an ancient Markov chain – that is to say, a sequence of non-negative functions for both positive and negative , such that for all . The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the version of the argument can be run using infinitely ancient chains.)

For any , let denote the claim that for any , there exists an ancient Markov chain with such that on a set of measure at least . Clearly holds since we can just take for all . Our objective is to show that holds for arbitrarily small . The heart of Ornstein’s argument is then the implication

for any , which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain on some probability space of total mass , such that attains the value of or greater almost everywhere. Assuming that the Markov process is irreducible, the will eventually converge as to the constant value of , in particular its final state will essentially stay above (up to small errors).

Now suppose we duplicate the Markov process by replacing with a double copy (giving the uniform probability measure), and using the disjoint sum of the Markov operators on and as the propagator, so that there is no interaction between the two components of this new system. Then the functions form an ancient Markov chain of mass at most that lives solely in the first half of this copy, and attains the value of or greater on almost all of the first half , but is zero on the second half. The final state of will be to stay above in the first half , but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves , of the system (I mentally think of and as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain in the previous example gets replaced by a slightly different ancient Markov chain which is more or less identical with for negative times , or for bounded positive times , but for very large values of the final state is now constant across the entire state space , and will stay above on this space.

Finally, we consider an ancient Markov chain which is basically of the form

for some large parameter and for all (the approximation becomes increasingly inaccurate for much larger than , but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces , but with the second copy delayed by a large time delay , and also attenuated in amplitude by a factor of . The total mass of this process is now . Because of the component of , we see that basically attains the value of or greater on the first half . On the second half , we work with times close to . If is large enough, would have averaged out to about at such times, but the component can get as large as here. Summing (and continuing to ignore various epsilon losses), we see that can get as large as on almost all of the second half of . This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages for a free group action can be lifted up to become powers of a Markov operator, basically by randomly assigning a “velocity vector” to one’s base point and then applying the Markov process that moves along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from to ). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with -measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case ) comes from basically considering the action of on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time , and only reaches the boundary of this ball at the time .

In a multiplicative group , the commutator of two group elements is defined as (other conventions are also in use, though they are largely equivalent for the purposes of this discussion). A group is said to be nilpotent of step (or more precisely, step ), if all iterated commutators of order or higher necessarily vanish. For instance, a group is nilpotent of order if and only if it is abelian, and it is nilpotent of order if and only if for all (i.e. all commutator elements are central), and so forth. A good example of an -step nilpotent group is the group of upper-triangular unipotent matrices (i.e. matrices with s on the diagonal and zero below the diagonal), and taking values in some ring (e.g. reals, integers, complex numbers, etc.).

Another important example of nilpotent groups arise from operations on polynomials. For instance, if is the vector space of real polynomials of one variable of degree at most , then there are two natural affine actions on . Firstly, every polynomial in gives rise to an “vertical” shift . Secondly, every gives rise to a “horizontal” shift . The group generated by these two shifts is a nilpotent group of step ; this reflects the well-known fact that a polynomial of degree vanishes once one differentiates more than times. Because of this link between nilpotentcy and polynomials, one can view nilpotent algebra as a generalisation of polynomial algebra.

Suppose one has a finite number of generators. Using abstract algebra, one can then construct the free nilpotent group of step , defined as the group generated by the subject to the relations that all commutators of order involving the generators are trivial. This is the universal object in the category of nilpotent groups of step with marked elements . In other words, given any other -step nilpotent group with marked elements , there is a unique homomorphism from the free nilpotent group to that maps each to for . In particular, the free nilpotent group is well-defined up to isomorphism in this category.

In many applications, one wants to have a more concrete description of the free nilpotent group, so that one can perform computations more easily (and in particular, be able to tell when two words in the group are equal or not). This is easy for small values of . For instance, when , is simply the free abelian group generated by , and so every element of can be described uniquely as

for some integers , with the obvious group law. Indeed, to obtain existence of this representation, one starts with any representation of in terms of the generators , and then uses the abelian property to push the factors to the far left, followed by the factors, and so forth. To show uniqueness, we observe that the group of formal abelian products is already a -step nilpotent group with marked elements , and so there must be a homomorphism from the free group to . Since distinguishes all the products from each other, the free group must also.

It is only slightly more tricky to describe the free nilpotent group of step . Using the identities

(where is the conjugate of by ) we see that whenever , one can push a positive or negative power of past a positive or negative power of , at the cost of creating a positive or negative power of , or one of its conjugates. Meanwhile, in a -step nilpotent group, all the commutators are central, and one can pull all the commutators out of a word and collect them as in the abelian case. Doing all this, we see that every element of has a representation of the form

for some integers for and for . Note that we don’t need to consider commutators for , since

and

It is possible to show also that this representation is unique, by repeating the previous argument, i.e. by showing that the set of formal products

forms a -step nilpotent group, after using the above rules to define the group operations. This can be done, but verifying the group axioms (particularly the associative law) for is unpleasantly tedious.

Once one sees this, one rapidly loses an appetite for trying to obtain a similar explicit description for free nilpotent groups for higher step, especially once one starts seeing that higher commutators obey some non-obvious identities such as the Hall-Witt identity

(a nonlinear version of the Jacobi identity in the theory of Lie algebras), which make one less certain as to the existence or uniqueness of various proposed generalisations of the representations (1) or (2). For instance, in the free -step nilpotent group, it turns out that for representations of the form

one has uniqueness but not existence (e.g. even in the simplest case , there is no place in this representation for, say, or ), but if one tries to insert more triple commutators into the representation to make up for this, one has to be careful not to lose uniqueness due to identities such as (3). One can paste these in by ad hoc means in the case, but the case looks more fearsome still, especially now that the quadruple commutators split into several distinct-looking species such as and which are nevertheless still related to each other by identities such as (3). While one can eventually disentangle this mess for any fixed and by a finite amount of combinatorial computation, it is not immediately obvious how to give an explicit description of uniformly in and .

Nevertheless, it turns out that one can give a reasonably tractable description of this group if one takes a polycyclic perspective rather than a nilpotent one – i.e. one views the free nilpotent group as a tower of group extensions of the trivial group by the cyclic group . This seems to be a fairly standard observation in group theory – I found it in this book of Magnus, Karrass, and Solitar, via this paper of Leibman – but seems not to be so widely known outside of that field, so I wanted to record it here.

Notational convention: In this post only, I will colour a statement red if it assumes the axiom of choice. (For the rest of the course, the axiom of choice will be implicitly assumed throughout.)

The famous Banach-Tarski paradox asserts that one can take the unit ball in three dimensions, divide it up into finitely many pieces, and then translate and rotate each piece so that their union is now two disjoint unit balls. As a consequence of this paradox, it is not possible to create a finitely additive measure on that is both translation and rotation invariant, which can measure every subset of , and which gives the unit ball a non-zero measure. This paradox helps explain why Lebesgue measure (which is countably additive and both translation and rotation invariant, and gives the unit ball a non-zero measure) cannot measure every set, instead being restricted to measuring sets that are Lebesgue measurable.

On the other hand, it is not possible to replicate the Banach-Tarski paradox in one or two dimensions; the unit interval in or unit disk in cannot be rearranged into two unit intervals or two unit disks using only finitely many pieces, translations, and rotations, and indeed there do exist non-trivial finitely additive measures on these spaces. However, it is possible to obtain a Banach-Tarski type paradox in one or two dimensions using countably many such pieces; this rules out the possibility of extending Lebesgue measure to a countably additive translation invariant measure on all subsets of (or any higher-dimensional space).

In these notes I would like to establish all of the above results, and tie them in with some important concepts and tools in modern group theory, most notably amenability and the ping-pong lemma. This material is not required for the rest of the course, but nevertheless has some independent interest.

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.