Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let be a measure-preserving system (a probability space with an invertible measure-preserving transformation ). Then for any , the averages converge in norm as , where .

In this post, all functions in and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of to the -invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let be a Hilbert space, and let be a unitary operator on . Then for any , the averages converge strongly in as .

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let be a measure space with a measure-preserving action of a nilpotent group . Let be polynomial sequences in (i.e. each takes the form for some and polynomials ). Then for any , the averages converge in norm as , where .

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand , one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra . There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space, which for us will be a commutative unital algebra over the reals together with a linear functional which maps to and obeys the non-negativity axiom for all . The key example to keep in mind here is of essentially bounded real-valued measurable functions with the supremum norm, and with the trace . We will also assume in our definition of commutative probability spaces that all elements of are bounded in the sense that the spectral radius is finite. (In the concrete case of , the spectral radius is just the norm.)

Given a commutative probability space, we can form an inner product on it by the formula

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on . We could complete this structure into a Hilbert space (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing as providing a semi-metric on . For future reference we record the inequalities

for any , which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

Theorem 4 (Walsh’s theorem, abstract version) Let be a commutative probability space, and let be a nilpotent group acting on by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and norm). Let be polynomial sequences. Then for any , the averages form a Cauchy sequence in (semi-)norm as .

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression can be viewed as the inner product of (say) with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average , except with the polynomials replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

As is common practice in nonstandard analysis, we will need to select a non-principal ultrafilter. Using this ultrafilter, we can now form the ultraproduct of any sequence of (standard) spaces, defined as the space of all ultralimits of sequences defined for sufficiently close to , with two sequences considered to have the same ultralimit iff they agree sufficiently close to . Any operation or relation on the standard spaces can then be defined on the nonstandard space in a natural fashion. For instance, given a sequence of standard functions , one can form their ultralimit from the nonstandard space to the nonstandard space by the formula

As usual, we call a nonstandard real bounded if we have for some standard , and infinitesimal if we have for every standard , and in the latter case we also write . Every bounded nonstandard real is infinitesimally close to a unique standard real, called the standard part of .

We will need the following fundamental properties about nonstandard analysis:

(Transfer / Los’s theorem) If for each , is a sequence of mathematical objects, spaces, or functions, with ultralimit or ultraproduct , then for any first-order predicate involving mathematical objects of the appropriate type, the claim is true if and only if .

(Overspill) If an internal set (an ultraproduct of standard sets, also known as a nonstandard set) of nonstandard numbers contains all unbounded natural numbers, then there exists a standard natural number such that contains all nonstandard numbers larger than .

(Loeb measure, hyperfinite case) If is a non-empty nonstandard finite set (i.e. the ultraproduct of standard finite sets, also known as a hyperfinite set), and the Loeb -algebra is defined as the -algebra generated by the internal subsets of , then there exists a unique countably additive probability measure on the Loeb -algebra, called Loeb measure, such that for any internal subset of , one has . See e.g. this previous blog post for the details of the construction.

To motivate the discussion that follows, let us recall some equivalent formulations of a Cauchy sequence in a pseudometric space (i.e. a generalisation of a metric space in which some distances are allowed to vanish).

Proposition 5 Let be a sequence in a pseudometric space (not necessarily complete). Let be the nonstandard extension of , taking values in the nonstandard metric space . Then the following are equivalent:

(standard Cauchy sequence) For every standard , there exists a standard such that for all standard .

(nonstandard Cauchy sequence) For every nonstandard , there exists a nonstandard such that for all nonstandard .

(standard metastability) For every standard function and standard , there exists a standard such that for all standard .

(nonstandard metastability) For every nonstandard function and nonstandard , there exists a nonstandard such that for all nonstandard .

(asymptotic stability) One has for all unbounded .

Proof: The equivalence of 1 and 2 follows from the transfer principle (or Los’s theorem), as does the equivalence of 3 and 4. The implication of 3 from 1 is also clear. Finally, suppose that 1 failed, then there is an such that for every standard we can find a larger number such that . Setting , we see that 3 fails also.

If 1 holds, then from transfer we see that for any unbounded , one has for every standard , giving 5. Conversely, if 1 fails, then letting be as before, we see from transfer that for every nonstandard , contradicting 5.

Now we consider more general sequences, in which the above notions of convergence begin to diverge:

Definition 6 Let be a nonstandard pseudometric space (i.e. the ultraproduct of standard pseudometric spaces ; in particular, takes values in rather than ), and let be an nonstandard sequence (or internal sequence) in , that is to say a nonstandard map (or internal map) from to (and thus an ultralimit of maps from to ).

We say that the sequence is internally Cauchy if for every nonstandard , there exists a nonstandard such that for all nonstandard .

We say that the sequence is externally Cauchy or metastable if for every standard , there exists a standard such that for all standard .

We say that the sequence is asymptotically stable if whenever are unbounded.

These three notions are now distinct, even for a simple nonstandard metric space such as the ultrapower of the unit interval with the usual metric, as the following examples demonstrate:

If is an unbounded natural number, then the nonstandard sequence is internally Cauchy, but not externally Cauchy or asymptotically stable.

If is an unbounded natural number, then the nonstandard sequence is internally and externally Cauchy, but not asymptotically stable.

If is an unbounded natural number, then the nonstandard sequence is externally Cauchy, but not internally Cauchy or asymptotically stable.

If is an unbounded natural number, then the nonstandard sequence is asymptotically stable and externally Cauchy, but not internally Cauchy.

Any monotone bounded nonstandard sequence of nonstandard reals is automatically both externally Cauchy and internally Cauchy, but is not necessarily asymptotically stable, as the example above shows.

The property of being externally Cauchy is only dependent on an initial segment of the sequence: if is externally Cauchy, and one modifies arbitrarily for and some fixed unbounded , then the modified sequence will still be externally Cauchy. The same claim is certainly not true for the notions of internally Cauchy or asymptotically stable, as can be seen by considering examples such as , and .

The property of being externally Cauchy is closed under (external) uniform limits; if is a nonstandard sequence such that for every standard one can find an externally Cauchy sequence with for all , then is itself externally Cauchy. The same claim holds as well for asymptotically stability, but not for the internal Cauchy property (unless one allows to be nonstandard).

One can equate these three nonstandard notions of convergence with standard notions as follows:

Proposition 7 Let be a nonstandard pseudometric space (the ultraproduct of standard pseudometric spaces ), and let the nonstandard sequence be the ultralimit of standard sequences .

The nonstandard sequence is internally Cauchy if and only if the standard sequences are Cauchy for all sufficiently close to .

The nonstandard sequence is externally Cauchy if and only if for every standard and standard , there exists a standard such that for all and all sufficiently close to .

The nonstandard sequence is asymptotically stable if and only if for every standard , there exists a standard such that one has for all standard and all sufficiently close to .

The nonstandard sequence is externally Cauchy if and only if there exists an unbounded such that is asymptotically stable up to , in the sense that for all unbounded .

Informally: internally Cauchy sequences are ultralimits of sequences that are Cauchy; externally Cauchy sequences are ultralimits of sequences that are uniformly metastable for an asymptotically infinite period of time; and asymptotically stable sequences are ultralimits of sequences that converge at a uniform rate for an asymptotically infinite period of time.

Proof: The claim 1 follows directly from the transfer principle. Claim 2 follows from the equivalences of parts 1 and 3 of Proposition 5 applied to the standard portion of the sequence (replacing the nonstandard metric by its standard part). Finally, we verify claim 3. If is asymptotically stable, then for every standard , we have for all unbounded , and so by the overspill principle, there is a standard such that for all , which by transfer gives the “only if” portion of Claim 3. Reversing these steps gives the “if” direction.

To show Claim 4, observe that if is externally Cauchy, then for every standard , one has for all sufficiently large standard , and thus by overspill there is an unbounded such that for all unbounded . By overspill (or countable saturation) one can find an unbounded such that for every standard , giving the “only if” direction. The “if” implication follows by reversing the steps.

From these equivalences one sees that asymptotic stability implies externally Cauchy, but as the above counterexamples show, there are no other implications between the three concepts.

Of the three notions of convergence for nonstandard sequences, we will focus almost exclusively on the notion of external Cauchy convergence, which at the finitary level corresponds to uniform metastability bounds (as opposed to qualitative convergence, or convergence at a uniform rate). In particular, we will deduce Walsh’s theorem from the following nonstandard version:

Theorem 8 (Walsh’s theorem, nonstandard version) Let be a nonstandard commutative probability space (i.e. the ultraproduct of standard commutative probability spaces), and let be a nilpotent nonstandard group acting on by isomorphisms. Let be polynomial nonstandard functions (i.e. each takes the form for some standard , some and some standard polynomials ). Then for any elements which are bounded (in the sense that are bounded), the averages form an externally Cauchy sequence with respect to the (non-standard) pseudometric.

From Proposition 5, Theorem 8 implies Theorem 4 and thus Theorem 3. But it is actually somewhat stronger, in that it gives a uniform metastability on the averages occuring in those latter two theorems. (This uniform metastability was already derived in Walsh’s original paper, and I did something similar in the special case of linear commuting averages.) This uniformity ultimately comes from the fact that in the above theorem, the polynomial sequences are allowed to have nonstandard coefficients, rather than just standard ones (and the space is a general nonstandard space, rather than an ultrapower).

Remark 1 Since the original appearance of this post, it was essentially observed in this preprint of Avigad and Iovino that the original result in Theorem 4 implies the special case of Theorem 8 in the case when the polynomials have standard coefficients, as one can use the standard part construction to project the nonstandard commutative probability space to a standard commutative probability space. As such, Theorem 4 automatically implies a metastable version of itself, in which the metastability bound is allowed to depend on the coefficients of the polynomials as well as the degree. The result in Theorem 8 is then apparently stronger because the metastability bound obtained in the finitary setting is also uniform in the choice of coefficients of the polynomials; however, by lifting the group to a higher rank nilpotent group, one can replace the action of any finite number of polynomials with unbounded coefficients with lifted polynomials with bounded coefficients (this trick dates back to the book of Furstenberg) and so it turns out that Theorem 8 is ultimately equivalent to Theorem 4 (and also Theorem 3, using the structural theory of commutative probability spaces).

From the definition of external Cauchy convergence, it is clear that if , are two externally Cauchy convergent sequences of (nonstandard) reals, then their sum is also externally Cauchy convergent, and more generally any (standard) finite linear combination (with standard real coefficients) of externally Cauchy convergent sequences of nonstandard reals is also externally Cauchy convergent. A key property for us is that external Cauchy convergence is also preserved by hyperfinite averages involving a nonstandardly finite number of sequences:

Proposition 9 (Metastable dominated convergence theorem) Let be a non-empty nonstandard finite set (i.e. the ultraproduct of standard finite sets), and let be an internal family of internal sequences of bounded elements of a nonstandard normed vector space. If the sequences are externally Cauchy convergent for each , then the (nonstandardly) averaged sequence is also externally Cauchy convergent.

This is an infinitary version of the finitary metastable dominated convergence theorem that first appeared in this paper of mine, which roughly speaking claims that the average of uniformly metastable bounded sequences is again metastable. The proof was infinitary (deducing it from the Lebesgue dominated convergence theorem), and we will take a similar approach here. The argument was eventually finitised (and strengthened) in this paper of Avigad, Dean, and Rute, but the finitary argument is surprisingly non-trivial.

Proof: As each is individually bounded (i.e. smaller in norm than any unbounded natural number), and depends internally on , we see from overspill that there is a uniform bound for some standard natural number .

For each standard and standard natural number , let denote the subset of given by the formula

These sets are not internal subsets of , but are instead -internal (i.e. countable intersections of internal sets). In particular, they are still Loeb measurable subsets of and thus have a well-defined Loeb measure .

By hypothesis, we see that for any fixed , the increase to in the sense that and . By monotone convergence, we conclude that there exists a standard such that . We then have for that

As can be arbitrarily small, this gives the external Cauchy convergence of as desired.

Remark 2 This proof is significantly shorter than the finitary proof of Avigad, Dean, and Rute, but the complexity has been concealed in the construction of Loeb measure and the monotone convergence theorem. This is typically how nonstandard analysis arguments work; they are unable to magically make the “hard” component of an argument disappear entirely, but they are often able to efficiently conceal such components in fundamental building blocks which are of independent interest, and which can be usefully applied as a black box to a wide spectrum of problems. (In contrast, a hard argument in a finitary argument often needs to be reworked each time one wishes to apply it to a new problem.)

— 2. A simple case: the von Neumann ergodic theorem —

Before we prove Theorem 8, let us first warm up by establishing an easy case, namely the nonstandard version of the von Neumann ergodic theorem (Theorem 2):

Theorem 10 (Nonstandard von Neumann mean ergodic theorem) Let be a nonstandard inner product space (i.e. the ultraproduct of standard inner product spaces), and let be a be a nonstandard unitary operator on . Then for any bounded , the averages are externally Cauchy in .

We first observe that if one takes the bounded elements of and forms the Hilbert space completion using the standard norm

one obtains a Hilbert space . Thanks to the bound

for all bounded in , we see that these ergodic averages can be defined in , and so it suffices to show that the averages are externally Cauchy in for all .

Let us first investigate a condition that would force to be asymptotically stable in . We expand out

(where all expressions have been extended to nonstandard values of or in the usual fashion, and operations are extended from the bounded elements of to by continuity). With a little bit of rearrangement, this expression can be rewritten as , where the dual function for any is defined by the formula

(The terminology of dual functions originates from this paper of Ben Green and myself.) Thus, if we let be the linear span in of all functions of the form with unbounded and , and is orthogonal to , then vanishes in for any unbounded . This makes asymptotically stable, and thus externally Cauchy, in norm.

In view of this fact, the existence of an orthogonal projection to the closure of , and the linearity of in , it suffices to show that for any in the closure of in , the expression is eventually Cauchy in .

Remark 3 This step used the existence of orthogonal projections from a Hilbert space to a closed subspace, and is closely related to an analogous use of such projections in the textbook proof of Theorem 2 (see e.g. the proof of Theorem 2 in these blog notes). In the finitary argument of Walsh, one uses instead a decomposition established by Gowers using the Hahn-Banach theorem as a substitute for orthogonal projections. See also the “Hilbert space finite convergence principle” from this blog post for a closely related link between orthogonal projections and quantitative decompositions.

Having eliminated the “pseudorandom case” when is orthogonal to , we have now reduced to the “structured case” when lies in the closure of . By linearity and an approximation argument, we may reduce to the case when is just a projection of a single dual function for some and unbounded , and by a further density and approximation argument we can assume that is a bounded element of , rather than a general element of . (Incidentally, the non-standard analysis formalism is painlessly skipping over a certain amount of epsilon management here which is much more visible in the finitary version of the argument.) Thus, our task is now to show that the expression is externally Cauchy in .

We expand

Fixing , we now restrict to the regime , thus for all standard . Then as well. We can then shift the range by by making a substitution . If we then return to the range , this creates an error of norm , and so

in for all . In particular, is asymptotically stable up to , and thus externally Cauchy as required.

— 3. Descent —

Theorem 8 is proven by an induction on the “complexity” of the . Fix and the action of the nilpotent nonstandard group . Given any finite tuple of internal functions from to (not necessarily polynomials), let us say that is good if the conclusion of Theorem 8 holds, thus the averages form an external Cauchy sequence in for all bounded . Trivially, any permutation of a good tuple is good, and any tuple that consists entirely of copies of the constant function mapping to the group identity of is good. Furthermore, if is a finite tuple, and is obtained from by removing duplicate functions (e.g. converting into ) and also removing all copies of , then is good if and only if is good.

If the tuple is non-empty (i.e. ), then for any standard integer , we define the -reduction of to be the tuple consisting of the functions , together with the function

and the functions

for . (We will see why these particular functions arise in the argument shortly.) The key step in proving Theorem 8 is then the following result, reminiscent of the van der Corput lemma in ergodic theory (see e.g. this blog post).

Proposition 11 (Descent) Let be a nonstandard commutative probability space, and let be a nonstandard group acting on by isomorphisms. Let be a non-empty finite tuple of internal functions from to . If is good for every nonstandard integer , then is good.

Once one has this proposition, Theorem 8 will be an immediate consequence of the following combinatorial claim (and the remarks made at the beginning of this section).

Proposition 12 (PET induction) Let be a nilpotent nonstandard group. Then there exists a well-ordered set and a way to assign to each finite tuple of polynomial nonstandard functions from to a nilpotent nonstandard group a tuple which is a permutation of after all duplicates and copies of have been removed, and a weight in , with the following property: if is non-empty, and is an nonstandard integer, then there is a permutation of which has a strictly smaller weight than that of .

Indeed, once one has this proposition and Proposition 11, Theorem 8 follows by strong induction on the weight .

We prove Proposition 11 in this section, and Proposition 12 in the next section.

The proof of Proposition 11 closely mimics the proof of Theorem 10. Fix , , the tuple , and bounded elements , and assume that is good for all standard integers . We consider the averages

and our task is to show that the form an external Cauchy sequence in .

We fix bounded elements , and largely work with manipulation of . The map is then an operator from to . We have the easily verified bound

Because of this, the linear operator can be uniquely continuously extended to a linear operator from to , where is defined as the Hilbert space completion of the bounded elements of under the norm

In particular quotients out all the elements of infinitesimal norm. In this Hilbertian formalism, the problem can now be viewed as one of establishing a weighted variant of Theorem 10.

Let us first investigate a condition that would force to be be asymptotically stable in . We expand out

(where all expressions have been extended to nonstandard values of or in the usual fashion, and operations are extended from the bounded elements of to by continuity). With a little bit of rearrangement, this expression can be rewritten as , where the dual function for any is defined by the formula

Thus, if we let be the linear span in of all functions of the form with unbounded and , and is orthogonal to , then vanishes in for any unbounded . This makes asymptotically stable, and thus externally Cauchy, in norm.

In view of this fact, the existence of an orthogonal projection to the closure of , and the linearity of in , it suffices to show that for any in the closure of in , the expression is eventually Cauchy in .

Having eliminated the “pseudorandom case” when is orthogonal to , we have now reduced to the “structured case” when lies in the closure of . By linearity and an approximation argument, we may reduce to the case when is just a projection of a single dual function for some and unbounded , and by a further density and approximation argument we can assume that is a bounded element of , rather than a general element of .

Inspecting the definition (1) of , we see that we need to understand the shifts for . It is here that we perform the “van der Corput” or “Weyl differencing” calculation that is pervasive in multiple recurrence theory. Namely, we expand

Fixing , we now restrict to the regime , thus for all standard . Then as well. We can then shift the range by by making a substitution . If we then return to the range , this creates an error of norm , and so

(with both sides being interpreted in ). (In the language of Walsh’s paper, this identity asserts that dual functions are reducible.) Substituting this into (1), and recalling the definition of the tuples , we thus obtain the “Weyl differencing identity”

in whenever . In particular, since the property of being externally Cauchy is unaffected by truncation to for any unbounded (and in particular to an unbounded ), we see that the left-hand side of (2) is externally Cauchy in if and only if the right-hand side is. But by the induction hypothesis, each of the sequences is externally Cauchy in , and from Proposition 9 we see that is externally Cauchy in , and Proposition 11 follows.

— 4. PET induction —

Now we prove Proposition 12, which will follow the general PET induction method first introduced by Bergelson. We prove the claim first for abelian groups (where there is an obvious notion of the “degree” of a polynomial sequence), and indicate at the end of the section how to modify the argument to handle nilpotent groups.

Henceforth the nonstandard abelian group is fixed. In the abelian case, we can take to be the well-ordered set of tuples of standard non-negative integers with only finitely many of the non-zero, with the reverse lexicographical ordering, thus if there exists such that and for all .

It is easy to see that any polynomial nonstandard function can be uniquely expressed in the discrete Taylor expansion form

for some finite number of group elements with non-trivial (or with if is trivial). We call the degree of ; in the case that is the nonstandard integers with the additive group operation, this corresponds to the usual notion of the degree of a polynomial.

We observe the ultratriangle inequality

with the inequality being equality if have different degree; we also have the symmetry property

Also, we observe the key fact that if is a non-trivial polynomial sequence, then for any , the derivative defined by is a polynomial sequence of strictly smaller degree.

We can now place an ultrametric on , with the distance between two polynomials defined as

with the convention that . One easily verifies that the ultrametric axioms are obeyed.

Example 1 If , and we consider the four polynomials , , , , then is separated from by a distance of , is separated from by a distance of , and are separated from each other by a distance of .

Now let be a finite tuple of polynomials in for some . Selecting a reference polynomial (not necessarily in the tuple), we say that two polynomials are equivalent relative to if . From the ultrametric property we see that this is an equivalence relation, and each equivalence class is a constant distance from . We can then define the weight function of the tuple relative to to equal , where is the number of equivalence classes that have distance exactly from .

Example 2 Let be as in Example 1. Relative to , are all equivalent and at distance from , so the weight function here is . Relative instead to , none of the are equivalent, and at are distances from , so the weight function here is . For the tuple , the weight function relative to any one of these three polynomials is .

Now let be a non-empty tuple of nonstandard polynomials. We form by removing all duplicates and copies of from (starting from the left and moving right), and if does not already have maximal degree amongst all the , permute the tuple in some arbitrary fashion to make this the case. We define the weight of to be the weight of the augmented tuple relative to the final element of the tuple:

Suppose , thus there are equivalence classes intersecting that are at a distance exactly from . We set .

Now let be a nonstandard integer, and consider the -reduction

where . We first consider the weight of the augmented tuple

relative to . Observe that for any , one has

thus, relative to , and are in the same equivalence class. As such, we see that the weight of the tuple (6) relative to is equal to , thus there are still equivalence classes intersecting the tuple (6) that are distance exactly from . We remark for future reference that the abelian nature of was not directly used in the above calculation.

Now let be the element of the tuple (6) which has the minimal distance to , and has maximal degree. The two requirements are compatible, as any element of the tuple has degree less than that of (which has the maximal degree by construction) necessarily has the maximal distance to . The weight of (6) relative to is then strictly smaller than the weight of (6) relative to , because the weight function at is decreased by one, while the weight function at all values strictly greater than are unchanged. (The weight function at values less than can increase dramatically, but with the lexicographical ordering this does not change the validity of the previous assertion.) Because of this, if we then permute to place at the end, then we see that (note that removing duplicates and copies of from only serves to decrease the weight vector, not to increase it), and the claim follows.

Example 3 Suppose we start with the tuple , whose weight vector is . Performing an -reduction, we obtain

with a weight vector now reduced to . Note that already has maximal degree and has minimal distance to , so no additional permutation is needed at this stage. Performing another -reduction, we obtain

but now we need to permute to move (which has maximal degree and minimal distance to ) to the end, giving

with a weight vector now of . Performing another -reduction, we obtain

which after eliminating duplicates and moving (which has maximal degree and minimal distance to ) to the end, gives

with a weight vector of . Another -reduction then gives

(note the elimination of all quadratic terms) which after eliminating duplicates becomes

with a weight vector of . Performing yet another -reduction gives

with a weight vector of . Continuing this process, we will see that the linear terms will eventually all be eliminated, leaving only the constant terms, which can then be eliminated one at a time using further reduction until only the empty tuple remains. See also Walsh’s paper for several further examples of this reduction process, as well as some commentary on how the process can be speeded up somewhat if one observes that one can eliminate not only duplicate polynomials, but also polynomials which differ from an existing polynomial by a constant.

Finally, we address the case of a nilpotent group , which will be a modification of the previous argument. The main issue is how to define degree properly. If is a polynomial nonstandard sequence, then by many applications of (discrete analogues of) Baker-Campbell-Hausdorff formula, we can (as before) place uniquely in the Taylor expansion form

for some (standard) finite number of group elements of ; see e.g. Exercise 11 of this previous blog post. In the abelian case, we used the largest for which was non-trivial as the degree of . This turns out to not be a good choice in the nilpotent case, because the crucial ultratriangle property (3) does not hold for this concept of degree. For instance, if is a two-step nilpotent group, and are non-commuting elements of , then the sequences would ostensibly have degree with this definition, but the product

where is the commutator of and , would then have degree , thus contradicting (3). (The symmetry property (4) can also be shown to break down.)

Fortunately, the theory of polynomial sequences in nilpotent groups has been understood since the work of Leibman. The trick is not to view the coefficients appearing above as roaming unrestrictedly in the whole -step nilpotent group , but to restrict some or all of these coefficients to subgroups in the lower central series, defined by setting and for all . Given natural numbers , we then say that a sequence has filtered degree at most if, when using the Taylor expansion (7), we have whenever . Thus, for instance, if , and , then the sequence has filtered degree at most . A fundamental result of Leibman (proven for instance in this previous post) asserts that if the sequence is superadditive in the sense that whenever , then the collection of polynomial sequences of filtered degree at most form a group. A related fact is that if a sequence has filtered degree at most for some superadditive , then any derivative of has filtered degree at most (which is still superadditive).

If we let be the set of all superadditive degree sequences, we can order such sequences lexicographically by declaring if there is an with and for all . This makes a well-ordered set, and then we can define the filtered degree of a polynomial sequence to be the minimal in for which has filtered degree at at most . Thus, for instance, in an -step nilpotent group, a sequence with non-trivial would have filtered degree . From Leibman’s results we then have the key properties (3), (4), and also that any derivative of has strictly smaller filtered degree.

Unfortunately, as filtered degrees are not numbers, we cannot define an ultrametric taking values in in the using the formula (5), but this is not a real difficulty; we simply declare an “ultrametric” taking values in instead of , by declaring if are distinct, and otherwise. If we view as being smaller than any element of , we see that the ultrametric axioms are still obeyed, and one can still run the argument more or less exactly as given above; we leave the details to the interested reader.

Jason Rute and I recently noticed that in the case of the mean ergodic theorem, one has something even stronger than a uniform metastability result. Saying that a sequence is Cauchy is equivalent to saying that, for every , there are at most finitely many “fluctuations” (or “jumps”) by more than . In the Hilbert space setting, a very elegant variational inequality due to Jones, Ostrovskii, Rosenblatt implies that a sequence of ergodic averages has at most many -fluctuations. The JOR inequality and this corollary provide is a remarkably clean and uniform quantitative formulation of the MET.

Before learning of the JOR result, Jason and I had discovered a uniform bound on the number of -fluctuations that works in the more general setting of a uniformly convex Banach space. Our result is not sharp when specialized to the case of a Hilbert space, however, and we were unable to strengthen it to anything like the JOR inequality. This state of affairs is described in a paper, “Oscillation and the mean ergodic theorem,” that we recently posted to arXiv.

We are curious to know how far this quantitative uniformity extends. In particular, in your norm convergence result and Walsh’s more recent result, is there a uniform bound on the number of -fluctuations? Does anything like the JOR square function inequality carry over?

Incidentally, we took the term “-fluctuations” from a 1996 paper by Kachurovskii (which we cite). In an appendix he also considers nonstandard formulations of such uniformities. So maybe your nonstandard argument can be adapted to yield the stronger result?

Hmm. There are variational inequalities for some pointwise ergodic theorems (such as Bourgain’s pointwise ergodic theorem from 1988 for averages along polynomials or primes), but I don’t think they have been established yet for norm convergence for multiple averages. The techniques used to prove these estimates are rather different (based on harmonic analysis rather than on regularity/metastability arguments, or on characteristic factors). I don’t think the metastability arguments (or the nonstandard variant given in this post) can easily give variational estimates, because one would then need to control fluctuations for very large unbounded times, and the arguments here are optimised instead to only control all times up to a relatively small unbounded time, which is all that one needs for metastability. But it is certainly a good question…

Update: Christoph Thiele pointed me towards this recent paper of Do, Oberlin, and Palsson which establishes such a variational result for a dyadic version of a double average such as . In practice, these dyadic harmonic analysis arguments can often be adapted to the non-dyadic case, though the details can get messier in the process. The arguments are quite different from those in the ergodic theory literature, though, relying on the same sort of time-frequency analysis used to control operators such as the bilinear Hilbert transform or Carleson’s maximal operator. It may be that one actually needs such “hard analysis” tools to get such a quantitative result, and that the softer tools used to prove, say, Walsh’s theorem, might not be suitable for variational estimates.

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.