Lemma 1 (Regularity lemma, standard version) Let be a graph on vertices, and let and . Then there exists a partition of the vertices , with bounded below by and above by a quantity depending only on , obeying the following properties:

(Equitable partition) For any , the cardinalities of and differ by at most .

(Regularity) For all but at most pairs , the portion of the graph between and is -regular in the sense that one has

for any and with , where is the density of edges between and .

This lemma becomes useful in the regime when is very large compared to or , because all the conclusions of the lemma are uniform in . Very roughly speaking, it says that “up to errors of size “, a large graph can be more or less described completely by a bounded number of quantities . This can be interpreted as saying that the space of all graphs is totally bounded (and hence precompact) in a suitable metric space, thus allowing one to take formal limits of sequences (or subsequences) of graphs; see for instance this paper of Lovasz and Szegedy for a discussion.

For various technical reasons it is easier to work with a slightly weaker version of the lemma, which allows for the cells to have unequal sizes:

Lemma 2 (Regularity lemma, weighted version) Let be a graph on vertices, and let . Then there exists a partition of the vertices , with bounded above by a quantity depending only on , obeying the following properties:

(Regularity) One has

where the sum is over all pairs for which is not -regular between and .

While Lemma 2 is, strictly speaking, weaker than Lemma 1 in that it does not enforce the equitable size property between the atoms, in practice it seems that the two lemmas are roughly of equal utility; most of the combinatorial consequences of Lemma 1 can also be proven using Lemma 2. The point is that one always has to remember to weight each cell by its density , rather than by giving each cell an equal weight as in Lemma 1. Lemma 2 also has the advantage that one can easily generalise the result from finite vertex sets to other probability spaces (for instance, one could weight with something other than the uniform distribution). For applications to hypergraph regularity, it turns out to be slightly more convenient to have two partitions (coarse and fine) rather than just one; see for instance my own paper on this topic. In any event the arguments below that we give to prove Lemma 2 can be modified to give a proof of Lemma 1 also. The proof of the regularity lemma is usually conducted by a greedy algorithm. Very roughly speaking, one starts with the trivial partition of . If this partition already regularises the graph, we are done; if not, this means that there are some sets and in which there is a significant density fluctuation beyond what has already been detected by the original partition. One then adds these sets to the partition and iterates the argument. Every time a new density fluctuation is incorporated into the partition that models the original graph, this increases a certain “index” or “energy” of the partition. On the other hand, this energy remains bounded no matter how complex the partition, so eventually one must reach a long “energy plateau” in which no further refinement is possible, at which point one can find the regular partition.

One disadvantage of the greedy algorithm is that it is not efficient in the limit , as it requires one to search over all pairs of subsets of a given pair of cells, which is an exponentially long search. There are more algorithmically efficient ways to regularise, for instance a polynomial time algorithm was given by Alon, Duke, Lefmann, Rödl, and Yuster. However, one can do even better, if one is willing to (a) allow cells of unequal size, (b) allow a small probability of failure, (c) have the ability to sample vertices from at random, and (d) allow for the cells to be defined “implicitly” (via their relationships with a fixed set of reference vertices) rather than “explicitly” (as a list of vertices). In that case, one can regularise a graph in a number of operations bounded in . Indeed, one has

Lemma 3 (Regularity lemma via random neighbourhoods) Let . Then there exists integers with the following property: whenever be a graph on finitely many vertices, if one selects one of the integers at random from , then selects vertices uniformly from at random, then the vertex cells (some of which can be empty) generated by the vertex neighbourhoods for , will obey the conclusions of Lemma 2 with probability at least .

Thus, roughly speaking, one can regularise a graph simply by taking a large number of random vertex neighbourhoods, and using the partition (or Venn diagram) generated by these neighbourhoods as the partition. The intuition is that if there is any non-uniformity in the graph (e.g. if the graph exhibits bipartite behaviour), this will bias the random neighbourhoods to seek out the partitions that would regularise that non-uniformity (e.g. vertex neighbourhoods would begin to fill out the two vertex cells associated to the bipartite property); if one takes sufficiently many such random neighbourhoods, the probability that all detectable non-uniformity is captured by the partition should converge to . (It is more complicated than this, because the finer one makes the partition, the finer the types of non-uniformity one can begin to detect, but this is the basic idea.)

This fact seems to be reasonably well-known folklore, discovered independently by many authors; it is for instance quite close to the graph property testing results of Alon and Shapira, and also appears implicitly in a paper of Ishigami, as well as a paper of Austin (and perhaps even more implicitly in a paper of myself). However, in none of these papers is the above lemma stated explicitly. I was asked about this lemma recently, so I decided to provide a proof here.

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.