§11.6: The Invariance Principle

Let’s summarize the Variant Berry–Esseen Theorem and proof from the preceding section, using slightly different notation. (Specifically, we’ll rewrite $\boldsymbol{X}_i = a_i {\boldsymbol{x}}_i$ where $\mathop{\bf Var}[{\boldsymbol{x}}_i] = 1$, so $a_i = \pm \sigma_i$.)
We showed that if ${\boldsymbol{x}}_1, \dots, {\boldsymbol{x}}_n, \boldsymbol{y}_1, \dots, \boldsymbol{y}_n$ are independent mean-$0$, variance-$1$ random variables, reasonable in the sense of having third absolute moment at most $B$, and if $a_1, \dots, a_n$ are real constants assumed for normalization to satisfy $\sum_i a_i^2 = 1$, then \begin{gather*} \label{eqn:be-generalizes} a_1 {\boldsymbol{x}}_1 + \cdots + a_n {\boldsymbol{x}}_n \approx a_1 \boldsymbol{y}_1 + \cdots + a_n \boldsymbol{y}_n, \\ \text{with error bound proportional to } B \max\{|a_i|\}. \end{gather*} We think of this as saying that the linear form $a_1 x_1 + \cdots + a_n x_n$ is (roughly) invariant to what independent mean-$0$, variance-$1$, reasonable random variables are substituted for the $x_i$’s, so long as all $|a_i|$’s are “small” (compared to the overall variance). In this section we generalize this statement to degree-$k$ multilinear polynomial forms, $\sum_{|S| \leq k} a_S\,x^S$. The appropriate generalization of the condition that “all $|a_i|$’s are small” is the condition that all “influences” $\sum_{S \ni i} a_S^2$ are small. We refer to these nonlinear generalizations of Berry–Esseen as Invariance Principles.

In this section we’ll develop the most basic Invariance Principle, which involves replacing bits by Gaussians for a single Boolean function $f$. We’ll show that this doesn’t change the distribution of $f$ much provided $f$ has small influences and provided that $f$ is of “constant degree” — or at least, provided $f$ is uniformly noise-stable so that it’s “close to having constant degree”. Invariance Principles in much more general settings are possible — for example the exercises describe variants which handle several functions applied to correlated inputs, and functions on general product spaces. Here we’ll just focus on the simplest possible Invariance Principle, which is already sufficient for the proof of the Majority Is Stablest Theorem in Section 7.

As in the Berry–Esseen Theorem, to get good error bounds we’ll need our random variables $\boldsymbol{z}_i$ to be “reasonable”. Sacrificing generality for simplicity in this section, we’ll take the bounded $4$th-moment notion from Definition 9.1 which will allow us to use the basic Bonami Lemma (more precisely, Corollary 9.6):

The main examples we have in mind are that each $\boldsymbol{z}_i$ is either a uniform $\pm 1$ random bit or a standard Gaussian. (There are other possibilities, though; e.g., $\boldsymbol{z}_i$ could be uniform on the interval $[-\sqrt{3}, \sqrt{3}]$.)

We can now prove the most basic Invariance Principle, for low-degree multilinear polynomials of random variables:

Remark 65 The proof will be very similar to the one we used for Berry–Esseen except that we’ll take a $3$rd-order Taylor expansion rather than a $2$nd-order one (so that we can use the easy Bonami Lemma). As you are asked to show in the exercises, had we only required that $\psi$ be $\mathcal{C}^3$ and that the ${\boldsymbol{x}}_i$’s and $\boldsymbol{y}_i$’s be $(2,3,\rho)$-hypercontractive with $2$nd moment equal to $1$, then we could obtain \[ \left|\mathop{\bf E}[\psi(F({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(F(\boldsymbol{y}))]\right| \leq \tfrac{\|\psi^{\prime\prime\prime}\|_\infty}{3}\cdot (1/\rho)^{3k} \cdot \sum_{t=1}^n \mathbf{Inf}_t[F]^{3/2}. \]

At first it may seem peculiar to substitute arbitrary real numbers into the Fourier expansion of a Boolean function. Actually, if all the numbers being substituted are in the range $[-1,1]$ then there’s a natural interpretation: as you were asked to show in Exercise 1.5, if $\mu \in [-1,1]^n$, then $f(\mu) = \mathop{\bf E}[f(\boldsymbol{y})]$ where $\boldsymbol{y} \sim \{-1,1\}^n$ is drawn from the product distribution in which $\mathop{\bf E}[\boldsymbol{y}_i] = \mu_i$. On the other hand, there doesn’t seem to be any obvious meaning when real numbers outside the range $[-1,1]$ are substituted into $f$’s Fourier expansion, as may certainly occur when we consider $f(\boldsymbol{g})$.

Nevertheless, \eqref{eqn:cor-simple-inv} says that when $f$ is a low-degree, small-influence function, the distribution of the random variable $f(\boldsymbol{g})$ will be close to that of $f({\boldsymbol{x}})$. Now suppose $f : \{-1,1\}^n \to \{-1,1\}$ is Boolean-valued and unbiased. Then \eqref{eqn:cor-simple-inv} might seem impossible; how could the continuous random variable $f(\boldsymbol{g})$ essentially be $-1$ with probability $1/2$ and $+1$ with probability $1/2$? The solution to this mystery is that there are no low-degree, small-influence, unbiased Boolean-valued functions. This is a consequence of the OSSS Inequality — more precisely, Exercise 40(b) — which shows that in this setting we will always have $\epsilon \geq 1/k^3$ in \eqref{eqn:cor-simple-inv}, rendering the bound very weak. If the Aaronson–Ambainis Conjecture holds (see the notes for Chapter 8), a similar statement is true even for functions with range $[-1,1]$.

The reason \eqref{eqn:cor-simple-inv} is still useful is that we can apply it to small-influence, low-degree functions which are almost $\{-1,1\}$-valued, or $[-1,1]$-valued. Such functions can arise from truncating a very noise-stable Boolean-valued function to a large but constant degree. For example, we might profitably apply \eqref{eqn:cor-simple-inv} to $f = \mathrm{Maj}_n^{\leq k}$ and then deduce some consequences for $\mathrm{Maj}_n({\boldsymbol{x}})$ using the fact that $\mathop{\bf E}[(\mathrm{Maj}_n^{\leq k}({\boldsymbol{x}}) - \mathrm{Maj}_n({\boldsymbol{x}}))^2] = \mathbf{W}^{> k}[\mathrm{Maj}_n] \leq O(1/\sqrt{k})$ (Corollary 5.20). Let’s consider this sort of idea more generally:

Proof: For the first statement we simply decompose $f = f^{\leq k} + f^{> k}$. Then the left-hand side of \eqref{eqn:simple-inv-trunc1} can be written as \begin{multline*} \left|\mathop{\bf E}[\psi(f^{\leq k}({\boldsymbol{x}}) + f^{> k}({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(f^{\leq k}(\boldsymbol{g}) + f^{> k}(\boldsymbol{g}))]\right| \\ \leq \left|\mathop{\bf E}[\psi(f^{\leq k}({\boldsymbol{x}}))] – \mathop{\bf E}[\psi(f^{\leq k}(\boldsymbol{g}))]\right| + c\mathop{\bf E}[|f^{>k}({\boldsymbol{x}})|] + c\mathop{\bf E}[|f^{>k}(\boldsymbol{g})|], \end{multline*} using the fact that $\psi$ is $c$-Lipschitz. The first quantity is at most $O(c) \cdot 2^{k} \epsilon^{1/4}$, by Corollary 67 (even if $k$ is not an integer). As for the other two quantities, Cauchy–Schwarz implies \[ \mathop{\bf E}[|f^{>k}({\boldsymbol{x}})|] \leq \sqrt{\mathop{\bf E}[f^{>k}({\boldsymbol{x}})^2]} = \sqrt{\sum_{|S| > k} \widehat{f}(S)^2} = \|f^{> k}\|_2, \] and the same bound also holds for $\mathop{\bf E}[|f^{>k}(\boldsymbol{g})|]$; this uses the fact that $\mathop{\bf E}[f^{>k}(\boldsymbol{g})^2] = \sum_{|S| > k} \widehat{f}(S)^2$ just as in Remark 64. This completes the proof of \eqref{eqn:simple-inv-trunc1}.

Finally, if we think of the Basic Invariance Principle as the nonlinear analogue of our Variant Berry–Esseen Theorem, it’s natural to ask for the nonlinear analogue of the Berry–Esseen Theorem itself, i.e., a statement showing cdf-closeness of $F({\boldsymbol{x}})$ and $F(\boldsymbol{g})$. It’s straightforward to obtain a Lévy distance bound just as in the degree-$1$ case, Corollary 61; in the exercises you are asked to show the following:

Corollary 69 In the setting of Corollary 66 we have the Lévy distance bound $d_{L}(F({\boldsymbol{x}}),F(\boldsymbol{y})) \leq O(2^k\epsilon^{1/5})$. In the setting of Remark 65 we have the bound $d_{L}(F({\boldsymbol{x}}),F(\boldsymbol{y})) \leq (1/\rho)^{O(k)} \epsilon^{1/8}$.

Suppose we now want actual cdf-closeness in the case that $\boldsymbol{y} \sim \mathrm{N}(0,1)^n$. In the degree-$1$ (Berry–Esseen) case we used the fact that degree-$1$ polynomials of independent Gaussians have good anticoncentration. The analogous statement for higher-degree polynomials of Gaussians is not so easy to prove; however, Carbery and Wright [CW01] have obtained the following essentially optimal result: