As a reminder, the only difference between this theorem and its “weak” form (proven in Chapter 9.4) is that we don’t assume $r, s \leq 1$. Below we will show that the two theorems are equivalent, via Hölder’s inequality. Given the Two-Function Hypercontractivity Induction Theorem from Chapter 9.4, this implies that to prove the Hypercontractivity Theorem for general $n$ we only need to prove it for $n = 1$. This is an elementary but technical inequality which we defer to the end of the section.

Before carrying out these proofs, let’s take some time to interpret the Two-Function Hypercontractivity Theorem. One interpretation is simply as a generalization of Hölder’s inequality. Consider the case that the strings ${\boldsymbol{x}}$ and $\boldsymbol{y}$ in the theorem are fully correlated; i.e., $\rho = 1$. Then the theorem states that \begin{equation} \label{eqn:its-holder} \mathop{\bf E}[f({\boldsymbol{x}})g({\boldsymbol{x}})] \leq \|f\|_{1+r} \|g\|_{1+1/r} \end{equation} because the condition $\sqrt{rs} = 1$ is equivalent to $s = 1/r$. This statement is identical to Hölder’s inequality, since $(1+r)’ = 1+1/r$. Hölder’s inequality is often used to “break the correlation” between two random variables; in the absence of any information about how $f$ and $g$ correlate then we can at least bound $\mathop{\bf E}[f({\boldsymbol{x}})g({\boldsymbol{x}})]$ by the product of certain norms of $f$ and $g$. (If $f$ and $g$ have different “sizes” then Hölder lets us choose different norms for them; if $f$ and $g$ have roughly the same “size” then we can take $r = s = 1$ and get Cauchy–Schwarz.) Now suppose we are considering $\mathop{\bf E}[f({\boldsymbol{x}})g(\boldsymbol{y})]$ for $\rho$-correlated ${\boldsymbol{x}},\boldsymbol{y}$ with $\rho < 1$. In this case we might hope to improve \eqref{eqn:its-holder} by using smaller norms on the right-hand side; in the extreme case of independent ${\boldsymbol{x}}, \boldsymbol{y}$ (i.e., $\rho = 0$) we can use $\mathop{\bf E}[f({\boldsymbol{x}})g(\boldsymbol{y})] = \mathop{\bf E}[f] \mathop{\bf E}[g] \leq \|f\|_1 \|g\|_1$. The Two-Function Hypercontractivity Theorem gives a precise interpolation between these two cases; the smaller the correlation $\rho$ is, the smaller the norms we may take on the right-hand side.

In the case that $f$ and $g$ have range $\{0,1\}$, these ideas yield another interpretation of the Two-Function Hypercontractivity Theorem, namely a two-set generalization of the Small-Set Expansion Theorem:

Remark 1 When $a$ and $b$ are not too close the optimal choice of $r$ in the proof exceeds $1$. Thus the Generalized Small-Set Expansion Theorem really needs the full (non-weak) Two-Function Hypercontractivity Theorem; equivalently, the full Hypercontractivity Theorem.

We now turn to the proofs. We begin by showing that the Hypercontractivity Theorem and the Two-Function version are indeed equivalent. This is a consequence of the following general fact (take $T = \mathrm{T}_\rho$, $p = 1+r$, $q = 1+1/s$):

Now suppose we prove the Hypercontractivity Theorem in the case $n = 1$. By the above proposition we deduce the Two-Function version in the case $n = 1$. Then the Two-Function Hypercontractivity Induction Theorem from Chapter 9.4 yields the general-$n$ case of the Two-Function Hypercontractivity Theorem. Finally, applying the above proposition again we get the general-$n$ case of the Hypercontractivity Theorem, thereby completing all needed proofs. These observations all hold in the context of more general product spaces, so let’s record the following for future use:

Remark 5 In traditional proofs of the Hypercontractivity Theorem for $\pm 1$ bits, this theorem is proven directly; it’s a slightly tricky induction by derivatives (see the exercises). For more general product spaces the same direct induction strategy also works but the notation becomes quite complicated.

Our remaining task, therefore, is to prove the Hypercontractivity Theorem in the case $n = 1$; in other words, to show that a uniformly random $\pm 1$ bit is $(p,q,\sqrt{(p-1)/(q-1)})$-hypercontractive. This fact is often called the “Two-Point Inequality” because (for fixed $p$, $q$, and $\rho$) it’s just an “elementary” inequality about two real variables.