Wednesday, April 13, 2016

Weierstrass's approximation theorem

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval $[-1;1]$ can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and $2\pi$-periodic function on $\mathbf R$ can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on $[-1;1]$:

Stone truncates the power series expansion of the function \[ x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, \] bounding by hand the error term.

Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving that the function $x\mapsto \sqrt x$ can be uniformly approximated by polynomials on $[0;1]$. (The absolute value function is recovered since $\mathopen|x\mathclose|\sqrt{x^2}$.) To this aim, he introduces the sequence of polynomials given by $p_0=0$ and $p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right)$ and proves by induction the inequalities \[ 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n\] for $x\in[0;1]$ and $n\geq 0$. This implies the desired result.

The algebra of polynomials separates points on the compact set $[-1;1]$, hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation $(\rho_n)$ of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on $\mathbf R$ such that $\int\rho_n=1$ and such that for every $\delta>0$, $\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0$. Given a continuous function $f$ on $\mathbf R$, form the convolutions defined by $f*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt$. It is classical that $f*\rho_n$ converges uniformly on every compact to $f$.

Now, given a continuous function $f$ on $[-1;1]$, one can extend it to a continuous function with compact support on $\mathbf R$ (defining $f$ to be affine linear on $[-2;-1]$ and on $[1;2]$, and to be zero outside of $[-2;2]$. We want to choose $\rho_n$ so that $f*\rho_n$ is a polynomial on $[-1;1]$. The basic idea is just to choose a parameter $a>0$, and to take $\rho_n(x)= c_n (1-(x/a)^2)^n$ for $\mathopen|x\mathclose|\leq a$ and $\rho_n(x)=0$ otherwise, with $c_n$ adjusted so that $\int\rho_n=1$. Let us write $f*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt$; if $x\in[-1;1]$ and $t\in[-2:2]$, then $x-t\in [-3;3]$ so we just need to be sure that $\rho_n$ is a polynomial on that interval, which we get by taking, say, $a=3$. This shows that the restriction of $f*\rho_n$ to $[-1;1]$ is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in $\mathopen]0;1\mathclose[$ which he extends linearly to $[0;1]$ so that they vanish at $0$ and $1$; he considers the same convolution, taking the parameter $a=1$.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for $2\pi$-periodic functions, considering the kernel defined by $\rho_n(x)=c_n(1+\cos(x))^n$, where $c_n$ is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function $f$ on $[0;1]$ and, for $n\geq 0$, set \[ B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}.\] It is classical that $B_nf$ converges uniformly to $f$ on $[0;1]$.

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that $B_nf(x)$ is the expected value of $f(S_n)$, where $S_n$ is the sum of $n$ i.i.d. Bernoulli random variables with parameter $x\in[0;1]$. Another (generalized as the Korovkin theorem, “On convergence of linear positive operators in the space of continuous functions”, Dokl. Akad. Nauk SSSR (N.S.), vol. 90,‎ 1953) consists in showing (i) that for $f=1,x,x^2$, $B_nf$ converges uniformly to $f$ (an explicit calculation), (ii) that if $f\geq 0$, then $B_nf\geq 0$ as well, (iii) for every $x\in[0;1]$, squeezing $f$ inbetween two quadratic polynomials $f^+$ and $f_-$ such that $f^+(x)-f^-(x)$ is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, $2\pi$-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing $1,x,x^2$ on $[0;1]$ by $1,z,z^2,z^{-1},z^{-2}$ on the unit circle).

Let us show that for every $\delta\in\mathopen]0,1\mathclose[$ and every $\varepsilon>0$, there exists a polynomial $p$ satisfying the following properties:

$0\leq p(x)\leq \varepsilon$ for $-1\leq x\leq-\delta$;

$0\leq p(x)\leq 1$ for $-\delta\leq x\leq \delta$;

$1-\varepsilon\leq p(x)\leq 1$ for $\delta\leq x\leq 1$.

In other words, these polynomials approximate the (discontinuous) function $f$ on $[-1;1]$ defined by $f(x)=0$ for $x< 0$, $f(x)=1$ for $x> 0$ and $f(0)=1/2$.

A possible formula is $p(x)=(1- ((1-x)/2))^n)^{2^n}$, where $n$ is a large enough integer. First of all, one has $0\leq (1-x)/2\leq 1$ for every $x\in[-1;1]$, so that $0\leq p(x)\leq 1$. Let $x\in[-1;-\delta]$; then one has $(1-x)/2\geq (1+\delta)/2$, hence $p(x)\leq (1-((1+\delta)/2)^n)^{2^n}$, which can be made arbitrarily small when $n\to\infty$. Let finally $x\in[\delta;1]$; then $(1-x)/2\geq (1-\delta)/2$, hence $p(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n$, which can be made arbitrarily close to $1$ when $n\to\infty$.

By translation and dilations, the discontinuity can be placed at any element of $[0;1]$. Let now $f$ be an arbitrary step function and let us write it as a linear combination $f=\sum a_i f_i$, where $f_i$ is a $\{0,1\}$-valued step function. For every $i$, let $p_i$ be a polynomial that approximates $f_i$ as given above. The linear combination $\sum a_i p_i$ approximates $f$ with maximal error $\sup(\mathopen|a_i\mathclose|)$.

Using uniform continuity of continuous functions on $[-1;1]$, every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function $x\mapsto \mathopen|x\mathclose|$ is uniformly approximated by a sequence of polynomial on $[-1;1]$. Consequently, so are the functions $x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 $ and $x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2$. By translation and dilation, every continuous piecewise linear function on $[-1;1]$ with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace $A$ of a Banach space is dense if and only if every continuous linear form which vanishes on $A$ is identically $0$. In the present case, the dual of $C^0([-1;1],\mathbf C)$ is the space of complex measures on $[-1;1]$ (Riesz theorem, if one wish, or the definition of a measure). So let $\mu$ be a complex measure on $[-1;1]$ such that $\int_{-1}^1 t^n \,d\mu(t)=0$ for every integer $n\geq 0$; let us show that $\mu=0$. This is the classical problem of showing that a complex measure on $[-1;1]$ is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval $[0;1]$. So, let $F(z)=\int_0^1 t^z \,d\mu(t)$. The function $F$ is holomorphic and bounded on the half-plane $\Re(z)> 0$ and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting $w=(z-1)/(z+1)$) and gets a bounded function on the unit disk with zeroes at $(n-1)/(n+1)=1-2/(n+1)$, for $n\in\mathbf N$, and this contradicts the fact that the series $\sum 1/(n+1)$ diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family $(t^{\lambda_n})$ generates a dense subset of $C([0;1])$ if and only if $\sum 1/\lambda_n=+\infty$.

For every complex number $a$ such that $\mathopen|a\mathclose|>1$, one can write $1/(t-a)$ as a converging power series. By summation, this quickly gives that
\[ F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0. \]
Observe that this formula defines a holomorphic function on $\mathbf C\setminus[-1;1]$; by analytic continuous, one thus has $F(a)=0$ for every $a\not\in[-1;1]$.
Take a $C^2$-function $g$ with compact support on the complex plane. For every $t\in\mathbf C$, one has the following formula
\[ \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t), \]
which implies, by integration and Fubini, that
\[ \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0. \]
On the other hand, every $C^2$ function on $[-1;1]$ can be extended to such a function $g$, so that the measure $\mu$ vanishes on every $C^2$ function on $[-1;1]$. Approximating a continuous function by a $C^2$ function (first take a piecewise linear approximation, and round the corners), we get that $\mu$ vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence $(f_n)$ of continuous functions on an interval $I$ is a Markov system (resp. a weak Markov system) if for every integer $n$, every linear combination of $(f_0,\dots,f_n)$ has at most $n$ zeroes (resp. $n$ sign changes) in $I$.

Given a Markov system $(f_n)$, one defines a sequence $(T_n)$, where $T_n-f_n$ is the element of $\langle f_0,\dots,f_{n-1}\rangle$ which is the closest to $f_n$. The function $T_n$ has $n$ zeroes on the interval $I$; let $M_n$ be the maximum distance between two consecutive zeroes.

Borwein's theorem (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence $(f_n)$ is a Markov system consisting of $C^1$ functions, then its linear span is dense in $C(I)$ if and only if $M_n\to 0$.

The sequence of monomials $(x^n)$ on $I=[-1;1]$ is of
course a Markov system. In this case, the polynomial $T_n$ is the $n$th
Chebyshev polynomial, given by $T_n(2\cos(x))=2\cos(nx)$, and its roots
are given by $2\cos((\pi+2k\pi)/2n)$, for $k=0,\dots,n-1$, and $M_n\leq
\pi/n$. This gives yet another proof of Weierstrass's approximation theorem.