Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 17, 2013

Carleson’s Theorem

Posted by Tom Leinster

I’ve just started teaching an advanced undergraduate course on Fourier
analysis — my first lecturing duty in my new job at Edinburgh.

What I hadn’t realized until I started preparing was the
extraordinary history of false beliefs about the pointwise convergence
of Fourier series. This started with Fourier himself about 1800, and was
only fully resolved by Carleson in 1964.

The endlessly diverting index of Tom Körner’s book Fourier
Analysis alludes to this:

Here’s the basic set-up. Let T=R/Z\mathbf{T} = \mathbf{R}/\mathbf{Z} be the
circle, and let f:T→Cf\colon \mathbf{T} \to \mathbf{C} be an integrable
function. The Fourier coefficients of ff are

In other words, pointwise convergence holds for continuously differentiable
functions.

It was surely just a matter of time until someone
managed to extend the proof to all continuous functions. Riemann believed
this could be done, Weierstrass believed it, Dedekind believed it, Poisson believed it.
So, in Körner’s words, it ‘came as a considerable surprise’ when du
Bois–Reymond proved:

Theorem (du Bois–Reymond, 1876) There is a continuous
function f:T→Cf\colon \mathbf{T} \to \mathbf{C} such that for some x∈Tx \in
\mathbf{T}, the sequence ((Snf)(x))((S_n f)(x)) fails to converge.

Even worse (though I actually don’t know whether this was proved at the
time):

Theorem Let EE be a countable subset of T\mathbf{T}. Then
there is a continuous
function f:T→Cf\colon \mathbf{T} \to \mathbf{C} such that for all x∈Ex \in
E, the sequence ((Snf)(x))((S_n f)(x)) fails to converge.

The pendulum began to swing. Maybe there’s some continuous ff such that
((Snf)(x))((S_n f)(x)) doesn’t converge for anyx∈Tx \in \mathbf{T}. This,
apparently, became the general belief, solidified by a discovery of
Kolmogorov:

Theorem (Kolmogorov, 1926) There is a Lebesgue-integrable
function f:T→Cf\colon \mathbf{T} \to \mathbf{C} such that for all x∈Tx \in
\mathbf{T}, the sequence ((Snf)(x))((S_n f)(x)) fails to converge.

It was surely just a matter of time until someone managed to
adapt the counterexample to give a continuous ff whose Fourier series converged nowhere.

At best, the situation was unclear, and this persisted until relatively
recently. I have on my shelf a 1957 undergraduate textbook
called Mathematical Analysis by Tom Apostol. In the part on Fourier
series, he states that it’s still unknown whether the Fourier series of a
continuous function has to converge at even one point. This isn’t
ancient history; Apostol’s book was even on my own undergraduate
recommended reading list (though I can’t say I ever read it).

The turning point was Carleson’s theorem of 1964. His result implies:

If f:T→Cf\colon \mathbf{T} \to \mathbf{C} is continuous then (Snf)(x)→f(x)(S_n f)(x) \to
f(x) for at least one x∈Tx \in \mathbf{T}.

In fact, it implies something stronger:

If f:T→Cf\colon \mathbf{T} \to \mathbf{C} is continuous then (Snf)(x)→f(x)(S_n f)(x) \to
f(x) for almost all x∈Tx \in \mathbf{T}.

In fact, it implies something stronger still:

If f:T→Cf\colon \mathbf{T} \to \mathbf{C} is Riemann integrable then (Snf)(x)→f(x)(S_n
f)(x) \to f(x) for almost all x∈Tx \in \mathbf{T}.

This was soon strengthened even further by Hunt (in a way that apparently
Carleson had anticipated). ‘Recall’ that the spaces Lp(T)L^p(\mathbf{T}) get
bigger as pp gets smaller; that is, if 1≤q≤p≤∞1 \leq q \leq p \leq \infty then
Lq(T)⊇Lp(T)L^q(\mathbf{T}) \supseteq L^p(\mathbf{T}). So, if we could change the
‘2’ in Carleson’s theorem to something smaller, we’d have strengthened it. We
can’t take it all the way down to 1, because of Kolmogorov’s
counterexample. But Hunt showed that we can take it arbitrarily close to 1:

There’s an obvious sense in which Carleson’s and Hunt’s theorems can’t be
improved: we can’t change ‘almost all’ to ‘all’, simply because changing a
function on a set of measure zero doesn’t change its Fourier coefficients.

But there’s another sense in which they’re optimal: given any set of measure zero,
there’s some L2L^2 function whose Fourier series fails to
converge there. Indeed, there’s a continuous such ff:

Theorem (Kahane and Katznelson, 196?) Let EE be a measure
zero subset of T\mathbf{T}. Then there is a continuous function f:T→Cf\colon
\mathbf{T} \to \mathbf{C} such that for all x∈Ex \in E, the sequence ((Snf)(x))((S_n
f)(x)) fails to converge.

I’ll finish with a question for experts. Despite Carleson’s own proof having been
subsequently simplified, the Fourier analysis books I’ve seen say that all
proofs are far too hard for an undergraduate course. But what about the
corollary that if ff is continuous then (Snf)(x)(S_n f)(x) must converge to
f(x)f(x) for at least one xx? Is there now a proof of this that might be
simple enough for a final-year undergraduate course?

Posted at January 17, 2013 2:15 PM UTC

TrackBack URL for this Entry: http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2591

30 Comments & 0 Trackbacks

Re: Carleson’s Theorem

I don’t think there is much simplification to the proof of Carleson’s theorem in restricting attention to continuous functions, even if one only wants convergence somewhere rather than almost everywhere. It’s now realised that pointwise convergence questions of Snf(x)S_n f(x) are closely related to the boundedness properties of the Carleson maximal operator
S*f(x):=supn|Snf(x)|. S^* f(x) := \sup_n |S_n f(x)|.
(This is only a sublinear operator rather than a linear operator, but it’s still an operator nonetheless.) Roughly speaking, if one can prove a non-trivial bound on S*fS^* f (and in particular keep it finite almost everywhere) for all ff in a function space (e.g. LpL^p), then it is a relatively routine matter to demonstrate almost everywhere convergence in LpL^p; and conversely, if no such bound exists, then it is likely that (with perhaps a bit of nontrivial trickery) one can eventually cook up a counterexample in this function space for which one has pointwise convergence nowhere.
If one has a bit of regularity, e.g. C1C^1, then bounding S*fS^* f is relatively straightforward, but in spaces with zero regularity (E.g. LpL^p or C0C^0) it’s much more difficult. The key problem here is the modulation invariance of S*S^*: if one multiplies ff by a character e2πikxe^{2\pi i kx}, this essentially does not change S*S^*. (This invariance is more apparent if one replace SnS_n with to the closely related double sum Sm,nf(x)=∑−mnf^(k)e2πikxS_{m,n} f(x) = \sum_{-m}^n \hat f(k) e^{2\pi i kx} in the definition of S*S^*.) The spaces LpL^p and C0C^0 are also modulation invariant, and this basically forces any proof of almost everywhere (or even somewhere) pointwise convergence in these spaces to also be modulation invariant, which rules out a lot of standard techniques and requires instead tools such as time-frequency analysis.

Re: Carleson’s Theorem

Thanks very much. I finished my lecture just now by telling them that I’d asked this question here, and that I’d consider including the proof if it turned out there was a suitably simple one. Now I have an answer — one they might be relieved about.

Incidentally, I’m glad you’ve figured out how to typeset math here. Jacques Distler, who runs the blog, made a small change to the interface. There may be further changes soon.

Re: Carleson’s Theorem

Regarding pointwise convergence for Hölder continuous functions, I like the two page note by Chernoff, Pointwise convergence of Fourier series.

On another note, as you write, the Carleson-Hunt theorem is optimal for the exponent in L^p because of Kolmogorov’s L^1 example, but there are still open questions for intermediate spaces like L log(L) (see Lacey, Carleson’s theorem: proof, complements, variations).

Re: Carleson’s Theorem

I think the lecture note proof of the decay rate for the Fourier coefficients of Lipschitz functions is needlessly complicated. If we consider, more generally, a Holder continuous function ff of order α\alpha on 𝕋\mathbb{T} then then decay rate |f^(k)|≲(1+|k|)−α|\hat{f}(k)| \lesssim (1+|k|)^{-\alpha} is an obvious consequence of the identity
f^(k)=−∫𝕋f(x−12k)e−2πikxdx
\hat{f}(k) = -\int_{\mathbb{T}} f(x - \tfrac{1}{2k})e^{-2\pi i kx} \,dx
However, I don’t think the summation method argument adapts if you assume the coefficients are just O(k−α)O(k^{-\alpha}).

Re: Carleson’s Theorem

With regards to Carleson’s theorem a MUCH simpler result along similar lines is the almost everywhere convergence of the Fejer means of an L1(𝕋)L^1(\mathbb{T}) function. The reason I mention this is that proof has the same general outline as that of Carleson’s Theorem - here we consider the Fejer maximal function, defined in an analogous way to the Carleson maximal function, and try to show it satisfies some interesting bound. I mention this as a point of interest rather than something suitable for inclusion in the course.

Re: Carleson’s Theorem

Re: Carleson’s Theorem

Even if the usual Fourier series of a continuous function f:T→ℂf: \mathbf{T} \to \mathbb{C} does not always converge pointwise to ff (much less uniformly to ff), it might be nice to point out to your students that the averages of the finite Fourier approximations

AN(f)=1N+1∑n=0NSn(f)A_N(f) = \frac1{N+1} \sum_{n=0}^N S_n(f)

do converge to ff uniformly! These are the Fejer means that Jonathan referred to.

The way I think of this goes as follows. (All of this is really well known. Certainly it’s much better known to the analysts reading this thread than it is to a semi-ignorant category theorist like myself. And for all I know it’s covered in Körner’s book – I’ve never looked at the book, but have heard very nice things about it.)

First, there’s a Banach algebra L1(T)L^1(\mathbf{T}) where on the unit circle T\mathbf{T} we use the normalized Haar measure, and the multiplication on L1L^1 is given by convolution. (This algebra doesn’t have an identity, but we can adjoin one, and it is useful to think of it as the Dirac distribution supported at the identity 1∈T1 \in \mathbf{T}.) The spaces Lp(T)L^p(\mathbf{T}), for 1≤p≤∞1 \leq p \leq \infty are Banach modules under convolution product: the convolution product

where the integral in the parentheses gives the nthn^{th} Fourier coefficient. Thinking of ene_n and ff as living in L2(T)L^2(\mathbf{T}), this Fourier coefficient is the Hilbert space pairing ⟨en,f⟩\langle e_n, f \rangle, and so we get the neat little formula

en*f=⟨en,f⟩en.e_n \ast f = \langle e_n, f \rangle e_n.

The operator Sn:f↦Sn(f)S_n: f \mapsto S_n(f) is defined by
f↦∑k=−nn⟨ek,f⟩ekf \mapsto \sum_{k = -n}^n \langle e_k, f \rangle e_k
where we orthogonally project ff onto the subspace of L2(T)L^2(\mathbf{T}) spanned by the orthonormal elements e−n,e−n+1,…,ene_{-n}, e_{-n+1}, \ldots, e_n. Using the example above, we see that the operator SnS_n can be described as the result of convolving with the function

Dn=e−n+e−n+1+…+en,D_n = e_{-n} + e_{-n+1} + \ldots + e_n,

i.e.,

Sn(f)=Dn*f.S_n(f) = D_n \ast f.

This DnD_n is called the nthn^{th}Dirichlet kernel. It’s manifestly a finite geometric series, and thus we easily compute

and these two facts (a), (b) mean DnD_n behaves something like an approximate Dirac distribution supported at 00 – except it doesn’t taper off to zero a small distance away from 00.

Now for a brief interlude on this business of “Dirac distribution” as identity. To repeat: the Banach algebra L1(T)L^1(\mathbf{T}) has no identity for the convolution product (which is what the Dirac distribution δ\delta would be, except that this “mystical” δ(x)\delta(x), which is infinite at x=0x=0, and zero away from x=0x=0, and has integral 11, is not represented by an L1L^1 function). But, as a workaround, we can speak of approximate identities. A sequence of L1L^1 functions KnK_n is an approximate identity if

For all nn we have
12π∫−ππKn(x)dx=1;\frac1{2\pi} \int_{-\pi}^{\pi} K_n(x) d x = 1;

The second condition says that away from the identity element 00, an approximate identity is close to zero (in the L1L^1 sense), while still having total mass 11 by the first condition.

Lemma: Let {Kn}\{K_n\} be an approximate identity. For each of the Banach modules Lp(T)L^p(\mathbf{T}) over L1(T)L^1(\mathbf{T}), 1≤p≤∞1 \leq p \leq \infty, we have

limn→∞Kn*f=f\underset{n \to \infty}{\lim} K_n \ast f = f

for all f∈Lp(T)f \in L^p(\mathbf{T}). (This justifies the term “approximate identity”.)

It should also be said that convolutions K*fK \ast f inherit the regularity behavior of KK or ff. For example, if KK or ff is of class CnC^n, then so is K*fK \ast f (even if ff or KK, respectively, isn’t). Consequently, if ff is continuous, then convergence Kn*f→fK_n \ast f \to f in the L∞L^\infty norm becomes uniform convergence of continuous functions.

Now we head to our third example.

Define an operator (the NthN^{th}Cesaro or Fejer mean) on continuous functions f:T→ℂf: \mathbf{T} \to \mathbb{C} by averaging the Sn(f)S_n(f):

KN(f)=1N+1∑n=0NSn(f).K_N(f) = \frac1{N+1} \sum_{n=0}^N S_n(f).

Thus KNK_N is the result of convolving with an average of Dirichlet kernels: KN(f)=FN*fK_N(f) = F_N \ast f where

FN=1N+1∑n=0NDn.F_N = \frac1{N+1} \sum_{n=0}^N D_n.

This FNF_N is called a Fejer kernel. Now here is a pretty neat fact:

FN=1N+1DN2.F_N = \frac1{N+1} D_N^2.

Indeed, DN2=∑n=0NDnD_N^2 = \sum_{n=0}^N D_n for essentially the same reason that 111112=12345432111111^2 = 123454321 – the proof of either can be left as an exercise. Hence we have

and the positive functions {Fn}\{F_n\} form an approximate identity. For, each of the FnF_n has mass 11 since each is an average of Dirichlet kernels which have mass 11, and given any δ,ϵ>0\delta, \epsilon \gt 0 there exists NN such that

Re: Carleson’s Theorem

Wow, Todd, that’s very comprehensive!

We’ll definitely do the standard results about Fejér kernels and approximations to the identity. The students haven’t seen Cesàro summation before, so it’ll be nice to introduce it: summing unsummable series is an appealing story to tell. And introducing the results in terms of the “mystical” delta function was exactly what I’d planned to do!

(To any of my students reading this: Todd’s comment gives a high-level, abstract, compressed account of a decent-sized chunk of the course. We’ll take it at a much more relaxed pace, spending several lectures on this material.)

On the other hand, I’ll enjoy reading your comment slowly when I get the chance, as at a quick glance there are some points of view there that I haven’t yet learned to appreciate (e.g. looking at things in terms of Banach algebras). Also, I hadn’t noticed your “pretty neat fact” before.

One thing that constrains this course is that the students haven’t seen Lebesgue integration (and I’m not going to teach it). At first I thought it would be impossible to do this stuff using only Riemann, but it turns out to be entirely possible. Indeed, Körner’s book and the identically-titled book by Stein and Shakarchi do exactly this.

Re: Carleson’s Theorem

Todd, one little thing. In the notes I inherited, there was a further condition on “approximate identities”. (In fact, the previous lecturer used the term “good approximation to the identity”, perhaps for that reason.) It’s that
supn‖Kn‖1<∞.
\sup_n \| K_n \|_1 \lt \infty.
This follows immediately from your condition 1 if each KnK_n is nonnegative, and is therefore true for the Fejér kernels.

Do you need that condition somewhere? At the moment I haven’t thought through where (if anywhere) you might need it, but I know it’s used in those notes I have.

Re: Carleson’s Theorem

It could be that I overlooked something; I was going on memory for all of this. I guess a lot of people would impose the positivity condition (so that ∫Kdx=∫|K|dx=1\int K \; d x = \int {|K|} \; d x = 1) on approximate identities KK (or what are also called mollifiers, especially when smoothness assumptions are added to the mix).

At one point I had gone through all this reasonably carefully, when I was teaching real analysis or functional analysis, and for some reason I don’t recall mentioning uniform boundedness of the L1L^1 norms. But I’ll try to get back to you on this, after I find a certain book in my library.

My main memory though is that I fell in love with approximate identities and the Dirac distribution and what a powerful technique they bring to the table. I was lucky enough in my graduate school days to learn functional analysis from François Trèves, who taught a very abstract distributions-oriented approach – it was definitely not your ordinary meat-and-potatoes functional analysis course. It would be nice thing for me to return to some time.

Re: Carleson’s Theorem

Well, I found the book I was looking for in my library (Measure and Integral by Wheeden and Zygmund), and their set-up for approximate identities is a little different. Anyway, I accept that the extra boundedness assumption you mentioned is probably there for a good reason, to make the estimates needed to prove the lemma I mentioned come out right. I’ll bet when I was teaching this, I included a positivity assumption which wasn’t mentioned in my comment above.

Re: Carleson’s Theorem

Tom, are you using any of Katznelson’s Introduction to Harmonic Analysis? It might be too compressed for your purposes, and in places it presupposes knowledge of the Lebesgue integral; but Chapter I (of the 2nd ed, Dover) contains many of the key results in a concise presentation.

In particular, Ch. I Section 2 talks about “summability kernels” which give an abstract characterization of the key features that make convolution with Fejer kernels behave nicely.

Re: Carleson’s Theorem

Thanks. I hadn’t looked at that bit of Katznelson before. At a quick read, it seems to be basically the same as part of what was covered by my predecessor Jim Wright, but done more tersely and more abstractly (and, as you point out, with the Lebesgue theory). The treatment looks pleasingly streamlined; I should read through it properly when I get the chance.

Re: Carleson’s Theorem

I decided to try to visualize the Dirichlet kernels once. You’ve inspired me to find them again. You can see an animation of the kernels DND_N for NN from 00 to 200200 (thought of as functions on the circle) at YouTube:

Re: Carleson’s Theorem

Tom’s comment was dry English humour.

You sure about that?

I saw the Fejér one on your channel earlier and watched it too. I was a bit puzzled, because it looks a bit like Todd’s condition 2 for an approximate identity isn’t satisfied. I suppose it’s just slow convergence, or the area under the crinkly bit of the curve is smaller than it looks.

Re: Carleson’s Theorem

I think it’s probably okay: those little spikes where local maxima occur are being pushed very gradually inwards (even as they seem to grow in height a bit) as NN increases. It might be hard to tell this condition 2, just from going up to N=200N = 200.

Those are cool animations! Great to show the students; when I was teaching such things, I think I had them whip out their TI-85’s to graph a few kernels, but mostly just to get a general feel for their shapes. (It was also before YouTube was around.)

Re: Carleson’s Theorem

Re: Carleson’s Theorem

Regarding the “false beliefs” about pointwise convergence mentioned in the post: the false beliefs of Fourier form part of the backdrop of the second story that Lakatos tells in his Proofs and Refutations (his Appendix 1: Cauchy on the ‘Principle of Continuity’).