In this section we re-present Bolthausen’s original proof of Berry Esseen using Stein’s method and fill in some computational details. I hope to make the argument more transparent to people trying to learn Stein’s method and Berry-Esseen theorem. The letter will denote a constant that varies from line to line, but will be a fixed absolute constant.
The classical Berry-Esseen theorems gives bound between the normal distribution function and the cumulative distribution function of , where are iid with mean 0 variance 1 and . It states

where is universal. Various efforts have been spent on finding the optimal , which is certainly less than . But we are only interested in the existence of such a .
First we can restate (1)as follows. Define

where

is the collection of all length sequences of iid variables with prescribed first three absolute moments, ,,and respectively.

is the indicator that , and

, where is the standard normal density function, is the same as .

Then Berry-Esseen says

As a check of your understanding, the above supremum is bounded by in (1).
Next we want to get a recursive bound on etc. This will be achieved by manipulating the additive structure . In fact if we can prove

for some absolute constant , then we have

where we can choose a bigger on the right hand side, because the geometric sum can be bounded by , for some (the geometric decay dominates the growth). To bound , we will introduce some continuous approximations of the heavyside function . We define

where means taking minimum of the two sides, and maximum. In words, is the same as for and linearly interpolates and in between. We also denote , as the natural functional limit.
We also define the direct analogue of in the -smoothed case,

From the fact that (try drawing their graphs using the linear interpolation interpretation), one gets

This follows because if for a particular sequence and , , we get

The last bound uses the fact that the normal density is bounded uniformly by , and that is supported on the interval and bounded by . The case is treated similarly. Hence taking the supremum over we get (8).
Now we can make further abstraction of (3) by defining , then we can rewrite

This might seem tautological so far, but here is where Stein’s method comes in, because has the following nice represetation:

where . This is just elementary differentiation. Recall Stein’s method says that has standard normal distribution if and only if for every smooth ,

Thus we are reducing the problem of bounding difference of distribution functions to that of bounding the associated Stein operator. From now on we will write for , and when , we will write out explicitly.
Since ,

By write , we get

Thus for , . For , we have , and . Therefore

for . Together we have (15) for . Therefore (Bolthausen claims , but it doesn’t seem obvious and we don’t need it). Similarly, using (14), we get

which combined with (11) gives . By symmetry, the above inequalities hold for as well. So in summary we have , , and are all universally bounded.
Also from the definition , we get by a simple change of variable

Note could stand for if .
Using Stein’s equation again we get

Now recall we need to bound . By Stein’s equation, this reduces to

by linearity of expectation and exchangeability of ‘s; the form of is chosen merely as a convenience. The same argument below works for non-iid sequence as well.
We can write as

2 Responses to Bolthausen’s proof of Berry Esseen theorem

Though most of the steps are very detailed (thanks a lot for this, the proof in Bolthausens’s paper is quite hard to grasp at a first glance), I can’t see why eq (5) should be true. We need
sum(2^(-k)/sqrt(n-k), k, 0, n-1) infty, but where does the bound on the right-hand side come from? When I computed the expression for several n, I found that the bound seems to be quite sharp. Can you give me a hint on why it is true?

Dear Michael,
Consider first the sequence This converges to 2. Now consider . I can rewrite the latter as . The geometric series dominates the growth of , for instance, you can write . So we know that the latter series is summable because . Thus for any on the left hand side of eq (5) I can choose a bigger on the right hand side.