4 Answers
4

Essentially, the issue is to show that $\lim_{n\to\infty}(1- 1/n)^n=e^{-1}$
(and of course, $e^{-1} =1/e \approx 1/3$, at least very roughly).

It doesn't work at very small $n$ -- e.g. at $n=2$, $(1- 1/n)^n=\frac{1}{4}$. It passes $\frac{1}{3}$ at $n=6$, passes $0.35$ at $n=11$, and $0.366$ by $n=99$. Once you go beyond $n=11$, $\frac{1}{e}$ is a better approximation than $\frac{1}{3}$.

The grey dashed line is at $\frac{1}{3}$; the red and grey line is at $\frac{1}{e}$.

Rather than show a formal derivation (which can easily be found), I'm going to give an outline (that is an intuitive, handwavy argument) of why a (slightly) more general result holds:

Fact 1: $\exp(x/n)^n=\exp(x)\quad$ This follows from basic results about powers and exponentiation

Fact 2: When $n$ is large, $\exp(x/n) \approx 1+x/n\quad$ This follows from the series expansion for $e^x$.

(I can give fuller arguments for each of these but I assume you already know them)

Substitute (2) in (1). Done. (For this to work as a more formal argument would take some work, because you'd have to show that the remaining terms in Fact 2 don't become large enough to cause a problem when taken to the power $n$. But this is intuition rather than formal proof.)

[Alternatively, just take the Taylor series for $\exp(x/n)$ to first order. A second easy approach is to take the binomial expansion of $\left(1 + x/n \right) ^n$ and take the limit term-by-term, showing it gives the terms in the series for $\exp(x/n)$.]

More precisely, each bootstrap sample (or bagged tree) will contain $1-\frac{1}{e} \approx 0.632$ of the sample.

Let's go over how the bootstrap works. We have an original sample $x_1, x_2, \ldots x_n$ with $n$ items in it. We draw items with replacement from this original set until we have another set of size $n$.

From that, it follows that the probability of choosing any one item (say, $x_1$) on the first draw is $\frac{1}{n}$. Therefore, the probability of not choosing that item is $1 - \frac{1}{n}$. That's just for the first draw; there are a total of $n$ draws, all of which are independent, so the probability of never choosing this item on any of the draws is $(1-\frac{1}{n})^n$.

Sampling with replacement can be modeled as a sequence of binomial trials where "success" is an instance being selected. For an original dataset of $n$ instances, the probability of "success" is $1/n$, and the probability of "failure" is $(n-1)/n$. For a sample size of $b$, the odds of selecting an instance exactly $x$ times is given by the binomial distribution:

If our original dataset is big, we can use this formula to compute the probability that an instance is selected exactly $x$ times in a bootstrap sample. For $x = 0$, the probability is $1/e$, or roughly $0.368$. The probability of an instance being sampled at least once is thus $1 - 0.368 = 0.632$.

Needless to say, I painstakingly derived this using pen and paper, and did not even consider using Wolfram Alpha.

This can be easily seen by counting. How many total possible samples? n^n. How many NOT containing a specific value? (n-1)^n. Probability of a sample not having a specific value - (1-1/n)^n, which is about 1/3 in the limit.