I agree with Qiaochu and Mariano, so I'm voting to close. I'm happy to reopen the question if it is edited to be more focused (though I guess that's unlikely since there's already an accepted answer).
–
Anton GeraschenkoJan 22 '10 at 2:25

3

Douglas Zare said on meta, "if you have gone through the en.wikipedia.org/wiki/Borel-Cantelli_lemma under the notation usually used, then you should recognize what the questioner meant, which is why the question got detailed answers even though it looks close to meaningless to some." (tea.mathoverflow.net/discussion/216) It looks like the question is not as vague as I thought it was, so I'm voting to reopen. I still would prefer that the question defined its terms and asked something more precise. It wouldn't have been hard to make the question clearly meaningful.
–
Anton GeraschenkoFeb 13 '10 at 21:19

If $ x \in \limsup A_n$ then $x$ is in all of the $\cup_{n\ge N} A_n$, which means no matter how large you pick $N$ you will find an $A_n$ with $n>N$ of which $x$ is a member. Thus members of $\limsup A_n$ are those elements of $X$ that are members of infinitely many of the $A_n$'s. If $A_n$ are thought of as events (in the sense of probability) $\limsup A_n$ will be another event. It corresponds exactly to the occurance of infinitely many of the $A_n$'s. This is why $\limsup A_n$ is sometimes written $x \in A_n$ infinitely often.

Similarly, if $x\in \liminf A_n$ then $x$ is in one of $\cap_{n\ge N} A_n$, which means $x \in A_n$ for all $n > N$. Thus, for $x$ to be in the $\liminf$, it must be in all of the $A_n$, with finitely many exceptions. This is how the phrase "ultimately all of them" comes up.

Both of these operations, similar to their counterparts in metric spaces, concern the tail of the sequence $\{A_n\}$. I.e., neither changes if an initial portion of the sequence is truncated. As a previous response pointed out, often the sets $A_n$ are defined to track the deviation of a sequence of random variables from a candidate limit by setting $A_n = \{x: |Y_n(x) -Y(x)| \ge \epsilon\}$. The members of $\limsup A_n$ then represents those sequences that every now and then deviate $\epsilon$ away from $Y(x)$, which is solely determined by the tail of the sequence $Y_n$.

Here is a conceptual game that can be partially understood using these concepts: We have a deck of cards, on the face of each card an integer is printed; thus the cards are $\{1,2,3...\}.$ At the nth round of this game, the first $n^2$ cards are taken, they are shuffled. You pick one of them. If your pick is 1, you win that round. Let $A_n$ denote the event that you win the nth round. The complement $A_n^c$ of $A_n$ will represent that you lose the $n^{th}$ round. The event $\limsup A_n$ represents those scenarios in which you win infinitely many rounds. The complement of this event is $\liminf A_n^c$, and this represents those scenarios in which you ultimately lose all of the rounds. By the Borel Cantelli Lemma $P(\limsup A_n)$ $=0$ or equivalently $P(\liminf A_n^c)=1$. Thus, a player of this game will deterministically experience that there comes a time, after which he never wins.

As Johannes stated, the Borel-Cantelli lemmas (there are two) are the primary way in which these quantities (referred to as "infinitely often" and "almost always") appear.

The most common use is to prove things about limits of random variables. To see why this is the case, suppose you can show that, for any $\epsilon > 0$,
$$
P([|X_n| > \epsilon]\textrm{ i.o.}) = 0.
$$
(To show this with the first Borel-Cantelli lemma, you would establish $\sum_n P([|X_n| > \epsilon]) < \infty$.) From here, it follows that $P([\lim X_n = 0]) = 1$, because
$$
P([\lim X_n = 0]) = P(\cap_i \cup_N \cap_{n\geq N} [|X_n| > 1/i]) =: P(A)
$$
by definition of limit, but
$$
P(A^c)
=1-P(\cup_i \cap_N \cup_{n\geq N} [|X_n| \leq 1/i])
\geq 1- \sum_i P([|X_n| > \epsilon]\textrm{ i.o.}) = 1,
$$
where the union bound (subadditivity) and definition of infinitely often were employed.
Note that i worked out a bunch of symbols to make sure the math was correct, but you can see it in words: if, for every $\epsilon >0$, you have the property that probability of
infinitely many of your random variables exceeding $\epsilon$ is zero, then it is intuitive that the limit of this sequence is 0 with probability 1.

To get a feel for more details (and the relationship to specific probabilistic quantities), maybe try using this technique to prove certain limiting properties of certain sequences of random variables (any probability textbook will have many, for instance the excellent book by Resnick).

I'll also add that you can prove a weakened form of the SLLN (weakened means you need some extra assumptions on which moments are finite) using Chebyshev's inequality and the limiting technique above. As you can guess, Chebyshev allows you to say something of the form $\sum_n P([|X_n| > \epsilon) < \infty$, where $X_n$ is something fancier as needed for the SLLN (a normalized sum).

We compute $$\sum \mathbb P(E_n) = \sum e^{-\lambda n} < \infty,$$ thus by the Borel-Cantelli lemma, $E$ has probability zero. With probability one, there exists a (random) number $N$ such that for all $n \ge N$, $X_n \le n$.

Let's analyze this graphically. Make a plot with the horizontal axis representing time $n$, and the vertical axis $x = X_n$. Draw the line $x = n$. For small times $n$, these random points might jump above the line $x = n$. But the argument above shows that there is some (random) time $N$ after which the points $X_n$ all lie below the line $x = n$.