We first consider the case where к = 1, i. e. the case of real valued random vari­ables. Extension to the vector case are discussed later. We start by defining con­vergence in probability.

Definition 1. (Convergence in probability) The sequence of random variables Zn converges in probability (or stochastically) to the random variable Z if for every e > 0

lim P(|Z n – Z| < e) = 1. (10.1)

n—— ro

We then write plim Zn = Z, or Zn — Z, or Zn ^ Z i. p. as n ^ ^.

n—— ro

We next define almost sure convergence.

Definition 2. (Almost sure convergence) The sequence of random variables Zn converges almost surely (or strongly or with probability one) to the random variable Z if there exists a set N Є F with P(N) = 0 such that lim Zn(rn) = Z(ra) for every ю Є Q – N, or equivalently n

Р({ю Є Q :limZn(ю) = Z(rn)}) = 1. (10.2)

n — ro

We then write Zn —> Z, or Zn ^ Z a. s., or Zn ^ Z w. p.1 as n ^ ^.

The following theorem provides an alternative characterization of almost sure convergence.

Theorem 1. The sequence of random variables Zn converges almost surely to the random variable Z if and only if

lim P({|Z; – Z| < e for all i > n}) = 1 (10.3)

n — ro

for every e > 0.

АП = (ю Є Q 😐 Z;(rn) – Z(o>)| < e for all i > n},

then (10.2) and (10.3) can be written equivalently as P(A) = 1 and lim n^P(Aen) = 1. Next define Ae = U" 1 An and observe that An T Ae. By construction Ae is the set of all ю Є Q for which there exists some finite index ne(ю) such that |Zi(ra) – Z(ra)| < e for all i > ne(ю). Consequently A C Ae; in fact A = f|E>0 Ae. Now suppose (10.2) holds, i. e. P(A) = 1. Then, using the continuity theorem for probability measures given, e. g., in Billingsley (1979, p. 21), we have P(Ae) = limn^P(Aen) > P(A) = 1, i. e. (10.3) holds. Conversely, suppose (10.3) holds, then P(Ae) = 1. Observe that Ae і A as e і 0. Choosing e = 1/k we have A = ПГ=і A1/k and, using again the continuity theorem for probability measures, P(A) = lim k^„ P(A1/k) = 1. ■

The above theorem makes it evident that almost sure convergence implies con­vergence in probability.

where the integers mn and kn satisfy n = mn + 2kn and 0 < mn < 2kn. That is, kn is the largest integer satisfying 2kn < n. Let Z = 0 and let Al and Bn be defined as above. Then for e < 1 we have Bn = Q – [mn 2-kn, (mn + 1) 2-kn) and hence P(Bn) = 1 – 2-kn ^ 1 as n ^ ro. This establishes that Zn converges to zero in probability. Observe further that An = nB® = 0. Consequently Zn does not converge to zero almost surely. In fact, in this example Zn(o>) does not converge to 0 for any ю Є Q, although Zn — 0.

We then write Zn Z. For r = 2 we say the sequence converges in quadratic mean or mean square.3

Remark 1. For all three modes of convergence introduced above one can show that the limiting random variable Z is unique up to null sets. That is, sup­pose Z and Z* are both limits of the sequence Zn, then P(Z = Z*) = 1.

Lyapounov’s inequality implies that E|Zn – Z|s < {E |Zn – Z|r}s/r for 0 < s < r. As a consequence we have the following theorem, which tells us that the higher the value of r, the more stringent the condition for convergence in rth mean.

Theorem 3. Zn — Z implies Zn —— Z for 0 < s < r.

The following theorem gives conditions under which convergence in r th mean implies convergence of the rth moments.

Theorem 4.4 Suppose Zn — Z and E|Z|r< <~. Then E|Zn|r^ E|Z|r. If, furthermore, ZTn and Zr are well-defined for all n (e. g. if Zn > 0 and Z > 0, or if r is a natural number), then also EZrn ^ EZr.

By Chebyshev’s inequality we have P{|Zn – Z| > e} < E |Zn – Z |r/er for r > 0. As a consequence, convergence in r th mean implies convergence in probability, as stated in the following theorem.

Theorem 5 and Corollary 1 show how convergence in probability can be implied from the convergence of appropriate moments. The converse is not true in gen­eral, and in particular Zn — Z does not imply Zn Z. In fact even Zn —> Z

does not imply Zn – rt^- Z. These claims are illustrated by the following example.

Example 3. Let Q, F, and P be as in Example 1 and define

0 for ю Є [0, 1 – 1/n), n for ю Є [1 – 1/n, 1).

Then Zn(rn) ^ 0 for all ю Є Q and hence Zn —> 0. However, E |Zn | = 1 for all n and hence Zn does not converge to 0 in rth mean with r = 1.

The above example shows in particular that an estimator that satisfies 0n — 0 (or Pn — 0) need not satisfy EPn ^ 0, i. e. need not be asymptotically unbiased. Additional conditions are needed for such a conclusion to hold. Such conditions are given in the following theorem. The theorem states that convergence in prob­ability implies convergence in rth mean, given that the convergence is dominated.

Under the assumptions of the above theorem also convergence of the r th moments follows in view of Theorem 4. We also note that the existence of a ran­dom variable Y satisfying the requirements in Theorem 6 is certainly guaranteed if there exists a real number M such that |Zn | < M a. s. for all n (choose Y = M).

Now let Zn be a sequence of random vectors taking their values in Rk. Conver­gence in probability, almost surely, and in the rth mean are then defined exactly as in the case к = 1 with the only difference that the absolute value 1.1 has to be replaced by ||. II, the Euclidean norm on Rk. Upon making this replacement all of the results presented in this subsection generalize to the vector case with two obvious exceptions: first, in Corollary 1 the condition var(Zn) ^ 0 has to be replaced by the conditions that the variances of the components of Zn converge to zero, or equivalently, that the variance-covariance matrix of Zn converges to zero. Second, the last claim in Theorem 4 continues to hold if the symbol Zrn is interpreted so as to represent the vector of the rth power of the components of Zn. Instead of extending the convergence notions to the vector case by replacing the absolute value |.| by the norm ||. ||, we could have defined convergence in probability, almost surely and in rth mean for sequences of random vectors by requiring that each component of Zn satisfies Definition 1, 2, or 3, respectively. That this leads to an equivalent definition is shown in the following theorem.

Theorem 7. Let Zn and Z be random vectors taking their values in Rk, and let Z(® and Z® denote their ith component, respectively. Then Zn — Z if and only if Z® — Z® for i = 1,…, к. An analogous statement holds for almost sure conver­gence and for convergence in rth mean.

The theorem follows immediately from the following simple inequality:

|Z® – Z®| < ||Zn – Zll < 4k max (|Z®- Z®|).

i=1,…,k

For sequences of random k x І-matrices Wn convergence in probability, almost surely, and in rth mean is defined as the corresponding convergence of the sequence vec(Wn).

We finally note the following simple fact: Suppose Z1, Z2,…, and Z are nonrandom vectors, then Zn Z, Zn —> Z, and Zn Z each hold if and only if Zn ^ Z as n ^ «>. That is, in this case all of the concepts of convergence of random vectors introduced above coincide with the usual convergence concept for sequences of vectors in Rk.