Chebyshev's inequality

In probability theory, Chebyshev's inequality (also spelled as Tchebysheff's inequality, Russian: Нера́венство Чебышёва, also called Bienaymé-Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be more than kstandard deviations away from the mean (or equivalently, at least 1−1/k2 of the distribution's values are within k standard deviations of the mean). The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In practical usage, in contrast to the 68–95–99.7 rule, which applies to normal distributions, Chebyshev's inequality is weaker, stating that a minimum of just 75% of values must lie within two standard deviations of the mean and 89% within three standard deviations.[1][2]

The term Chebyshev's inequality may also refer to Markov's inequality, especially in the context of analysis.

The theorem is named after Russian mathematician Pafnuty Chebyshev, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé.[3]:98 The theorem was first stated without proof by Bienaymé in 1853[4] and later proved by Chebyshev in 1867.[5] His student Andrey Markov provided another proof in his 1884 Ph.D. thesis.[6]

Only the case k>1{\displaystyle k>1} is useful. When k≤1{\displaystyle k\leq 1} the right hand side 1k2≥1{\displaystyle {\frac {1}{k^{2}}}\geq 1} and the inequality is trivial as all probabilities are ≤ 1.

Because it can be applied to completely arbitrary distributions provided they have a known finite mean and variance, the inequality generally gives a poor bound compared to what might be deduced if more aspects are known about the distribution involved.

Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 standard deviations of the mean) must be at least 75%, because there is no more than 1⁄k2 = 1/4 chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is normal, we can say there is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).

As shown in the example above, the theorem typically provides rather loose bounds. However, these bounds cannot in general (remaining true for arbitrary distributions) be improved upon. The bounds are sharp for the following example: for any k ≥ 1,

Markov's inequality states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (X − μ)2 with a = (kσ)2.

It can also be proved directly. For any event A, let IA be the indicator random variable of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then

The direct proof shows why the bounds are quite loose in typical cases: the number 1 to the right of "≥" is replaced by [(X − μ)/(kσ)]2 to the left of "≥" whenever the latter exceeds 1. In some cases it exceeds 1 by a very wide margin.

Fix t{\displaystyle t} and let At{\displaystyle A_{t}} be defined as At={x∈X∣f(x)≥t}{\displaystyle A_{t}=\{x\in X\mid f(x)\geq t\}}, and let 1At{\displaystyle 1_{A_{t}}} be the indicator function of the set At{\displaystyle A_{t}}. Then, it is easy to check that, for any x{\displaystyle x},

Navarro[19] proved that these bounds are sharp, that is, they are the best possible bounds for that regions when we just know the mean and the covariance matrix of X.

Stellato et al.[20] showed that this multivariate version of the Chebyshev inequality can be easily derived analytically as a special case of Vandenberghe et al.[21] where the bound is computed by solving a semidefinite program (SDP).

There is a straightforward extension of the vector version of Chebyshev's inequality to infinite dimensional settings. Let X be a random variable which takes values in a Fréchet spaceX{\displaystyle {\mathcal {X}}} (equipped with seminorms || ⋅ ||α). This includes most common settings of vector-valued random variables, e.g., when X{\displaystyle {\mathcal {X}}} is a Banach space (equipped with a single norm), a Hilbert space, or the finite-dimensional setting as described above.

for every seminorm || ⋅ ||α. This is a generalization of the requirement that X have finite variance, and is necessary for this strong form of Chebyshev's inequality in infinite dimensions. The terminology "strong order two" is due to Vakhania.[22]

Let μ∈X{\displaystyle \mu \in {\mathcal {X}}} be the Pettis integral of X (i.e., the vector generalization of the mean), and let

Saw et al extended Chebyshev's inequality to cases where the population mean and variance are not known and may not exist, but the sample mean and sample standard deviation from N samples are to be employed to bound the expected value of a new drawing from the same distribution.[27]

gQ(x)={Rif R is even,Rif R is odd and x<a2,R−1if R is odd and x≥a2.{\displaystyle g_{Q}(x)={\begin{cases}R&{\text{if }}R{\text{ is even,}}\\R&{\text{if }}R{\text{ is odd and }}x<a^{2},\\R-1&{\text{if }}R{\text{ is odd and }}x\geq a^{2}.\end{cases}}}

This inequality holds even when the population moments do not exist, and when the sample is only weakly exchangeably distributed; this criterion is met for randomised sampling. A table of values for the Saw–Yang–Mo inequality for finite sample sizes (N < 100) has been determined by Konijn.[28] The table allows the calculation of various confidence intervals for the mean, based on multiples, C, of the standard error of the mean as calculated from the sample. For example, Konijn shows that for N = 59, the 95 percent confidence interval for the mean m is (m − Cs, m + Cs) where C = 4.447 × 1.006 = 4.47 (this is 2.28 times larger than the value found on the assumption of normality showing the loss on precision resulting from ignorance of the precise nature of the distribution).

The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size N = 100 and let k = 3. Chebyshev's inequality states that at most approximately 11.11% of the distribution will lie at least three standard deviations away from the mean. Kabán's version of the inequality for a finite sample states that at most approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below.

For N = 10, the 95% confidence interval is approximately ±13.5789 standard deviations.

For N = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations.

For N = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations.

For N = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively.

The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.

Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples. Samuelson's inequality states that all values of a sample will lie within √(N − 1) standard deviations of the mean. Chebyshev's bound improves as the sample size increases.

When N = 10, Samuelson's inequality states that all members of the sample lie within 3 standard deviations of the mean: in contrast Chebyshev's states that 99.5% of the sample lies within 13.5789 standard deviations of the mean.

When N = 100, Samuelson's inequality states that all members of the sample lie within approximately 9.9499 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 10 standard deviations of the mean.

When N = 500, Samuelson's inequality states that all members of the sample lie within approximately 22.3383 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 10 standard deviations of the mean.

In the univariate case, i.e. nξ=1{\textstyle n_{\xi }=1}, this inequality corresponds to the one from Saw et al.[27] Moreover, the right-hand side can be simplified by upper bounding the floor function by its argument

As N→∞{\textstyle N\to \infty }, the right-hand side tends to min{1,nξλ2}{\textstyle \min \left\{1,{\frac {n_{\xi }}{\lambda ^{2}}}\right\}} which corresponds to the multivariate Chebyshev inequality over ellipsoids shaped according to Σ{\textstyle \Sigma } and centered in μ{\textstyle \mu }.

Chebyshev's inequality is important because of its applicability to any distribution. As a result of its generality it may not (and usually does not) provide as sharp a bound as alternative methods that can be used if the distribution of the random variable is known. To improve the sharpness of the bounds provided by Chebyshev's inequality a number of methods have been developed; for a review see eg.[31]

The one-sided variant can be used to prove the proposition that for probability distributions having an expected value and a median, the mean and the median can never differ from each other by more than one standard deviation. To express this in symbols let μ, ν, and σ be respectively the mean, the median, and the standard deviation. Then

|μ−ν|≤σ.{\displaystyle \left|\mu -\nu \right|\leq \sigma .}

There is no need to assume that the variance is finite because this inequality is trivially true if the variance is infinite.

The proof is as follows. Setting k = 1 in the statement for the one-sided inequality gives:

A distribution function F is unimodal at ν if its cumulative distribution function is convex on (−∞, ν) and concave on (ν,∞)[45] An empirical distribution can be tested for unimodality with the dip test.[46]

The bounds on this inequality can also be sharpened if the distribution is both unimodal and symmetrical.[50] An empirical distribution can be tested for symmetry with a number of tests including McWilliam's R*.[51] It is known that the variance of a unimodal symmetrical distribution with finite support [a, b] is less than or equal to ( b − a )2 / 12.[52]

Let the distribution be supported on the finite interval [ −N, N ] and the variance be finite. Let the mode of the distribution be zero and rescale the variance to 1. Let k > 0 and assume k < 2N/3. Then[50]

Symmetry of the distribution decreases the inequality's bounds by a factor of 2 while unimodality sharpens the bounds by a factor of 4/9.

Because the mean and the mode in a unimodal distribution differ by at most √3 standard deviations[55] at most 5% of a symmetrical unimodal distribution lies outside (2√10 + 3√3)/3 standard deviations of the mean (approximately 3.840 standard deviations). This is sharper than the bounds provided by the Chebyshev inequality (approximately 4.472 standard deviations).

These bounds on the mean are less sharp than those that can be derived from symmetry of the distribution alone which shows that at most 5% of the distribution lies outside approximately 3.162 standard deviations of the mean. The Vysochanskiï–Petunin inequality further sharpens this bound by showing that for such a distribution that at most 5% of the distribution lies outside 4√5/3 (approximately 2.981) standard deviations of the mean.

Symmetrical unimodal distributions

For any symmetrical unimodal distribution

at most approximately 5.784% of the distribution lies outside 1.96 standard deviations of the mode

at most 5% of the distribution lies outside 2√10/3 (approximately 2.11) standard deviations of the mode

Normal distributions

DasGupta's inequality states that for a normal distribution at least 95% lies within approximately 2.582 standard deviations of the mean. This is less sharp than the true figure (approximately 1.96 standard deviations of the mean).

Grechuk et.al. developed a general method for deriving the best possible bounds in Chebyshev's inequality for any family of distributions, and any deviation risk measure in place of standard deviation. In particular, they derived Chebyshev inequality for distributions with log-concave densities.[56]

One use of Chebyshev's inequality in applications is to create confidence intervals for variates with an unknown distribution. Haldane noted,[63] using an equation derived by Kendall,[64] that if a variate (x) has a zero mean, unit variance and both finite skewness (γ) and kurtosis (κ) then the variate can be converted to a normally distributed standard score (z):

The Environmental Protection Agency has suggested best practices for the use of Chebyshev's inequality for estimating confidence intervals.[65] This caution appears to be justified as its use in this context may be seriously misleading.[66]