The probability P(X = x) is the probability of the event that X = x. Similarly, the probability that P(a < X < b) is the probability of the event that X lies between a and b.

These probabilities may be estimated, empirical, or abstract (see Chapter 7 in Finite Mathematics or the Probability Summary for a discussion of these estimated, empirical, and abstract of probability.)

For a finite random variable, the collection of numbers P(X = x) as x varies is called the probability distribution of X, and it is useful to graph the probability distribution as a histogram.

Press here for an on-line utility that will generate any probability distribution and also show you the histogram.

Example

Estimated Probability Distribution

Let X be the number of heads showing after each toss of three coins (see above). The following simulation shows the estimated probability distribution (relative frequency distribution) of X.

Coin 1:

Coin 2:

Coin 3:

Value of X

N

Estimated Prob. Distribution

0

1

2

3

Empirical Probability Distribution

For the experiment above, the empirical probability distribution is given by the following histogram.

The empirical probability distribution is given by counting the number of combinations that give 0, 1, 2, or 3 heads.

A Bernoulli trial is an experiment with two possible outcomes, called success and failure. Each outcome has a specified probability: p for success and q for failure (so that p+q = 1).

If we perform a sequence of n independent Bernoulli trials, then some of them result in success and the rest of them in failure. The probability of exactly x successes in such a sequence is given by

P(exactly x successes in n trials) = C(n,x)pxqn-x. Note: q = 1-p

For an on-line utility which allows you to compute and graph the probability distribution for Bernoulli trials, press here.

If X is the number of successes in a sequence of n independent Bernoulli trials, with probability p for success and q for failure, then X is said to have a binomial distribution. This distribution is given by the above formula

P(X = x) = C(n,x)pxqn-x

for x running from 0 to n.

For an on-line utility which allows you to compute and graph the probability distribution for Bernoulli trials, press here.

Examples
Suppose we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. Take X = number of heads.
Then the distribution is given by

X

0

1

2

3

Formula

0.23

3(0.8)1(0.2)2

3(0.8)2(0.2)1

0.83

Probability

0.008

0.096

0.384

0.512

The histogram density function given above results from to tossing a fair coin three times, and is also a binomial distribution.

A collection of specific values, or "scores", x1, x2, . . ., xn of a random variable X is called a sample. If {x1, x2, . . ., xn} is a sample, then the sample mean of the collection is

x

=

x1 + x2 + . . .+ xnn

=

xin

,

where n is the sample size: the number of scores.

The sample median m is the middle score (in the case of an odd-size sample), or average of the two middle scores (in the case of an even-size sample), when the scores in a sample are arranged in ascending order.

A sample mode is a score that appears most often in the collection. (There may be more than one mode in a sample.)

If the sample x1, x2, . . ., xn we are using consists of all the values of X from an entire population (for instance, the SAT of every graduating high school student who took the test), we refer to the mean, median, and mode above as the population mean, median, and mode.

We write the population mean as instead of .

Example

Consider the following collection of scores:

11.5, 3, 5.5, 0.5, 3, 10, 2.5, 4

The sum is i = 40, and n = 8, so that

x

=

SXn

=

408

=

5.

To get the sample median, arrange the scores in increasing order, and select the middle scores (two of them since n is even):

If X is a finite random variable taking on values x1, x2, . . ., xn, the mean or expected value of X, written , or E(X), is

= E(X) = x1.P(X = x1) + x2.P(X = x2) + . . . + xn.P(X = xn)

= (xi.P(X = xi).

The median of X is the least number m such that

P(X m) 1/2 and P(X m) 1/2.

(This definition holds for continuous variables as well.)

A mode of X is a number m such that P(X = m) is largest. This is the most likely value of X or one of the most likely values if X has several values with the same largest probability. For a continuous random variable, a mode is a number m such that the probability density function is highest at x = m.

The expected value, median, and mode of a random variable are the average, median, and mode we expect to get if we have a large number of X-scores. Conversely, if all we know about X is a collection of X-scores, then the average, median and mode of those scores are our best estimates of the expected value, median and mode of X.

Example

Suppose we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. Take X = number of heads. Then the distribution (see above) is given by

x

0

1

2

3

P(x)

0.008

0.096

0.384

0.512

The expected value of X is given by

E(X) = (xi.P(X = xi)
= 0(.008) + 1(.096) + 2(.384) + 3(.512)
= 2.4.

The median is 3, since P(X 3) = 1 1/2 and P(X 3) = 0.512 1/2. Further, 3 is the least value of X with this property.

The sample standard deviation is the square root, s, of the sample variance.

Population Variance and Population Standard Deviation

The population variance and standard deviation have slightly different formulas from those of the corresponding statistics for samples. Given a set of numbers x1, x2, . . . , xn the population variance, , is

2

=

(xi-)2n

=

(x1-)2 + (x2-)2 + ... + (xn-)2n

The population standard deviation, , is the square root of the population variance.

We saw above that the smple mean is 5 (see the example "Mean, Median, and Mode of a Set of Data" above). The following table shows the squares of the differences from the mean, which we use to compute the sample variance and standard deviation.

xi

11.5

3

5.5

0.5

3

10

2.5

4

x-

6.5

-2

0.5

-4.5

-2

5

-2.5

-1

(x-)2

42.25

4

0.25

20.25

4

25

6.25

1

The sum of the entires in the bottom row is (xi-)2 = 103. Therefore,

s2

=

(xi-)2n-1

=

1037

14.714

Also,

s = 14.7141/2 3.836.

For the population variance, we divide 103 by n = 8 instead of 7, getting

Its standard deviation is defined to be the square root of the variance.
An alternate formula for the variance, useful for calculation, is

2 = E(X2) -2.

The variance and standard deviation of a random variable are the sample variance and sample standard deviation we expect to get if we have a large number of X-scores. Conversely, if all we know about X is a collection of X-scores, then the sample variance and sample standard deviation of those scores are our best estimates of the variance and standard deviation of X.

Example

Let us look again at the experiment in which we toss an unfair coin, with p = P(heads) = 0.8 and q = P(tails) = 0.2, three times. (X = number of heads.) Here is the distribution with the x2 scores added.

A continuous random variable X may take on any real value whatsoever. The probabilities P(a X b) are specified by means of a probability density curve, a curve lying above the x-axis with the total area between the curve and the x-axis being 1.

The probability P(a X b) is given by the area enclosed by the curve, the x-axis, and the lines x = a and x = b.

For a calculus-based discussion of this and other distributions (the uniform, exponential, and beta distributions) go to the on-line section on probability density functions. (To activate the links there, press the dots and not the words...)

Probability of a Normal Distribution Being within k Standard Deviations of its Mean

If X is a normal random variable with mean and standard deviation s, then

P(s X +s) = 0.6826

P(2s X +2s) = 0.9545

P(3s X +3s) = 0.9973

Normal Approximation to a Binomial DistributionIf X is the number of successes in a sequence of n independent Bernoulli trials, with probability p of success in each trial, and if the range of values of X three standard deviations above and below the mean lies entirely within the range 0 to n (the possible values of X), then

P(a X b) is approximately equal to P(a-0.5 Y b+0.5)

where Y has a normal distribution with the same mean and standard deviation as X, that is, = np and s = (n.p.(1-p))1/2.