Confidence
Interval for a Population Mean

This file is part of a program based on the Bio 4835 Biostatistics class taught
at KeanUniversity
in Union, New
Jersey. The course uses the following text:
Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the
health sciences. New York:
John Wiley and Sons.
The file follows this text very closely and readers are encouraged to consult
the text for further information.

A) Confidence interval for a population mean

Estimating the mean

Estimating the mean of a normally distributed population entails drawing a
sample of size n and computing which is used as a point estimate of .

It is more meaningful to estimate by an interval that communicates information
regarding the probable magnitude of .

Sample distributions and estimation

Interval estimates are based on sampling distributions. When the sample
mean is being used as an estimator of a population mean, and the population is
normally distributed, the sample mean will be normally distributed with mean, , equal to the population mean, , and variance .

The 95% confidence interval

Approximately 95% of the values of x making up the
distribution will lie within 2 standard deviations of the mean. The
interval is noted by the two points, and , so that 95% of the values are in the
interval, .

Since and are unknown, the
location of the distribution is uncertain. We can use as a point estimate of . In constructing intervals of , 95% of these intervals would contain .

Example

Suppose a researcher, interested in obtaining an estimate of the average level
of some enzyme in a certain human population, takes a sample of 10 individuals,
determines the level of the enzyme in each, and computes a sample mean of x =
22. Suppose further it is known that the variable of interest is
approximately normally distributed with a variance of 45. We wish to estimate .Solution

An approximate confidence interval for is given by:

Components of an interval estimate

This is the general form for an interval estimate.

estimator ±
(reliability coefficient) (standard error)

The general form for an interval estimate consists of three components.
These are known as the estimator, the reliability coefficient,
and the standard error.

Estimator: The interval estimate of is centered on the point
estimate of . As noted in the table above, is an
unbiased point estimator for .

Reliability coefficient: Approximately 95% of the values of the standard
normal curve lie within 2 standard deviations of the mean. The z score in this case is called the reliability
coefficient. We use a value of z that will give the correct interval
size. The proper z score depends on the value of
being used. Generally, the three values of most
commonly used are .01, .05 and .10. Their corresponding z scores are
1.645, 1.96 and 2.575, respectively, as shown in the table below.

Table of reliability coefficients

Standard error: The standard error equals

Interpretation of confidence intervals

The interval estimate for is expressed as:

Assuming that we are using a value of =.05, we can say that, in repeated
sampling, 95% of the intervals constructed this way will include . This is
based on the probability of occurrence of different values of
.

The area of the curve of that is outside the area of the interval is called , and the area inside the interval is
called 1- .

Interpretation of the interval

There are two ways in which interval estimates can be
interpreted. These are known as the probabilistic interpretation
and the practical interpretation.

The probabilistic interpretation results from repeated sampling. With
repeated sampling from a normally distributed population with a known standard
deviation, 100(1- ) percent of all intervals in the form will, in the long run, include the
population mean, . The quantity 1- is called the confidence
coefficient or confidence level and the interval, , is called the confidence
interval for .

Note that the percentage of intervals involved depends on the value of . With modern electronic
devices such as the TI-83 calculator and Microsoft Excel, it is possible to use
any value of . When statistics was
developing during the 20th century, such devices were not generally available
so one had to use tables. These tables were very difficult to prepare and
so only a few values of were supported. The most
commonly used values of are .01, .05, and .10. When
these are used in the formula 100 (1- ), they yield percentages of 99%,
95%, and 90%, respectively. The most widely used value for a confidence
level is 95%, which corresponds to =.05. Using this figure, the
probabilistic interpretation says that in 100 samplings, 95 of them should include . For situations in which there is
neither time nor ability to do 100 samplings, the practical interpretation is
used.

The practical interpretation of the interval is used for a single sampling.
When sampling is from a normally distributed population with known
standard deviation, we are 100(1- ) percent confident that the single
computed interval, , contains the population mean, .

Precision

Precision indicates how much the values deviate from their mean.
Precision is found by multiplying the reliability factor by the standard error
of the mean. This is also called the margin of error.

Exercise 6.2.2

We wish to estimate the mean serum indirect bilirubin level of 4-day-old infants. The mean for a
sample of 16 infants was found to be 5.98 mg/dl. Assuming bilirubin levels in 4-day-old infants are approximately
normally distributed with a standard deviation of 3.5 mg/dl find:
A) The 90% confidence interval for
B) The 95% confidence interval for
C) The 99% confidence interval for

(1) Given = 5.98 = 3.5
n = 16
(2) Sketch
(3) Calculations

We start with the formula for an interval estimate then substitute the values
given in the problem.

Then we need to determine the values of the reliability coefficient that will
be used in solving the three parts of the problem. We consult the Table
of Reliability Coefficients above. The correct value of reliability
coefficient is multiplied by the standard error (.975). The resulting
value is subtracted from then added to the value of to give the boundaries of the
interval estimate.

A) 90% interval (z = 1.645)

5.98 ± 1.645 (.875)

5.98-1.439375, 5.98+1.439375
(4.5408, 7.4129)
Interpretation:
We estimate the population mean to be 5.98.
We are 90% confident that the true value of the mean lies between 4.5408
and 7.4129)

B) 95% interval (z = 1.96)

5.98 ± 1.96 (.875)
(4.265, 7.695)
Interpretation:
We estimate the population mean to be 5.98. We are 95% confident
that the true value of the mean lies between 4.265 and 7.695)

C) 99% interval (z = 2.575)

5.98 ± 2.575 (.875)
(3.7261, 8.2339)
Interpretation:
We estimate the population mean to be 5.98. We are 99% confident
that the true value of the mean lies between 3.7261 and 8.2339)

(4) Results

A higher percent
confidence level gives a wider band. There is less chance of making an
error but there is more uncertainty.
Calculator answers are
more accurate because the calculator uses exact values and derives its answers
from calculus.

The t distribution

In most real life situations the variance of the
population is unknown. We know that the z score, , is normally distributed if the population
is normally distributed and is approximately normally distributed when the
population is large. But, it cannot be used because is
unknown.

Estimation of the standard deviation

The sample standard deviation, , can be used to replace .
If n 30, then s is a good approximation of .
An alternate procedure is used when the samples are small. It is known as
Student's t distribution.

Student's t distribution

Student's t distribution is used as an alternative for z with small
samples. It uses the following formula:

Properties of the t distribution

1. Mean = 0
2. It is symmetrical about the mean.
3. Variance is greater than 1 but approaches 1 as the sample gets large.
For df > 2, the variance = df/(df-2) or
4. The range is - to + .
5. t is really a family of distributions
because the divisors are different.
6. Compared with the normal distribution, t is less peaked and has
higher tails.
7. t distribution approaches the normal
distribution as n-1 approaches infinity.

Confidence interval for a mean using t

When sampling is from a normal distribution whose standard deviation, ,
is unknown, the
100(1- ) percent confidence interval for the population mean, , is given
by:

Deciding between z and t

When constructing a confidence interval for a population mean, we must decide
whether to use z or t. Which one to use depends
on the size of the sample, whether it is normally distributed or not, and
whether or not the variance is known. There are various flowcharts
and decision keys that can be used to help decide. Mine appears below.

Key for deciding between z and t in confidence interval
construction

1. Population normally distributed................2
Not as above—normally distributed.........5

2. Sample size is large (30 or higher)............3
Sample size is small (less than 30)............4

3. Population variance is known.............use z
Population variance not known.... use t (or
z)

4. Population variance is known.............use z
Population variance is not known.......use t

6. Population variance is known.............use z
Population variance not known
(central limit theorem applies)............use z

7. Must use a non-parametric method

Example

In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10
healthy, nonpregnant women to be 119 with a standard
deviation of 2.1.
(Preeclampsia: Development of hypertension, albuminuria, or edema between the 20th week of pregnancy
and the first week postpartum.Eclampsia: Coma and/or convulsive seizures in
the same time period, without other etiology.)

a. What is the estimated standard error of the
mean?
b. Construct the 99% confidence interval for the
mean of the population from which the 10 subjects may be presumed to be a
random sample.
c. What is the precision of the estimate?
d. What assumptions are necessary for the
validity of the confidence interval you constructed?

(1) Given
n = 10 = 119
s = 2.1

(2) Sketch of t distribution

(3) Calculations

= .6640783086

99% confidence interval

(The correct value of t for a 99% confidence interval with 9 degrees of freedom
is 3.2498)