Approximate Hypothesis Tests: the z Test and the t Test

This chapter presents two common tests of the hypothesis that a population mean
equals a particular value and of the hypothesis that two population means are equal:
the z test and the t test.
These tests are approximate:
They are based on approximations to the
probability distribution of the
test statistic when the
null hypothesis, so their
significance levels are not exactly what
they claim to be.
If the sample size is reasonably large and the population from which the
sample is drawn has a nearly normal distribution—a notion defined
in this chapter—the nominal significance levels of the tests are close to
their actual significance levels.
If these conditions are not met, the significance levels of the approximate
tests can differ substantially from their nominal values.
The z test is based on the
normal approximation; the t
test is based on Student's t curve, which approximates some probability
histograms better than the normal curve does.
The chapter also presents the deep connection between hypothesis tests and
confidence intervals, and shows how to compute approximate confidence
intervals for the population mean of nearly normal populations using Student's
t-curve.

z tests

In
we constructed the z test for equality of two percentages using
independent random
samples from the two populations.
The original test statistic
was the difference φt−c between the two
independent sample percentages.
If the null hypothesis that the
two population means are equal is true, the expected value of the test statistic,
E(φt−c), is zero.
If, in addition, the sample sizes are large, we can
estimate SE(φt−c) accurately using the pooled
bootstrap estimate of the SD of the
"null box:"

SD(box)~ s*=(φ×(1− φ))½,

where φ is the pooled sample percentage of the two samples.
The estimate of SE(φt−c)
under the null hypothesis is

se = s*×(1/nt +
1/nc)½,

where nt and nc are the
sizes of the two samples.
If the null hypothesis is true, the Z statistic,

This strategy—transforming a test statistic approximately to standard units under the
assumption that the null hypothesisis true, and then using the
normal approximation to determine the rejection
region for the test—works to construct approximate hypothesis tests in many other
situations, too.
The resulting hypothesis test is called a z test.
Suppose that we are testing a null hypothesis using a test statistic X,
and the following conditions hold:

We have a probability model for how the observations arise, assuming the null hypothesis is true.
Typically, the model is that under the null hypothesis, the data are like
random draws with or without replacement from a box of numbered tickets.

Under the null hypothesis, the test statistic X, converted to standard units,
has a probability histogram that can be approximated well by the normal curve.

Under the null hypothesis, we can find the expected value of the test statistic, E(X).

Under the null hypothesis, either we can find the SE of the test statistic, SE(X),
or we can estimate SE(X) accurately enough to ignore the error of the estimate of the SE.
Let se denote either the exact SE of X under the null hypothesis, or the estimated
value of SE(X) under the null hypothesis.

Then, under the null hypothesis, the probability histogram of the Z statistic

Z = (X − E(X))/se

is approximated well by the normal curve, and we can use the normal
approximation to select the rejection region for the test using Z as the
test statistic.
If the null hypothesis is true,

P(Z<za) ~ a

P(Z>z1−a) ~ a,

and

P(|Z|>z1−a/2) ~ a.

These three approximations yield three different z tests of the hypothesis that
μ = μ0 at approximate significance level a:

Reject the null hypothesis whenever Z<za
(left-tail z test)

Reject the null hypothesis whenever Z>z1−a
(right-tail z test)

Reject the null hypothesis whenever |Z|>z1−a/2
(two-tail z test)

The word "tail" refers to the tails of the normal curve: In a left-tail test,
the probability of a Type I error is approximately
the area of the left tail of the normal curve, from minus infinity to
za.
In a right-tail test, the probability of a Type I error
is approximately the area of the right tail of the normal
curve, from z1−a to infinity.
In a two-tail test, the probability of a Type I error is approximately the
sum of the areas of both tails of the normal curve, the left tail
from minus infinity to za/2 and the right tail from
z1−a/2 to infinity.
All three of these tests are called z tests.
The observed value of Z is called the z score.

Which of these three tests, if any, should one use?
The answer depends on the probability distribution of Z when the alternative
hypothesis is true.
As a rule of thumb, if, under the alternative hypothesis,
E(Z) < 0, use the left-tail test.
If, under the alternative hypothesis, E(Z) > 0,
use the right-tail test.
If, under the alternative hypothesis, it is possible that E(Z) < 0
and it is possible that E(Z) > 0, use the
two-tail test.
If, under the alternative hypothesis, E(Z) = 0, use a different
approach: Consult a statistician.
Generally (but not always), this rule of thumb selects the test
with the most power for a given significance level.

P values for z tests

Each of the three z tests gives us a family of procedures for
testing the null hypothesis at any (approximate) significance level a
between 0 and 100%—we just use the appropriate quantile of the normal curve.
This makes it particularly easy to find the P value for a z test.
Recall that the P value is the smallest significance level for which we
would reject the null hypothesis, among a family of tests of the null hypothesis
at different significance levels.

Suppose the z score (the observed value of Z) is x.
In a left-tail test, the P value is the area under the normal curve to
the left of x: Had we chosen the significance level a so that
za=x, we would have rejected the null
hypothesis, but we would not have rejected it for any smaller value of a,
because for all smaller values of a, za<x.
Similarly, for a right-tail z test, the P value is the area
under the normal curve to the right of x:
If x=z1−a we would reject the null hypothesis
at approximate significance level a, but not at smaller significance levels.
For a two-tail z test, the P value is the sum of the area under the
normal curve to the left of −|x| and the area under the normal curve to the
right of |x|.

Finding P values and specifying the rejection region for the z
test involves the probability distribution of Z under the assumption that
the null hypothesis is true.
Rarely is the alternative hypothesis sufficiently detailed to specify the probability
distribution of Z completely, but often the alternative does help us
choose intelligently among left-tail, right-tail, and two-tail z tests.
This is perhaps the most important issue in deciding which hypothesis to take as
the null hypothesis and which as the alternative:
We calculate the significance level under the null hypothesis, and that calculation
must be tractable.

How close the normal approximations to the significance and power are to the true
significance level and power depends on how well the normal curve approximates
the probability histogram of the test statistic in standard units.
If the original test statistic is a sample sum
or a sample mean of draws with replacement
(or a sum or difference of independent sample sums or sample means
),
its probability histogram can be approximated accurately by a normal curve if
the sample size is large; this is a consequence of the
central limit theorem.

However, to construct a z test, we need to know the expected value and SE of
the test statistic under the null hypothesis.
Usually it is easy to determine the expected value, but often the SE must be estimated
from the data.
Later in this chapter we shall see what to do if the SE cannot be estimated accurately,
but the shape of the distribution of the numbers in the population is known.
The next section develops z tests for the
population percentage and mean, and for the difference between two population means.

Examples of z tests

The central limit theorem assures us that the probability histogram of the sample mean of
random draws with replacement from a box of tickets—transformed to standard units—can
be approximated increasingly well by a normal curve as the number of draws increases.
In the previous section, we learned that the probability histogram of a sum or difference
of independent sample means of draws with replacement also can be approximated increasingly
well by a normal curve as the two sample sizes increase.
We shall use these facts to derive z tests for population means and percentages
and differences of population means and percentages.

z Test for a Population Percentage

Suppose we have a population of N units of which G
are labeled "1" and the rest are labeled "0."
Let p = G/N be the population
percentage.
Consider testing the null hypothesis that
p = p0 against the alternative hypothesis
that p ≠ p0, using a
random sample of n units drawn with replacement.
(We could assume instead that N >> n and allow the
draws to be without replacement.)

Provided n is large and p0 is not too close to zero or 100%
(say n×p > 30 and
n×(1−p) > 30),
the probability histogram of Z will be approximated well by the normal
curve, and we can use it as the Z statistic in a z test.
For example, if we reject the null hypothesis when |Z| > 1.96,
the significance level of the test will be about 95%.

z Test for a Population Mean

The approach in the previous subsection applies, mutatis mutandis, to testing
the hypothesis that the population mean equals a given value, even when the
population contains numbers other than just 0 and 1.
However, in contrast to the hypothesis that the population percentage
equals a given value, the null hypothesis that a more general population
mean equals a given value does not specify the SD of the population, which poses
difficulties that are surmountable (by approximation and estimation) if the sample
size is large.

Consider testing the null hypothesis that the population mean μ is equal to a
specific null value μ0, against the alternative hypothesis that
μ<μ0, on the basis of a random sample with replacement
of size n.
Recall that the sample meanM
of n random draws with or without replacement from a box of numbered
tickets is an unbiased estimator of the
population meanμ:
If

M = (sum of sample values)/n,

then

E(M) = μ = (sum of population values)/N,

where N is the size of the population.
The population mean determines the expected value of the sample mean.
The SE of the sample mean of a random sample with replacement is

SD(box)/n½,

where SD(box) is the SD of the list of all the numbers in the box, and n
is the sample size.
As a special case, the sample percentage φ of n independent random
draws from a 0-1 box is an unbiased estimator
of the population percentage p, with SE equal to

(p×(1−p))½/n½.

In testing the null hypothesis that a population percentage p
equals p0, the null hypothesis specifies not only the
expected value of the sample percentage φ, it automatically specifies the
SE of the sample percentage as well, because the SD of the values in a 0-1 box
is determined by the population percentage p:

SD(box) = (p×(1−p))½.

The null hypothesis thus gives us all the information we need to standardize the sample
percentage under the null hypothesis.
In contrast, the SD of the values in a box of tickets labeled with arbitrary numbers
bears no particular relation to the mean of the values, so the null hypothesis
that the population mean μ of a box of tickets labeled with arbitrary numbers
equals a specific value μ0 determines the expected value of the
sample mean, but not the standard error of the sample mean.
To standardize the sample mean to construct a z test for the value of a
population mean, we need to estimate the SE of the sample mean under the null hypothesis.
When the sample size is large, the sample standard deviation s is likely
to be close to the SD of the population, and

se=s/n½

is likely to be an accurate estimate of SE(M).
The central limit theorem tells us that
when the sample size n is large, the probability histogram of the
sample mean, converted to standard units, is approximated well by the normal curve.
Under the null hypothesis,

E(M) = μ0,

and thus when n is large

Z = (M−μ0)/(s/n½)

has expected value zero, and its probability histogram is approximated well by
the normal curve, so we can use Z as the Z statistic in a z test.
If the alternative hypothesis is true, the expected value of Z could be
either greater than zero or less than zero, so it is appropriate to use a
two-tail z test.
If the alternative hypothesis is μ > μ0,
then under the alternative hypothesis, the expected value of Z is greater
than zero, and it is appropriate to use a right-tail z test.
If the alternative hypothesis is μ < μ0,
then under the alternative hypothesis, the expected value of Z is less
than zero, and it is appropriate to use a left-tail z test.

z Test for a Difference of Population Means

Consider the problem of testing the hypothesis that two population means are
equal, using random samples from the two populations.
Different sampling designs lead to different hypothesis testing procedures.
In this section, we consider two kinds of random samples from the two populations:
paired samples and independent samples, and construct
z tests appropriate for each.

Paired Samples

Consider a population of N individuals, each of whom is labeled with two numbers.
For example, the N individuals might be a group of doctors, and the two numbers
that label each doctor might be the annual payments to the doctor by an HMO under the
terms of the current contract and under the terms of a proposed revision of the contract.
Let the two numbers associated with individual i be ci
and ti.
(Think of c as control and t as treatment.
In this example, control is the current contract, and treatment is the proposed contract.)
Let μc be the population mean of the N values

{c1, c2, …,
cN},

and let μt
be the population mean of the N values

{t1, t2, …,
tN}.

Suppose we want to test the null hypothesis that

μ = μt − μc = μ0

against the alternative hypothesis that μ<μ0.
With μ0=$0, this null hypothesis is that the average annual payment to
doctors under the proposed revision would be the same as the average payment under the
current contract, and the alternative is that on average doctors would be paid less
under the new contract than under the current contract.
With μ0=−$5,000, this null hypothesis is that the proposed contract
would save the HMO an average of $5,000 per doctor, compared with the current contract;
the alternative is that under the proposed contract, the HMO would save even more than that.
With μ0=$1,000, this null hypothesis is that doctors would be paid an
average of $1,000 more per year under the new contract than under the old one;
the alternative hypothesis is that on average doctors would be paid less than an
additional $1,000 per year under the new contract—perhaps even less than they are
paid under the current contract.
For the remainder of this example, we shall take μ0=$1,000.

The data on which we shall base the test are observations of both
ci and ti for a sample of
n individuals chosen at random with replacement from the population of
N individuals (or a simple random sample of size n<<N):
We select n doctors at random from the N doctors under contract
to the HMO, record the current annual payments to them, and calculate what the
payments to them would be under the terms of the new contract.
This is called a paired sample, because the samples from the
population of control values and from the population of treatment values come
in pairs: one value for control and one for treatment for each individual in the sample.
Testing the hypothesis that the difference between two population means is equal to
μ0 using a paired sample is just the problem of testing the
hypothesis that the population mean μ of the set of differences

di = ti − ci,
i= 1, 2, …, N,

is equal to μ0.
Denote the n (random) observed values of
ci and ti by
{C1, C2, …, Cn}
and {T1, T2, …, Tn},
respectively.
The sample mean M of the differences between the observed values of
ti and ci is the difference
of the two sample means:

M is an unbiased estimator of μ, and if n is large,
the normal approximation to its probability histogram will be accurate.
The SE of M is the population standard deviation of the N
values {d1, d2,
…, dN}, which we shall denote SDd,
divided by the square root of the sample size, n½.
Let sd denote the sample standard deviation of the n observed differences

(Ti−Ci),
i=1, 2, …, n:

sd = ( ((T1−C1−M)2 +
(T2−C2−M)2 +
… +
(T1−Cn−M)2)/(n−1))½

(recall that M is the sample mean of the observed differences).
If the sample size n is large, sd is very likely to be close to
SD(d), and so, under the null hypothesis,

Z = (M−μ0)/(sd/n½)

has expected value zero, and when n is large the probability histogram of
Z can be approximated well by the normal curve.
Thus we can use Z as the Z statistic in a z
test of the null hypothesis that μ=μ0.
Under the alternative hypothesis that μ0
(doctors on the average are paid less than an additional $1,000 per
year under the new contract), the expected value of Z is less than zero,
so we should use a left-tail z test.
Under the alternative hypothesis μ≠μ0 (on average, the
difference in average annual payments to doctors is not an increase of $1,000,
but some other number instead), the expected value of Z could be positive
or negative, so we would use a two-tail z test.
Under the alternative hypothesis that μ<μ0 (on average,
under the new contract, doctors are paid less than an additional $1,000 per year),
the expected value of Z would be less than zero, so we should use a
left-tail z test.

Independent Samples

Consider two separate populations of numbers, with population means
μt and μc, respectively.
Let μ=μt−μc
be the difference between the two population means.
We would like to test the null hypothesis that μ=μ0
against the alternative hypothesis that μ>0.
For example, let μt be the average annual payment by an
HMO to doctors in the Los Angeles area, and let μc be the
average annual payment by the same HMO to doctors in the San Francisco area.
Then the null hypothesis with μ0=0 is that the HMO pays doctors in the
two regions the same amount annually, on the average; the alternative hypothesis
is that the average annual payment by the HMO to doctors differs between the two areas.
Suppose we draw a random sample of size nt with
replacement from the first population, and independently draw a random sample
of size nc with replacement from the second population.
Let Mt and Mc be the sample
means of the two samples, respectively, and let

M = Mt − Mc

be the difference between the two sample means.
Because the expected value of Mt is μt
and the expected value of Mc is μc,
the expected value of M is

E(M) = E(Mt −
Mc) = E(Mt) −
E(Mc) = μt − μc = μ.

Because the two random samples are independent,
Mt and −Mc are independent random variables,
and the SE of their sum is

SE(M) = (SE2(Mt) +
SE2(Mc))½.

Let st and sc be the
sample standard deviations of the two samples, respectively.
If nt and nc are both very large,
the two sample standard deviations are quite likely to be close to the standard
deviations of the two populations, and so

st/nt½

is likely to be close to SE(Mt), and

sc/nc½

is likely to be close to SE(Mc).
Therefore, the pooled estimate of the standard error

sep =
( (st/nt½)2 +
(sc/nc½)2)½
= ( st2/nt +
sc2/nc )½

is likely to be close to SE(M).
Under the null hypothesis, the statistic

M − μ0

M1 − M2 − μ0

Z

=

-------------

=

-------------------------

Sdiff

(S12/n1 +
S22/n2)½

has zero expected value, and its probability histogram is approximated well by the
normal curve, so we can use it as the Z statistic in a z test.

Under the alternative hypothesis

μ = μt − μc > μ0,

the expected value of Z is greater than zero, so it is appropriate to use a
right-tail z test.

If the alternative hypothesis were μ≠μ0, under the alternative
the expected value of Z could be greater than zero or less than zero, so it would
be appropriate to use a two-tail z test. If the alternative hypothesis were
μ<μ0, under the alternative the expected value of Z
would be less than zero, so it would be appropriate to use a right-tail z test.

The following exercises check that you can compute the z test for a population
mean or a difference of population means.
The exercises are dynamic: the data will tend to change when you reload the page.

t Tests

For the nominal significance level of the z test for a population mean to
be approximately correct, the sample size typically must be large.
When the sample size is small, two factors limit the accuracy of the z test:
the normal approximation
to the probability distribution of the sample mean can be poor,
and the sample standard deviation can be an inaccurate estimate of the
population standard deviation, so se is not an accurate estimate
of the SE of the test statistic Z.
For nearly normal populations, defined in the next subsection, the
probability distribution of the sample mean is nearly normal even when
the sample size is small, and the uncertainty of the sample standard
deviation as an estimate of the population standard deviation can be
accounted for by using a curve that is broader than the normal curve
to approximate the probability distribution of the (approximately)
standardized test statistic.
The broader curve is Student's t curve.
Student's t curve depends on the sample size:
The smaller the sample size, the more spread out the curve.

Nearly Normally Distributed Populations

A list of numbers is nearly normally distributed if the fraction
of values in any range is close to the area under the normal curve for the
corresponding range of standard units—that is, if the list has mean μ and
standard deviation SD, and for every pair of values a < b,

(the fraction of numbers in the list between a and b)

is approximately equal to

(the area under the normal curve between (a − μ)/SD and (b − μ)/SD ).

A list is nearly normally distributed if the normal curve is a good approximation to the
histogram of the list transformed to standard units.
The histogram of a list that is approximately normally distributed is (nearly)
symmetric about some point, and is (nearly) bell-shaped.

No finite population can be exactly normally distributed, because the normal
curve has positive area between every two distinct values—no matter how large or
small the values.
No population that contains only a finite number of distinct values can be exactly
normally distributed, for the same reason.
In particular, populations that contain only zeros and ones are not approximately
normally distributed, so results for the sample mean of samples drawn from
nearly normally distributed populations need not apply to the sample
percentage of samples drawn from 0-1 boxes.
Such results will be more accurate for the sample percentage when the population
percentage is close to 50% than when the population percentage is close to 0% or 100%,
because then the histogram of population values is more nearly symmetric.

Suppose a population is nearly normally distributed.
Then a histogram of the population is approximately symmetric about the mean
of the population.
The fraction of numbers in the population within ±1 SD of the mean of the
population is about 68%, the fraction of numbers within ±2 SD of the mean
of the population is about 95%, and the fraction of numbers in the population
within ±3 SD of the mean of the population is about 99.7%.

The following exercises check that you understand what it means for a list to
be nearly normally distributed.
The exercises are dynamic: the data tend to change when you reload the page.

Student's t-curve

Student's t curve is similar to the normal curve, but broader.
It is positive, has a single maximum, and is symmetric about zero.
The total area under Student's t curve is 100%.
Student's t curve approximates some probability histograms more
accurately than the normal curve does.
There are actually infinitely many Student t curves,
one for each positive integer value of the degrees of freedom.
As the degrees of freedom increases, the difference between
Student's t curve and the normal curve decreases.

Consider a population of N units labeled with numbers.
Let μ denote the population mean of the N numbers,
and let SD denote the population standard deviation of the N numbers.
Let M denote the sample mean of a random sample of size n drawn
with replacement from a population, and let s denote the sample standard
deviation of the sample.
The expected value of M is μ,
and the SE of M is
SD/n½. Let

Z = (M − μ)/(SD/n½).

Then the expected value of Z is zero, the SE of Z is 1, and if n
is large enough, the normal curve is a good approximation to the probability histogram
of Z.
The more nearly normal the distribution of values in the population, the smaller
n needs to be for the normal curve to be a good approximation to the distribution
of Z.
Consider the statistic

T = (M − μ)/(s/n½),

which replaces SD by its estimated value (the sample standard deviation s).
If n is large enough, s is very likely to be close to SD, so T
will be close to Z; the normal curve will be a good approximation to the
probability histogram of T; and we can use T as the Z
statistic in a z test of hypotheses about μ.

For many populations, when the sample size is small—say less than 25, but the accuracy
depends on the population—the normal curve is not a good approximation to the
probability histogram of T.
For nearly normally distributed populations, when the sample size is intermediate—say 25–100,
but again this depends on the population—the normal curve is a good approximation
to the probability histogram of Z, but not to the probability histogram of
T, because of the variability of the sample standard deviation s
from sample to sample, which tends to broaden the probability distribution of
T (to make SE(T)>1).

For nearly normally distributed populations, Student's t curve is a better
approximation to the probability histogram of T than the normal curve is.
Student's t curve is broader and flatter than the normal curve, which
accounts for the extra variability in the distribution of T.
Actually, Student's t curve is not one curve: It is a family of curves,
one for each value of the degrees of freedom d.f., 1, 2, ….
In approximating the probability histogram of T, the appropriate value of
d.f. to use is n −1, one less than the sample size.
When d.f. is small, Student's t curve is much broader and flatter than
the normal curve.
As d.f. grows, Student's t curve gets closer and closer to the normal curve;
for d.f. over 200, the two curves are essentially indistinguishable.
For every value of the degrees of freedom, the total area under Student's
t curve is 100%, the curve has a single peak at zero, and the curve is
symmetric about zero.
plots Student's t curve for various values of the degrees of freedom, and
gives the area under Student's t curve over any interval.

When you first load this page, the degrees of freedom will be set to 25, and the
region from −1.96 to 1.96 will be hilighted.
The area under the normal curve between ±1.96 is 95%, but for Student's
t curve with 25 degrees of freedom, the area is about 93.9%:
Student's t curve with d.f.=25 is broader than the normal curve.
Increase the degrees of freedom to 200; you will see that the Student t curve gets
slightly narrower, and the area under the curve between ±1.96 is about 94.9%.

We define quantiles of Student t curves in the same way we defined quantiles of the
normal curve: For any number a between 0 and 100%, the a quantile of Student's t
curve with d.f.=d, td,a, is the unique
value such that the area under the Student t curve with d degrees
of freedom from minus infinity to td,a is equal
to a.
For example, td,0.5 = 0 for all values of
d. Generally, the value of td,a
depends on the degrees of freedom d.
The probability calculator allows you to find quantiles of Student's t curve.

t test for the Mean of a Nearly Normally Distributed Population

We can use Student's t curve to construct approximate tests of hypotheses about
the population mean μ when the population standard deviation is unknown, for
intermediate values of the sample size n.
The approach is directly analogous to the z test, but instead of using a
quantile of the normal curve, we use the corresponding quantile of Student's
t curve for the appropriate value of degrees of freedom.
However, for the test to be accurate when n is small or intermediate,
the distribution of values in the population must be nearly normal, which is a
somewhat bizarre restriction: It can require a very large sample to detect that the
population is not nearly normal, but if the sample is very large, we can use the
z test instead of the t test.
It is my opinion that the t test is over-taught and overused—because its
assumptions are not verifiable in the situations where it is potentially useful.

Consider testing the null hypothesis that μ=μ0 using the sample
mean M and sample standard deviation s of a random sample of size
n drawn with replacement from a population that is known to have a nearly
normal distribution.
Define

T = (M − μ0)/(s/n½).

Under the null hypothesis, if n is not too small, Student's t curve
with n−1 degrees of freedom will be an accurate
approximation to the probability
histogram of T, so

P(T < tn−1,a),

P(T > tn−1,1−a),

and

P(|T| > tn−1,1−a/2)

all are approximately equal to a.
As we saw earlier in this chapter for the Z statistic, these three
approximations give three tests of the null hypothesis μ=μ0
at approximate significance level a—a left-tail t test, a right-tail
t test, and a two-tail t test:

Reject the null hypothesis if T < tn−1,a
(left-tail)

Reject the null hypothesis if T > tn−1,1−a
(right-tail)

Reject the null hypothesis if |T| >
tn−1,1−a/2 (two-tail)

To decide which t test to use, we can apply the same rule of thumb we used for
the z test:

Use a left-tail t test if, under the alternative hypothesis, the expected
value of T is less than zero.

Use a right-tail t test if, under the alternative hypothesis, the expected
value of T is greater than zero.

Use a two-tail t test if, under the alternative hypothesis, the expected
value of T is not zero, but could be less than or greater than zero.

Consult a statistician for a more appropriate test if, under the alternative hypothesis,
the expected value of T is zero.

P-values for t tests are computed in much the same way as P-values for
z tests.
Let t be the observed value of T (the t score).
In a left-tail t test, the P-value is the area under Student's
t curve with n−1 degrees of freedom, from minus infinity to t.
In a right-tail t test, the P-value is the area under Student's t curve
with n−1 degrees of freedom, from t to infinity.
In a two-tail t test, the P-value is the total area under
Student's t curve with n−1 degrees of freedom between minus infinity and
−|t| and between |t| and infinity.

There are versions of the t test for comparing two means, as well.
Just like for the z test, the method depends on how the samples from
the two populations are drawn.
For example, if the two samples are paired (if we are sampling individuals
labeled with two numbers and for each individual in the sample, we observe
both numbers), we just base the t test on the sample mean of the paired
differences and the sample standard deviation of the paired differences.
Let μ1 and μ2
be the means of the two populations, and let

and the appropriate curve to use to find the rejection region for the test is Student's
t curve with n−1 degrees of freedom, where n is the number of
individuals (differences) in the sample.

Two-sample t tests for a difference of means using independent samples depend on
additional assumptions, such as equality of the two population standard deviations;
we shall not present such tests here.
The following exercises check your ability to compute t tests.
The exercises are dynamic: the data tend to change when you reload the page.

Hypothesis Tests and Confidence Intervals

There is a deep connection between hypothesis tests about parameters, and
confidence intervals for parameters.
If we have a procedure for constructing a level 100%×(1−a) confidence
interval for a parameter μ, then the following rule is a two-sided significance
level a test of the null hypothesis that μ = μ0:

reject the null hypothesis if the confidence interval does not contain μ0.

Similarly, suppose we have an hypothesis-testing procedure that lets us test the null
hypothesis that μ=μ0 for any value of
μ0, at
significance level a. Define

A = (all values of μ0 for which we would not reject the null
hypothesis that μ = μ0).

Then A is a 100%×(1−a)
confidence set for
μ:

P(A contains the true value of μ) = 100%×(1−a).

(A confidence set is a generalization of the idea of a confidence interval:
a 1−a confidence set for the parameter
μ is a random set that
has probability 1−a of containing μ.
As is the case with confidence intervals, the probability makes sense only
before collecting the data.)
The set A might or might not be an interval, depending on the nature of the test.
If one starts with a two-tail z test or two-tail t test, one
ends up with a confidence interval rather than a more general confidence set.

Confidence Intervals Using Student's t curve

The t test lets us test the hypothesis that the population mean μ
is equal to μ0 at approximate significance level a using a
random sample with replacement of size n from a population with a
nearly normal distribution.
If the sample size n is small, the actual significance level is
likely to differ considerably from the nominal significance level.
Consider a two-sided t test of the hypothesis μ=μ0
at significance level a.
If the sample mean is M and the sample standard deviation is s,
we would not reject the null hypothesis at significance level a if

|(M−μ0)/(s/n½)| =
tn−1,1−a/2.

We rearrange this inequality:

−tn−1,1−a/2 ≤
(M−μ0)/(s/n½) ≤
tn−1,1−a/2

−tn−1,1−a/2 ×
s/n½ ≤ M − μ0 ≤
tn−1,1−a/2 × s/n½

−M − tn−1,1−a/2 ×
s/n½ ≤ − μ0 ≤
−M + tn−1,1−a/2 ×
s/n½

M + tn−1,1−a/2 ×
s/n½ ≤ μ0 ≤
M − tn−1,1−a/2 ×
s/n½

M − tn−1,1−a/2 ×
s/n½ ≤ μ0 ≤
M + tn−1,1−a/2 ×
s/n½.

That is, we would not reject the hypothesis μ = μ0
provided μ0 is in the interval

[M − tn−1,1−a/2 ×
s/n½, M +
tn−1,1−a/2 ×
s/n½].

Therefore, that interval is a 100%−a confidence interval for μ:

P([M − tn−1,1−a/2 ×
s/n½, M +
tn−1,1−a/2 ×
s/n½] contains μ) ~ 1−a.

The following exercise checks that you can use Student's t curve to construct
a confidence interval for a population mean.
The exercise is dynamic: the data tend to change when you reload the page.

Summary

In hypothesis testing, a Z statistic is a random variable whose
probability histogram is approximated well by the normal curve if the null
hypothesis is correct:
If the null hypothesis is true, the expected value of a Z statistic is zero,
the SE of a Z statistic is approximately 1, and the probability that a
Z statistic is between a and b
is approximately the
area under the normal curve between a and b.
Suppose that the random variable Z is a Z statistic.
If, under the alternative hypothesis, E(Z)<0, the appropriate
z test to test the null hypothesis at approximate significance level a
is the left-tailed z test: Reject the null hypothesis
if Z<za, where
za is the a quantile of the normal curve.
If, under the alternative hypothesis, E(Z)>0, the appropriate
z test to test the null hypothesis at approximate significance level a
is the right-tailed z test:
Reject the null hypothesis if Z>z1−a.
If, under the alternative hypothesis, E(Z)≠0 but could be greater
than 0 or less than 0, the appropriate z test to test the null
hypothesis at approximate significance level a is the two-tailed z test:
reject the null hypothesis if |Z|>z1−a/2.
If, under the alternative hypothesis, E(Z)=0, a z test probably
is not appropriate—consult a statistician.
The exact significance levels of these tests differ from a
by an amount that depends on how closely the normal curve approximates the
probability histogram of Z.

Z statistics often are constructed from other statistics by transforming
approximately to standard units, which requires knowing the expected value and
SE of the original statistic on the assumption that the null hypothesis is true.
Let X be a test statistic; let E(X) be the expected value of
X if the null hypothesis is true, and let se be approximately equal to
the SE of X if the null hypothesis is true.
If X is a sample sum of a large random sample with replacement, a
sample mean of a large random sample with replacement, or a sum or difference of
independent sample means of large samples with replacement,

Z = (X−E(X))/se

is a Z statistic.

Consider testing the null hypothesis that a population percentage p is equal
to the value p0 on the basis of the sample percentage φ of a
random sample of size n with replacement.
Under the null hypothesis, E(φ)=p0 and

SE(φ) =
(p0×(1−p0))½/n½,

and if n is sufficiently large
(say n×p>30 and n×(1−p)>30,
but this depends on the desired accuracy), the normal approximation to

Z =
(φ−p0)/((p0 ×
(1−p0))½/n½)

will be reasonably accurate, so Z can be used as the Z statistic in a
z test of the null hypothesis p=p0.

Consider testing the null hypothesis that a population mean μ is equal to the
value μ0, on the basis of the sample mean
M of a random
sample of size n with replacement.
Let s denote the sample standard deviation.
Under the null hypothesis, E(M)=μ0, and if n is large,

SE(M)=SD/n½~s/n½,

and the normal approximation to

Z = (M−μ0)/(s/n½)

will be reasonably accurate, so Z can be used as the Z statistic in a
z test of the null hypothesis μ=μ0.

Consider a population of N individuals, each labeled with two numbers.
The ith individual is labeled with the numbers
ci and ti, i=1, 2,
…, N.
Let μc be the population mean of the N values
{c1, …, cN}
and let μt be the population mean of the N values
{t1, …, tN}.
Let μ=μt−μc be the difference
between the two means.
Consider testing the null hypothesis that μ=μ0 on the basis of a
paired random sample of size n with replacement from the population: that is,
a random sample of size n is drawn with replacement from the population, and
for each individual i in the sample, ci and
ti are observed.
This is equivalent to testing the hypothesis that the population mean of the N
values {(t1−c1), …, (tN−cN)}
is equal to μ0, on the basis of the sample random sample of size
n drawn with replacement from those N values.
Let Mt be the sample mean of the n observed
values of ti and let Mc be
the sample mean of the n observed values of ci.
Let sd denote the sample standard deviation of the n observed differences
{(ti−ci)}.
Under the null hypothesis, the expected value of

(Mt−Mc) is μ0,
and if n is large,

SE(Mt−Mc)~sd/n½,

and the normal approximation to the probability histogram of

Z =
((Mt−Mc−μ0)/(sd/n½)

will be reasonably accurate, so Z can be used as the Z statistic in a z
test of the null hypothesis that
μt−μc=μ0.

Consider testing the hypothesis that the difference
(μt−μc)
between two population means, μc and
μt, is equal to μ0, on the basis of the
difference (Mt−Mc)
between the sample mean Mc of a random sample of size
nc with replacement from the first population and the sample mean
Mt of an independent random sample of size
nt with replacement from the second population.
Let sc denote the sample standard deviation of the sample
of size nc from the first population and let
st denote the sample standard deviation of the sample
of size nt from the second population.
If the null hypothesis is true,

E(Mt−Mc)=μ0,

and if nc and nt are both large,

SE(Mt−Mc) ~
(st2/nt +
sc2/nc)½

and the normal approximation to the probability histogram of

Z =
(Mt−Mc−μ0)/(st2/nt + sc2/nc)½

will be reasonably accurate, so Z can be used as the Z statistic in a
z test of the null hypothesis that
μt−μc=μ0.

A list of numbers is nearly normally distributed if the fraction of numbers between
any pair of values, a<b, is approximately equal to the area under the
normal curve between (a−μ)/SD and (b−μ)/SD, where
μ is
the mean of the list and SD is the standard deviation of the list.
Student's t curve with d degrees of freedom is symmetric about 0, has a
single bump centered at 0, and is broader and flatter than the normal curve.
The total area under Student's t curve is 1, no matter what d is;
as d increases, Student's t curve grows closer and closer to
the normal curve.
Let M be the sample mean of a random sample of size n with
replacement from a population with mean μ and a nearly normal distribution,
and let s be the sample standard deviation of the random sample.
For moderate values of n (n<100 or so), Student's t curve
approximates the probability histogram of
(M−μ)/(s/n½) better than the normal
curve does, which can lead to an approximate hypothesis test about μ
that is more accurate than the z test:
Consider testing the null hypothesis that the mean μ of a population with a
nearly normal distribution is equal to μ0 from a random sample of
size n with replacement.
Let

T=(M−μ0)/(s/n½),

where M is the sample mean and s is the sample standard deviation.
The tests reject the null hypothesis if T<tn−1,a
(left-tail t test), reject the null hypothesis if
T>tn−1,1−a (right-tail t test),
and reject the null hypothesis if |T|>tn−1,1−a/2
(two-tail t test) all have approximate significance level a.
How close the nominal significance level a is to the actual significance level
depends on the distribution of the numbers in the population, the sample size n, and a.
The same rule of thumb for selecting whether to use a left, right, or two-tailed z
test (or not to use a z test at all) works to select whether to use a left, right,
or two-tailed t test: If, under the alternative hypothesis, E(T)<0,
use a left-tail test.
If, under the alternative hypothesis, E(T)>0, use a right-tail test.
If, under the alternative hypothesis, E(T) could be less than zero or greater
than zero, use a two-tail test.
If, under the alternative hypothesis, E(T)=0, consult an expert.
Because the t test differs from the z test only when the sample
size is small, and from a small sample it is not possible to tell whether the population
has a nearly normal distribution, the t test should be used sparingly, if ever.

A 1−a confidence set for a parameter μ is like a 1−a
confidence interval for a parameter μ: It is a random set of values that has probability
1−a of containing the true value of μ.
The difference is that the set need not be an interval.
There is a deep duality between hypothesis tests about a parameter μ and confidence
sets for μ.
Given a procedure for constructing a 1−a confidence set for μ, the rule
reject the null hypothesis that μ=μ0 if the confidence
set does not contain μ is a significance level a test of the null
hypothesis that μ=μ0.
Conversely, given a family of significance level a hypothesis tests that allow
one to test the hypothesis that μ=μ0 for any value of
μ0, the set of all values μ0 for which one would not
reject the null hypothesis that μ=μ0 is a 1−a
confidence set for μ.