Stats without Tears7. Normal Distributions

View orPrint:
These pages change
automatically for your screen or printer.
Underlined text, printed
URLs, and the table of contents become live links on screen;
and you can use your browser’s commands to change the size of
the text or search for key words.If you print, I suggest black-and-white,
two-sided printing.

Summary:
The normal distribution (ND) is important for two reasons.
First,
many natural and artificial processes are ND.
You’ll look at some of those in this chapter.
Second,
any process can be treated as a ND through sampling.
That will be the subject of Chapter 8,
and it’s also the foundation of the inferential statistics
you’ll do in Chapters 9 through 11.

7A. Continuous Random Variables

You met random
variables back in Chapter 6.
Any random variable has a single numerical value, determined by
chance, for each outcome of a procedure.
Discrete random variables are limited to specified values, usually
whole numbers.
But a continuous random variable can
take any value at all, within some interval or across all the real
numbers.

Just as discrete probability models are used to model discrete
variables, continuous probability models are used to model
continuous variables. Of course, because a continuous random variable
has infinitely many possible values, you can’t make a table of
values and probabilities as you could do for a discrete distribution.
Instead, either there’s an equation, or just a density curve
(below).

A probability model is often called a
distribution, so you can say that a variable “is normally
distributed” (ND), that it “is a normal distribution”
(also ND), or that it “follows a normal probability model”.

There are lots of specialized continuous distributions, but
the normal distribution is most important by a
wide margin. Many, many real-life processes follow the normal model,
and the ND is also the key to most of our work in inferential
statistics.

This section will give you some concepts that are common to
all continuous distributions, and the rest of the chapter will talk
about special properties of the normal distribution and
applications. In Chapter 8, you’ll
apply the normal distribution to get a handle on the variation from
one sample to the next.

7A1. Density Curves

In Chapter 2, you learned to graph continuous data by
grouping the data in classes and
making a histogram, like the one below left.
This is wait times in a fast-food drive-through, with time in
minutes — not whole minutes, which would make a discrete
distribution, but minutes and fractional minutes.

Any sample you might take has a finite number of data points, so
you set up classes, place the data points in the classes, and then
draw a histogram. The height of each bar is proportional to the
frequency or relative frequency of that class.

But when you come to consider all the possible values of a
continuous variable, you have an infinite number of data points.
If you tried to assign them to classes, it would take you
forever —literally! Instead, you draw a smooth curve,
called a density curve, to show the possible values and how
likely they are to occur. An example is shown above right.

The density curve is a
picture of a continuous probability model. It doesn’t
just represent the data in a particular sample, but all possible data
for that variable — along with the probabilities of their
occurrence, as you’ll see next.

7A2. Probability and Continuous Distributions

Up to now, the height of a bar in a histogram has
been the number of data points in that class, or the relative
frequency of that class.
But how do you interpret the height of a density curve?

Answer: you don’t! The height of the curve
above any particular point on the x axis just doesn’t lend
itself to a simple interpretation. You might think it would be the
probability of that value occurring. But with infinitely many possible
values, “what’s
the likelihood of a wait time of exactly 4 minutes?” just
isn’t a meaningful question, because what about 3.99997 minutes
or 4.002 minutes?

Area = Probability

What is meaningful is the
probability within an interval, which equals the
area under the curve within that interval. For example, in this
illustration, the probability of a wait time of 6.4 to 9.5 minutes is
29.4%. In symbols,

P(6.4 ≤ x ≤ 9.5) = 29.4%

or

P(6.4 < x < 9.5) = 29.4%

That’s right —
the probability is the same whether you include or exclude the endpoints of the interval.

Okay, I lied. The height of the curve is meaningful,
but only if you’ve had some calculus. The curve is the graph of a
probability density function or pdf. The integral of that
curve from a to b is the area between x=a and x=b and is the
probability that the random variable will have a value between a and
b.

This explains why the probability is the same whether you
include or exclude either endpoint of the interval. The difference is
the area of a “rectangle” whose height is the height of the
density curve and whose width is the distance from a to
a — which is zero. Thus the area of the
“rectangle” is zero, and the probability of the random
variable taking any particular value, exactly, is zero.

Since area equals probability, and total probability must be 1,
total area must be 1. Every pdf — the height of every
density curve — is scaled so that the integral from
−∞ to +∞ is 1.

You can also have the probability for
an
interval with one boundary,
&lt or ≤ some value like the picture at right, or > or ≥ some
value. For example, 3.33 minutes is about 3 minutes and 20 seconds,
so the probability of waiting up to 3 minutes and 20 seconds is 20.6%:
P(x ≤ 3.33) = 20.6%.

The total area under any density curve equals the
probability that the random variable will take any one of its possible
values, which of course is 1, or 100%. So you can use the
complement to say that the probability of waiting 3 minutes and 20
seconds or more (or, more than 3 minutes and 20 seconds) is
100−20.6% = 79.4%.

Two Interpretations of Probability

You remember from
Interpreting Probability
Statements in Chapter 5 that every probability can be
interpreted as a probability of one or a
proportion of all. For example,
P(x > 3.33) = 79.4% can equally well be interpreted
in two ways:

Probability of one: “Any randomly selected person has a 79.4%
chance of waiting more than 3 minutes and 20 seconds.”

Proportion of all: “79.4% of people will wait more than 3
minutes and 20 seconds.”

Which interpretation you use in a given situation depends on
what seems simplest and most natural in the situation. Here, the
“proportion of all” interpretation seems simpler. But
you’re always free to switch to the other interpretation if it
helps you in thinking about a situation.

Area = Probability of One = Proportion of All

7B. The Normal Model

Why study the normal distribution?

First, it’s useful on its own.
Lots and lots of real-life distributions match the normal model:
body temperature or blood pressure of healthy
people, scores on most standardized tests, commute times on a given
route, lifetimes of batteries or light bulbs, heights of men or women,
weights of apples of a particular variety, measurement errors (in many
situations), and on and on.

Why is the ND so common?
In real life, very few events have just one cause; most things are the
result of many factors operating independently.
It turns out that if you take a lot of
independent random variables and add them up, their sum is
ND.
For example, your IQ score
results from multiple genetic factors, countless occurrences in your
education and your family life, even transient factors like how well
you slept the night before the test. Most of these are independent of
each other, so the result of adding them is a ND.

Several mathematicians can claim the
discovery of the normal distribution. Abraham de Moivre
(1667–1754, French) was probably first, in 1733. But the name
of Carl Friedrich Gauss is permanently
coupled to the normal distribution — literally. Although
Sir Francis Galton coined the term normal distribution in
1889, Karl Pearson called it the
Gaussian distribution in 1905, and that’s still a
recognized synonym.

Second,
through sampling, even non-ND populations follow a normal model.
You’ll use this model in inferential statistics to make statements about a
whole population based on just one sample. You’ll learn about
this neat trick in Chapter 8.

7B1. Properties of the Normal Distribution

The normal distribution (ND) has the properties of other
continuous distributions as listed earlier.
In particular, area =
probability, and the total area under the density curve is the
total probability, which is 1. The ND also has these special
properties:

A ND is
completely described by its mean and SD.The mean locates the center of the curve, but has no effect on the
shape. For example, here are three normal curves with
μ = 0, 2, and 5 and σ = 4.

The standard deviation determines the shape of the curve,
but has
no effect on the location. Smaller SD means the data
stick closer to the mean, so the peak is higher and the tails are
shorter and fatter. Larger SD means the data vary more, so they
spread out from the mean: the peak is lower and the tails are longer
and thinner.
The second picture shows are three normal curves
with μ = 2 and σ = 2, 4, and 6. (The
vertical scale is different from the first picture.)

The ND is symmetric — left and right
sides are mirror images of each other. This implies that the
mean, median and mode are all equal.

In principle,
the tails of the normal curve run out to ±∞.
However,
data points more than 3 standard deviations from the mean are rare.
(This is part of the Empirical Rule from
Chapter 3.)

The books all say that
inflection points are one SD above and below the mean.
Inflection points, if you haven’t had calculus, are where the
curve transitions between concave up and concave down.
The books don’t tell you that those points are far from obvious
visually. Just do the best you can when making sketches.

All of this is the theoretical normal distribution. In
fact,
nothing in real life is perfectly ND,
because nothing in
real life has an infinite number of data points.
When we say something is ND, we mean it’s a close match,
not a perfect match.
“Normally distributed” (or ND) is short for
“using a normal distribution to model this data set, the
calculations will come out close enough to reality.”

This is a lot like what you did in
Chapter 3, when you computed the
statistics of a grouped
distribution. The statistics were only approximate, because of
the simplification you introduced by grouping, but the approximation
was good enough.

Now let’s get to some applications! There are two main
categories: “forward” problems, where you have the
boundaries and you have to find the area or probability, and
“backward” problems, where you have a probability or area
and you have to find the boundaries.

In case you’re interested, the
pdf, the height of the density curve above a given x, is
.
The cdf, the area to the left of a given x, is the integral of that,
just the same as finding the area under any curve to the left of a
given x:
.
This integral doesn’t have a “closed form”, a finite
sequence of basic algebraic operations, so it must be found by
successive approximations. That’s what your calculator does with
normalcdf and Excel does with NORM.DIST.

7B2. From Boundaries, Find Probability

Summary:
Make a sketch, estimate the
probability (area), then compute it.

TI-83/84/89:
Use normalcdf(left bound,
right bound, mean, SD). I’ll
walk you through the TI-83/84 keystrokes in the first example below.
If you have a TI-89, press [CATALOG] [F3] [plain6makesN]
[ENTER].

Example 1:
Heights of human children of a given age and sex are ND. One
study found that three-year-old girls’ heights
have a mean of
38.72″ and SD of 3.17″. What
percentage of three-year-old girls are 35″ to
40″ tall?

Solution:Take the time to make a sketch.
It doesn’t have to be
beautiful, but you should make it as accurate as you reasonably can.
It’s an important safeguard against making boneheaded mistakes.
Here’s what should be on your sketch:

Draw the axis line.

Label the axis, x or z as appropriate. x is the symbol for
real-world data points, and z is the symbol for z-scores in the
standard normal distribution, below.

Draw a vertical line in the middle of the distribution and write
the numerical value of the mean below the axis where that central line
meets it.
(If necessary, offset it with a tick
mark, as I did.)

Draw a horizontal line at about the right spot and show the
numerical value of the standard deviation.

Draw a line and show the value for each boundary.

Important: When you marked the SD, you set the
scale for the sketch. Now you have to honor that and place your
boundaries in proportion. For instance, in
this problem the mean is 38.72 and the left boundary is 35, which is
3.72 below the mean. Your left boundary therefore needs to be a bit
more than one SD (3.17) left of the mean. The right bound is 40, which is
1.28 above the mean, so your line needs to
be just over a third of a
SD to the right of the mean.

(Students often put in more numbers and lines, like the values of 1,
2, and 3 SD above and below the mean. That’s not wrong, but
it’s usually not helpful, and it definitely clutters up the
sketch.)

Shade the area you’re trying to find.

Look at your sketch and
estimate the area before you
pull out your calculator. That way, if you
make a mistake that leads to a ridiculous answer, you’ll
recognize it as ridiculous and fix it.

From my sketch, I estimate an area of 50%–60%. If
it’s 45% or 70% I won’t be terribly surprised, but if
it’s 5% or 99% I’ll know something is wrong.

Compute the area (below).

If you wish, add that number
to your sketch — not below the axis, please. Write
it within the
shaded area, if there’s room, or as a callout to the left or
right of the diagram, the way I did here.

Computing the Area

On a TI-83 or TI-84, press [2ndVARSmakesDISTR] [2] to
select normalcdf. Enter the left boundary (35), right
boundary (40), mean (38.72), and SD (3.17).

After entering the standard deviation, press [)] [ENTER] to get
the answer.

You always need to show your work, so write down
normalcdf(35,40,38.72,3.17) before you proceed to the
answer. (There’s no need to write down the keystrokes you
used.)

In this book, I
round probabilities to four decimal places, or two decimal
places if expressed as a percentage. The probability is

P(35 ≤ x ≤ 40) = 0.5365

That number matches my estimate of 50%–60%.

But the problem asked for a percentage.
(Always, always, always
look back at the problem and make sure you’re answering the question that was actually asked.)
The answer:
53.65% of three-year-old girls are 35″ to 40″ tall.

Example 2:
A three-year-old girl is randomly chosen. Would it be unusual
(unexpected, surprising) if
she’s over 45″ tall?

In Chapter 5 you learned to
call a low-probability event unusual (a/k/a surprising or unexpected).
The standard definition of unusual events is a probability below 0.05,
so really this problem is just asking you to find the probability and
compare it to 0.05.

Solution: The sketch is at right, and obviously the
probability should be small. The left boundary
is 45, but what’s the right boundary? The normal distribution
never quite ends, so the right boundary is ∞ (infinity).
TI-89s have a key for ∞, but TI-83s and TI-84s don’t
and Excel doesn’t, so
use 10^99 instead. (That’s 10 to the 99th power; the
[^] key on your TI calculator is between [CLEAR] and
[÷].)

Show your work:

P(x > 45) = normalcdf(45,10^99,38.72,3.17) =
0.0238

That’s rounded from 0.0237914986, and it’s in line
with my estimate of “small”. Now answer the question:
There’s only a 2.38% chance that a randomly selected three-year-old girl will be over 45″ tall, so that would be unusual.

Example 3:
For the same population, find and interpret P(x < 33).

Solution:
The sketch is at right, and again the expected probability is small.
The right boundary is 33, but what’s the left boundary? You
might want to use 0, since no one can be under 0″ tall,
but you could make the same argument for 1″ or
5″, so that can’t be right.

To locate the left boundary, remember that
you’re using a normal model to
approximate the data, and the normal distribution runs right out to
±∞. Therefore, the left boundary is
minus ∞ on a TI-89, or minus 10^99 on a TI-83/84. (Use the
[(-)] key, not the [−] subtraction key.)

P(x < 33) = normalcdf(-10^99,33,38.72,3.17) =
0.0356

The proportion of three-year-old girls under 33″ tall is 0.0356
or 3.56%;
or,
3.56% of three-year-old girls are under 33″ tall.
The other interpretation is
the chance that a randomly selected three-year-old girl is under 33″ tall is 0.0356
or 3.56%.

Percentiles

Example 4:
What’s the percentile rank of a three-year-old girl who
is 33″ tall?

Solution: Long ago, in a galaxy called
Numbers about Numbers, you learned the definition of
percentiles. The
percentile rank of a data point is the percentage of the data set that
is ≤ that data point. So you need P(x ≤ 33). But
that’s exactly what you computed in the previous example: 3.56%.
So the 33″-tall girl is between the third and fourth
percentiles for her age group.

“That was P(x < 33), and for a
percentile I need P(x ≤ 33)!” I hear you yell.
But those two are equal. When we talked about
density curves, near the beginning of this
chapter, you learned that the area and probability
are the same whether you include or exclude the boundary.

And this is why it doesn’t make much difference whether
you define a percentile rank in terms of < or ≤, because the
probability in a continuous distribution is the same either way.

7B3. From Probability, Find
Boundaries

Summary:
Make a sketch, estimate the
value(s), then compute the value(s).

TI-83/84/89:
Use invNorm(area to left,
mean, SD). I’ll
walk you through the TI-83/84 keystrokes in the first example below.
If you have a TI-89, press
[CATALOG] [F3] [plain9makesI] [▼ 3 times] [ENTER].

Excel:
In Excel 2010 or later, use
=NORM.INV(area to left, mean, SD).
In Excel 2007 or earlier, it’s
NORMINV rather than NORM.INV.

Example 5:
Blood pressure is stated as two numbers, systolic over
diastolic. The World Health Organization’s
MONICA Project
(Kuulasmaa 1998 [see “Sources Used” at end of book])
reported these parameters for the US:

Systolic: μ = 120, σ = 15

Diastolic: μ = 75, σ = 11

Blood pressure in the population is normally distributed. The
lowest 5% is considered “hypotensive”, according to
Kuzma and Bohnenblust (2005, 103) [see “Sources Used” at end of book].
What systolic blood pressure would be
considered hypotensive?

Always estimate your answer to guard against at least
some errors. In the sketch, x1 looks like it’s not
quite two SD left of the mean, so I’ll estimate
a pressure of 95 to 100. (Okay, I cheated by using
my calculator to make my “sketch”. But even with a real
pencil-and-paper sketch, you ought to be in the right ballpark.)

Now you’re ready to calculate.
TI-89 or Excel users, please see the
instructions above. On your TI-83 or TI-84,
press [2ndVARSmakesDISTR] [3] to
select invNorm. Enter the area to the left of the point
you’re interested in (.05), the mean (120), and the SD (15).

Show your work! Write down
invNorm(.05,120,15) before you proceed to the
answer. (There’s no need to write down the keystrokes you
used.)

Answer:
Systolic blood pressure (first number) under 95 would be considered hypotensive.

Example 6: The same source considers the top 5%
“hypertensive”. What is the minimum systolic blood pressure
that is hypertensive?

Solution: My “sketch” is at right. It’s
mostly straightforward — the x1 boundary is
between the 5% tail and the rest of the distribution.

But what’s up with the 1−0.05?
The problem asks you about the upper 5%, which is the area to the
right of the unknown boundary. But
invNorm on the calculator, and NORM.INV in
Excel, need area to left of the desired boundary.
The area to the left is the probability of
“not hypertensive”, and area is probability, so the area to
left is 1 minus the area to right, in this case 1−0.05.

Could you just write down 0.95? Sure, that would be correct.
But if the area to right was 0.1627 you’d probably make the
calculator compute 1 minus that for you, so why not be consistent?

x1 = invNorm(1−.05,120,15) =
144.6728044 → 145

(That’s actually a little liberal. Several sources that
I’ve seen give 140 as the threshold.)

Example 7: Kuzma and Bohnenblust describe the middle 80% as
“normal”. What is that range of systolic blood
pressure?

This problem wants you to find two boundaries, lower and upper.
You have to convert the 80% middle into two areas to left.
Here’s how. If the middle is 80%, then the two tails combined
must be 100−80% = 20%. But the curve is symmetric, so each
tail must be 20/2 = 10%. Strictly speaking, I probably should
have written that computation on the diagram, instead of just a
laconic “0.1”, but it would take up a lot of space and the
computation was easy enough. You’ll probably do the
same — just be careful.

Once you have the areas squared away, the computation is
simple enough:

x1 = invNorm(.1,120,15) =
100.7767265 → 101

x2 = invNorm(1−.1,120,15) =
139.2232735 → 139

Check: The boundaries of the middle 80% (or the middle
any percent) should be equal distances from the mean.
(100.776265+139.2232735)/2 = 120, so
at least it’s consistent. Answer:
Systolic b.p. of 101 to 139 is considered normal.

Percentiles Again

Example 8:
What’s the 40th percentile for systolic blood pressure?

Sometimes the gods smile on us. The kth percentile is the
value that is ≥ k% of the population, so k% is
exactly the area to left that you need.

P40 = invNorm(.4,120,15) =
116.1997935 → 116

7C. The Standard Normal Distribution

Definition:
The
standard normal distribution is a normal distribution with a
mean of 0 and standard deviation of 1, sometimes written N(0,1).

The standard normal distribution is a picture of z-scores of
any possible real-world ND — more about that
later.

The standard normal distribution lets you make computations
that apply to all normal models, not just a particular model.
You’ll see some examples shortly, but first —

7C1. “Normal” and “Standard
Normal”

The main point about the standard normal distribution is that
it’s a stand-in for every ND from real life. How does
this work? Well, if you take any real data set and subtract the mean
from every data point, the mean of the new data set is 0. And if you
then divide that data set by the standard deviation (which
doesn’t change when you subtract a constant from every data
point), then the SD of the new-new data set is 1.

But all you did with those manipulations was replace the
numbers with z-scores. Remember the formula:
.
The standard normal distribution is what you get when you convert any normal model to z-scores.

Long ago, when dinosaurs ruled the earth —
okay, up through the early 1980s —
a “computer” was a person who used a slide rule to make computations.
(I swear I am not making this up.)
There were no statistical calculators and no Excel. The only way
for most people to make computations on a normal model was to look
up probabilities in printed tables. But obviously a book
couldn’t print tables for every normal model. So the printed
tables were for the standard normal distribution. If you had
boundaries and wanted the probability of the interval, you converted
your real-world numbers to z-scores, looked up the probabilities in
the table, and subtracted them. If you had a probability and needed a
boundary, you looked up the z-score in the table and then converted it
to a raw score using the mean and SD of your data set.

The need to do normal computations the hard way has gone the
way of the dinosaurs, but I think this history is why many stats books
still use tables to do their computations. Inertia is a powerful
force in textbooks!

The pdf and cdf functions for the
standard normal distribution are what you get when you set μ=0 and
σ=1 in the general equations for the
ND:
and
.
Again, the integral must be found by
successive approximations. That’s where the tables in books come
from, and it’s what your calculator does with normalcdf
and Excel does with NORM.DIST.

7C2. Applying the Standard Normal
Distribution

I said above that the standard normal
distribution lets you make statements about all normal models. What
sort of statements? Well, the Empirical Rule for one.

Example 9: The Empirical Rule
says that 68% of the population in a normal model lies within one
SD of the mean.
How good is the rule?
In other words, what’s the actual proportion?

Solution: As usual, you start with a sketch.
This is the standard ND,
so the axis is z, not x.
There’s no need to mark the
mean or SD, because the z label identifies this as a
standard normal distribution and therefore μ = 0 and
σ = 1. Just label the boundaries.

Compute the probability the same way you’ve already
learned. (Both Excel and the TIs have special procedures available
for the standard normal distribution, but
it’s not worth taking brain cells to learn them, when
the regular procedures for the ND work just fine with
N(0,1).)

P(−1 ≤ z ≤ 1) =
normalcdf(−1,1,0,1) = .6826894809 →
68.27%

The Empirical Rule says 68% of the data are within
z = ±1. Actually it’s about 68¼%, close
enough.

Example 10:
How many standard deviations must you go above and below the
mean to take in the middle 50% of the data in a normal model?

Solution: This is similar to finding the middle 80% of blood
pressures earlier, except now you’re making a statement about
all normal models, not just a particular one.

Shading the middle 50% leaves 100−50 = 50% in the
two tails combined, so each tail is 50/2 = 25%.

z1 = invNorm(.25,0,1) =
−.6744897495 → −0.67

By symmetry, z2 must be numerically equal to
z1 but have the opposite sign: z2 =
0.67.

50% of the data in any normal model are within about 2/3 of a SD of the mean.
Since the bounds of the middle 50% of the data are Q1 and Q3, the IQR
of any normal distribution is twice that, about
one and a third standard deviations.
More precisely, the IQR is 2×0.674 ≈
1.35 times the SD.

7C3. The z Function (Critical z)

There’s one special notation you’ll use when you
compute confidence intervals in
Chapter 9.

Definition:zarea or z(area), also known as
critical z, is the z-score that divides the standard normal
distribution such that the right-hand tail has the indicated area.

This may seem a little weird, but really it’s just a
recipe to specify a number. Compare with the square root of 48. That
is the positive number such that, if you multiply it by itself, you
get 48. Or consider π: the number that you get when you divide
the circumference of a perfect circle by its diameter. Math is full
of numbers that are specified as recipes. An example will make things
clearer.

Example 11:
Find z0.025.

Solution: The problem is diagrammed at right.
Caution! 0.025 is an area, not a z-score, so you don’t
write 0.025 on the number line (the z axis). z0.025is a z-score (though you don’t know its value yet), so
it goes on the number line.

Once you have your sketch, the computation is straightforward.
Have area (probability), compute boundary.
The area is 0.025, but it’s an area to right, and
invNorm needs an area to left, so you subtract from 1 as
usual:

z0.025 = invNorm(1−.025, 0, 1) =
1.959963986 → 1.96

Caution! You’re computing a boundary for the
right-hand tail. If you get a negative number, that can’t
possibly be right.

z0.025 = 1.96 makes sense, if you think
about it. If you also shaded in the left-hand tail with an area of
0.025, the two tails together would total 5%, leaving 95% in the
middle. The Empirical Rule says
that 95% of data are within 2 SD above and below the mean, and 1.96 is
approximately 2.

7D. Checking for Normality

How do you know whether a normal model is appropriate? How do
you know whether your data are normally distributed? A histogram can
rule out skewed data, or data with more than one peak.

But what if your data are unimodal and not obviously
skewed? Is that enough to justify a normal model? No, it’s
not. You need to perform a test called a normal probability plot.
You’ll need this procedure in Chapters 8 through 11, whenever you
have a small sample of numeric data.

Summary:
To check whether a normal model can
represent your sample, make a normal probability plot. This
plots the actual data points, against the z-scores you would expect
for this number of points that are ND. If the
plot is close to a straight line, a normal model is appropriate; if
the plot is far from a straight line, a normal model is not
appropriate.

That’s the bare outline, and you’ll get a little
bit more with the examples. For those who want the full theory,
it’s marked optional at the end of this section.

Technology:

Testing for normality can be automated partly or completely,
depending on what technology you have:

On a TI-83/84, you have two choices:
Normality Check on TI-83/84,
or the MATH200A program (shown below).
I strongly recommend the program, not just because I wrote
it ☺ but because it saves you a lot of work.
See Getting the Program.

On a TI-89, you have to do the plot and the computations
yourself. See the step-by-step procedure in
Normality Check on TI-89.

7D1. Checking Data Sets

Solution:
Put the data in any statistics list,
then press [PRGM], scroll down to MATH200A, and
press [ENTER] twice. Select Normality chk.

The program makes the plot, and you can look at the points to
determine whether they seem to be pretty much on a straight line. At
least, that’s the theory. In practice, most data sets are a lot
less clear cut than this one. It can be hard to tell whether the
points fit a line, particularly if you have only a few of them. The
plot takes up the whole screen, so deviations can look bigger than
they really are.

Fortunately, there’s a test for whether points lie on a
straight line. As you know from Chapter 4, the closer the
correlation coefficient r is to 1, the
closer the points are to a straight line.

The program computes r for you, and it also computes a
critical value★ to help you determine if the points are
close enough to a straight line. (For technical reasons,
the critical value is different from the
decision points of
Chapter 4.)
If r≥crit, it’s close enough to 1,
the points are close enough to a straight line,
and you can use a normal model.
If r<crit, it’s too far from 1, the
points are too far from a straight line, and you can’t use a
normal model.

For this data set,
r > crit, and therefore these vehicle weights fit the normal model.

★The “classic TI-83”
(non-“Plus” model) doesn’t compute the critical value,
so you have to do it yourself. See the formula in
item 4 in the next section.

Solution:
I entered them in a statistics list and then ran MATH200A Program part 4. The
result was the plot at the right.

You can see that the plot is curved. This is reinforced by
comparing r=0.9473 to crit=0.9639.
r < crit. The
points diverge too far from a straight line, and therefore
I cannot use a normal model for the lengths of my iTunes
songs.

7D2. Optional: How Normal
Probability Plots Work

The basic idea isn’t too bad. You make an
xy scatterplot where the x’s are the data points, sorted in ascending order, and the y’s are the expected z scores for a normal distribution.

Why would you expect that to
be a straight line? Recall the formula for a
z score: z = (x−x̅)/s. Breaking the one
fraction into two, you have z = x/s−x̅/s.
That’s just a linear equation, with slope 1/s and intercept
x̅/s.
So an xz plot of any theoretical ND, plotting each data
point’s z score against the actual data value, would be
a straight line.

Further, if your actual data
points are ND, then their actual z scores
will match their expected-for-a-normal-distribution
z scores, and therefore a
scatterplot of expected z scores against actual data values will
also be a straight line.

Now, in real life no data set is ever exactly a ND, so
you won’t ever see a perfectly straight line.
Instead, you say that the closer the points are to a straight line, the
closer the data set is to normal. If the data points
are too far from a straight line — if their correlation
coefficient r is lower than some critical value — then
you reject the idea that the data set is ND.

Okay, so you have to plot the data points against what their
z-scores should be if this is a ND, and specifically for a
sample of n points from a ND, where n is your sample
size.
This must be built
up in a sequence of steps:

Divide the normal curve (mentally) into n regions of equal probability and take one probability from each region.
For technical reasons, the probability number you use for
region i is (i−.375)/(n+.25). This
formula is in many textbooks, and also in
Normal Probability Plots and Tests for Normality
(Ryan and Joiner 1976 [see “Sources Used” at end of book]).

Compute the expected z scores for those probabilities.
Working with the calculator, that’s just
invNorm of (i−.375)/(n+.25).

Plot those expected z scores against the data values.
This xy plot (or xz plot) has a correlation
coefficient r, computed just like any other correlation
coefficient.

Compare the r for your data set to the critical value for the size of your data set.
Ryan and Joiner
determined that the critical value for sample size n,
at the 0.05 significance
level,, is
1.0063−.1288/√n−.6118/n+1.3505/n².
To make it a little easier on the calculator I rearranged it as
1.0063−.6118/n+1.3505/n²−.1288/√n.

In the same paper, they gave formulas for critical
values at other significance levels:

1.0071−0.1371/√n−0.3682/n+0.7780/n²
at α=0.10

0.9963−0.0211/√n−1.4106/n+3.1791/n²
at α=0.01

The closer the points are to a straight line, the
closer the data set is to fitting a normal model.
In other words, a larger r
indicates a ND, and a smaller r indicates a
non-ND. You can draw one of two conclusions:

If r is less than the critical value,
reject the hypothesis of normality at the 0.05 significance level
and say that the data set is not ND.

(If you haven’t studied hypothesis testing yet, another
way to say it is that you’re pretty sure the data set
doesn’t fit the normal model
because there’s less than a 5% probability that
it does.)

If r is greater than the critical value, fail to
reject the hypothesis that the data set comes from a ND.

This doesn’t mean you are certain it does,
merely that you can’t rule it out. Technically you don’t
know either way, but practically it doesn’t matter. Remember
(or you will learn later) that inferential statistics procedures like
t tests are robust, meaning that they still work even if
the data are moderately non-normal. But if your data were extremely
non-normal, r would be less than the critical value. When
r is greater than the critical value, you don’t know
whether the data set comes from normal data or moderately non-normal data,
but either way your inferential statistics procedures are okay.

So the bottom line is, if r > CRIT,
treat the data as normal, and if r < CRIT,
don’t.

The normal probability plot is just one of many possible ways to
determine whether a data set fits the normal model. Another method,
the D’Agostino-Pearson test, uses numerical measures of the
shape of a data set called skewness and kurtosis to test for
normality. For details, see
Assessing Normality
in Measures of Shape: Skewness and Kurtosis.

Exercises for Chapter 7

Write out your solutions to these exercises,
making a sketch and
showing your
work for all computations. Then check your solutions against the
solutions page and get
help with anything you don’t understand.

Caution! If you don’t see how to start a
problem, don’t peek at the solution — you won’t learn
anything that way. Ask your instructor or a tutor for a hint. Or
just leave it and go on to a different problem for now. You may find
when you return to that “impossible” problem that you see
how to do it after all.

8
Scores on the math SAT are ND with a mean of 500 and standard
deviation of 100. What percentile is represented by a score of 735?

9
To join Mensa, you must be in the top 2% of the population on a
recognized intelligence test. Mensa accepts the SAT as a qualifying
test for membership. The mean on the combined three parts is 1500 and
the SD is 300. What’s the minimum combined
score to qualify you for Mensa?

10
Find z0.01.

11
For men’s heights, find P(x < 60″) and
write two interpretations.

12
Test scores are supposed to be ND, but this is questionable on
small tests. Here are scores from a recent quiz; do they fit the
normal model?

0.3
8.8
11.5
12
12.3
12.5
13
13.5
14.8

13
A small shop decided to stock formal wear for men and women in the
middle 90% of height. How tall must men and women be to shop there?