Statistics for Practical
People

PART VI - How are Standard Deviation
and Standard Error related?Published July 1989

Editor's Note:
Because of difficulties in displaying a square root
symbol on the web, we have used exponential notation.
Whenever you see X0.5 we are expressing
the square root of X.

In the
previous parts of this series we have talked about what Standard
Deviation (SD) and Standard Error (SE) really mean. The formulas
for actually calculating them are not really important in
this day and age. How to use them and what they
mean becomes more important all the time.

A brief
review:

SD is how spread out THINGS in the population
are, and this is calculated (somehow) from the
data in your sample. It is useful in describing
the population itself.

SE is how spread out the SAMPLE MEAN will
be around the true population mean. It is useful
in describing how close your cruise will be to
the right answer.

HOW ARE STANDARD DEVIATION AND STANDARD ERROR RELATED?
It so happens that there is a very simple relationship
between SD and SE. You can calculate SE by the following
formula:

SE = SD /n0.5

Now this is really quite a simple and beautiful little
formula. It should be considered THE most important statistical
formula in all of statistics.

It starts out with the way the world IS (that's SD
- how spread out the data are, and there is virtually NOTHING you can
do about it).

It then talks about how hard you WORK (that's the
sample size "n"), and you ARE in
control of that. Please note that is how hard you work,
not how smart).

It then tells you HOW GOOD your average is likely
to be with that amount of effort (the Standard
Error).

All this happens with a simple little formula that anybody
can understand and remember. It only gets ugly and
complicated looking when you have multiple layers of sampling
or lots of strata mixed together. The IDEA is simple and easy
to grasp.

WHAT IS A "t-TABLE"?
We have talked about using a Z-table to tell how far to
go when creating a confidence interval. You do this when
you know what the standard deviate REALLY is. If
you don't KNOW the standard deviate (and you
hardly ever do), you can still estimate it from the data
you gathered in your sample. The complication is that you won't
get it quite right. In general you will slightly
underestimate it, particularly if you have a small
sample. Since this is the case, you need to go out just a
little bit farther in each direction than a Z-table
would tell you.

Luckily, some nice person has figured this out, and
published another table called the "t-table".
It is very close to the Z-table except in the very small
sample sizes. In fact, after a sample size of about 30 or
so there is virtually no difference (which just means
that you are now getting a very good estimate of the
standard deviate). You often hear in statistics that
"after a sample size of 30 it is correct to use the
Z-table". This isn't really true, but there is so
little difference between the tables that nobody worries
about doing it.

The t-table value depends on the sample size you have
used to estimate the standard deviation. These tables sometimes
use a special term for "the sample size minus
1" (n-1). They call this the "degrees of freedom".
At any rate, the t-table just tells you how many standard deviates
to go, each way, when you are making a confidence interval.
An example of such a t-table is shown below.

A COMPLETE EXAMPLE
Suppose we have just done a sample of 21 weights, and we
calculate that the mean is 200 pounds. We want to describe
how spread out the population is, so we would calculate the
STANDARD DEVIATION from the data and find it to be 25
pounds. Now if the population itself is normally distributed
then we can make a confidence interval for the THINGS in
the population. Let's say we want a 95% confidence interval. How
many standard deviations do we go each way? We look in
the t-table under sample size 21 (or 20 degrees of freedom
depending on how the table is labeled) and get the t
value of 2.086.

We now know that 95% of the things in the population
are within ±2.086 standard deviations of the sample mean.
What is that in pounds? 2.086 * 25 pounds = 52.15 pounds
each way. The "confidence interval" is therefore
200 pounds ±52 pounds (between 148 and 252 pounds if you prefer to
state the end points).

And how close is our SAMPLE MEAN to the true
population mean? Well, even if the population was not normally
distributed we can still use its SD to estimate how
widely spread the sample means will be. We know that sample
means are always normally distributed. We need to
calculate the STANDARD ERROR, and we do this using the SE
formula. 25 / 210.5 = ±5.45 pounds. Suppose
we have decided to get a 90% confidence interval
for the sample mean. We have to go out 1.725 standard
errors each way according to the t-table, and in units
this would be 1.725 * 5.45 = ±9.4 pounds. We can now estimate
that the true population mean is 200 ±9.4 pounds
(or 190.6 to 209.4 pounds).

If you can follow the logic of this example you will
be able to do the most practical parts of statistical analysis.
It may take practice to do it quickly, but these are the main logical ideas
you need to understand. When you read a statistics book
there are a lot more terms you run into, but many of them
are just slightly different ways of saying the same
thing. Next time we will try to sort out a few of these so
they don't get in your way. Once you see the pattern you
will realize that SD and SE are really ALL you need to
worry about. The business of how to create a confidence
interval, and understanding standard deviation and standard
error, are the longest and hardest part of this series.

From now on it gets easier. Remember -- this
statistics business has to do with somebody's
MONEY and SWEAT, and if you can understand some of the
basics, you might save a lot of each. It's worth the effort.