Survey Sample Sizes and Margin of Error

The most accurate survey of a group of people is a vote: Just ask
everyone to make a decision and tally the ballots. It's 100%
accurate, assuming you counted the votes correctly.

(By the way, there's a whole other topic in math that describes
the errors people can make when they try to measure things like that.
But, for now, let's assume you can count with 100% accuracy.)

Here's the problem: Running elections costs a lot of money. It's
simply not practical to conduct a public election every time you want
to test a new product or ad campaign. So companies, campaigns and
news organizations ask a randomly selected small number of people
instead. The idea is that you're surveying a sample of people who
will accurately represent the beliefs or opinions of the entire
population.

But how many people do you need to ask to get a representative
sample?

The best way to figure this one is to think about it backwards.
Let's say you picked a specific number of people in the United States
at random. What then is the chance that the people you picked do
not accurately represent the U.S. population as a
whole? For example, what is the chance that the percentage of those
people you picked who said their favorite color was blue does not
match the percentage of people in the entire U.S. who like blue best?

Of course, our little mental exercise here assumes you didn't do
anything sneaky like phrase your question in a way to make people
more or less likely to pick blue as their favorite color. Like, say,
telling people "You know, the color blue has been linked to
cancer. Now that I've told you that, what is your favorite color?"
That's called a leading question, and it's a big no-no in surveying.

Common sense will tell you (if you listen...) that the chance that
your sample is off the mark will decrease
as you add more people to your sample. In other words, the more
people you ask, the more likely you are to get a representative
sample. This is easy so far, right?

Okay, enough with the common sense. It's time for some math.
(insert smirk here) The formula that describes the
relationship I just mentioned is basically this:

The margin of error in a sample = 1 divided by the
square root of the number of people in the sample

How did someone come up with that formula, you ask?
Like most formulas in statistics, this one can trace its roots back to
pathetic gamblers who were so desperate to hit the jackpot that
they'd even stoop to mathematics for an "edge." If you
really want to know the gory details, the formula is derived from the
standard deviation of the proportion of times that a researcher gets
a sample "right," given a whole bunch of samples.

Which is mathematical jargon for..."Trust me. It works,
okay?"

So a sample of just 1,600 people gives you a margin of error of
2.5 percent, which is pretty darn good for a poll.

You've probably heard that term — "margin of error" — a
lot before. Reporters throw it around like a hot potato — like if
they linger with it too long (say, by trying to explain what it
means), they'll just get burned. That's because many reporters have
no idea what a "margin of error" really represents.

I gave you the math up above. But let's talk about what that math
represents. When you do a poll or survey, you're making a very
educated guess about what the larger population thinks. If a poll has
a margin of error of 2.5 percent, that means that if you ran that
poll 100 times — asking a different sample of people each time — the
overall percentage of people who responded the same way would remain
within 2.5 percent of your original result in at least 95 of those
100 polls.

(WARNING: Math Geek Stuff!)Why 95 times out of 100? In
reality, the margin of error is what statisticians call a confidence
interval. The math behind it is much like the math behind the
standard deviation. So you can think of the margin of error at the 95
percent confidence interval as being equal to two standard deviations
in your polling sample. Occasionally you will see surveys with a 99-percent confidence interval, which would correspond to three standard
deviations and a much larger margin of error.(End of Math Geek
Stuff!)

If a poll says that 48 percent of registered voters surveyed are
likely to vote for Candidate A and 46 precent of those voters plan to
cast their ballots for Candidate B, you'll likely hear reporters
saying that Candidate A has a two-point lead. Now that's true in
this poll, but given the likely margin of error, a mathematician
wouldn't say that Candidate A has a two-point lead in the actual
race. There's just too much of a chance that Candidate A's true
support is enough less than 48 percent and the Candidate B's true
support is enough higher than 46 percent that the two might actually
be tied, or maybe even that Candidate B might have a slight lead. You
can't say for sure on the basis of a single poll with a two-point
gap.

If you want to get a more accurate picture of who's going to win
the election, you need to look at more polls. Just as asking more
people in one poll helps reduce your margin of error, looking at
multiple polls can help you get a more accurate view of what people
really think. Analysts such as Nate Silver and Sam Wang have created models that
average multiple polls to help predict which candidates are most
likely to win elections. (Silver got his start using baseball
statistics to predict future on-field performance, which goes to show
that numbers can help you predict things other than elections.) In
2012, Silver was 50-for-50 in predicting state results in the
presidential election, based on his model for averaging publicly
available polls.

Now, remember that the size of the entire population doesn't
matter when you're measuring the accuracy of polls. You could have a
nation of 250,000 people or 250 million and that won't affect how big
your sample needs to be to come within your desired margin of error.
The Math Gods just don't care.

Sometimes you'll see polls with anywhere from 600 to
1,800 people, all promising the same margin of error. That's because
pollsters often want to break down their poll results by the gender,
age, race or income of the people in the sample. To do that, the
pollster needs to have enough women, for example, in the overall
sample to ensure a reasonable margin or error among just the women.
And the same goes for young adults, retirees, rich people, poor
people, etc. That means that in order to have a poll with a margin of
error of five percent among many different subgroups, a survey will
need to include many more than the minimum 400 people to get that
five percent margin in the overall sample.