Articles and Analysis

In a post
last week that presented an automated survey of North Carolina voters, we
described a three-point lead for John Edwards over Hillary Clinton (34% to 31%)
among Democrats as "statistically insignificant" and said that a six-point advantage
meant that Rudy Giuliani "runs ahead" of Newt Gingrich among Republicans (31%
to 26%). But reader "Thomas" asked
a good question:

When I look at the results for the
Republican candidates, there's a 6-point gap between Giuliani and Gingrich. But
the size of the sample is only 735. Do you think this gap between the two
candidates is really statistically more significant than the gap between the
two Democrats candidates? I'm especially concerned with the size of the
samples, and the way the interviews were conducted (automatically).

Thomas' question gets at an important issue for pre-election
polls: How do we know when a lead is really
a lead?

Let's get to the heart of the matter: The PPP survey
of North Carolina Republicans reported a "margin of error of +/- 3.6%." Presumably, Thomas doubled that margin (getting
+/- 7.2%) and compared it to the 6 point margin separating Guiliani and
Gingrich. That's the right instinct, because the reported "margin of error"
applies to each percentage separately. Looking at it that way, if you apply the
margin of error to each candidate's percentage, you get a set of ranges that
overlaps: somewhere between 27.4% and 34.6% for Giuliani and 22.4% and 29.6%
for Gingrich. So how can that be a significantly meaningful lead?

The issue gets a bit technical, but the bottom line is that the
statistical formula for a confidence interval (the formal term for "margin of
error") for the difference of two
percentages from the same sample produces something slightly smaller than just
doubling the reported margin of error. I'll let my colleague, Prof. Charles Franklin,
explain:

While [doubling the margin of
error] is the correct conclusion when there are only two possible survey
responses, it is not correct when there are more than two possible responses,
which is in fact virtually always the case. The difference between the "twice
the margin of error" rule and the correct calculation for the confidence
interval of a difference of multinomial proportions will depend on how large are
the proportion of survey responses other than that of the top two candidates
combined.

Franklin's paper** has the complete formula and more details
for those interested (see also Kish, Survey Sampling , 1965, p. 498-501), but
the bottom line is that the margin of error for a difference of two percentages
gets slightly smaller as the
percentage falling into other categories (undecided or third candidates) gets
larger. Franklin illustrates that point with the
following graphic. The horizontal blue lines represent the reported margins of
error (times two) for various sample sizes. The diagonal purple lines show how
the margin of error for the difference of two percentages declines as the total
of the percentages on which they are based ("p1 + p2") decline.

In this case, the margin of error for the 31% to 25%
Giuliani lead is +/- 5.43, which would be just barely significant. So what do we make of that? Thomas' question
implies that we should be skeptical about "barely significant" differences
given that, in this case, the survey was automated. Let's consider that.

First, we need to keep in mind that this sort of
significance test only takes into account the purely random variation that
comes from drawing a sample rather than interviewing the entire population. Other
potential errors could come from low rates of coverage or response (provided
that the missing respondents have different opinions than those interviewed) or
from the wording of the questions or their order. Unfortunately, the "margin of
error" as we know it is not a measure of total
error. So while other sources of error may not alter that "statistical
significance" the result might still be wrong. Poll consumers should keep that
in mind.

Also, the error margins calculated above assume a "simple
random sample," but most political polls involve some weighting and other minor
deviations from pure random sampling, which increase the error margin slightly.

Finally, keep in mind that the reported margin assumes a 95%
level of confidence. That is, we are 95% certain a 31% to 25% lead on simple
random sample of 735 respondents did not occur by chance alone. But there is
nothing magic about 95%, it is just the common accepted standard used by most
public opinion pollsters. If we wanted to be 99% certain, that 6 point lead
would just miss "statistical significance."

All of which brings us to a lesson: As Professor Franklin
likes to put it, we gain little by getting obsessed with "statistical
significance," except when we are a few days before an election (and even then,
it helps to look at many surveys, as we do here on pollster, rather than few). For
a survey like this one, the concept of statistical significance provides an
objective check, but it is more of a guide than a source of absolute rules.

**Charles wanted to make a few small revisions to his paper,
which we should have posted soon.

Comments

Thanks for the timely write-up! I just covered this topic yesterday in my junior-level statistics class, discussing confidence intervals for non-independent samples.

Thanks, too, for emphasizing the total error concept. Far too many people apply the statistics they learned as freshmen, and never consider the effects of poor design in survey instruments or sampling plans (because we save that stuff for the grad students).

Let me try a non-academic analysis to try and simplify things (maybe oversimplify).
First, let's clear out as much noise as possible -- two candidates, no undecideds. Candidate A leads candidate B, 53-47, MoE of +/- 3.5%. This means that 95% of the time, A is somewhere between 56.5 and 49.5 and B is between 50.5 and 43.5. Is there a situation where B is ahead? Yes, but it is less likely than A having a double digit lead. There is abotu a 7% chance that B is ahead and an 86% chance that A is ahead -- which side do you want to bet on? The overwhelming likelihood is that A is really ahead and that the lead is somewhere in the vicinity of 6. That is all we can say from any one poll. Basically, once a candidate's lead exceeds the MoE, he is likely to be ahead, and the closer it gets to double the MoE, the greater that likelihood is.
The media varies between calling a one point move in approval rating a "shift in popularity" and a small lead a "virtual tie" -- a phrase I would like banned from analysis. A one or two point lead is a lead -- maybe small, maybe insignificant, but a lead. Last year, when analyzing the Tester-Burns race on my web site, I noted that it was clear that the race was tight, but I couldn't believe that Burns was ahead because there wasn't one poll over the previous six months that showed him ahead. If he was up a point or two that should have shown up somewhere, in fact, randomly, that could show up as a 4 or 5 point lead. I just concluded that there was no way I could pick Burns to win, due to all evidence pointing the other way.
Professor Franklin is absolutely right about getting too excited about the significance of any one poll, but neither can we dismiss it -- a lead is a lead.

When deciding if someone is ahead, the proper approach is to use one sided, not two sided, confidence intervals. Consider for instance a poll where candidate A has 54% and candiadate B has 46% with a "95% confidence interval of plus or minus 3%". What we are interestred in is whether candidate A is over 50%. If the true value for candidate A was really 58% it would lie outside the plus or minus 3% interval, but obviously candidate A would still win. Thus the proper estimate is a one sided confidence interval which is minus X% to any plus value (up to 100% for candidate A). The impact of this is to make the proper value of X about
5/6 of 3% (i.e. about 2.5%) since in a one-sided confidence interval the 95% confidence interval is from -1.67 sigma up (not minus 196 sigma to plus 1.96 sigma).

Thanks a lot for such a thorough answer! But still, I keep on getting obsessed with the "statistical significance" of the margin of error. What's the reason? I live in France and, as you may know, there's presidential and legislative elections in a few months. The number of polls published is, as usual, pretty amazing. And as you may also know, the french polling industry only trust in quota sampling. Contrary to what french pollsters like to repeat publicly, this method refrain them from calculating the "margin of error". That's why I can't stop to think to this technical issue. The more polls published, the better it is to get an idea of the rankink of candidates. It's also better to focus on trends rather than on each poll separately.

Post a comment

Name:

Email Address:

URL:

Comments: (you may use HTML tags for style)

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.