Demystifying the Science and Art of Political Polling - By Mark Blumenthal

May 31, 2006

Words Worth Remembering

One of the great rewards of writing this blog is the incredible diversity of its readers -- everyone from ordinary political junkies to the some of the most respected authorities in survey research. I heard indirectly from one of the latter over the weekend regarding my post on NSA phone records issue on the CBS Public Eye blog, and I would like to share his comments. They underscore the caution we should all have in placing too much faith in any one survey question about an issue of public policy.

There are few academics more respected on the subject of writing a survey questions than Professor Howard Schuman of the University of Michigan. In 1981, along with co-author Stanley Presser (now a professor at the University of Maryland), he wrote Questions and Answers in Survey Research, a book that remains required reading for graduate students of survey methodology. After my article appeared on CBS Public Eye last Friday, someone posted a link to the members-only listserv of the American Association for Public Opinion Research (AAPOR). Schuman read it and posted some thoughts to the listserv that I will reproduce below.

But first, I thought it a little ironic that I had nearly quoted Schuman at the end of the Public Eye post, cutting the key quotation at the last minute only because my piece was already running long. It came from the seminal article, "Problems in the Use of Survey Questions to Measure Public Opinion," that Schuman co-authored with Jacqueline Scott for the journal Science in 1987 (vol. 236, pp. 957-959). The article described several experiments that compared results when similar questions were asked using an open-ended format (where respondents answer in their own words) or a closed-ended format (where respondents choose from a list of alternatives).

The experiments yielded some very big differences, but also revealed shortcomings with both formats. They demonstrated that both open and closed-ended questions have the potential to produce highly misleading results. Schuman and Scott concluded with this recommendation (p. 959):

There is one practical solution to the problems pointed to in this report. The solution requires giving up the hope that a question, or even a set of questions, can be used to assess preferences in an absolute sense or even the absolute ranking of preferences and relies instead on describing changes over time and differences across social categories. The same applies to all survey questions, including those that seem on their face to provide a picture of public opinion.

Schuman's reaction to my Public Eye piece (quoted with his permission) show that his philosophy has not changed over the years:

Mark Blumenthal's relearning of the effects of different formulations of
questions is useful, but might go even further to recognize that the
timing of a poll (and a few other features) can also produce quite
different results. Given polls on any issue, but especially a new one,
we should all keep in mind the old verse about the Elephant, a copy of
which can be found at:

May 30, 2006

Rasmussen Update

Scott Rasmussen launched a new look for his Rasmussen Reports web page last week, so I thought it might be a good time to update the chart that compares the Bush job rating as measured by the Rasmussen automated tracking survey to the results of other polls. As you'll see, the Rasmussen surveys still show a consistent "house effect" benefiting President Bush but also generally tracks the same long term trends as the average of conventional telephone surveys.

Rasmussen, as most regular readers and political web junkies know, conducts a regular national nightly tracking survey on the Bush job rating and other measures of political and economic opinion. They use what is known as an Interactive Voice Response (IVR) methodology. Respondents hear a recorded voice that asks them to answer questions by pressing the buttons on their touch tone telephones. Unlike other public pollsters who typically conduct surveys on behalf of media organizations, Rasmussen's clients are paid subscribers to his web site. However, Rasmussen has become one of the most heavily trafficked web sites for poll data because they make many results available for free.

Those who want to dig deeper into the strengths and weaknesses of the IVR methodology should review my previous description of SurveyUSA's IVR methodology and my article last fall in Public Opinion Quarterly (html or pdf).

The chart below -- created and updated especially for MP readers by our friend Charles "Political Arithmetik" Franklin -- shows how the Rasmussen reading of the Bush job rating compares to other polls since 2004.** The chart makes clear that the approval percentage on the Rasmussen poll is consistently three to four percentage points higher than other surveys (Franklin tells me that the average difference is 3.6 percentage points, with half of the cases showing a difference of between 2.6 and 4.3).

It may be too early to tell, but the recent change in Rasmussen's party weighting procedure does not appear to have reduced the house effect (see also my twopart discussion of Rasmussen and party weighting). Note that Rasmussen's new design will now regularly update unweighted trends in party identification.

[Click on the chart to see a full=size version]

The two trend lines also generally rise and fall in parallel. Franklin reports that the Rasmussen trend line is slightly less sensitive to short term changes than the average of other polls - for a one percentage point movement in the average of all polls, Rasmussen changes an average of 0.78 percentage points.

So generally speaking, the Rasmussen survey appears to pick up the same trends as other polls, a critical issue for this daily measure of political opinion. However, zoom in on the polls conducted in the last 18 months (see the chart below) and you can see that Rasmussen does not always pick up the same trends as other polls. For example, the conventional polls show a downward trend in the Bush rating beginning in mid-January (after a brief up-tick in late December and early January). The Rasmussen surveys showed no such decline until late February. In fact, Rasmussen was showing a slight increase as late as mid-February while the conventional polls were showing the beginning of a pronounced decline.

[Click on the chart to see a full=size version]

The two charts also show quite clearly the usual pattern of random sampling error that ought to discourage readers from making too much of the typical up-and-down ticks. Within any given week, Rasmussen's results show the totally random "scatter" of variation within the expected range (in this case, 3%). Rasmussen uses a three-day rolling average to smooth out the day-to-day variation, but compare Rasmussen's result from any given three-day period to the three days just before or after and you will often see random shifts of 3-4 percentage points up or down. The lesson here is to pay little attention to the day-to-day variation and focus instead on a weekly or (better yet) monthly average.

Rasmussen typically offers the same advice, although he provides the latest weekly averages for paid subscribers only. However, those curious about the latest blip up or down are advised to calculate their own weekly or monthly average and compare to the most recent week or month.

**Because Rasmussen releases a three day rolling average, we plotted the results released every third day in the graphs above.

May 26, 2006

MP on NSA Polls on CBS Public Eye

Today I accepted an invitation to contribute to the "Outside Voices" feature on the CBS News blog "Public Eye." My post -- about lessons learned from the conflicting NSA telephone records polls -- is now up. Here's my bottom line:

The lesson we could all stand to learn here is that on issues of public
policy no single question provides a precise, “scientific” measure of
the truth. The most accurate read of public opinion usually comes from
comparing the sometimes conflicting results of many different questions
on many different polls and understanding the reasons for those
differences.

Check it out. You might want to click thru just to see the non-mysterious photo.

May 25, 2006

Generic Ballot: What Does it Measure?

Several readers have asked for my opinion on the so-called "generic vote" or "generic ballot" asked on national surveys to gauge Congressional vote preference. Given the obvious inability to tailor a national question to match 435 individual House races, this question asks about party rather than candidate names: "If the 2006 election for U.S. House of Representatives were being held today, would you vote for the Republican candidate or the Democratic candidate in your district?" Although most national surveys currently show a statistically significant Democratic lead on this question (see the summaries by RealClearPolitics or the Polling Report) many analysts have questioned the predictive accuracy of these reports, particularly this far out from the election.

I raise this topic today because of a new batch of surveys from Democracy Corps that show how the generic vote compares to other measures of candidate preference in parallel surveys conducted in three congressional districts. To be totally honest, I am skeptical about the utility of the generic ballot question but have never formed a strong opinion on this controversy, largely because I ask the generic question so rarely on internal campaign studies. I want to direct readers to the Democracy Corps surveys because they provide some examples consistent with my own experience that help explain my own skepticism.

How accurate is the "generic ballot" for Congressional vote preference? An analysis by the Pew Research Center in October 2002 concluded in off-year elections, the final pre-election measure of the generic vote (presumably as reported among "likely voters") has been "an accurate predictor of the partisan distribution of the national vote," showing an average error between 1954 and 1998 of only 1.1%. Similarly but less formally, MyDD's Chris Bowers did a simple comparison last year and found that on average these results for the final polls in 2002 and 2004 came reasonably close to the final margins.

However, these final surveys are obviously taken very late in the campaign and typcially among likely voters. The surveys we are seeing now typically report results among registered voters, a distinction that according to various reports by the Gallup organization consistently tips the scale in favor of the Democrats. For example, a recent (subscribers-only) analysis reports that "Democrats almost always lead on the generic ballot among registered voters, even in elections in which Republicans eventually win a majority of the overall vote for the House of Representatives."

In another report released in February, Gallup's David Moore put it more plainly: "Our experience over the past two mid-term elections, in 1998 and 2002, suggests that the [registered voter] numbers tend to overstate the Democratic margin by about ten and a half percentage points." Similarly, taking a somewhat longer view ("most of the last decade") Gallup's Lydia Saad reported last September that "the norm" a five point Republican deficit among all registered voters that "converts to a slight lead among likely voters." Make "some adjustments" to the generic vote, she wrote, and one can "make a fairly accurate guess about how many seats each party would win."

Not everyone agrees. For example, in a recent column entitled "Don't Bet on the Generic Vote," Jay Cost argued that the problem is less about a "consistent Democrat skew," than about the weak predictive value of the question this far out from an election.

This debate will obviously continue, and while I am now paying closer attention to it, I have not done so over the years. As an internal campaign pollster, I have helped conduct hundreds of surveys for candidates for Congress that almost never included the "generic ballot." We focus instead on questions that provide the candidate's names. We have included the generic ballot question in a few rare instances, and the results have been very different from questions that include names and often quite puzzling. The experience left me and my colleagues feeling skeptical about what the generic ballot measures.

Now comes a unique release of data from Democracy Corps -- a joint project of Democratic consultants Stan Greenberg, James Carville and Bob Shrum -- that allows for a comparison of the generic ballot to more direct measurements of vote preference within three individual districts. In fact, the latest Democracy Corps release provides ordinary political junkies with a great window into the surveys that internal campaign pollsters like me typically conduct for our clients.

Having said this, I should note that in highlighting this release, I am breaking a longstanding rule I set for this blog of refraining from "comment on any race in which we are also polling." So in the interests of full disclosure: Joe Sestak, the Democratic candidate in Pennsylvania's 7th CD is a client of my firm, although I am not working on his race personally. I have chosen to break my usual rule largely because the three surveys help demonstrate two things of interest to MP readers: (1) the questions that internal campaign pollsters use to assess where a race stands and (2) a close-up view of the generic vote and what it may and may not measure. I will not comment here about the implications of the results of these surveys for any of the candidates involved, except to reiterate the advice offered before that conumsers should always read partisan polls with larger than usual grain of salt.

Back to the Democracy Corps survey. The three districts polled involve Republican incumbents that independent analysts rated, according to Democracy Corps, as falling in "middle or lowest tier in their assessments of the 50-60 races in play this election cycle." The questionnaire includes:

The so-called re-elect referendum question that includes the name of the incumbent, but not the challenger (Q32)

A ballot question that includes the names of both candidates (Q33)

An"informed" question following brief positive descriptions of both candidates in each race (Q40, Q42 & Q44)

An "informed" question following an exchange of potential negative attacks by the two candidates (Q46)

Campaign pollsters typically use both reelect referenda and informed vote questions to try to get past the tendency of well known incumbents to lead relatively unknown challengers in early horserace questions involving candidate names. Campaign pollsters typically find they get a better read on the way a race will ultimately play out with balanced "informed" questions using the same basic format as those in the Democracy Corps surveys.

Now, check the table of results above and note the difference between the generic ballot test in each district and any of the vote questions involving candidate names that follow. The results of the generic preference question in each district are very different from those based on the names of the candidates, including both the current snapshot and on informed questions that attempt to simulate the exchange of information that would occur in a campaign. Specifically, note how much lower the generic preference for the Republican candidate is in each district compred to the Republican's peformance on the "named" preference questions that follow later in the survey.

My conclusion: I am not sure what the generic vote is measuring right now, but it clearly measures something different than current candidate preference and something different again from the informed questions that try to preduct how voters will respond as they learn more about the candidates.

This is not to say that the generic vote is useless. As we get closer and closer to the election, pre-election polls will do an increasingly better job projecting the likely electorate. Moreover, voters will get increasingly better acquainted with candidates and start to make form more lasting preferences. Thus, my hunch is that as we get closer to November, the generic vote will gradually becomes a better measure of actual vote preference.

Right now, however, the value of the generic vote is mostly for comparisons with polls conducted by the same organzation using identical language at this point in prior election cycles. For example, the Pew Center did just that and concluded, "there has been only a handful of occasions since 1994 when either party has held such a sizable advantage in the congressional horse race."

But remember the limitations: These generic questions may be telling us more about voters' general attitudes about politics right now than about their candidate preference. And, as with any poll, tomorrow's opinions may be different.

May 23, 2006

SurveyDNA

Here is another interesting piece of news announced during the AAPOR conference: Automated pollster SurveyUSA plans to introduce a new service later this year called "SurveyDNA." The subscription-only service will allow subscribers to "take apart" SurveyUSA polls and re-weight and re-tabulate the results as they see fit.

For now, this press release on the SurveyUSA web site has the only available details on the new service, which promises the following features.

SurveyDNA subscribers will be able to:

Examine SurveyUSA's unweighted data and see how SurveyUSA weighting changed the data.

I emailed SurveyUSA's Jay Leve for more details, and he says that while they have yet to finalize pricing and are not yet ready to release a demo to the public, they are actively seeking beta testers at this address: betatester@surveyusa.com

Many survey research organizations routinely deposit their respondent level data in academic archives (such as the Roper Center or the Interuniversity Consortium for Political and Social Research -ICPSR) or make it available directly (such as the Pew Research Center) months after their initial release so that scholars can slice, dice and re-weight the results as they see fit. What will make this announced SurveyUSA service unique, assuming it lives up to its promise, is that users will get an immediate (e.g. "real-time") ability to manipulate and re-tabulate respondent level data using web-based software rather than having to use their own statistical package.

May 22, 2006

NPR on AAPOR

I am back today from four days at the annual conference of the American Association for Public Opinion Research (AAPOR). Given the various "day job" tasks that have accumulated in my absence, I am grateful that Marc Rosenbaum of National Public Radio, another AAPOR conference attendee, filed a helpful first person account on the NPR blog Mixed Signals. Rosenbaum's provides a nice review of his experience at the conference. He was certainly right about the rain.

As for my own experience over four very full and exhausting days, my preference is to try to share the results from some of the more interesting papers a little at a time over the upcoming summer months rather trying to summarize it all now.

But before moving on to other things, I thought I would pass along Marc Rosenbaum's helpful assessment of a panel on exit polls and theories about a stolen election in 2004:

There also was a session called, "Who Really Won the Election 2004?" This was an opportunity for the cyber-active bloggers who think the Ohio vote was somehow fraudulent to present their best case. They didn't. Their presentations were confusing, if not incoherent to this listener, and they all seemed to boil down to one complaint: namely, that the vote totals didn't match the exit polls. The problem with that argument is that if you can give good reasons why the exit polls were wrong in Ohio (and there are many), their entire complaint disappears.

As regular readers might guess, I had the same reaction. The panel on exit polls largely rehashed old arguments, although it did feature coherent rebuttals of the exit-polls-as-evidence-of-fraud theories by regular MP commenters Mark Lindeman and Elizabeth Liddle. Those new to the issue may want to review my exit poll FAQ and other recent posts on the subject, especially thesetwo from last year. Mark Hertsgaard's Mother Jonesreview of the various conspiracy theories also provides a helpful reality check on some of the other annecodotes still frequently cited by the conspiracy theorists as evidence of fraud.

May 18, 2006

That Immigration Speech Instant Reaction Poll

dHere is a less than instant reaction to an instant-reaction survey fielded Monday night following the immigration address by President George Bush. CNN conducted the survey (story, results) among Americans who reported viewing the immigration speech by President George Bush. The survey showed 79% of debate speech viewers expressing a positive reaction, only 18% with a negative reaction. This spurred John Podhoretz of the National Review Online (via Kaus) to crow:

Unless this CNN poll was an outlier, last night was anything but undramatic. It was a grand slam for President Bush. You can't do better than having 79 percent thinking favorably of your speech on a divisive issue, and a 25 percent jump in support for your policies.

Actually, as data on comparable surveys show, this speech was an outlier and you can do better. The survey, conducted by CNN's new survey partner the Opinion Research Corporation (ORC) appears to use the same basic methodology and an indentical question as used by Gallup in previous speech reaction surveys. Here are the reactions of the speech audience - using precisely the same questions - to the last eight State of the Union addresses:

The positive reaction to the Bush speech (79%) was lower than what President's Bush and Clinton received in all but two of their last eight SOTU addresses, and the "very positive" (40%) score was lower than all eight. Not surprisingly, the partisanship of the immigration speech audience was heavily Republican - 41% Republican, 23% Democrat - a result in line with the Republican skew of recent Bush SOTU addresses. I wrote about these issues in more depth back in January here, here and here.

Also of note, on Tuesday, the Washington Post's Chris Cillizza devoted his Parsing the Polls feature to a discussion of the merits of instant reaction polls. Cillizza gathered comments from Andrew Kohut of the Pew Center, Mike Traugott of the University of Michigan and the Post's own Claudia Deane. It's worth a click.

And yes, most of the caution about putting to much faith in instant reactions also applies to our recent obsession with polls on the NSA phone records issues. Keep in mind, however, that a survey of speech watchers can be especially misleading because fans of the President tend to make up a disproportionate share of the audience.

May 17, 2006

AAPOR

As noted in an earlier post, I am in Montreal, Canada tonight on the eve of the annual conference of the American Association for Public Opinion Research (AAPOR). In that context I want to pass along some personal news: In an election concluded just a few weeks ago, I was elected the Associate Chair of AAPOR's Publications and Information committee. This new status should have little impact on this blog, but I want to pass it along in the interests of full disclosure:

Some background: As its web site will tell you, AAPOR is an organization of professionals that conduct surveys in "academic institutions, commercial firms, government agencies and non-profit groups, as both producers and users of survey data." AAPOR's active membership includes many of the well-known public pollsters, but for every name you might recognize there are hundreds of other dedicated professionals in government and academia who devote their careers to the advancement of survey methodology.

My increased formal involvement in the organization should come as little surprise to regular readers who know that I depend on AAPOR's annual conferences and its official journal Public Opinion Quarterly to stay current with latest developments in survey methodology. In my volunteer capacity as associate chair (and then as full chair the following year), I will help oversee AAPOR's web site, its members-only newsletters and listserv and other efforts to communicate its mission to the general public.

Now having said that, the views expressed here are mine and mine alone, and it will stay that way. I do not speak for AAPOR or its membership, nor will I in the forseable future.

And as long as we are on the topic of the AAPOR conference, one note: Last year I tried to attend the conference, soak up all it had to offer, and still try to post to the blog at the end of each day. The end result was one very sleep deprived MP, and some blog posts that would have benefited with a few more days of reflection. So no repeat performance this year, although I do not rule out the possibility of posting a quick item or two as time allows. I'm told we have wireless internet access in the conference rooms, so you never know. However, expect posting to be on the light side this week, with plenty to follow in the weeks to come.

Post/ABC on NSA Records - Part II

As expected, Washington Post and ABC News continued conducting interviews over the weekend and released complete results Tuesday afternoon. The release last Friday morning was based on the first 505 interviews -- the complete sample consisted of 1,103 adults. The overall survey has much bad news for President Bush and the Republicans, as reported separately by the Dan Balz and Rich Morin of the Post and Gary Langer of ABC News. However, the summary of results put out by the Washington Post includes new data on the NSA questions we have been puzzling over for the last few days.

One thing is clear: The new results obtained from Friday through Saturday night are virtually identical to the original Thursday night sample. The latest release from the Post (excerpted below) includes two lines of data for each question: The first line, labeled 5/11, shows the now controversial results from the first night of interviewing. The second line, labeled 5/15, shows the data collected from Friday through Monday night. As the tables show, the results differ by at most a percentage point or two:

[Click on the image to see a full size version].

The Post summary also breaks out results for party, ideology, vote registration and religion so that we can compare the Thursday night interviews to those collected since. There are a few small differences -- the Thursday night interviews have slightly more respondents reporting an income over $50,000 annually, for example -- but none that can explain why the Post/ABC results on the NSA records questions differ so much from the subsequent poll questions asked by Newsweek and USAToday/Gallup (see previous discussion of these differences here and here).

The bottom line: These new results help rule out two widely floated hypotheses for the discrepancies. One was that "views changed that much in one day" (as Editor and Publisherspeculated Saturday). The new results show virtually no change in the Post/ABC results after Thursday. Another theory was a skewed sample resulting from the relatively small number of interviews conducted in a single evening. The new results show that on party, ideology and religion the Thursday night interviews are well within sampling error of those conducted since.

So with respect to the questions on the NSA phone records database, these new data suggest that the differences between the Post/ABC survey were mostly about question wording and question order.

May 16, 2006

NSA Records Polling - Update

I want to continue to discuss the differences in the polls on the NSA records database issue, especially as new surveys are released. We have no new polls to consider for the moment, but I do want to pass on a bit of overlooked commentary (sent by an alert reader). In a live chat yesterday morning, the Washington Post's Tom Edsall, after receiving several questions on the Post poll released last Friday, passed along this commentary:

Tom Edsall: The very intelligent and gracious Claudia Deane of our polling staff wrote the follow[ing] in reply to an inquiry:

Yes, our Thursday night poll did show a much more positive reaction than the Gallup poll conducted on Friday and Saturday evenings or a Newsweek poll conducted Thursday and Friday nights.

I don't think it's a question of the number of respondents -- we do get a larger number of respondents for our 4 day projects, but 502 is a perfectly respectable sample size. The margin of sampling error on a sample of 500 is plus or minus 5 percentage points, not much different than the plus or minus 3 percentage points on a 1,000 person sample.

More likely some combination of 1) a learning curve as people come to understand the scope of the program, 2) question wording, and 3) question order.

We're looking into all these things now to see what else we can learn. The biggest difference between opinions the two surveys seems to be among Democrats, who were substantially more negative in the Gallup poll.

The table below is my compilation of the results by party released by the three organizations (more details in previous posts here, here and here). The Newsweek and Gallup results are remarkably consistent and - as Deane noted - the differences were larger among Democrats.

One other note: As reader Karl pointed out in a comment yesterday, the Gallup survey was done entirely over the weekend, and the Newsweek survey completed on Friday night. Pollsters disagree on the merits of weekend interviewing, but many - including my own firm - recommend that clients confine interviewing to evenings before weekdays (e.g. Sunday through Thursday).

As Karl noted, I have written about this issue before, as Karl noted, and his comment got me to go back and re-read my previous posts own on this issue. On example involves a Newsweek poll from early October 2004 conducted immediately after the first debate that showed John Kerry jumping to a two-point lead over George Bush (49% to 47%). I wrote:

We have good reason to be cautious [about this result], however, as Newsweek did most of its interviews on Friday night and Saturday afternoon. According to the Newsweek release, it conducted the survey "after the debate ended" with interviewing done Thursday through Saturday.

Because so many adults are away from home on Friday nights and Saturdays, very [few] survey organizations conduct polls on only Fridays and Saturdays. In my experience, weekend-only interviewing yields respondents that are more aware of current events and political figures even after demographic weighting...

To be clear, I am urging caution about these results, not disbelief. We can have much more confidence when we see results less reliant on Friday-Saturday interviewing. I'm assuming many, many new polls will follow in the next 72 hours.

Good advice then and now. Keep in mind that other surveys confirmed the Kerry trend picked up by the Newsweek poll, but most showed Bush still ahead, albeit by a narrower margin than before the debate.

But also consider this additional bit of data that I learned about in 2005:

MP has his own doubts about the reliability of weekend interviewing but was surprised to see evidence presented by the ABC pollsters at the recent AAPOR conference showing no systematic bias in partisanship for weekend interviews. The ABC pollsters looked at their pre-election tracking surveys conducted between October 1 and November 1 of 2004, and compared 14,000 interviews conducted on weeknights (Sunday to Thursday) with 6,597 conducted on Fridays and Saturdays. Party ID was 33% Dem, 30% GOP on weeknights, 32% Dem 30% GOP on weekends, a non-significant difference even with the massive sample size.

So to be honest, I have no theory to offer regarding what impact weekend interviewing might have had on any of these surveys, except that caution is in order, as it should have been for yours truly last Friday. Ah...physician, heal thyself.

Again, we will soon have more new surveys to consider soon. So stay tuned.

"Professional pollster Mark Blumenthal started Mystery Pollster to provide better interpretation of polling results and methodology... offers much needed help to Political Wire readers" - Political Wire