Why follow polls?

August 5th, 2016, 10:00am by Sam Wang

Before the 2016 campaign season, I had reservations about re-starting up this site’s polling analysis. However, there was one big reason in favor of doing it. It has to do with your readership of the site – and how you can best influence the outcome.

The biggest reason not to re-start the site was the market for statistical politics, which looked saturated. In 2004, this strange hobby was made newly possible by an abundance of polls. Now there are many sites, most notably the NYT’s The Upshot, ESPN’s FiveThirtyEight, Electoral-vote.com, and HuffPollster (which provides our polling data). The Princeton Election Consortium may now seem redundant.

Yet that is not the case. Site traffic is the highest it has been since October 2012. Traffic for July 2016 grew over 50-fold – that’s a factor of 50 – over July 2012. Some of this arises from the bizarrity of this year’s campaign. But it feels like there’s something else at work too.

I think it might be the relative purity of PEC’s calculation. I try to make the snapshot relatively transparent; the code and data are open-source. The prediction is based on polls only, and is meant to be accurate and stable. Since the start, the November election win probability has hovered in the range of Clinton 80-85%. It isn’t designed to swing around every day; in my view, a poll-based “prediction” that moves around a lot is no prediction at all. In principle, the probability and electoral Meta-Analysis should be as boring to watch as possible – and of course eventually land on the outcome.

A practical reason to follow polls is that you can learn where to invest time and money. There is value in triaging races because they are in the bag – or because they are hopeless. Instead, donors and activists should invest in races where the outcome is uncertain, i.e. whose win probability is in the 20-80% range for either side.

That’s where the left sidebar comes in.

For Democrats, the ActBlue site goes to a list of five candidates for Senate in close races. They are all women: Ann Kirkpatrick (AZ), Catherine Cortez Masto (NV), Deborah Ross (NC), Maggie Hassan (NH), and Katie McGinty (PA). Current polls put these races within five percentage points. All are within reach for either Democrats or Republicans. Depending on how these races fall, under current polling conditions Democrats would end up with 48 to 53 seats. So control of the Senate is up for grabs.

In addition, control of the House could go either way. So the Democratic Congressional Campaign Committee is on the list.

For Republicans, the logic is exactly the same. These races could go either way, and so I provide a link to the National Republican Senate Committee.

Note the absences of some high-profile Senate races, for example Wisconsin, Florida, and Illinois. For now, these races do not look competitive. I have also omitted downticket races such as the North Carolina’s governor’s race. These are important, but in most cases, data is lacking.

>>>

I will close with a second reason to resist the siren call of this hobby. Work in the lab is going better than ever. We have exciting new projects in understanding the cerebellum’s contributions to cognitive and social function, including the neuroscience of autism. In addition, we are working on some nifty methods that may help researchers record neural activity with better resolution. All of that takes attention and time.

However, even if I am posting less often in the coming weeks, the automated calculation will continue to update multiple times a day!

128 Comments so far ↓

Off Topic–I am getting my PhD in Nutrition Science, and recently ran across some studies that relate to autism, in case you are interested. A published study reported that CNS lymphatic vessels have been identified (doi: 10.1038/nature14432), and I just attended a conference where Dr. Carolyn Slupsky (UC Davis) presented data from one of her studies, which found a change in the intestinal microbiome related to the pesticide chlorpyrifos that may be related to autism. I thought of you and your research. Is it possible that there is a gut/brain connection via the lymph system?

For me, it’s not the purity of the polls that keeps me coming back here every four years since 2008; it’s the fact that your commentary and analysis is substantively different from and often better than 538’s–especially so now that 538 is bigger than it was in 2008, and isn’t just Nate Silver blogging.

Also, hey, bonus news of the neuroscience world, which I left many years ago but remain interested in.

I hope you will find time to continue doing this. The analysis is first rate, and the writing clear and always providing useful insight. I’d echo Eric’s comments above about the “bonus” in following your work in neuroscience. Best wishes in all your pursuits.

This is a great site! Good to have multiple perspectives and models on this sort of stuff, and I think your approach comes across as level-headed. I’m glad you have the models set up to update even when you are busy, because even though stability in prediction is a good thing, I’m compulsive about checking!

Absolutely it is the straightforward *scientific* analysis without hidden, questionable “secret sauce” ingredients.

Furthermore – the transparency of analysis provides a clear on-going scientific test of what this approach can accomplish. No site with hidden analysis techniques can provide that. It would be a loss if this site were not run for every Presidential election year, and also a (lesser) loss if it was not run for each mid-term. In this sense, the market cannot become “saturated” for PEC since we need PEC to provide a standard against which all other sites are measured.

Another data-driven fan, I see. Losing the PEC would be a great sorrow. Nate has gone a bit too commercial, I fear, and too slick. E-V is the only peer in the group where the analysis is done for the pleasure of understanding what’s going on.

I like that both of them only use state polls in their electoral map calculations. Electoral-vote.com’s algorithm tends to produce swingier results early in the campaign, since they only use the single most recent poll or the mean of the past week’s polls if there are some. But it is simple and transparent.

Their commentary does tend toward conventional-wisdom punditry, whereas Sam tends to hang back from that sort of thing unless something really extraordinary is happening.

Indeed. The “rigging issue” is yet another reason for PEC to continue.
Not only is PEC the best source of truth, but its value increases as more national races are so analyzed.
If what’s needed is funding or Princeton volunteers or something else, let us help you continue this in addition to the important work on the brain.
A rational look at the election is counterintuitively straightforward in a world of bias, and bias is comprehension’s worst enemy.
PEC might have more value than you suppose!

The most recent NH polls were wildly divergent from each other; there haven’t been many, and the latest few go back to Trump’s convention-bounce period (during which there was a single poll showing Trump way ahead) and the period before that. In a situation like that, the average you get will depend heavily on your averaging technique.

Most likely, New Hampshire will not look so weird any more once there are a few more polls for PEC’s model to average.

I am not sure how high a level of confidence one can have of the NH polls. 532 is a small sample size, and it just uses adults. NH population is 1,330,608; and registered voters are 916,808.
95% confidence +-5 interval needs 384 as minimum sample size (for registered voters). Not sure what it would be for adults.

I think sample size is less of an issue here than the fact that these are mostly old polls.

7/21 was in the midst of Trump’s convention bounce period following Hillary Clinton’s toughest couple of weeks of the campaign. And even given that, the “InsideSources/NH Journal” result does seem like an outlier. Sam’s method of dealing with outliers is simply to use median-based averaging, which blunts their effect. But in this case it puts old tied polls at the median.

You too? I’m always torn between increasing the visibility and influence of Sam’s great work and inadvertently increasing the chances of losing one of the last civil and sane refuges where data still has meaning.

On a side note, as another 8-year fan i’d sure love to see some of these accolades translate into making that thermometer on the left rise a little more quickly! (for the party of your choice, of course)

I tell people too. And brought John here. Thank you again, Sam, for providing truth and explication.
PS Though I’ve been reading the suggested articles to understand why the EV estimator’s projected red line still goes so far down when only the grey area goes near that far, not the black line which signifies the median, I’m not clear on it. And if the lower red area should match the grey, why not the upper area as well? If McMullen wins a lawsuit allowing him to appear on most state ballots, or otherwise Trump loses popularity, wouldn’t the upper edge of the red prediction line become taller? Or from now on is the line expected only to contract, partly because debates (lacking formal debate rules/ fact checking) favor the GOP? How can we influence the format– a whitehouse.gov petition?

Thanks for PEC Dr. Wang.
I’ve been a reader/lurker for many years and recommend your site to everyone who desires a reliable prediction of what might be the election outcomes. Please continue your good work and I am so glad you are back this year because of the “bizarrity” of the election. You even teach us new words!

The median estimators are indispensable imo…been following since 2008…missed 2004. Your work is “very” much appreciated. Statistical data is about the only safe harbor from serial irrationality, prevarications and what has become one of the more bizarre presidential elections in history.

This site makes a big contribution to the discussion of the political races. AFAIK it was the first site to point out how, in a race with many entrants, the rules of the Republican primary allowed someone like Donald Trump, with a plurality of support, to gain a majority of the delegates. The idea was then picked up by others.

The Upshot at the NY Times, which to me seems modeled on this site, is pretty good. Imitation is the sincerest form of flattery.

That PEC is a hobby, as opposed to your profession, is a big part of the appeal to me. PEC has no need for a clickbait-y “nowcast,” no need to game out 1% hypotheticals. A calm, reasoned look at the polls and making predictions based on state-based claims as opposed to endorsements or the like is exactly what this cycle needs. Thank you so much for continuing on this year, it’s very appreciated.

I certainly hope you keep doing this! I follow your analysis for two reasons. One, because it’s better than the rest. I haven’t forgotten than you were more accurate than Nate Silver in 2008, even though he gets all the press. Accuracy matters. And two, because as you say, your polling is boring to watch: it is very easy to get all worked up, listening to mainstream media’s moment-by-moment analysis of presidential races. But it’s very stressful, and in my view, not at all healthy. PEC kept me sane in 2008, 2012, and again this year; I know I can ignore all the bloviating in the press because you’ve got the real pulse on the race.

I’d like to echo all the praise for your work on this site, Sam. But I have a substantive question. I had an argument with a friend about how big a swing the Democrats need in order to capture the house. He was arguing something like 20-30 points, due to gerrymandering, whereas your House graph has a threshold around 4-5 points. That estimate is based on someone else’s analysis around the 2014 elections, if I recall correctly. Since you’ve worked on the gerrymandering problem, is there a way to be more rigorous about how large a D advantage would have to be to offset the gerrymandering?

Not that this is particularly rigorous, but the way gerrymandering works isn’t by creating enormous advantages in districts, it’s by creating *small* advantages in as many districts as possible – making 4 districts 55-45 in favor of R while the 5th district is 70-30 in favor of D, for instance.

I don’t know exact thresholds, I assume it varies a bit from state to state, and each district has presumably had a tiny amount of drift … and candidate quality might even matter on the margin.

But in general, I would expect a “small” overall swing of around 5 points to be about what it takes to overcome gerrymandering – and create an enormous wave in the process – as opposed to an enormous one. (If it would take an enormous swing, the gerrymanderers could have gotten more seats by distributing voters a little more evenly, instead.)

538 is basically playing games with numbers. Electoral-vote.com is decent. I believe they use means instead of medians, but the big difference is that they are much more pundits than anslysts (good ones in my view).

HuffPost I have never investigated much. RealClearPolitics has good numbers, but their articles get in the way.

You provide rigorous analytics and a clear methodology and a level of conversation well beyond what any of the other sites provide.

I wish your prediction said Sanders was winning the race right now, but your state predictions and meta margin are the most accurate around. Sometimes truth has to trump desire.

So far, the Black Swans have all been kicking sand in Trump’s face. Fears that terrorist attacks would favor Trump failed to play out. The conservative candidate may in general benefit from the fear generated by terrorist attacks, but that presumes a baseline level of competence; Trump does not project a reassuring fatherly presence.
My guess is also that anyone who thinks Trump would deal better with an economic crisis is already voting for Trump.

I am still in shock over the recent polls in GA, though I am not sure that the current trend will hold up over time. Ultimately it will come down to whether or not the Black and Hispanic voters actually go to the polls in GA.

Prof. Wang, I just want to say that I appreciate the work you have done on this site! I follow it often and usually learn something new when I do. I am a practicing statistician who has been doing election modeling for quite a while. But, until I learned about your site, I had only been using means, simple regression, and so forth. Consider me a happy convert and disciple, as I have adopted use of medians.

Hi Sam,
First, thanks for all the work you do both in maintaining the site, and in making it clear and accessible even to those of us without statistic backgrounds. This is a great service.
I’m probably missing something simple here, but while the upper and lower bounds of the prediction ranges for the meta-margin, and for the wider range for the EV, move pretty much symmetrically, the upper bound for the red zone of the EV estimator is pretty much flat. Why is that?
Much appreciated!

This hobby of yours provides such an oasis of reason and calm to the larger world – thank you for your commitment to it. I always tell people who are freaking out about the possibility that Trump will be elected that so long as Sam Wang’s analysis isn’t showing that as a serious likelihood, I’m not jumping on the Freak Out Train, no matter what the political media has to say.

Linzer’s opening statements are interesting: he’s been less bullish on a Clinton win than most analysts all year, on the basis of fundamentals models. But he’s stating there that the chances of a President Trump are now about 25%, which honestly isn’t *that* far off from everyone else, on the level with which you can make these kinds of predictions.

How did Florida move to a +2 Clinton? Don’t understand how that could be the median from what I’m seeing at Huffpost Pollster. Latest poll is +1 Clinton, penultimate +6 Clinton and the earliest one (over a week old) +7 Clinton. What could I be missing here? Thanks!

PEC is drawing on (I think) a Pollster feed that generally uses the results on the three-way or four-way race (Johnson and perhaps Stein included) when available. What you posted were the two-way race results in the latest Florida polls. With Johnson and Stein included the results are Tied, Clinton +4, and Clinton +5.

In addition I think that PEC is also adding the JMC Analytics Trump +5 poll into its mix, since its polling dates are about the same as those of the oldest of the three polls you noted (the NBC/WSJ/Marist one). (When there’s a tie for the third oldest poll, all of the polls in the tie are included.) Using those four polls you get a median of Clinton +2.

I’m sorry I was unclear. I meant the CSV file here at the PEC of state polls (http://election.princeton.edu/code/data/2016_StatePolls.csv). I have no idea what’s happening behind the scenes, but I am surprised that the Huffpost website has the polls I mentioned while the PEC apparently and for whatever reason does not have them yet.

Thanks David. Looking at the PEC csv file you link to, appears as if something’s mangled, either in the scraping or in the data Huffpost pushes out. For the most recent FL polls, see two Quinnipiac polls with the same info (dates, pop, LV, live calls, etc.) outside of the results (+1 Clinton and tied). Ditto two Suffolk polls. Unless I’m missing something, I assume this can’t be right. Also, the median for these four polls would be +2.5 Clinton and not +2 Clinton.

In no way do I understand everything you write about. But I regularly check this site because:
It’s the purity of the process/analysis.
The math info that I don’t always grok but feel like I come away with a better big picture understanding of how polls *should* work.
The comments which are civil and illuminating.

To quote someone on FB today: this site is my non-prescription Xanax!!!

Thanks for keeping this going. People come to me asking their opinion of a candidate and their chances of winning. I think it wastes our time. I refer them to your site and tell them about PEC’s accuracy of past presidential elections. The fervent inevitably ignore me and the concerned thank me. Keep up with all of your good work.

“A practical reason to follow polls is that you can learn where to invest time and money. ”

Thank you for that. Nice. It’s probably my lack of understanding but when I go to ActBlue I never see this type of information laid out. Seems like everything is devoted to funding ActBlue infrastructure. That’s fine but not the same.