Today on The Takeaway: The Home Stretch

November 7th, 2016, 8:33am by Sam Wang

Today on The Takeaway with host John Hockenberry: How certain is the Presidential race? What about the Senate? Who picks the bug? The show airs nationwide at various times starting at 9:00am Eastern (I’m scheduled at 30-40 minutes after the hour). Find a radio station near you, stream at thetakeaway.org, or listen to the segment here.

The house generic preference has had an overall downward trend since July accentuated by the latest polling swing. This in contrast to the apparent tendency of the presidential and Senate meta-margims to hue to their means and the strong positive trend exhibited by the Obama net approval rating. I have not tried to fit a trend line to any of these. Perhaps I am missing something subtle. Thoughts?

I have no idea what is going on with that, to be honest. But just to throw out some information, I looked at a couple recent polls where Clinton had a small lead and the Republicans were winning the generic house (Bloomberg and Franklin Pierce), and crosstabs were available.

There are some funny aspects to the demographic breakdowns–for example, women were only 51-49 in Bloomberg, and only 48.5-51.5 in Franklin Pierce–the first is probably going to be a bit low, and the second is way low (to steal a line, I’ll eat a bug if more men than women vote in 2016). Bloomberg was also 70% white, and Franklin Pierce was 76.3% white. Again the first is probably a bit high, and the second is way high (again, bug-worthy). Finally, the non-white margin was only +27 in Bloomberg, +28 in Franklin Pierce–both low.

I don’t want to start “unskewing”, but it is interesting to me both of these polls produced a Clinton lead despite all that. Anyway, I wonder if something strange is still happening with the samples in these polls.

It also doesn’t square well with party favorable ratings. -2% for the Dems and -22% for the Reps according to Pollster. Incumbents do have an advantage and I don’t know if these polls even consider being a voter a criteria.

Sam,
Re: the Congress and gerrymandering: When I first started coming to this site a few months ago, I thought I saw something indicating that there’s a push toward no longer being able to gerrymander congressional districts. And, I thought I had seen a link on the left sidebar somewhere on this site where a reader could go to register his/her desire to stop congressional district gerrymandering. Is this the case or was I just imagining or wishfully thinking? With Congressional districts having been gerrymandered in 2010 to give Republicans the advantage, it seems like Democrats won’t be able to regain the House majority unless gerrymandering of Congressional districts is disallowed. But, in a given presidential cycle, if there was a strong trend toward the Democrat and enough House seats up for grabs, couldn’t the Democrats theoretically regain a majority despite gerrymandering? Right now, it appears that while Democrats won’t gain a majority in the House in this election, they may gain several seats to at least bring a little more balance to the House. Or, am I misreading some of the polls, such as the HuffPost pollster?

I’ll be listening to the Takeaway on our local national public radio station (NPR) throughout the day today, as I have been doing throughout this election cycle. Hoping to catch your comments, Sam. Thanks for the heads up that you’ll be on today.

Sam – A (perhaps naive) question about your forecast: why does the “All possible outcomes” electoral vote histogram show scenarios in which Clinton gets fewer than 270 EV clearly include more than just the 2.5% probability of the left tail while your overall forecast for Clinton is >99%?

I’m feeling unreasonably nervous about that having weird and unpredictable psychological effects back on the populace, though I suppose it’s still obscure enough at the moment that it won’t do much damage.

But I think they’d argue it’s better than relying on things like the questionable exit-poll leaks in 2004.

One hypothesis- we’re seeing the incumbancy effect in action. Early on in the cycle, the respondents to the polls answer in the theoretical, generic sense. Late in the cycle, the respondents become aware of who’s actually on the ballot and respond with a specific person in mind. So for those in a R district who would prefer to vote for a D, the reality is they often have no choice or the challenger is particuarly weak. Back in the summer or early fall, they might not have known that. 1-2 weeks before the election, they’re much more aware.

I have no data to suggest this is what’s actually going on, but it makes intuitive sense to me.

Another common narrative is that as Clinton becomes more likely, voters move to split ballots in order to keep Clinton in check. This strikes me as less likely given the analyses which suggest voters split their tickets less often now than they used to.

FBI announcement late Sunday night is positive for team blue, but will not be reflected in state levels polls soon. Not sure how many one day polls will be done for tomorrow. Even if they do, there is not much of incentive to publish it on Tuesday, it will be swamped by election day coverage, “Lookkee here! long lines” “Look here some random guy who appears intimidating!”

It will be big. But not as big as it otherwise would have been had Mr. Comey not decided to play kingmaker over what wound up to be nothing. After this is all over, we need a serious review of FBI procedures. A very serious review.

The Republicans used the FBI through their Congressional oversight authority to try to torpedo a political opponent. And the director of the FBI allowed it to happen. He is either stupid, incompetent, or simply does not care about the reputation of the department that he has been charged to oversee.

If you want to blame someone, Blame Obama. He is in the direct chain of command to the director of the FBI. He allowed such a man to become FBI director. So it’s on him. And it’s up to him to fix it before Hillary becomes President.

I’m running a polling place here in San Francisco tomorrow. I’ll be there from 6 am to 10 pm if not later. I want to follow the election during the day but I can’t play audio. Any suggestions for where to find coverage on my smart phone that doesn’t involve audio?

The polling data in NH shows negative impressions of both major party candidates, but especially Trump, that are unprecedented in Presidential election polling data. It could be there’s an unusually large amount of undecided voters, especially Republicans, that can’t decide who they want to vote against the most.

A simple explanation could be that the sample sizes in the NH polls are smaller than for other states. I don’t know if the difference is large enough to account for the observed variability but a quick look at the polls at Real Clear show samples from NH that are mostly 500-700 likely voters while polls in other states almost all include more than 800.

The really pro-Trump polls seem to be from earlier in the week than the really pro-Hillary polls, for the most part–but also, they’re from different pollsters that might just have very different house effects. It seems like there’s a bimodal distribution.

Similarly in the Senate race, today’s snapshot shows Dems with about 70 % chance, but the projection for tomorrow is 79 %. Surely the bayesian model can’t be predicting such a large movement towards Dems in the polls over the next 24 hours?

Can you explain further – I’d love to understand it better. The thing is, there’s only one day left, so I don’t see how either random drift or bayesian drift have enough time to change, e.g., 70 % to 79 %.

Hmm, maybe I see it now. Sam calculates several things: (1) current EV/Senate snapshot based on polls (2) using the snapshot, the meta-margin (3). a “typical” relationship between (1) and (2), based on the history of the race.

It looks like the Nov projections are based on drifts in the meta-margin, which are converted to Clinton/Dem win probabilities using the “typical” relationship above. Very sensible weeks or months out. But when we are this close to the end of the race, there simply isn’t enough time for the meta-margin to drift very much, so the discrepancy I see must be due to the fact that the current meta-margins are somewhat atypical. I.e., the meta-margins are little lower than you would expect, based on the median EV and number of dem senate seats.

So this is another way (beyond the bayesian drift) that Nov win probabilities push things back towards a “regression to the mean”.

I asked this same question on the on the 99% thread. This was Sam’s response:

“The histogram’s not used to estimate win probability. Think of the width of that histogram as showing the standard deviation (spread of all possibilities) as opposed to the standard error (how well we know where the midpoint of the histogram is).”

FWIW, I do not believe it makes sense for the random-drift algorithm to reduce Trump’s win probability from what it is today according to the EV histogram, but that is apparently what the random-drift algorithm does (even with zero drift time).

The Bayesian algorithm has a regression-to-the-mean effect, and since HRC’s lead was greater on average than it is now, it does make sense for the Bayesian algorithm to reduce Trump’s chances from what they are today.

1) “Sam’s argument is that the fringes of the EV histogram aren’t telling us how likely those outcomes are because they don’t take [correlated and uncorrelated errors] into account.”

The fact that uncorrelated error “cancels out” does show up in the histogram. That’s why there is bell-curve shape, and the wings have small probability distributions. Wings represents the tiny, but non-zero, chance it won’t cancel out; middle of bell curve shows the higher chance it does cancel out. I don’t think we can collapse this distribution to the median, there is data there.

Correlated error is handled by the drift prediction, didn’t really think of it that way, but that’s what it does!

2) “The error of the median is hard to estimate, and Sam probably chose too restrictive of an error value this year. If he’d chosen the error that he now thinks he probably should have used, the Trump win % would be more like 99% prediction deserves”

I think Sam wanted to change the error in the drift value (0.8%) from yesterday to today (correlated error), not the polling error uncertainty in the median of MM (uncorrelated polling errors). Both those number can be calculated from the data:
standard deviation of the daily differences in MM to get you drift error, and poling uncertainty to get you error in MM.

I assumed there was a regression assumption based on what I’ve read, but I may be wrong. Perhaps I should have commented with such certainty :)

The caveat here is that I haven’t taken a long look at the code and/or all the assumptions that go into it. Though my understanding of it is that it’s actually quite basic yet effective. It finds a standard error and calculates a probability.

Now we ask whether it’s unreasonable that the probability shifts by 0.09 in one day. There is the polling error; the SD between the final polls and the final results is probably not that small. But let’s assume that’s it’s only a half a point or so.

So the question is, I think, whether or not a roughly 1 point change between the polls and the final result could produce a 9 point shift in the probability. I think is most definitely possible.

Maybe I’m misunderstanding things here. But I don’t see what is so puzzling.

The 0.8 value is definitely the estimated likely error in the meta-margin. The drift value per day is much much smaller than that and is negligible at this point. It is estimated from the historical performance of the final predicted meta-margin as compared to the actual meta-margin of the election, but the number of elections is too small to generate much confidence in the 0.8 value or to give much confidence in the correct shape of the distribution.

Thanks – I still want to understand where the origin of the uncertainty in MM is in the code. The only place there is uncertainty in MM is in EV_predictor.m
all the uncertainty in MM is given in a few lines:

That’s it. The drift value per root day is 0.4. There is nowhere else that adds error, and the predication is taken right after it.

That’s why I’m thinking he’s referring to drift per day as 0.8 (which is 2x 0.4 that is in the code maybe because of some matlab definition of sigma), but if you could point to how error in MM is added, it would be helpful.

Scott H, good catch. You are correct that this is the relevant part of the code.

This code specifies that the minimum drift is 0.5%. The 0.8% was from a previous alpha version that we’re not using. I apologize – I will need to fix this post.

I never actually thought that this parameter would become important, which is why I remembered an older value. To tell the truth, I always viewed the home stretch as incidental. This attention arises from the presence of FiveThirtyEight in the news…I wonder if we are better off without the comparison.

I hadn’t read through the code, so you are right that 0.4 is the drift factor (with the limitation of only growing with the square root of the time until the election and not being allowed to grow past 3, which is how it can be that large of a value), but there is also the max(MMdrift,0.5) in the following line (line 12), which enforces a minimum uncertainty of 0.5 on election day. So it looks like it is actually a final likely error estimate of 0.5 % rather than 0.8 %. I’m not seeing anything weird in how matlab handles the tpdf function and standard deviations that would justify that 0.4 (or 0.5) being effectively doubled to 0.8 (or 1) . Line 18 normalizes the MM – MMdrift values so that 1 MMdrift = 1 sigma, and a plot of now vs Mrange pretty clearly shows 0.5 % as 1 sigma for today.

Is there a way to use the data on who turned out in early voting to improve the forecast? While these data become available late in the game and don’t provide data on how people voted, they seem to be a very valuable source of information. Any thoughts on how the 2020 model could have an early-voting data strike-date that helps reduce uncertainty on the eve of the election?

I went to 538 a few days ago because some here were talking about a ‘twitter war’. Anyhow, looking at his tipping point chart he had NH at the center. Not only that, he had it the last state Clinton won meaning if Trump took it he would win the election. Just today he finally added NC and Florida to the Clinton column. I have no idea what polls he was looking at previously to keep Florida as Trump.

To be cynical here, I remember when I did read the site regularly he would call out some pollsters for having a biased model and then adjusting it to the norm just before the election so they could claim their polls were accurate. Just saying….

Bonus, those two states only bring a couple of points of certainty for a Clinton victory. 68.8% I think it needs a few more digits of precision to make me buy in. After all, I don’t know if he rounded up or down to that .8 ;)

In coding the probability of >270, I ran into the quandary of doing”simulations” vs. a closed form solution. I went with the closed form because the computation doesn’t take that long when there are only like 15 uncertain states; which of course may not always be the case.

Anyways, I coded up a simulation method, and used the probabilities for a given candidate in each of the 51 districts as input. Basically just looped through them all, generated a “random” number between 0-1, checked if it was greater than the district probability and summed the EVs. I did this 10000 times, and counted the times the sum was above 270. That’s my simple methodology.

538 says they simulate the outcome 10,000 times.

Based on my little sandbox, 10,000 iterations yielded about a 3% range of outcomes. I determined that 2^24 (I think it was 24) was a good trade-off between speed and accuracy, and it still took under 5 seconds.

I’m also a refugee. There are plenty of tabloid headlines on other news sites, but I find it reassuring to look at actual facts, computed by actual scientists using actual algorithms. So do a lot of people. 538 has become another junk heap, with scary headlines, sports predictions, etc. They shouldn’t have diversified.

I want to eat less than 0.5 bug per lifetime of doing this. When the probability of Trump greater than 240 EV ticked up after the first Comey announcement, that made me wonder if I was using up more of my probability allotment than planned.

Nice interview. I saw what Hockenberry was trying to do there. Now that the presidential horserace is less exciting, the media is trying to talk up the poll-aggregator horserace, the 65-percenters vs. the 99 percenters.

Sam — No matter what happens tomorrow, in addition to your statistical wisdom, many thanks for providing the nudge that got us to open our wallets and throw money at the candidates in an efficient manner. I’m a Dem and your site encouraged us to give that staggering $360K+. And I presume some Repubs gave through NRSC also. Amazing.

Quesiton Sam: On your final podcast before elction day you said to watch the margin of victory in New Hampshire and North Carolina to get an idea of which direction the race was heading. At the time you said to use NH +5-6 for Hillary and NC +2-3 for Hillary as the baselines for comparison. Would you say that these are still accurate, or should they be a little closer given the tightening/wackiness of polling out of New Hampshire?

Huffington Pollster currently has NH at ~+3 Hillary and NC at ~+2 Hillary, are these good numbers to use tomorrow? I ask because I intend to take you advice on what to watch tomorrow night.

I actually suspect neither of these may be as good indicators as they used to be–because the closing poll numbers in NH are completely incomprehensible, and Democratic early turnout in NC may be down relative to 2012 entirely because of the state government’s efforts to suppress the African-American vote. (But on the other hand, in the latter case, that may be accidentally baked into turnout models that were assuming lower AA turnout because of the absence of Obama on the ticket.)

Yeah I would say if FL ends up between even and Clinton +2-3 then the polls were pretty much correct (I count a polling error within 2% to be “accurate”, but that’s just my subjective take). Above 2-3% or Trump wins, the polls were off.

Remember, all, that the polls underestimated Obama percentage by 2% in 2012… prediction was a two percent victory and it was actually four percent. That could be because poll aggregates include polls (as I believe they do this year) that are skewed Republican, the Democrat gotv effort, enthusiasm for President Obama. But it has been real in several recent elections. Would not be a suprise if the 3-4 percent prediction for Hilary becomes the median between the two Obama victories. Which would be about 6 percent. And one wonders if that’s the current state of presidential politics in the nation: D +6%.

I think that’s the point of picking a close state or two to use as bellwethers for the polls. We don’t really have a way of knowing for sure (though there are obvious suspects) which polls are skewed or not, and attempting to guess and fix it brings us to the dirty business of “unskewing” them. But we can look at a state like Florida as the results start coming in and take the results in relation to the polls as an indication of where things may stand in terms of accuracy.

I also wanted to thank you, Dr. Wang. Your insistence on watching the data not the drama has been very reassuring. Even if Clinton loses, my wife and I will have had a few steady, non-neurotic weeks to look back on.

New state polls came out today. The polling was done by a little known Trafalgar Group. It shows Hillary is down in PA, FL and MI. A quick survey of it’s Facebook page shows its strong association with Republican party. Is this some sort of lame attempt to fire up Trump base?

I’ve been wondering if eventually we get to the point where the world gets spammed with so many tendentious polls, with the specific intention of messing with poll aggregators, that it just poisons them into irrelevance.

@Matt McIrvin I think it’s bound to happen to some extent, and it perhaps is already happening in form of online polls that Trump touts in his rallies. Although, calling actual people is a lot more difficult than putting up an online poll and stuffing it with bias results.

I notice something interesting re: NH. Several polls in field around Oct 29 to Nov 2 show ties or slight Trump leads. A plausible theory is that there was a post-Comey non-response bias occurring. In a state with no early voting, such swings could have more of an effect than say NC or FL, which have had more stable polling. Another simpler theory is that a couple of R leaning pollsters happened to strike around that time.

Recently, there are four NH polls in field as late as Nov 6. These show Clinton +2, +6, +11, +11. A four day trailing poll median would have NH at +8.5, instead of the seven day which shows +1.

This movement alone would give Clinton 272 EV at a poll median of at least 4, and thus a meta margin of 4.

I’m not suggesting a change to the model this late in the game, but it is striking that the difference between 3 days in one state can almost double the MM. Even on Dr. Wang’s more conservative estimates of polling uncertainty, a MM of 4 would seem insurmountable.