Thursday, November 01, 2012

Two Men Say They're Jesus, One of Them Must Be Wrong

One of the more interesting questions that will get resolved in this political cycle is whether or not Nate Silver has this election correct. He has remained quite bullish on Barack Obama's campaign and has been saying, despite what would seem to be a significant amount of countervailing evidence, that Obama's chance of winning are greater than 70%. As I write, he has it at 79% and has Obama winning 300 electoral votes.

The question is whether his model will work. Ace commenter Brian, writing over at his place, has confidence that it will. In the comments section over at his blog, he said the following:

Well, the way I see it, Silver's (apparently radical) approach to predicting who people are going to vote for is...looking at every possible data set asking people who they are going to vote for.

He nailed it last time around. If he does it again, a lot of people who run their mouths for a living--and who base their predictions on things like their memories and impressions of campaigns past, intangibles like "enthusiasm", etc.--are going to have to contend with the fact that their intuitions just don't seem to matter very much. And there will be strong empirical evidence of that.

I had taken the view that you have to pay attention to the behavior of the campaigns and, from my perspective, the Obama campaign has been acting in a way that is strongly reminiscent of the way the George H. W. Bush campaign was acting in 1992 -- lotsa personal attacks, anger and scrambling around. Meanwhile, Romney's campaign has been moving forward and contesting a number of states that had been assumed unavailable, including Minnesota.

In a longish but very good article over at RedState, Dan McLaughlin makes what I think is a very important observation about the limitations of Silver's polling, using an example that I personally experienced:

Mathematical models are all the rage these days, but you need to start with the most basic of facts: a model is only as good as the underlying data, and that data comes in two varieties: (1) actual raw data about the current and recent past, and (2) historical evidence from which the future is projected from the raw data, on the assumption that the future will behave like the past. Consider the models under closest scrutiny right now: weather models such as hurricane models. These are the best kind of model, in the sense that the raw data is derived from intensive real-time observation and the historical data is derived from a huge number of observations and thus not dependent on a tiny and potentially unrepresentative sample.

Yet, as you watch any storm develop, you see its projected path change, sometimes dramatically. Why? Because the models are highly sensitive to changes in raw data, and because storms are dynamic systems: their path follows a certain logic, but does not track a wholly predictable trajectory. The constant adjustments made to weather models ought to give us a little more humility in dealing with models that suffer from greater flaws in raw data observations, smaller sample sizes in their bases of historical data, or that purport to explain even more complex or dynamic systems – models like climate modeling, financial market forecasts, economic and budgetary forecasting, or the behavior of voters. Yet somehow, liberals in particular seem so enamored of such models that they decry any skepticism of their projections as a “War on Objectivity,” in the words of Paul Krugman. Conservatives get labeled “climate deniers” or “poll deniers” (by the likes of Tom Jensen of PPP, Markos Moulitsas, Jonathan Chait and the American Prospect) or, in the case of disagreeing with budgetary forecasts that aren’t really even forecasts, “liars.” But if history teaches us anything, it’s that the more abuse that’s directed towards skeptics, the greater the need for someone to play Socrates.

Consider an argument Michael Lewis makes in his book The Big Short: nearly everybody involved in the mortgage-backed securities market (buy-side, sell-side, ratings agencies, regulators) bought into mathematical models valuing MBS as low-risk based on models whose historical data didn’t go back far enough to capture a collapse in housing prices. And it was precisely such a collapse that destroyed all the assumptions on which the models rested. But the people who saw the collapse coming weren’t people who built better models; they were people who questioned the assumptions in the existing models and figured out how dependent they were on those unquestioned assumptions. Something similar is what I believe is going on today with poll averages and the polling models on which they are based. The 2008 electorate that put Barack Obama in the White House is the 2005 housing market, the Dow 36,000 of politics. And any model that directly or indirectly assumes its continuation in 2012 is – no matter how diligently applied – combining bad raw data with a flawed reading of the historical evidence.

Emphasis mine. As regular readers of this feature know, I was a program analyst for Bank of America during the mortgage boom of the mid 2000s, and for our line of business (corporate relocation) one of my tasks was to estimate the value of future business opportunities. We used a mathematical model that frankly green-lighted every venture that a sales manager proposed, because we assumed we'd get and convert a certain number of mortgages because, well, we would. We always had.

In 2005, the model worked very, very well. I left B of A in 2006 when the office relocated -- the office left me, I guess. In the subsequent years, many of my colleagues who went out to Oregon ended up losing their jobs there when the overall market changed and -- crucially -- when B of A acquired Countrywide.

The problem with the model we used in 2005 was that we didn't have enough historical data to test the validity of the model longitudinally. And we couldn't have seen that our parent company would make what in retrospect was a ridiculous mistake in acquiring Countrywide, which was hip-deep in the mess but looked like a going concern at the time.

I don't have a lot of time for this post, so I don't want to oversimplify things, but McLaughlin sums up what I think Silver's problem is going to be in this cycle:

Poll toplines are simply the sum of their internals: that is, different subgroups within the sample. The one poll-watchers track most closely is the partisan breakdowns: how each candidate is doing with Republican voters, Democratic voters and independent voters, two of whom (the Rs & Ds) have relatively predictable voting patterns. Bridging the gap from those internals to the topline is the percentage of each group included in the poll, which of course derives from the likely-voter modeling and other sampling issues described above. And therein lies the controversy.

My thesis, and that of a good many conservative skeptics of the 538 model, is that these internals are telling an entirely different story than some of the toplines: that Obama is getting clobbered with independent voters, traditionally the largest variable in any election and especially in a presidential election, where both sides will usually have sophisticated, well-funded turnout operations in the field. He’s on track to lose independents by double digits nationally, and the last three candidates to do that were Dukakis, Mondale and Carter in 1980. And he’s not balancing that with any particular crossover advantage (i.e., drawing more crossover Republican voters than Romney is drawing crossover Democratic voters). Similar trends are apparent throughout the state-by-state polls, not in every single poll but in enough of them to show a clear trend all over the battleground states.

If you averaged Obama’s standing in all the internals, you’d capture a profile of a candidate that looks an awful lot like a whole lot of people who have gone down to defeat in the past, and nearly nobody who has won. Under such circumstances, Obama can only win if the electorate features a historically decisive turnout advantage for Democrats – an advantage that none of the historically predictive turnout metrics are seeing, with the sole exception of the poll samples used by some (but not all) pollsters. Thus, Obama’s position in the toplines depends entirely on whether those pollsters are correctly sampling the partisan turnout.

Emphasis in original. This is why I've been squawking throughout this cycle about D +7 models: you have to assume that voter enthusiasm on the Obama side is at the same level as 2008, or even exceeding it, for the toplines to make any sense. Based on my instincts and nearly 40 years of observing these things, I don't see anything like that out there. Perhaps Obama's team can squeeze every conceivable vote out of his own coalition in numbers sufficient to counter the Republican efforts, and in sufficient numbers to counteract the evidence that independents are supporting his opponent. We'll likely know in six days.

17 comments:

Anonymous
said...

Mark, I know you said you were not very convinced by the 'methodology' of unskewedpolls.com, but based upon your post today, it seems to me that your model, for lack of a better term, consists of your gut feelings and a reading that virtually all state polls are skewed left in the same manner that unskewedpolls thinks they are. Is that a correct interpretation?

2--Just to be clear, I think Silver's model probably works to the extent that Silver himself does: that based on the data at hand, there's a ~75% chance that Obama will win the election. That doesn't mean he (or I) are asserting that the election is a done deal. A 25% (or 20%, or 30%, or whatever in that ballpark) chance of Romney winning is very probable event. Less probable things happen all the time.

So if Romney wins, I'll be disappointed, and a little surprised. But I won't be shocked. And I don't think Silver will be either.

3--Criticizing the tendency of some liberal pundits to over-interpret or over-sell Silver's model is totally fair. Dismissing someone with a near-slavish devotion to empiricism as an ideologue (e.g., Scarborough) is not. That's really the kind of "backlash" to which Ezra Klein (and I) were responding.

4--It is worth asking why all the notable skeptics of the 538 model seem to be conservative.

First of all, the idea that "virtually all state polls are skewed left" is not what I'm saying. And I'm reasonably certain you know that.

And I'm also certain that you realize that state polls have varied in this cycle quite a lot; the example of the Minnesota Poll here is an excellent example -- it's dubious in the extreme to assume that the Minnesota electorate has gone from D +13 to D +5 in one month. That just doesn't happen. Either one number is ridiculous or both are. I've lived in Minnesota for 20 years now and based on my experience, D +5 makes sense.

I don't have a "model," per se. But based on what I've seen, read and experienced, there's almost no way I can believe that a D +7 or D +8 model in most states, or nationally, makes any sense, especially given what happened in 2010.

If it isn't clear, my view is that modeling is only as good as the data that goes into the model and I think most of the data are suspect, especially in this cycle.

My guess is that party affiliation in the overall electorate is probably even or maybe D +1 in this cycle and that Republicans are significantly more motivated to get to the polls than the Democrats are. Couple that with the consistent polling results that indicates that independents are skewing toward Romney by anywhere from 8-15 points, it's awfully difficult for me to see Obama winning.

To believe the polls you see, you'd have to assume that Obama is going to keep virtually everyone who voted for him in 2008, or that Democratic party identification has gone up in a significant way in the last four years. Do you honestly believe either of those things are true?

We'll have a lot of time to pick over the particulars later on, but I suspect that at least a few things will become evident when we do the post-mortem:

1) The response rate to polling in most cases is so poor that it renders a lot of polling significantly meaningless, because the pollsters are only reaching about 9% of the people they are intending to reach;

2) That a lot of people who don't intend to vote for Obama in this cycle aren't reflected in the polling; and

3) One of the missing pieces in the Republican coalition in 2008, evangelicals, will be back this cycle and that will make a difference in places like Ohio and Iowa.

For what it's worth, this post from Jim Geraghty might help you understand what's going on with the spreads between the state and national polls. Geraghty is a conservative, but he's mostly looking at numbers in this piece and what he's reporting goes a long ways toward explaining the differences.

Thanks. I do understand the distinction that you are making, but a lot of people that I talk to don't.

4--It is worth asking why all the notable skeptics of the 538 model seem to be conservative.

You're a scientist, so you're used to questioning your premises, but let's face it -- most people don't test models that tell them things they like or want to hear. That seems like the Occam's Razor reason to me.

Mark, I don't mean to put words into your mouth. I am just trying to understand where you are coming from.You dismiss the notion that virtually all state polls are leaning left. (When I describe them that way, I don't mean that they are intentionally skewed left. I am asserting that you must think that all or most state polls have an inherent left lean in this cycle that is not warranted). But there are a whole slew of numbers crunchers, Nate Silver being only one of them, who are coming to similar conclusions. In fact, Silver is one of the more conservatve statisticians to model similar outcomes with the available data. Sam Wang at the Priceton Election Consortium has the probabilities of Obama winning at 93% (random drift) and 98% (bayesian). And like Silver, Wang has a really solid track record. There are others too. Pollster, Nate Cohn, DeSart and Holbrooke, etc. Even RCP, which has a decidedly right lean, is finding Obama to have an Electoral College advantage (with tossups). The reason they are all reaching the same conclusions is that virtually all state polls, when aggregated, point to Obama being in good shape for reelection.

State polls have been far more reliable than national polls in predicting both individual state outcomes and, when aggregated, national popular vote outcomes. There is always a chance that this will not be the case in this cycle, but given the persistence of the state numbers (not state vs. national) in the battleground states, unless we are looking at a case of systemic polling failure in the battleground states, it seems probable that Obama is going to win. Probable...as in a 3 out of 4 chance. The only thing I can see that would change that is systemic polling failure, meaning that not only are some polls overstating Obama’s numbers; but that almost all state polls have been consistently biased in his favor. And you are definitely right about one thing...if that is is happening, we won't know till Election Day.

Could it be that this year, almost all the pollsters at the state level have massively whiffed? I suppose it is possible. You noted that survey response rates have fallen to historical lows, but I don't know why that would necessarilly help Obama. It's fairly well known that surveys that use automatic dialers have the lowest response rates, and are barred from calling mobile phones. But this is believed to advantage Conservative outcomes. And although there are certainly house effects in the results of different polling firms...after all, every Likely Voter poll is a model of sorts...it seems unlikely to me that pollitically-leaning pollsters would intentionally distort their results to such an extent that they would discredit themselves. And the entire point of poll aggregation is to smooth biases across the board. Lastly, if, in fact, there is systemic failure in polling, wouldn't it be just as likely that that polls are underestimating Obama’s vote share?

Could it be that this year, almost all the pollsters at the state level have massively whiffed?

It doesn't take a massive whiff to miss things. In fact, just a few points makes a big difference. In a sample of 600 responses, 30 responses can swing things significantly.

I suppose it is possible. You noted that survey response rates have fallen to historical lows, but I don't know why that would necessarily help Obama. It's fairly well known that surveys that use automatic dialers have the lowest response rates, and are barred from calling mobile phones. But this is believed to advantage Conservative outcomes.

That would have been true in 2004 or 2008, but I'm not convinced it's true any more. There are increasingly more people who have abandoned landlines and the political skew would be less pronounced now. The thought was that younger people eschew landlines and older people have them, but I know a lot of people in my demographic who don't any more.

And although there are certainly house effects in the results of different polling firms...after all, every Likely Voter poll is a model of sorts...it seems unlikely to me that politically-leaning pollsters would intentionally distort their results to such an extent that they would discredit themselves.

The problem with that theory is that only political junkies pay attention to who is doing what polls. You know the difference between Rasmussen and Zogby, but I'd wager that 90-95% of the electorate doesn't. And only political junkies keep score of such things over time.

Perhaps my experience with the consistently biased Minnesota Poll here has colored my views on this, but it seems to me that polling, especially when done by news organizations, is more about driving an agenda than in identifying trends. They've gone with Mason-Dixon this time and their second poll seemed a lot closer to reality than the first one, which had a D +13 spread. In the past, though, the Minnesota poll always overcounted Democrats by substantial margins. If the Minnesota Poll was to be believed, we'd have elected Governor Skip Humphrey in 1998. He finished third.

And the entire point of poll aggregation is to smooth biases across the board. Lastly, if, in fact, there is systemic failure in polling, wouldn't it be just as likely that that polls are underestimating Obama’s vote share?

Maybe. But again, what is the purpose of the poll?

What has been striking about this cycle is the disconnect between what I have seen the campaigns do versus what the polls say. The campaigns are constantly doing their own internal polling and they don't particularly pay a lot of attention to what the outside polls say. And if you watch what the campaigns have been doing, it's obvious that the Obama campaign has been scrambling to fix leaks in a variety of states -- Ohio, Pennsylvania, Iowa, Wisconsin. Romney's campaign, on the other hand, has been moving its resources into those states. Those sorts of things don't happen in a vacuum.

Remember, Silver made his bones in 2008 in large part because he had Obama's internal polling. My understanding is that he doesn't have that this year. It makes a big difference in what he sees.

I can also tell you this -- although it's anecdotal, I know plenty of conservatives who are refusing to answer polling. I know I'm one of them. With Caller ID I can see who is calling and I've been able to blow off polling that has been done. I took a call from a pollster who appeared to be calling from Salt Lake City one time and they have tried to call me back multiple times, but I ignore the call now.

I'm the first to admit there's a fair amount of supposition in my observations, but I've been watching these things since 1972. I've seen lots of losing campaigns over the years and this Obama campaign is behaving like a losing campaign. The only time I really couldn't tell who would win was 2004. It was obvious early on that Obama would win in 2008. This time, I don't see it.

What I remember from the last week of 2004 is Rove pretending that Bush was competitive in California, and Kerry being dumb enough to take the bait. I, too, have been following elections closely since 72. I actually went house to house raising money for McGovern that year...by myself. And I have been an avid observer ever since. And I am not picking up the same vibe as you. I see a really close contest, but Romney's last ditch attempt to jump into PA, MI and MN now smack of pure desperation. It's not quite the bluff that Rove's Cali gambit was in 2004 because Romney is dead without Ohio, and he's finally facing the fact that Obama's small but stubborn lead and ground game in that state aren't going Romney's way. So he is scrambling to cover those points. And there is so much money floating around on both sides, that both campaigns can go there. But between that and the dissembling ads regrding Jeep and GM...wow does he seem desparate. Then, having Sandy suck all the oxygen out of the campaigns for 5 days. Hell, I almost felt sorry for Mitt.

My guess is that PA is a total head fake. Mitt's buying add time in Pittsburgh, which means he is buying add time in Akron, Warren, Youngstown and Canton. So its a total whitewash. But he needs to get Michigan, or Minn/Wisconsin, and I am guessing he is second guessing himself on not having gone all in on those states from the start. It's close as hell, and I imagine were gonna be up late on Tuesday Night.

My guess is that PA is a total head fake. Mitt's buying add time in Pittsburgh, which means he is buying add time in Akron, Warren, Youngstown and Canton.

Nope -- Canton and Akron are both in the Cleveland market -- I was in Cleveland, Akron and Canton two years ago for a vacation. Akron is about 20 miles south of Cleveland and Canton is another 10-15 miles south of Akron. We stayed in Richfield, OH, which is actually closer to Akron but only about 12 miles from Cleveland. Pittsburgh is over 100 miles away from those places. And Youngstown has its own television stations. See for yourself.

Romney is buying in Pittsburgh because he thinks he can get enough votes in western and central Pennsylvania to have a shot at the state. Philadelphia is an expensive market and his chances there are limited, but the rest of Pennsylvania is potentially good territory for him, especially given Obama's record on coal.

I really think we all ought to do a little friendly (and concrete) prognosticating. At RealClearPolitics you can make an electoral map, save it, and then link to it. Mr. D, are you planning such a thread for the weekend/Monday?

I really think we all ought to do a little friendly (and concrete) prognosticating. At RealClearPolitics you can make an electoral map, save it, and then link to it. Mr. D, are you planning such a thread for the weekend/Monday?

I can reprise my earlier post. I am down for Romney winning with 295. I'll get something up for the weekend.