Site Search Navigation

Site Navigation

Site Mobile Navigation

Number-Crunching the 2008 Election

By John Tierney January 9, 2008 7:54 amJanuary 9, 2008 7:54 am

Well, I guess I picked the wrong day to tout the Intrade futures market.

On Monday, citing the formidable record of Intrade’s bettors in calling elections, I noted their near-certainty that Barack Obama and John McCain would win the New Hampshire primary. The bettors were only half right. As one Lab reader, Tony, observed after the results came in last night, the traders apparently fell victim to the same Obama-is-unstoppable cascade that had journalists and politicos writing off Hillary Clinton’s chances.

But the traders did at least realize their mistake even before the polls closed (presumably because some of them were insiders who got hold of the closely guarded exit polls), and Mrs. Clinton’s upset victory was obvious on Intrade long before it was proclaimed on television. While the networks’ election wizards were still cautioning that the race was too close to call, the graphs at Intrade showed Mr. Obama’s shares plummeting while Mrs. Clinton’s shares soared.

I still think Intrade is a great time-saving device — it was the quickest way to know who won New Hampshire — but I’m sufficiently chastened by the traders’ fallibility to consider alternate tools for analyzing politics. I’ve sought help from Ian Ayres, an econometrician at Yale Law School and the author of “Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart.” Here’s his guide to political math:

Statistical prediction has a long tradition in political science and law. But powerfully large datasets are for the first time allowing candidates to make predictions about individual voters. In the old days, get-out-the-vote drives might have targeted particular neighborhoods, but today, using dozens of variables concerning demographics and even credit card purchases, political parties are starting to target individual households. We’re even beginning to see micro targeting of political messages.

Now the Republican party may predict not only that two neighbors are both leaning toward voting Republican, but that one cares more about the environment and the other cares more about the economy. Segregated messages mean that supporters of the same candidate may develop separate and individualized views of what the candidate stands for. Increasingly, political operatives are able to make individualized predictions about your politics — how you’ll vote and what are your hot button concerns.

But it’s also possible now for individuals to use the results of data crunching to make their own predictions about politics and government. Pauline Kim and colleagues have found a statistical algorithm that was able to predict the affirm/reversal votes of Supreme Court Justices more accurately than a panel of 83 experts (for example, in Chavez v. Martinez it bested two of three experts).

Ray Fair has been predicting presidential elections for years. In the famous “Fair Model,” the incumbent share of the two-party vote is a function of the following variables:
•VOTE = Incumbent share of the two-party presidential vote.
•PARTY = 1 if there is a Democratic incumbent at the time of the election and -1 if there is a Republican incumbent.
•PERSON = 1 if the incumbent is running for election and 0 otherwise.
•DURATION = 0 if the incumbent party has been in power for one term, 1 if the incumbent party has been in power for two consecutive terms, 1.25 if the incumbent party has been in power for three consecutive terms, 1.50 for four consecutive terms, and so on.
•WAR = 1 for the elections of 1920, 1944, and 1948 and 0 otherwise.
•GROWTH = growth rate of real per capita GDP in the first three quarters of the election year (annual rate).
•INFLATION = absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration (annual rate) except for 1920, 1944, and 1948, where the values are zero.
•GOODNEWS = number of quarters in the first 15 quarters of the administration in which the growth rate of real per capita GDP is greater than 3.2 percent at an annual rate except for 1920, 1944, and 1948, where the values are zero.

You can compare his predicted vote shares with the actual vote shares here.

Fair has noted that “the 2008 election looks very close.” A somewhat pessimistic economic forecast like the current forecast from the US model on this website leads to a modest Democratic victory. A more neutral economic forecast leads to a dead heat—clearly too close to call. Strong growth with modest inflation would lead to a modest Republican victory. (Remember that the estimated standard error is 2.54 percentage points and that added to the uncertainty of any prediction of VOTE is the uncertainty of the economic forecasts themselves.)

Fair has also created a prediction tool, where you can make predictions about the 2008 election yourself by plugging in your own estimates for growth and inflation between now and the election. (It’s not statistical, but here’s a tool to help predict which candidate has substantive positions closest to yours.)

I invite you to use the Fair Model, or any other tool, and post predictions on the 2008 election. For the past year, Intrade has made the Democratic party the favorite to win the White House (at this writing, the market gives a 63-percent chance that a Democrat will triumph in November.) On Election Night, we can see which crowd has more wisdom, the Intrade bettors or Lab readers.

Presumably since this blog appears under the “Science” heading, there’s going to be some future posts about the inherent big statistical problems with validating a model with lots of parameters tuned to historical data, and then applying it to future outcomes.

For American Presidential elections, the number of data points is pretty small for validating a complicated model such as Fair. Also there’s lots of reasons to think that the underlying social / economic / technology realities of American politics are changing. (ever more rapidly?)

Lots of careful work on these statistical problems has been done by lots of very smart scientists. I hink this work merits more attention when models like this are publicized under the heading of “Science”.

It was Yogi Berra that said, “prediction is hard, especially about the future”. There was one caucus in one state and suddenly there was a wave of momentum to carry a candidate to the nomination. The problem with waves is they are cyclical. For some reason, prognosticators are more than willing to draw trendlines far into the future even though they are looking at time frames much narrower than the period of the wave. We are hammered with predictions from multiple sources all predicting the same thing and suddenly its wrong. Thank God we have the good sense to actually have elections and not let our lives be run by polls.
I suspect climate change will run a similar course but it will take longer for the climate cycles to sort themselves out. I wish much more emphasis would go into solutions that stressed resource conservation which will benefit us economically rather than simple CO2 elimination which could shoot us in the foot for global competativeness.

When all was said and done in NH, it was glaringly obvious that the media pundits and politicos got it all wrong. Watching all the “analysis” last night, I was struck by only one thing: the media would never admit to being way off the mark. They blamed tears and hurt feelings for the fact that women got out the vote for Hillary Clinton. We’re still in Oz, folks.

I am glad the press held back for a while last night before declaring victory for the Democrats one way or the other. They are correctly responding to the pressure to get it right after the 2000 election fiasco. It has always seemed crazy to see projected victories when just 3% of a vote is in or something. Particularly when elections may be very close this year, people should just wait … and hear what the voters actually say with their votes.

Ray Fair’s model seems quite logical in its formulation, but is only useful if it provides better information than current tracking polls. Fair’s results have errors of at least 3% in three of the last five elections, meaning the model error is larger than the statistical errors in each poll, and much larger than the average error.

Take 2004, for example. Tracking polls (//edition.cnn.com/ELECTION/2004/special/polls/index.html) showed Bush’s percentage fluctuating between 46 and 54% from July through October, with an average of about 50%. Fair’s prediction was 54, while the actual vote was 51.2%. So why follow a regression model when we can just look at the polls (notwithstanding the recent Clinton/McCain surprises)?

Models are always a poor substitute for real data but real data is much more difficult to manipulate, thus the tendency to models. Frequently, the more parameters in the model the better the fit, but that doesn’t mean the additional parameters have any real significance. Models also have a hard time estimating the effects of men with ‘do my ironing’ signs and crying jags.

John Zogby analysis of the results (Huffington) … winds up with this groaner.

“On the other side, most of us did a whole lot better coming close to the numbers on the Republican side of the aisle. But this is one of those cases that remind us that pre-election polls are guides to voter attitudes and shifts. All things considered in this and other cases, we pollsters still do a creditable job.”

But don’t worry …. we’ll go through the entire “she/he is up/down” saga over and over and all the NH polling uncertainties about exit polls will be forgotten. There is precious airtime to fill.

“When all was said and done in NH, it was glaringly obvious that the media pundits and politicos got it all wrong.”

Not exactly true, the results from the Republican race made sense. In my opinion, I think McCain didn’t do quite as well as could be made out from the polls and intrade and this could be attibuted to his performance in the last two debates. Whatever, the results made sense. I just don’t see anything at all that could have caused the sea change in the Democratic race. We’ll have to wait and see what transpires in other races. I have a feeling that America is going to see again a situation where the Republican race is “right”, and on the same night in the same state, the Dem race will be “wrong”

I was on Intrade holding Obama stock at 8 pm. As soon as I heard the report from NPR that Hillary was in the lead I sold my Obama stock and bought Hillary. I didn’t have any insider knowledge, just NPR, but since Obama was supposed to win by a large margin I knew as soon as I heard Hillary was ahead that wasn’t going to happen, so I switched horses.
Made a nice bit of money, too.

It’s all well and nice to say that there were “bubbles” after the fact, but the truth is that many, many experts were believed in those prices. With the information available at the time, they were willing to risk their own money on the belief that the prices were correct, unlike most analysts and post-mortem pundits.

-Also, Intrade gave Obama 90% win. I’d like to think that we witnessed a very dramatic primary, rather than saying Intrade is no good.

Intrade also may not be a big enough market to get good predictions. (And politics is a relatively easy market to be irrational in, ’cause everyone has strong wishes as well as convictions, and the press very quickly develops a consensus.)

Fair’s model is useful, as are polls, but I don’t find either super-reliable. Overfitting can make a model look pretty impressive, and Lord knows there are factors like charisma, Electoral College structure, and Iraq that aren’t fully considered there.

FWIW, my reluctant analysis is that Clinton wins the Democratic nomination unless John Edwards drops out and endorses Obama soon. My reasoning is that in New Hampshire, Clinton led strongly among registered Democrats, and in many of the upcoming primaries only registered Dems will be able to vote. RealClearPolitics poll averages also favor Clinton, except in South Carolina.

I’d guess Edwards wouldn’t want to drop out before South Carolina because he’s counting on that (perhaps wishfully) to revive his candidacy. Results in and after South Carolina I don’t know.
It may be a factor that Edwards wants to be VP to *whoever* wins, and would therefore shy away from endorsing Obama. It will probably be a factor that Clinton will have more momentum than is reflected in the current poll averages, with wins in NH and probably NV.

Statistics say the average person has 1 breast and 1 testicle. There are some things that statistics can’t tell you for sure.

Plus, we look at polls and they change from day to day (or hour to hour). Predicting what will happen months from now involves so many variables, I suspect it’s more like economics – wave your hands and make up any explanation. (How about we have a prediction of what the polls are going to do between now and election day?) Of course, it helps to be analyzing the past, not the future.

Why does no one even consider the possibility that the polls were right, and the divergence is due to the fact that a large percentage of the votes cast were cast on Diebold voting machines? It would be interesting to see how the actual vote counts in districts featuring hand-cast ballots, and those featuring Diebold machines, compare to polling predictions. If there is a vast discrepancy, that would indicate a need for further investigation.

interesting, but important to recall here that voters decide who they think the winner should be while investors are concerned with who the winner will be. in each case, they’re influenced by such projections, but they are dispositive for the investor. it is a stupid short term decision to buy against the market. but in electoral terms, there are many reasons for voting that way.

I wish elections were fair, but it is clear that the persuasion of the Media has influenced the majority of American’s minds. The Constitution means nothing to Americans anymore and the process for candidate potentials is “Selection”, NOT “Election”. The American people feel like they have a voice and some control, but the facts laid out in history proves otherwise. Look at the geneology of all the past Presidents and organizations or clubs they belonged too. You will see some shocking similarities in bloodlines. Americans have been dumbed down over the many years by the media and it shows. It’s time to wake up people…

Only half joking when I write this, but what IF the NH primary was rigged by the Clinton machine. Consider:

1. The Clinton’s vast and deep control of the democratic party machinery

2. The strong polling data showing Obama with a surging lead in NH voters. How else could Hillary make up 12% in less than 24 hours?

3. The massive increase in new and young voters (a strong Obama demographic)

4. The chaos provided by this surge at the polling stations.

5. Haven’t fact checked this one yet, but check out the alleged disparity between hand-counted and Diebold machine votes in NH primary: //presscue.com/node/38034

I need 2-3 more compelling conspiracy theory points. Would also welcome any comments on folks that know how NH primary stations are run, the logistics of “rigging the count.” But if Bush can do it in Ohio twice, why can’t the Clintons?

Not worth the risk you say? Not clear. I’d suggest that the winner of NH democratic party has a greater than 50% of being the next president of the United States.

Only half joking here, but thought I’d kick this tire and see if any other boots can be put to it.

All the touting of the election betting markets as predictors is wildly overblown for the following reasons. The markets change predictions as the campaigns progress. Their final predictions are made on the eve of the election, and with all the benfits of publicly reported polls. Very few elections are hard to predict the day before. So that the gamblers are almost always right is hardly impressive. When the polls are wrong– as with the Dems in NH– so are the gamblers.

the new $500 tax rebate coming to every family soon
could be used to buy a device like iphone. there have
been successful experiments in on-line voting in
europe. encourage this and the next generation
to read and to write more, to think independently
before becoming a voter. to vote once a month
on issues through iphone. government will be
streamlined. cost of government and tax will be
reduced. the subject is organization. let us improve
ourselves and then search for capable and honest
person to fill government positions. the rest will follow.

Politicians and political scientists have been far ahead of tihis person for at least 50 years. Politicians use statistical mapping based on real data, not the often hysterical rantings of the populace, and certainly not the so called pundits, to design congressional districts and run elections. That is why House elections can be predicted with extreme accuracy, and incumbents win from 97% to 99% each 2 years.That is also the reason, to give you an example that can easily be checked, Cincinnati, Ohio is split into two House districts, the first and second, with the Afro-American population split between the two districts rather than being put into one, a result that gives Republicans two House seats. This is also true of many other districts throughout the U.S.More generally, it is true that neighborhoods have been selected for analysis, but it is also true that individuals within neighbors have been targeted in that any politician running in an election will look at data from past elections to determine where to put his/her emphasis before the campaign begins. Labor unions have long targeted individual members rather than entire neighborhoods, and this was based on data.There is a continued use of this type of data to determine where to continue to devote resources of all types. This is based on who actually voted in the past elections. Moreover, the use of math in politics goes back to at least 1647. I read, in the Cambridge University library, one of the first books published on that date that was called Mathematical Politics. It should also be noted that the polls showing Hillary in the lead by as much as 20 points nationally have been ignored in the rush to overly analyze the result of two outlier states from the national norms. There is not a single national pundit that does not simply rely on the latest poll, and draw absurd conclusions from them for a given moment, as do your bettors. One of the best pollsters, James Zogby, gave up making predictions based on polls after 2004 when he apologized for doing so, admiting that he was a pollster, not a seer. There are many problems, too many to enumerate here, but in the end, what bettors are using as their guide are the polls they see, and are putting in their own numbers based on the polls. What was crucial for Iowa and NH, as it will be in the national primaries, is the party affiliation of the voter. Hillary Clinton is clearly the favorite of Democratic party members nationally–and I am not a supporter of either Clinton or Obama, nor would I vote for them–and in those states where the primary is primarily among registered party members, Hillary is very likely to win. The author’s claim about the certainity of Obama and McCain win should disqualify him from any serious comment, and I do not know why I am bothering with this.

I’ve thoroughly enjoyed the furious back-pedaling the pundits are doing today after calling Hillary’s campaign all but dead after a defeat in one *Caucus*

The exit polls showing that Hillary won big in NH with older women is something we need to remember. Obama may excite the masses – particularly young voters under 30. But when election day comes in November, it’s the little old ladies that are lined up in precint after precint to vote, while those enthused 20 somethings miss their appointment with the voting booth because they are doing mid-terms, in love, out of love, can’t get a ride, or decided it’s all a crock anyhow.

People have gotten excited about charasmatic leaders before, but very few have been elected. We did have Kennedy and Camelot, but we never got Eugene McCarthy, George McGovern, Ralph Nader, Gary Hart, Bill Bradley, or Howard Dean as Presidents. Most likely, we won’t get Obama as president either.

Who got it wrong? Based on the evidence various sources were predicting both a McCain and Obama victory. They got McCain right and got the number 1 and 2 position switched on the democratic side. The only wrong is to predict one way when the evidence points to another (and perhaps not to properly consider the predictive distribution in addition to the MLE or MAP). A bad (incorrect) outcome is not the same as a bad decision (picking a winner when evidence points otherwise). Think about it, for the Dems there was;
Hillary Clinton, Barack Obama, John Edwards, Bill Richardson, and Dennis Kucinich. It is not as if the media was predicting a Kucinich or a Richardson win.

I think the only numbers that really matter are those that show that Clinton outpolled McCain by 24000 votes, and that Clinton and Obama together outpolled McCain and Romney by 63000 votes. If I were anything but a self-annointed authority (read network analyst), I would say that the Republicans have a serious problem with the actual voters of New Hampshire. The analysts may have to invent another way to justify their salaries.

What's Next

About

John Tierney always wanted to be a scientist but went into journalism because its peer-review process was a great deal easier to sneak through. Now a columnist for the Science Times section, Tierney previously wrote columns for the Op-Ed page, the Metro section and the Times Magazine. Before that he covered science for magazines like Discover, Hippocrates and Science 86.

With your help, he's using TierneyLab to check out new research and rethink conventional wisdom about science and society. The Lab's work is guided by two founding principles:

Just because an idea appeals to a lot of people doesn't mean it's wrong.

But that's a good working theory.

Comments and suggestions are welcome, particularly from researchers with new findings. E-mail tierneylab@nytimes.com.