Twitter Can’t Yet Predict Elections

The folks over at the Washington Post must have needed copy desperately for Monday’s opinion page if they were willing to publish a piece titled, “How Twitter can help predict an election.”

In the column, Indiana University Sociologist Fabio Rojas asserts: “Twitter discussions are an unusually good predictor of U.S. House elections.” He and his fellow researchers “found a strong correlation between a candidate’s ‘tweet share’ and the final two-party vote share,” and it doesn’t even matter whether the tweets are positive or negative. Content is irrelevant.

If you believe that, would you be interested in buying the Brooklyn Bridge from me? I have a sale on it this week only.

Rojas asserts that he and his co-authors of a new paper “predicted the winner in 404 out of 406 competitive races” after looking at more than 500,000 tweets that mentioned Republican and Democratic candidates for Congress in 2010.

Normally, when political scientists or journalists write about “competitive” races they are talking about contests where at least two candidates have at least some chance of victory. Obviously, there weren’t 406 “competitive” House races in 2010 under that definition — at the Rothenberg Political Report, we rated just more than 100 House races as “not safe,” and a far fewer number in the truly competitive categories — so Rojas must be using the term to describe contested races.

Most races aren’t real competitions, of course. Relatively few House challengers run robust campaigns, and voters generally are unfamiliar with challengers.

Since House re-election rates have been over 90 percent in 19 of the past 23 elections, you don’t need polls or tweet counts to predict the overwhelming majority of race outcomes. In most cases, all you need to know is incumbency (or the district’s political bent) and the candidates’ parties to predict who will win.

So, it’s possible that tweet counts simply reflect — and are a function of — existing name recognition, which is crucial in races where one candidate has it and the other doesn’t.

But other than that, the idea that the content of tweets is irrelevant, and that it doesn’t matter if the tweets originate from inside a district or from people who cannot even vote in the race, seems to fly in the face of logic and everything that political scientists believe. Count me as skeptical about Rojas’s entire argument.

Rojas, who holds graduate degrees in sociology from the University of Chicago and an undergraduate degree in mathematics from the University of California, Berkeley, teaches in the Sociology Department at Indiana, and his department’s website describes his areas of expertise as Organizational Analysis, Political Sociology of Social Movements, Sociology of Education and Mathematical Sociology.

I looked through his 11-page curriculum vitae and found publications and papers presented on black power, activism and protests, African-American studies, organizational change and even “A Game Theory Model of Sexually Transmitted Disease Epidemics.” But there is nothing on campaigns, elections or even public opinion.

Few reputable political scientists will accept Rojas’s assertion that, “New research in computer science, sociology and political science shows that data extracted from social media platforms yield accurate measurement of public opinion.”

“No one I know is saying anything close to this,” Michael S. Lewis-Beck, the F. Wendell Miller Distinguished Professor of Political Science at the University of Iowa and a well-known authority on both American politics and election forecasting, wrote in an email to me this week.

“The most that folks are saying,” Lewis-Beck continued, “is ‘wouldn’t it be nice if these social media data were representative of the larger public, but that seems hard to prove so far.’”

Political scientists and professional pollsters I have spoken to are interested in learning what tweets may tell us about voters and campaigns, but Rojas’s opinion piece — including his view that “digital democracy” will put professional pollsters out of work — promises far too much and delivers far too little.

But while it may be easy to dismiss the opinion piece, it isn’t difficult to understand why the folks at the Washington Post would devote space to the piece. It’s provocative. It will get clicks on their website, no matter that the argument is weak.

I was disappointed that National Journal would publish a follow-up piece that was so uncritical and without comment from experienced political scientists or pollsters.

Technology and new media are hot topics, of course, and everyone in the media strives to be the first one to discover some new trend or truth. But Twitter can’t predict elections just yet, and it can’t cure cancer either.

Wow Stu – I thought you would be above. . .Talk about a man feeling threatened – even it were true, and did get 404 out of 406 correct and it doesn’t matter who many of those were actually contested – the rate of return was much higher than the Rothenberg report. . .Isn’t that what this boils down to.

I am surprised someone didn’t tell you not to comment on this – otherwise – your name itself would add validity to their thesis – even more so if what you wrote was going to look like a hack job on their work.

Mike Donatello

I wrote a much briefer commentary, from the POV of the research community. I’m continually frustrated that headlines like these are paraded for notoriety, and tacitly endorsed by both editors who should know better and professional societies that have a legitimate interest in promoting properly conducted research.

I mean, even if you don’t think his methodology makes sense, you can’t argue with results. If his model successfully predicted 404/406 races, then clearly it does have predictive power.

mabramso

Not true. In 1989, it rained the first two days of the year in CA where I lived at the time. I could have predicted it would rain every day the rest of the year — I had a model that predicted well thus far. The problem is that I lived in the Mojave Desert!

Whether Stu rates these races as Safe or not, roughly 400 of the 435 races are SUPER-EASY to predict each year, and another 15 are not difficult if you are paying any attention at all. So there are 20 left that are toss-ups. All you have to do to reach 404/406 is NOT make predictions about the toss-ups. Heck, I could predict 406/406 EVERY year if I choose to ignore tossup races — and it’s pretty easy.

http://knoesis.org/amit/ Amit Sheth

Trying to forecast based on simplistic measures like frequency and popularity, without understanding coverage (incl demographics), participation, etc. is bound to disappoint. Here is a bit nuanced look (Are Twitter Users Equal in Predicting Elections? Insights from Republican Primaries and 2012 General Election): http://j.mp/Vz3v0E and some examples of insights are are possible to get: http://twitris.knoesis.org/election/insights/13 htthttp://twitris.knoesis.org/election/insights/13

John Allegro

The collectivist tendencies imported by Stuart Chase are, in all essentials, the German definition of socialism offered by Werner Sombart when he described socialism as a regulated, planned, and controlled system.