Can Twitter Predict Major Events Such as Mass Protests?

The idea that the Twitter stream is a window into the future is persuasive. But is it true?

The idea that social media sites such as Twitter can predict the future has a controversial history. In the last few years, various groups have claimed to be able to predict everything from the outcome of elections to the box office takings for new movies.

It’s fair to say that these claims have generated their fair share of criticism. So it’s interesting to see a new claim come to light.

Today, Nathan Kallus at the Massachusetts Institute of Technology in Cambridge says he has developed a way to predict crowd behaviour using statements made on Twitter. In particular, he has analysed the tweets associated with the 2013 coup d’état in Egypt and says that the civil unrest associated with this event was clearly predictable days in advance.

It’s not hard to imagine how the future behaviour of crowds might be embedded in the Twitter stream. People often signal their intent to meet in advance and even coordinate their behaviour using social media. So this social media activity is a leading indicator of future crowd behaviour.

That makes it seem clear that predicting future crowd behaviour is simply a matter of picking this leading indicator out of the noise.

Kallus says this is possible by mining tweets for any mention of future events and then analysing trends associated with them. “The gathering of crowds into a single action can often be seen through trends appearing in this data far in advance,” he says.

It turns out that exactly this kind of analysis is available from a company called Recorded Future based in Cambridge, which scans 300,000 different web sources in seven different languages from all over the world. It then extracts mentions of future events for later analysis.

It’s this data that Kallus has analysed to predict significant protests. “We find that the mass of publicly available information online has the power to unveil the future actions of crowds,” he says.

First, Kallus defines a significant protest as one that receives much more mainstream media coverage than usual.

He then analyses the mainstream coverage to see when significant protests actually occur and looks for activity in the Twitter feed that precedes the protests. If these are the predictive indicators, then it is possible to look for similar types of activity and assume that this is predictive too.

Kallus tests this idea by studying the tweets associated with the 2013 coup d’état in Egypt, which was centered around the anniversary of President Morsi’s rule, triggering significant protests during which he was removed from power by the Egyptian army.

Kallus says that evidence of the protests was clearly visible in the Twitter feed well in advance, as were the advanced protests that occurred before the anniversary. What’s more, the social media content predicted that the protests would go on for weeks beyond the anniversary.

Kallus’s conclusion that tweets can accurately predict significant protests in advance is an interesting one. There’s no question that the evidence is there to be found in the social media in retrospect. There is no shortage of people who make these kinds of predictions about historical events using historical data.

The bigger question is whether it’s possible to pick out this evidence in advance. In other words, is possible to make predictions before the events actually occur?

That’s not so clear, but there are good reasons to be cautious. First of all, while it’s possible to correlate Twitter activity to real protests, it’s also necessary to rule out false positives. There may be significant Twitter trends that do not lead to significant protests in the streets. Kallus does not adequately address the question of how to tell these things apart.

Then there is the question of whether tweets are trustworthy. It’s not hard to imagine that when it comes to issues of great national consequence, propaganda, rumor and irony may play a significant role. So how to deal with this?

There is also the question of demographics and whether tweets truly represent the intentions and activity of the population as a whole. People who tweet are overwhelmingly likely to be young but there is another silent majority that plays hugely important role. So can the Twitter firehose really represent the intentions of this part of the population too?

The final challenge is in the nature of prediction. If the Twitter feed is predictive, then what’s needed is evidence that it can be used to make real predictions about the future and not just historical predictions about the past.