Posted
by
kdawson
on Friday April 02, 2010 @01:24PM
from the wisdom-of-crowds dept.

Hugh Pickens writes "Kevin Kelly writes that researchers at the Social Computing Lab at HP Labs in Palo Alto have found that social media content can predict real world outcomes. In their study, the researchers built a model that used chatter from Twitter to predict accurately the box-office revenues of upcoming movies weeks before the movies were released. When the sentiment of the tweet was factored in (how favorable it was toward the new movie), the prediction was even more exact. To quantify the sentiments in 3 million tweets, the team used anonymous workers from Amazon's Mechanical Turk to rate a sample of tweets, and then trained an algorithmic classifier to derive a rating for the rest. But predicting box office receipts may be only the beginning. 'This method can be extended to a large panoply of topics [PDF], ranging from the future rating of products to agenda setting and election outcomes,' the researchers write. 'At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.'"

The Delphi method [wikipedia.org] is a systematic, interactive forecasting method which relies on a panel of experts.

Of course, in this case the "experts" are the movie-going public, who know more about their tastes in movies that anyone is Hollywood. The Delphi method depends on large panels, and n this case th researchers are using large panels indeed. Finally, the iteration is provided by the later tweeters reading earlier tweets before they post.

Putting the fact that increasing sample size does not necessarily increase the power of a predictor, you apparently didn't get the point of their method.

So method A was to simply "grep RamboIX" in these 3 million tweets. That alone already correlated to the box office outcome. However, that also catches messages like "RamboIX suxx, no way I'm going to see or even download this".

So method B was to use machine learning algorithms, combined with some initial work by human drones, to assign a degree of "positiveness" to each message about RamboIX.

While this has nothing to do with increasing sample size, it took the accuracy of the prediction to a whole new level.