TwitterStand: Separating the Wheat from the Chaff in Breaking News

Twitter is an electronic medium that allows a large user populace to communicate with each other simultaneously. Inherent to Twitter is an asymmetrical relationship between friends and followers thereby provides an interesting social network-like structure among the users of Twitter. Twitter messages, called tweets, are restricted to 140 characters and thus are usually very focused. Twitter is becoming the medium of choice for keeping abreast of rapidly breaking news. This project explores the use of Twitter to build a news processing system from Twitter tweets. The result is analogous to a distributed news wire service. The difference is that the identities of the contributors/reporters are not known in advance and there may be many of them. The tweets are not sent according to a schedule. The tweets occur as news is happening and are noisy while usually arriving at a high throughput rate.

The goal of this exploratory research project is to find effective methods for making Twitter a useful news gathering mechanism. Challenges addressed in this project include: removing the noise; determining tweet clusters of interest bearing in mind that the methods must be online; and determining the relevant location associated with the tweets.

The broad impact of this research is to make it easier to disseminate late breaking news and enhancing the distributed news gathering and reporting process. Web site (http://www.cs.umd.edu/~hjs/hjscat.html) reports results of this and related research.