Johns Hopkins researchers track flu through Twitter

By Michael Johnsen

BALTIMORE — "Too sick to go to work, but not sick enough not to tweet about it" Twitter users are helping to predict flu trends. Johns Hopkins researchers recently created a Twitter search method that helps differentiate between people merely tweeting about the flu in general and those who are actually sick, the university announced Thursday. The new tweet-screening method not only delivers real-time data on flu cases, but also filters out online chatter that is not linked to actual flu infections.

Comparing their method, which is based on analysis of 5,000 publicly available tweets per minute, to other Twitter-based tracking tools, the Johns Hopkins researchers reported their real-time results track more closely with government disease data that takes much longer to compile.

“When you look at Twitter posts, you can see people talking about being afraid of catching the flu or asking friends if they should get a flu shot or mentioning a public figure who seems to be ill,” stated Mark Dredze, an assistant research professor in the Department of Computer Science who uses tweets to monitor public health trends. “But posts like this don’t measure how many people have actually contracted the flu. We wanted to separate hype about the flu from messages from people who truly become ill.”

Dredze, who also is a research scientist at the Johns Hopkins Human Language Technology Center of Excellence, led a team that in mid-2011 released one of the first and most comprehensive studies showing that Twitter data can yield useful public health information. Since then, the U.S. Department of Health and Human Services last summer sponsored a contest challenging researchers to design an online application that could track major disease outbreaks.

This winter, as the United States entered an unusually severe and early flu season, Twitter-based flu projections have drawn increasing attention. Many public tweets, such as, “I’m so sick this week with the flu,” can indicate a rise in the flu rate. Collecting enough of these tweets can help health officials gauge the scope and severity of an epidemic.

But the reliability of many computer models can be weakened by too many tweets that point to flu-related news reports and other matters not directly linked to a flu case, according to David Broniatowski, a School of Medicine postdoctoral fellow in the Department of Emergency Medicine's Center for Advanced Modeling in the Social, Behavioral and Health Sciences. “For example,” he said, “a recent spike in Twitter flu activity was caused by discussions about basketball legend Kobe Bryant's flu-like symptoms during a recent game. Bryant's health notwithstanding, such tweets do very little to help public health officials prepare our nation for the next big outbreak.”

To improve their accuracy when using tweets to track the flu, the John Hopkins team developed sophisticated statistical methods based on human language processing technologies. The methods are designed to filter out the chatter. The system can distinguish, for example, between “I have the flu” and “I’m worried about getting the flu."

Another advantage of the Johns Hopkins flu projection method is that it can produce real-time results. By comparison, the U.S. Centers for Disease Control and Prevention, which records flu-related symptoms from hospital visits, typically take two weeks to publish data on the flu’s prevalence.

To check the reliability of their enhanced system, the Johns Hopkins researchers recently compared their results to CDC data for the same period. The researchers said that during November and December 2012, their system demonstrated a substantial improvement in tracking with CDC figures as compared with previous Twitter-based tracking methods. “In late December,” Dredze added, “the news media picked up on the flu epidemic, causing a somewhat spurious rise in the rate produced by our Twitter system. But our new algorithm handles this effect much better than other systems, ignoring the spurious spike in tweets.”

The researchers have also used their Twitter data to produce United States maps that document the stark differences between last year’s mild flu season and the much higher incidence of the virus in the winter of 2012-2013.

“This new work demonstrates that Twitter posts can be used to guide public health officials in their response to outbreaks of infectious diseases,” Dredze said. “Our hope is that the new technology can be used track other diseases as well.”

For a video produced by Twitter about Johns Hopkins’ use of tweets to track public health trends, click here.