Twitter Posts Betray Illness

Tweets reveal whether you have influenza, according to Penn State researchers.

15 Apps For Healthy Living

(Click image for larger view and slideshow.)

In 2008, when Google began tracking flu-related search terms as a way to estimate flu infections, researchers were optimistic about the potential of the Internet as a medium for data mining. Since then, Google Flu Trends hasn't performed as well as hoped.

Now it's Twitter's turn. Scientists from Pennsylvania State University claim to have developed a way to identify Twitter posts that are viral in the medical sense of the word.

The researchers obtained information from Penn State University's Health Services about 104 individuals who had been diagnosed with influenza by a medical professional during the 2012 through 2013 flu season. They also obtained data about 122 people who had not been diagnosed with the flu during this period. After discarding the data of a handful of individuals for a variety of reasons, the researchers set out to analyze the tweets from both groups in their study to determine whether they could diagnose influenza from Twitter posts.

The researchers demonstrated that they could indeed make that determination, with greater than 99% accuracy by combining text analysis, anomaly detection, and social network analysis.

There are related projects underway: The Parkinson's Voice Initiative, for example, is an effort to detect Parkinson's symptoms from voice analysis. But voice analysis involves active user participation; Twitter data is published and awaiting data miners.

The implications from a healthcare perspective are promising, as the Penn State research suggests a further method to complement traditional epidemiological data collection.

The implications from a privacy perspective, however, are rather chilling: "It would seem that simply avoiding discussing an illness is not enough to hide one's health in the age of big data," the researchers conclude.

The Penn State researches note that although they focused on remotely reconstructing a confidential diagnosis of influenza, this technique could be used to identify diseases associated with greater social stigma like HIV. Social media now clearly has a potential social cost.

At the same time, awareness of this technique could undermine it. That was part of the problem with Google Flu Trends -- news reports about influenza and about the way researchers were trying to correlate Google search queries with influenza cases made Google Flu Trends less accurate. There was more to it than that, however.

Reports in Nature in 2013 and Science in 2014 took issue with the accuracy of Google Flu Trends data during the flu 2011-2012 and 2012-2013 flu seasons. The paper that appeared in Science, "The Parable of Google Flu: Traps in Big Data Analysis," cited problems with Google's algorithm and what the paper's authors called "big data hubris," the assumption that online data collection can replace, rather than augment, traditional data collection methods.

Google has been taking steps to improve Flu Trends, but the authors of the the Science paper, David Lazer and Gary King of Harvard, Ryan Kennedy of the University of Houston, and Alessandro Vespignani of Northeastern University, in a separate paper, "Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season," claim that the issues identified with Google Flu Trends have gotten worse.

Despite some positive effects from Google's effort to dampen anomalous data spikes, the researchers say a major issue is Google's lack of transparency and lack of communication with researchers, who want access to Google's data to check its results. "[Google Flu Trends] has not been very forthcoming with [its data] in the past, going so far as to release misleading example search terms in previous publications."

"We review the Flu Trends model each year to determine how we can improve. We welcome feedback on how we can refine Flu Trends to help estimate flu levels and complement existing surveillance systems," a Google spokesperson said via email.

What do Uber, Bank of America, and Walgreens have to do with your mobile app strategy? Find out in the new Maximizing Mobility issue of InformationWeek Tech Digest.

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful ... View Full Bio

Considering there is absolutely no way to verify any of the information collected this way, without somehow having access to that person's medical records AND they would have had to had visited a medical professional to confirm it themselves, this seems like an entirely redudant exercise.

"After discarding the data of a handful of individuals for a variety of reasons, the researchers set out to analyze the tweets from both groups in their study to determine whether they could diagnose influenza from Twitter posts."

In other words, they discarded the individuals who hadn't given any clue about their flu in their tweets. :D

My diagnosis: Crowd awareness of Google Flu Trends leads to trend results corruption. If you have the flu and know Google is watching your searches, you may modify your key word choices to maintain a little privacy.

Healthcare providers just don't get it. They refuse to see the need to fully secure their protected health information from unauthorized users -- and from authorized users who abuse their access privileges. As a result, they don't allocate enough budgetary resources for securing medical data.