Model predicts flu outbreaks seven weeks out using Google search data

A new statistical model, built on search data from Google and principles from weather modeling, can predict flu outbreaks up to seven weeks in advance.

The results, published Monday in the Proceedings of the National Academy of Sciences, signify a transition in the study of infectious disease from modeling past outbreaks and events to predicting future ones.

The researchers, Jeffrey Shaman of Columbia University and Alicia Karspeck of the National Center for Atmospheric Research in Boulder, Colo., used data from the Google Flu Trends project, which keeps track of searches for flu-related topics and ties them to the geographic location of the searcher. Such data is now available for 28 countries as well as many local areas within those countries. The study focused on New York City.

The Google project made waves in 2008 when its results nearly matched those of the U.S. Centers for Disease Control and Prevention. The researchers’ model takes into account the importance of current conditions in projecting the timing of a flu outbreak.

Just as meteorologists are continually running their models with the latest weather data, the flu model regularly updates with the latest data from Google. Using the approach to analyze New York City’s flu data from 2003 to 2008, Shaman and Karspeck found they could predict the peak of flu season with data from seven weeks before it occurred.

And because the approach used data from multiple locations within New York City, the researchers were also able to estimate how much error was present in their predictions -- an important capability that is similar to how meteorologists predict rain with a particular degree of certainty.

While the results are impressive, Shaman and Karspeck believe the study is only the beginning. Just as the best weather models synthesize the results of many different models, they believe the ability to accurately predict flu outbreaks will only be improved as more teams develop their own models that can be combined with theirs. The better the modeling gets, the easier it will be for the CDC and other organizations to get vaccines and treatments to the people and places that need them most.