health, crowds, and data mining

Google released Google Flu Trends yesterday, which analyzes search terms for indicators of flu activity. With the onset of flu season, people start searching for keywords such as “flu vaccine” which Google detects and charts. The example below reveals that we are just a couple weeks away from a time of year that has experienced a large outbreak:

The true genius behind this system is that Google is not directly involved in data collection. Data is collected passively as searches are submitted by users. Incredibly, Google Flu Trends reliably performs flu surveillance up to 2 weeks faster than the CDC (US Center for Disease Control)! For details on Google’s tracking method, check out their blog post Tracking Flu Trends.

In a similar fashion, David Bates of Harvard Medical School is creating an epidemic surveillance system that analyzes electronic health records of several Boston-area medical centers every night. When an outbreak is in the works, not all the sick people go to one hospital. 2 might show up at one hospital and 3 at another. The next day several more go. By the time authorities are aware of an outbreak, it is weeks too late. Performing surveillance on data from several hospitals simultaneously greatly expands quantity of information available and can potentially prevent outbreaks from occurring.

Data mining in health that transcends a single unit (like a hospital) has only just begun. Personal health record systems like Google Health and Microsoft HealthVault optionally aggregate health data from a variety of sources (e.g. hospitals, clinics, insurers, pharmacies). Determining health trends is one of Google’s primary goals with this system:

Once again, while Google and Microsoft are both investing heavily in platform development and partner recruitment, the data is entered, imported, and managed by the consumer. For an interesting post on the positive and negative ramifications of Google Health, check out Tree of Knowledge.