Use of Social Media to Monitor and Predict Outbreaks and Public Opinion on Health Topics

Alessio Signorini

The world in which we live has changed rapidly over the last few
decades. Threats of bioterrorism, influenza pandemics, and
emerging infectious diseases coupled with unprecedented population
mobility led to the development of public health surveillance
systems. These systems are useful in detecting and responding to
infectious disease outbreaks but often operate with a considerable
delay and fail to provide the necessary lead time for optimal
public health response.

In contrast, syndromic surveillance systems rely on clinical
features (e.g., activities prompted by the onset of symptoms) that
are discernible prior to diagnosis to warn of changes in disease
activity. Although less precise, these systems can offer
considerable lead time. Patient information may be acquired from
multiple existing sources established for other purposes,
including, for example, emergency department primary complaints,
ambulance dispatch data, and over-the-counter medication sales.
Unfortunately, these data are often expensive, sometimes difficult
to obtain and almost always hard to integrate.

Fortunately, the proliferation of online social networks makes
much more information about our daily habits and lifestyles freely
available and easily accessible on the web. Twitter, Facebook and
FourSquare are only a few examples of the many websites where
people voluntarily post updates on their daily behaviors, health
status, and physical location.

In this thesis we develop and apply methods to collect, filter and
analyze the content of social media postings in order to make
predictions. As a proof of concept we used Twitter data to predict
public opinion in the form of the outcome of a popular television
show. We then used the same methods to monitor and track public
perception of influenza during the H1N1 epidemic, and even to
predict disease burden in real time, which is a measurable advance
over current public health practice. Finally, we used location
specific social media data to model human travels and show how
this data can improve our prediction of disease burden.