Privacy Invasion or Innovative Science?

Academia, social media data, and privacy

A practical discussion of how potentially revolutionary, yet ethically questionable data---such as that from facebook---is currently being handled in academia.

With every day that passes, the users of social media websites are providing scientists with ever-richer, larger datasets on human behavior. At the same time, machine-learning techniques allow us to exploit this data to accurately predict who these users are and how they will behave in the future. I begin this talk by outlining the need for public datasets containing rich information on individuals and their social relations. I then show how in practice, distribution and use of such datasets by academics is awkward and confused. I conclude with some consideration of how "enhancing" datasets by, for example, inferring missing or hidden data using machine learning classifiers, creates yet another ethical grey-zone.