The coming ‘ethical’ crisis? Data scraping young people’s lives

In 2007, Savage and Burrows predicted a ‘coming crises of empirical sociology’ as mainstream sociological methods were muscled out by new commercial data analytics techniques. Reflecting on their paper nearly a decade later, they admit that the scale of disruption caused by ‘big data’ (as it is now known) was unimaginable, even at that moment in time (Burrows & Savage 2014).

Our conceptualisation of ‘data’, and the language we use to describe it, have been irreversibly changed by the arrival of big data. For a new breed of data analysts, any dataset that is less than ‘total’ or ‘complete’ has become ‘small data’. The very language of data has been transformed by a new lexicon of analytics, real-time, tracking and scraping etc. However, remaining relatively unchanged is our language for talking about the ethics of ‘big data’.

This short piece focuses on one particular aspect of big data’s methodology – ‘data scraping’ – and the ethical questions it raises for researching young people’s lives through digital data.

According to Marres and Weltevrede (2013), scraping is an ‘automated’ method of capturing online data. It involves a piece of software being programmed (e.g. given instructions) to extract data from a particular source and creating a ‘big’ dataset that would be too onerous to capture manually.

Over the last few years, ‘scraping’ has been much lauded as a means by which data capture can be ‘scaled up’ to new analytical heights, particularly in relation to one of the most popular sources for big data capture – social media. Whilst ‘scraping’ techniques have advanced, a much slower trend has been the discussion of what ethical frameworks and language we need for robustly interrogating these techniques.

As one of the largest constituent users of social media, young people are a particularly relevant group within these debates. Data scraped from social media inevitably captures the conversations, thoughts and expressions of young people’s lives, even if as an ‘inadvertent’ by-product of research.

In 2010, Michael Zimmer reported on a study that had captured the profile data of a whole cohort of American college students on Facebook. The data had been taken without permission and a failure to appropriately anonymise the data had seen the identities of the students revealed. Zimmer’s article provided a robust critique of a growing data capture trend where all data not hidden by privacy settings was seen as consensually ‘public’, and available for analysis.

The ethical lessons learnt from incidents such as these have tended to focus more on greater care for data anonymization and security, and less on issues of consent and intrusion. Again, Zimmer (2010) has been particularly vocal in refuting claims that techniques such as anonymization through aggregation are ‘enough’[1].

How do these debates connect with young people’s social media data? Television programmes such as Teensand The Secret Life of Students[2] have played a significant role in perpetuating the idea that young people are less concerned than adults about having their data made public. However, studies have repeatedly shown that young people are highly concerned about privacy online (boyd, 2014; Berriman & Thomson, 2015), and the disclosure of their digital data (Bryce & Fraser, 2014).

A little while ago, I became aware that ‘scraping’ has a colloquial meaning in some UK secondary schools. According to Urban Dictionary (think Wikipedia for slang terms and phrases), the term ‘scrape’ is used to describe:

a person intruding on something. To say that one has come out of nowhere and intruded on a conversation.

[E.g.] ‘two people have a conversation’, ‘another person listens in’
one person out the original two people says “scrape out” to the other person.

This colloquial definition makes reference to ‘scraping’ as an unwelcome form of eavesdropping and intrusion on a private conversation. In the context of these ethical discussions, this definition seems particularly apt. It emphasises that privacy is a concern for young people, and that unsolicited ‘scraping’ of private conversations is ethically and morally contentious.

At present, there is a lack of serious ethical debate about the scraping of young people’s digital data. The presumption of public-as-consent doesn’t cut it. We need a new ethical language for talking about these issues, and young people’s voices need to be represented in these debates.