There has been a data rush in the past decade brought about by online communication and, in particular, social media (Facebook, Twitter, Youtube, among others), which promises a new age of digital enlightenment. But social data is compromised: it is being seized by specific economic interests, it leads to a fundamental shift in the relationship between research and the public good, and it fosters new forms of control and surveillance.

Compromised Data: From Social Media to Big Data explores how we perform critical research within a compromised social data framework. The expert, international lineup of contributors explores the limits and challenges of social data research in order to invent and develop new modes of doing public research. At its core, this collection argues that we are witnessing a fundamental reshaping of the social through social data mining.

Our chapter explores some of the analytical and ethical issues involved in using big data for social media research, drawing on our Mapping Movements work. You can find our presentation from the colloquium here.

The second day of the Compromised Data colloquium was fascinating, and I’m looking forward to chasing down further work from many of the presenters.

The opening session started with Lisa Blackman discussing experiments with repurposing commercial software tools to explore contagion in complex environments, drawing on controversies around psychic research of the nineteenth century (including work on automatic writing). I liked the idea of ‘haunted data’: the ways in which research takes on a new life after publication, and may begin to be circulated by non-academic networks in ways that the original researchers never intended.

Ingrid M. Hoofd raised some interesting questions about the ways in which academic institutions, researchers, media, and activists may be becoming implicated in problematic representational regimes in their use of social media. She discussed The Guardian’s Reading the Riots project, which she argued simultaneously made claims to build an empirically-based analysis of the reasons behind the riots while also being based in, and reinforcing, existing stereotypes around class and race.

Yuk Hui‘s work on self-archiving the massive amounts of digital objects which we generated notes the difference between merely storing, and archiving, this material: archiving requires the additional of contextual framing. The theoretical framework of Hui’s work is accompanied by attempts to design self-archiving tools which will allow them to create physical objects through which to share their archives.

Netlytic visualization for the #compdata13 network

The following session explored other attempts to combine analyse with software design. Fenwick McKelvey discussed network diagnostic tools, some of which may be helpful in better understanding NSA surveillance. He also raised questions about the structure of crowdsourced research: often, he notes, researchers set their aims and create the infrastructure for crowd participation, rather than allowing the ‘crowd’ (however that might be defined) to do more in setting research goals and processes.

Robert W. Gehl‘s presentation focused critical reverse engineering approaches, including making suggestions about how these may be applied to the humanities. He argued that critical reverse engineering allows us to understand they ways in which new technologies and systems are not radical breaks with the past, but rather come from a particular history and series of struggles, looking in detail at how this applied to attempts to create an alternative to Twitter, TalkOpen.

Anatoliy Gruzd talked about some of the work currently happening at the Dalhousie University Social Media Lab, including the creation of the Netlytic tool, which may be useful for visualizing networks and is currently being used to explore a number of different online communities and discussions.

In the session on audience engagement, Gavin Adamson looked at some of the ways in which social media is affecting mental health coverage (noting that audiences much prefer to share positive news stories, rather than those framed through the lens of violence/risk); Mariluz Sanchez discussed the use of social media in transmedia storytelling, and Kamilla Pietrzyk gave a thought-provoking presentation on the research she’s beginning on the effects of read receipts on online communication.

Alessandra Renzi and Ganaele Langlois kicked off the final session with a conversation about some of the issues involved in data/activism, exploring the ways in which militant research methods might be combined with critical software studies. They argued that much of the discussion around participatory culture takes celebratory approach to understanding political participation, and that we need to think about the ways in which being ‘active’ differs from resisting existing systems and building alternatives. They also raised many of the questions around the relationship between researchers and activists that Tim and I covered in our talk, including some we hadn’t considered.

David Karpf‘s challenged the idea that online activism, particularly petitions, are spontaneous examples of ‘organising without organisations’. Instead, he argues, a closer look at online petition sites demonstrates that we are seeing organising with different organisations. The organisations involved in MoveOn.org and Change.org both make choices about their platforms which shape the kinds of petitions created (those on MoveOn tend to be more political). MoveOn’s prompts guiding members’ creation of petitions also serve as an educational tool, drawing in part on (Saul Alinsky’s?) ideas about political organising.

Today’s presentations on big data research at Compromised Data? raised some important questions about the role that big data is playing in academic research and government policy, as well as about the methodological challenges faced by big data researchers.

Greg Elmer‘s opening remarks positioned the ‘compromised data?’ theme in the broader context of neoliberal policies and the Canadian government’s anti-environmental policies. Joanna Redden‘s work on the increasing incorporation of big data research into Canadian policy-making and government service provision expanded on this theme. Redden pointed out that the turn towards big data is framed in the language of efficiency and money-saving, but that we should be concerned about the quality of the data being used, including the erasure of poverty as those who are not online (or online less) become invisible, and as services which generate oppositional forms of knowledge have their funding cut. We should also remain aware of the ways in which a reliance on big data research can change government processes, changing the role of bureaucrats and changing the relationship between citizens and the government. We need to recognise that neoliberalism is not just a political project, but also one which aims to change how we think: big data is not neutral, but rather is easily incorporated within this system.

Tainer Bucher‘s exploration of shifts in the Twitter APIs complemented this well, inviting us to look more deeply at the role of APIs in shaping how we interact with data. Bucher argues that while there’s a risk of seeing APIs as just another convenient tool to gather data, we need to critically analyse software tools and understand the power relations embedded in their design. Her empirical research in 2010 and 2011 focused on shifts in the Twitter APIs, in which the initial openness which helped Twitter to grow was increasingly shut down.

Jean Burgess and Axel Bruns also touched on the consequences of Twitter’s API as they discussed Twitter research and the politics of data access. To begin with, they point out, there’s a disproportionate focus on Twitter in academic research because it’s the easiest social media data to access. At the same time, much of the work is biased by limitations in the software tools used to study the platform: key tools like TwapperKeeper and DataSift were constrained in important ways by the changes to Twitter APIs. There are also biases that come from a focus on the low-hanging fruit, such as a focus on hashtags rather than on more complex layers of interaction like follower networks and @replies. Burgess and Bruns argue that we need to be reaching beyond the easily-available data in order to build a better picture of how people are using Twitter.

Carolin Gerlitz provided one model for doing this, outlining an approach based on a model of social media as multivalent: producing data that is both standardised and vague, and therefore allows for multiple readings. Gerlitz argued that more research needs to be open to the multiple use practices involved in social media. Frauke Zeller‘s work also provided useful templates for research which is open to the multiple meanings of social media texts, suggesting that there are benefits to an interative approach in which qualitative and quantitative analysis mutually inform each other.

Daniel Pare and Mary Francoli‘s research raised concerns about existing approaches in big data research, particularly focusing on the literature on political engagement and mobilisation. Like others, they pointed out that the data which is most easily available is not necessarily the most accurate; a focus on big data research on social media is problematic when it’s used as a simple measure of broader political trends. There’s also far too little recognition of the ways in which assumptions about what ‘democracy’ means shape research on political mobilisation and engagement online, and of the inherently political nature of social media platforms.

Asta Zelenkauskaite’s work on mainstream media’s approaches to big data also highlighted the contested nature of these platforms, inviting us to consider the difference between social media engagement as a top-down process and what it might look like if it was driven by consumer interests. Sidneyeve Matrix’s presentation served as a useful complement to this, examining the shift towards niche social networks—often paid, gated communities—that support consumers’ use of their geolocative data.

The day’s presentations opened up some vital questions that are being asked far too infrequently in big data research, and in the broader big data community, about the political and methodological issues involved in the push towards big data as a magical cure-all. I’m looking forward to tomorrow’s presentations, as well as to talking about how these concerns relate to the research Tim and I are doing.