There has been a data rush in the past decade brought about by online communication and, in particular, social media (Facebook, Twitter, Youtube, among others), which promises a new age of digital enlightenment. But social data is compromised: it is being seized by specific economic interests, it leads to a fundamental shift in the relationship between research and the public good, and it fosters new forms of control and surveillance.

Compromised Data: From Social Media to Big Data explores how we perform critical research within a compromised social data framework. The expert, international lineup of contributors explores the limits and challenges of social data research in order to invent and develop new modes of doing public research. At its core, this collection argues that we are witnessing a fundamental reshaping of the social through social data mining.

Our chapter explores some of the analytical and ethical issues involved in using big data for social media research, drawing on our Mapping Movements work. You can find our presentation from the colloquium here.

The second day of the Compromised Data colloquium was fascinating, and I’m looking forward to chasing down further work from many of the presenters.

The opening session started with Lisa Blackman discussing experiments with repurposing commercial software tools to explore contagion in complex environments, drawing on controversies around psychic research of the nineteenth century (including work on automatic writing). I liked the idea of ‘haunted data’: the ways in which research takes on a new life after publication, and may begin to be circulated by non-academic networks in ways that the original researchers never intended.

Ingrid M. Hoofd raised some interesting questions about the ways in which academic institutions, researchers, media, and activists may be becoming implicated in problematic representational regimes in their use of social media. She discussed The Guardian’s Reading the Riots project, which she argued simultaneously made claims to build an empirically-based analysis of the reasons behind the riots while also being based in, and reinforcing, existing stereotypes around class and race.

Yuk Hui‘s work on self-archiving the massive amounts of digital objects which we generated notes the difference between merely storing, and archiving, this material: archiving requires the additional of contextual framing. The theoretical framework of Hui’s work is accompanied by attempts to design self-archiving tools which will allow them to create physical objects through which to share their archives.

Netlytic visualization for the #compdata13 network

The following session explored other attempts to combine analyse with software design. Fenwick McKelvey discussed network diagnostic tools, some of which may be helpful in better understanding NSA surveillance. He also raised questions about the structure of crowdsourced research: often, he notes, researchers set their aims and create the infrastructure for crowd participation, rather than allowing the ‘crowd’ (however that might be defined) to do more in setting research goals and processes.

Robert W. Gehl‘s presentation focused critical reverse engineering approaches, including making suggestions about how these may be applied to the humanities. He argued that critical reverse engineering allows us to understand they ways in which new technologies and systems are not radical breaks with the past, but rather come from a particular history and series of struggles, looking in detail at how this applied to attempts to create an alternative to Twitter, TalkOpen.

Anatoliy Gruzd talked about some of the work currently happening at the Dalhousie University Social Media Lab, including the creation of the Netlytic tool, which may be useful for visualizing networks and is currently being used to explore a number of different online communities and discussions.

In the session on audience engagement, Gavin Adamson looked at some of the ways in which social media is affecting mental health coverage (noting that audiences much prefer to share positive news stories, rather than those framed through the lens of violence/risk); Mariluz Sanchez discussed the use of social media in transmedia storytelling, and Kamilla Pietrzyk gave a thought-provoking presentation on the research she’s beginning on the effects of read receipts on online communication.

Alessandra Renzi and Ganaele Langlois kicked off the final session with a conversation about some of the issues involved in data/activism, exploring the ways in which militant research methods might be combined with critical software studies. They argued that much of the discussion around participatory culture takes celebratory approach to understanding political participation, and that we need to think about the ways in which being ‘active’ differs from resisting existing systems and building alternatives. They also raised many of the questions around the relationship between researchers and activists that Tim and I covered in our talk, including some we hadn’t considered.

David Karpf‘s challenged the idea that online activism, particularly petitions, are spontaneous examples of ‘organising without organisations’. Instead, he argues, a closer look at online petition sites demonstrates that we are seeing organising with different organisations. The organisations involved in MoveOn.org and Change.org both make choices about their platforms which shape the kinds of petitions created (those on MoveOn tend to be more political). MoveOn’s prompts guiding members’ creation of petitions also serve as an educational tool, drawing in part on (Saul Alinsky’s?) ideas about political organising.

Today’s presentations on big data research at Compromised Data? raised some important questions about the role that big data is playing in academic research and government policy, as well as about the methodological challenges faced by big data researchers.

Greg Elmer‘s opening remarks positioned the ‘compromised data?’ theme in the broader context of neoliberal policies and the Canadian government’s anti-environmental policies. Joanna Redden‘s work on the increasing incorporation of big data research into Canadian policy-making and government service provision expanded on this theme. Redden pointed out that the turn towards big data is framed in the language of efficiency and money-saving, but that we should be concerned about the quality of the data being used, including the erasure of poverty as those who are not online (or online less) become invisible, and as services which generate oppositional forms of knowledge have their funding cut. We should also remain aware of the ways in which a reliance on big data research can change government processes, changing the role of bureaucrats and changing the relationship between citizens and the government. We need to recognise that neoliberalism is not just a political project, but also one which aims to change how we think: big data is not neutral, but rather is easily incorporated within this system.

Tainer Bucher‘s exploration of shifts in the Twitter APIs complemented this well, inviting us to look more deeply at the role of APIs in shaping how we interact with data. Bucher argues that while there’s a risk of seeing APIs as just another convenient tool to gather data, we need to critically analyse software tools and understand the power relations embedded in their design. Her empirical research in 2010 and 2011 focused on shifts in the Twitter APIs, in which the initial openness which helped Twitter to grow was increasingly shut down.

Jean Burgess and Axel Bruns also touched on the consequences of Twitter’s API as they discussed Twitter research and the politics of data access. To begin with, they point out, there’s a disproportionate focus on Twitter in academic research because it’s the easiest social media data to access. At the same time, much of the work is biased by limitations in the software tools used to study the platform: key tools like TwapperKeeper and DataSift were constrained in important ways by the changes to Twitter APIs. There are also biases that come from a focus on the low-hanging fruit, such as a focus on hashtags rather than on more complex layers of interaction like follower networks and @replies. Burgess and Bruns argue that we need to be reaching beyond the easily-available data in order to build a better picture of how people are using Twitter.

Carolin Gerlitz provided one model for doing this, outlining an approach based on a model of social media as multivalent: producing data that is both standardised and vague, and therefore allows for multiple readings. Gerlitz argued that more research needs to be open to the multiple use practices involved in social media. Frauke Zeller‘s work also provided useful templates for research which is open to the multiple meanings of social media texts, suggesting that there are benefits to an interative approach in which qualitative and quantitative analysis mutually inform each other.

Daniel Pare and Mary Francoli‘s research raised concerns about existing approaches in big data research, particularly focusing on the literature on political engagement and mobilisation. Like others, they pointed out that the data which is most easily available is not necessarily the most accurate; a focus on big data research on social media is problematic when it’s used as a simple measure of broader political trends. There’s also far too little recognition of the ways in which assumptions about what ‘democracy’ means shape research on political mobilisation and engagement online, and of the inherently political nature of social media platforms.

Asta Zelenkauskaite’s work on mainstream media’s approaches to big data also highlighted the contested nature of these platforms, inviting us to consider the difference between social media engagement as a top-down process and what it might look like if it was driven by consumer interests. Sidneyeve Matrix’s presentation served as a useful complement to this, examining the shift towards niche social networks—often paid, gated communities—that support consumers’ use of their geolocative data.

The day’s presentations opened up some vital questions that are being asked far too infrequently in big data research, and in the broader big data community, about the political and methodological issues involved in the push towards big data as a magical cure-all. I’m looking forward to tomorrow’s presentations, as well as to talking about how these concerns relate to the research Tim and I are doing.

In October Tim and I will be presenting on the methodological underpinnings of our Mapping Movements project at the Compromised Data? colloquium at Ryerson University. Our paper examines some of the problems with big data research on social movements:

Social movement research and big data: critiques and alternatives

This paper examines the growing use of big data, social media-oriented approaches in the study of social movements, including Occupy and the Arab Spring, and suggests an alternative research methodology. We argue that although big data studies provide valuable contributions to the literature, there are both analytical and ethical reasons to complement this work with fieldwork. The Mapping Movements project provides a framework for a blended approach, developing mixed methods in order to examine the physical and the online aspects of social movements, with case studies of social movements in North America, Africa, and Europe; our preliminary research, from 2012, analysed the uses and perceptions of Twitter within Occupy Oakland, combining Twitter data with fieldwork from Oakland, including interviews with activists. Subsequent fieldwork and data collection was focused on the 2013 World Social Forum, held in Tunis, and Greek antifascist movements in 2013.

Recent years have seen a growth in the use of quantitative analyses in social movement research, taking advantage of the huge volume of data available through platforms like Twitter and YouTube. There has been considerable work on the Occupy movement focusing on hashtags (Conover, Davis, et al., 2013; Conover, Ferrara, Menczer, & Flammini, 2013), YouTube linking networks (Thorson et al., 2013), and even Facebook (Gaby & Caren, 2012), despite the difficulties involved in accessing data on the platform. Similar analyses have focused on the Arab Spring (Papacharissi & Oliveira, 2012; Starbird & Palen, 2012). This work makes important contributions to our understanding of these movements; a large-scale, quantitative approach enables a comprehensive overview of Twitter coverage across the lifespan of these social movements. Such data can demonstrate key patterns of activity, such as periods of heightened or lessened online communication, and in particular how these patterns develop over time in response to events affecting the movement. Datasets also provide valuable details on information sources cited or the attention received by individual users.

However, big data approaches have significant blind spots, and are most effective when complemented by qualitative methods, especially fieldwork and other direct contact with movement participants. Although other research has adopted a mixed-methods approach (Costanza-Chock, 2012), there has been little active reflection on this methodology when it comes to social movement research. The approach which we have framed for the Mapping Movements ties together big data research, participant observation, and interviews, working to complement and test data gathered through each technique. Such an approach is particularly vital for social movement research, where the online platforms used by participants may be different between movements, and also where the platforms employed – and their functions – change in response to the evolving needs and concerns of the movement.

Our preliminary research suggests that this methodology highlights issues which may not be visible to big data approaches. Interviews from the case studies indicate that many activists are currently engaging in strategic avoidance of social media. Participants also engage in self-censorship when they do use social media. This means that important participants and tactics are effectively hidden from the view of research based purely on big data. Similarly, our research suggests that many participants perceive Twitter and other social media platforms to be engaging in censorship or otherwise limiting activists’ online presence, with tweets or other material disappearing suspiciously, or accounts associated with activism being unfollowed. Recent developments, including the apparently targeted shut-down of Greek left-wing Facebook profiles (Ματθαίος, 2013) and the introduction of a Twitter ‘report’ button, are likely to further diminish the visibility of certain kinds of social movement activism on social media.

There are also important ethical issues associated with big data research on social movements. Chesters argues that ethical social movement research requires reciprocity with movement participants (including an openness to being challenged), and that we remember the “academy has no a priori reason or justification for making demands upon those it seeks knowledge of” (2012, p. 155). This suggests two ethical critiques of big data approaches to social movement studies. The first is that in gathering data from public or semi-public spaces we are drawing on participants’ activism, and transforming it into “commodifiable objects of knowledge” (Chesters, 2012, p. 145). The second is that the distance involved means that there is little dialogue involved with movement participants, and few chances for them to challenge the researcher’s position of power. Whereas participant observation and interviews frequently require the researcher to answer difficult questions about their work (as has happened in the case of our research), it is possible to carry out big data research without ever interacting with movement participants. If research is published in paywalled journals, participants may never even be able to read it, let alone comment on it.

The Mapping Movements methodology is not just an approach for gathering research data, but also shapes how findings and discussions are later disseminated. Our research enables a nuanced analysis of social media use by activists, looking beyond the object of study (the social medium of choice) at a quantitative level, to examine the intersections between the online and the physical aspects of social movements, and how these influence one another and affect the social media strategies at hand.

The stitching is based on the graph of the data that is in the table. By Jessica Kelly

There’s been a significant push in Internet Studies over the last few years towards ‘big data’ studies, which aggregate huge volumes of information (such as tweets or website linking patterns) and subject them to analysis, often quantitative analysis. Much of this research provides valuable insights into how people are using the Internet and its impacts on society, politics, and economics. At IR13 last year there were plenty of projects which took a ‘big data’ approach to the study of social movements, particularly the Arab Spring and Occupy, and provided important analysis about how they organise and communicate. And, of course, my collaborator on the Mapping Movements project, Tim Highfield, is doing excellent work in the area.

However, I do think that there are important aspects of this shift towards big data that we need to maintain a critical approach towards. Part of the reason why ‘big data’ is so appealing is that it looks like Science: there are numbers! and statistical analysis! There have been claims that it will allow us to ‘do away with the need for hypothesis and theory’ (presumably ridding ourselves of the biases contained in these processes). It fits within our perceptions of what ‘proper’ science should be: more objective, less reliant on qualitative methods like participant observation and interviews. This notion of science has, of course, been critiqued from a numberof perspectives. Emily Martin’s ‘The Egg and the Sperm‘, for example, provided an excellent demonstration of how profoundly even ‘hard’, supposedly objective, science, is shaped by cultural assumptions, including those surrounding gender.

The shift towards ‘big data’ is not only linked to the uptake of new analytical tools, it is also linked to our (gendered) ideas of what science should look like. As more funding becomes available for big data research, it is important to bear in mind the ways in which our assumptions structure the value we place on different research, and the ways in which access to different research fields is gendered. While many women provide vital contributions in STEM fields, there continue to be significant structural barriers to participation by women and minority groups in these areas. Devaluing qualitative research in favour of quantitative big data not only builds on misplaced assumptions about the value of ‘hard sciences’, it also adds to the factors excluding marginalised perspectives from academia.

By Jessica Kelly

This is not to say that we should abandon big data approaches. As I said, I believe that they provide many helpful insights. There’s also some fascinating work out there that uses big data in ways that undermine the assumptions that this research must be ‘objective’ – Zizi Papacharissi and Maria de Fatime Oliviera’s work on affective publics springs to mind here. Tim and I are approaching the use of big data by drawing together big data approaches and participant observation, interviews, and other qualitative methods. So the issue is not so much whether we use big data, as whether we remain aware of the ways in which its use is structured by our assumptions about what constitutes ‘science’, and of ways in which this may privilege some groups’ participation over others.

It was refreshing to begin the conference with plenary speakers bringing excellent feminist and queer analysis to bear on Internet Studies. Mary Gray, Larissa Hjorth, and Susanna Paasonen all posed challenges to the dominant focuses of Internet Studies. Gray questioned technology- (and particularly device-) centred approaches, and the accompanying focus on ‘big data’. (I’ve also been having some useful discussions around this latter focus as gendered: this push towards a ‘scientific’ and quantitative approach has important implications when women are still discouraged-both subtly and unsubtly-from engagement in STEM fields and the statistical training required for big data projects.)

smartphones as caravans, courtesy of @evestirling

Gray also critiqued the ongoing focus on normative users in research, looking instead at ‘boundary publics’ – in this case queer rural youth. Hjorth, similarly, implicitly challenged the common focus on (young, well-off) men and technology use by looking at the ways in which mobile use is affecting the space of ‘the domestic’, and the relationship between mothering, smartphones, and labour.

Finally, Susanna Paasonen provided a useful counter to the assumption (perhaps more common in popular narratives than academic discourse) that digital content is disembodied. This is often tied either to a narrative of loss (of authenticity, of tactility), or a narrative of freedom (from physical limitations). Paasonen argues, in contrast, that the materiality of consuming digital content matters: digital content is always mediated through particular devices, which have different affordances and encourage particular kinds of uses. Paasonen also got quite a few scandalised/delighted titters from the audience by showing a short clip from a porn film (which, she notes, she inherited in Super 8 form from her parents). While this added some (more) humour to the talk, I think it’s also important politically. The politics of pornography have always played a large role in discussions of the Internet. We need to be able to talk productively about pornography not only in order to understand the Internet, but also because it plays such a large role in the construction of sexuality and desire in many societies.

I’m looking forward to reading more work by all of these speakers, especially as my still-slighly-jetlagged self is having some trouble processing the more theoretically dense aspects of the talks in aural form. Corrections, comments, and reading suggestions are welcome!

Where Am I?

You are currently browsing entries tagged with big data at skycroeser.net.