Introduction

Data journalism in question

What is data journalism? What is it for? What might it do? What opportunities and limitations does it present? Who and what is involved in making and making sense of it? This book is a collaborative experiment responding to these and other questions. It follows on from another edited book, The Data Journalism Handbook: How Journalists Can Use Data to Improve the News (O’Reilly Media, 2012).1 Both books assemble a plurality of voices and perspectives to account for the evolving field of data journalism. The first edition started through a “book sprint” at MozFest in London in 2011, which brought together journalists, technologists, advocacy groups and others in order to write about how data journalism is done. As we wrote in the introduction, it aimed to “document the passion and enthusiasm, the vision and energy of a nascent movement”, to provide “stories behind the stories” and to let “different voices and views shine through”. The 2012 edition is now translated into over a dozen languages – including Arabic, Chinese, Czech, French, Georgian, Greek, Italian, Macedonian, Portuguese, Russian, Spanish and Ukrainian – and is used for teaching at many leading universities, as well as teaching and training centres around the world, as well as being a well-cited source for researchers studying the field.

While the 2012 book is still widely used (and this book is intended to complement rather than to replace it), a great deal has happened since 2012. On the one hand, data journalism has become more established. In 2011 data journalism as such was very much a field “in the making”, with only a handful of people using the term. It has subsequently become socialised and institutionalised through dedicated organisations, training courses, job posts, professional teams, awards, anthologies, journal articles, reports, tools, online communities, hashtags, conferences, networks, meetups, mailing lists and more. There is also broader awareness of the term through events which are conspicuously data-related, such as the Panama Papers, which whistleblower Edward Snowden then characterised as the “biggest leak in the history of data journalism”.

On the other hand, data journalism has become more contested. The 2013 Snowden leaks helped to establish a transnational surveillance apparatus of states and technology companies as a matter of fact rather than speculation. These leaks suggested how citizens were made knowable through big data practices, showing a darker side to familiar data-making devices, apps and platforms.2 In the US the launch of Nate Silver’s dedicated data journalism outlet FiveThirtyEight in 2014 was greeted by a backlash for its over-confidence in particular kinds of quantitative methods and its disdain for “opinion journalism”.3 While Silver was acclaimed as “lord and god of the algorithm” by The Daily Show’s Jon Stewart for successfully predicting the outcome of the 2012 elections, the statistical methods that he advocated were further critiqued and challenged after the election of Donald Trump in 2016. These elections along with the Brexit vote in the UK and the rise of populist right-wing leaders around the world, were said to correspond with a “post-truth” moment, characterised by a widespread loss of faith in public institutions, expert knowledge and the facts associated with them, and the mediation of public and political life by online platforms which left their users vulnerable to targeting, manipulation and misinformation.

Whether this “post-truth” moment is taken as evidence of failure or as a call to action, one thing is clear: data can no longer be taken for granted, and nor can data journalism. Data does not just provide neutral and straightforward representations of the world, but is rather entangled with politics and culture, money and power. Institutions and infrastructures underpinning the production of data – from surveys to statistics, climate science to social media platforms – have been called into question. Thus it might be asked: Which data, whose data and by which means? Data about which issues and to what end? Which kinds of issues are data-rich and which are data-poor? Who has the capacities to benefit from it? What kinds of publics does data assemble, which kinds of capacities does it support, what kinds of politics does it enact and what kinds of participation does it engender?

Towards a critical data practice

Rather than bracketing such questions and concerns, this book aims to “stay with the trouble” as the prominent feminist scholar Donna Haraway puts it.4 Instead of treating the relevance and importance of data journalism as an assertion, we treat this as a question which can be addressed in multiple ways. The collection of chapters gathered in the book aim to provide a richer story about what data journalism does, with and for whom. Through our editorial work we have encouraged both reflection and a kind of modesty in articulating what data journalism projects can do, and the conditions under which they can succeed. This entails the cultivation of a different kind of precision in accounting for data journalism practice: specifying the situations in which it develops and operates. Such precision requires broadening the scope of the book to include not just the ways in which data is analysed, created and used in the context of journalism but also more about the social, cultural, political and economic circumstances in which such practices are embedded.

The subtitle of this new book is “towards a critical data practice”, and reflects both our aspiration as editors to bring critical reflection to bear on data journalism practices, as well as reflecting the increasingly critical stances of data journalism practitioners. The notion of “critical data practice” is a nod to Philip E. Agre’s notion of “critical technical practice”, which he describes in terms of having “one foot planted in the craft work of design and the other foot planted in the reflexive work of critique”.5 As we have written about elsewhere, our interest in this book is understanding how critical engagements with data might modify data practices, making space for public imagination and interventions around data politics.6

Alongside contributions from data journalists and practitioners writing about what they do, the book also includes chapters from researchers whose work may advance critical reflection on data journalism practices, from fields such as anthropology, science and technology studies, (new) media studies, internet studies, platform studies, the sociology of quantification, journalism studies, indigenous studies, feminist studies, digital methods and digital sociology. Rather than assume a more traditional division of labour such that researchers provide critical reflection and practitioners offer more instrumental tips and advice, we have sought to encourage researchers to consider the practical salience of their work, and to provide practitioners with space to reflect on what they do outside of their day-to-day deadlines. None of these different perspectives exhaust the field, and our objective is to encourage readers to attend to the different aspects of how data journalism is done. In other words, this book is intended to function as an multidisciplinary conversation starter, and – we hope – a catalyst for collaborations.

We do not assume that “data journalism” refers to a unified set of practices. Rather it is a prominent label which refers to a diverse set of practices which can be empirically studied, specified and experimented with. As one recent review puts it, we need to interrogate the “how of quantification as much as the mere fact of it”, the effects of which “depend on intentions and implementation”.7 Our purpose is not to stabilise how data journalism is done, but rather to draw attention to its manifold aspects and open up space for doing it differently.

A collective experiment

It is worth briefly noting what this book is not. It is not just a textbook or handbook in the conventional sense: the chapters don’t add up to an established body of knowledge, but are rather intended to indicate interesting directions for further inquiry and experimentation. The book is not just a practical guidebook of tutorials or “how tos”: there are already countless readily available materials and courses on different aspects of data practice (e.g. data analysis and data visualisation). It is not just a book of “behind the scenes” case studies: there are plenty of articles and blog posts showing how projects were done, including interviews with their creators. It is not just a book of recent academic perspectives: there is an emerging body of literature on data journalism scattered across numerous books and journals.8

Rather the book has been designed as a collective experiment in accounting for data journalism practices and a collective invitation to explore how such practices may be modified. It is collective in that, as with the first edition, we have been able to assemble a comparatively large number of contributors (more than seventy) for a short book, and the editorial process has benefitted from recommendations from contributors. Through what could be considered a kind of curated “snowball editorial”, we have sought to follow how data journalism is done by different actors, in different places, around different topics, through different means. Through the process we have trawled through many shortlists, longlists, outlets and datasets to curate different perspectives on data journalism practices. Though there were many, many more contributors we would have liked to include, we had to operate within the constraints of a printable book, as well as giving voice to a balance of genders, geographies and themes.

It is experimental in that the chapters provide different perspectives and provocations on data journalism, which we invite readers to further explore through actively configuring their own blends of tools, datasets, methods, texts, publics and issues. Rather than inheriting the ways of seeing and ways of knowing that have been “baked into” elements such as official datasets or social media data, we encourage readers to enrol them into the service of their own lines of inquiry. This follows the spirit of “critical analytics” and “inventive methods” which aim to modify the questions which are asked and the way problems are framed.9 Data journalism can be viewed not just in terms of how things are represented, but in terms of how it organises relations – such that it is not just a matter of producing data stories (through collecting, analysing, visualising and narrating data), but also attending to who and what these stories bring together (including audiences, sources, methods, institutions and social media platforms). Thus we may ask, as Noortje Marres recently put it: “What are the methods, materials, techniques and arrangements that we curate in order to create spaces where problems can be addressed differently?”. The chapters in this book show how data journalism can be an inventive, imaginative, collaborative craft, highlighting how data journalists interrogate official data sources, make and compile their own data, try new visual and interactive formats, reflect on the effects of their work and make their methods accountable and code re-usable.

The online beta of the book is intended to provide an opportunity to publicly preview a selection of chapters before the printed version of the book is published. We hope this process will elicit comments and encounters (and perhaps testing out in contexts of teaching and training) before the book takes its final shape. If the future of data journalism is uncertain, then we hope that readers of this book will join us in both critically taking stock of what journalism is and has been, as well as intervening to shape its future.

An overview of the book

To stay true to our editorial emphasis on specifying the setting, we note that the orientation of the book and its selection of chapters is coloured by our interests and those of our friends, colleagues and networks at this particular moment – including growing concerns about climate change, environmental destruction, air pollution, tax avoidance, (neo)colonialism, racism, sexism, inequality, extractivism, authoritarianism, algorithmic injustice and platform labour. The chapters explore how data journalism makes such issues intelligible and experienceable, as well as the kinds of responses it can mobilise. The selection of chapters also reflects our own oscillations between academic research, journalism and advocacy, as well as the different styles of writing and data practice associated with each of these. We remain convinced of the generative potential of encounters between colleagues in these different fields, and several of the chapters attest to successful cross-field collaborations.

After the introduction, the book starts with a “taster menu” on doing issues with data. This includes a variety of different formats for making sense of different themes in different places – including looking at the people and scenes behind the numbers for home demolitions in occupied East Jerusalem (Haddad), multiplying memories of trees in Bogota (Magaña), tracing connections between agricultural commodities, crime, corruption and colonialism across several countries (Sánchez and Villagrán), investigating extractive industries in Peru (Salazar), mobilising for road safety in the Philippines (Rey and Mendoza), putting carbon emissions into context (Clark), engaging publics with data graphics on Instagram (Alaali), counting transgender lives (Talusan), and mapping segregation in the US (Williams). The chapters in this section illustrate a breadth of practices from visualisation techniques to building campaigns to engaging audiences around data on Instagram.

The third section focuses on how journalists assemble data, including projects on themes such as land conflicts (Shrivastava and Paliwal), air pollution (Naik and Salve) and knife crime (Barr). It also includes accounts of how to obtain and work with data in countries where it may be less easy to come by, such as in Cuba (Carmona et al) and China (Ma). Assembling data may also be a way of engaging with readers (Coelho) and assembling interested actors around an issue, which may in itself constitute an important outcome of a project. Gathering data may involve gradually and creatively piecing together fragments of information from disparate sources, including documents, interviews and investigative fieldwork (Boros). As well as using data, other types of stories may be surfaced by exploring how numbers are made (Verran).

The fourth section is concerned with different ways of working with data. This includes with graph databases (Haddou), algorithms (Stray), code (Simon) and varieties of digital and computational methods (Zhang; Rey). Contributors examine emerging issues and opportunities arising from working with sources such as text data (Maseda) and data from the web, social media and other online devices (Weltevrede). Others look at practices for making data journalistic work transparent, accountable and reproducible (Leon; Mazotte). Databases may also afford opportunities for collaborative work on large investigative projects (Díaz-Struck and Romera). Feminist thought and practice may also inspire different ways of working with data (D'Ignazio).

The fifth section is dedicated to examining different ways in which data can be experienced. Several pieces reflect on contemporary visualisation practices (Aisch and Rost; Stabe), as well as how readers respond to and participate in making sense with visualisations (Kennedy et al). Other pieces look at how data is mediated and presented to readers through databases (Rahman and Wehrmeyer), web based interactives (Bentley), TV and radio (de Jong) and comics (Amancio).

The sixth section is dedicated to emerging approaches for investigating data, platforms and algorithms. The digital is taken as a site of investigation, as highlighted by BuzzFeed News projects on viral content, misinformation and digital culture (Silverman). Chapters in this section examine different ways of reporting on algorithms (Diakopoulous), as well as how to conduct longer term collaborations in this area (Elmer). Several chapters look at how to work with social media data to explore how platforms participate in shaping debate, including storytelling approaches (Vo) and repurposing data to see how platforms and data industries see humans (Lavigne). A final chapter explores affinities between digital methods research and data journalism, including how data can be used to tell stories about web tracking infrastructures (Rogers).

The seventh section is on organising data journalism, and attends to different types of work in the field which is considered indispensable but not always prominently recognised. This includes the changing role of data journalism in newsrooms (Pilhofer; Klein); how data journalism has changed over the past decade (Rogers); how platforms and the gig economy shape cross-border investigative networks (Candea); entanglements between data journalism and movements for open data and civic tech (Baack); open source coding practices (Pitts); data journalism and gender (Vaca); audience measurement practices (Petre); archiving data journalism (Broussard); organising transnational collaborations (Ottaviani and Govindasamy); and the role of the #ddj hashtag in connecting data journalism communities on Twitter (Au and Smith).

The eighth section looks at training data journalists and the development of data journalism around the world. This includes chapters on teaching data journalism at universities in the US (Phillips); hackathons and bootcamps in Central Asia (Valeeva); and MOOCs and local training initiatives in Turkey (Dag). Others argue for the importance of empowering marginalised communities to tell their stories (Constantaras), and caution against “digital universalism” and underestimating innovation in the “periphery” (Chan).

Data journalism does not happen in a vacuum and the ninth section surfaces its various social, political, cultural and economic settings. A chapter on the genealogies of data journalism in the United States serves to encourage reflection on the various historical practices and ideas which shape it (Anderson). Other chapters look at the economics and sustainability of data journalism (Steiger); data journalism as a response to broader societal processes of datafication (Lewis and Radcliffe); different forms and formats of data journalism (Cohen); the publics that data journalism assembles (Parasie); and how data journalism projects are valued through awards (Loosen). Two chapters reflect on different approaches to measuring the impact of data journalism projects (Bradshaw; Green-Barber). Others examine issues around data journalism and colonialism (Young) and indigenous data sovereignty (Kukutai and Walter).

The tenth and final section closes with reflections, challenges and possible future directions for the field. This includes chapters on opportunities and pitfalls of knowing society through data (Didier); data journalism and digital liberalism (Boyer); and whether data journalism can live up to its earlier aspirations to become a field of inspired experimentation, interactivity and play (Usher). An afterword from Noortje Marres reflects on data journalism as a form of reporting from the perspective of digital sociology.

Twelve challenges for critical data practice

Drawing on the time that we have spent exploring the field of data journalism through the development of this book, we would like to provide twelve challenges for “critical data practice”. These consider data journalism in terms of its capacities to shape relations between different actors as well as to produce representations about the world.

How can data journalism projects account for the collective character of digital data, platforms, algorithms and online devices, including the interplay between digital technologies and digital cultures?

How can data journalism projects tell stories about big issues at scale (e.g. climate change, inequality, multinational taxation, migration) while also affirming the provisionality and acknowledging the models, assumptions and uncertainty involved in the production of numbers?

How can data journalism projects tell stories both with and about data including the various actors, processes, institutions, infrastructures and forms of knowledge through which data is made?

How can data journalism projects cultivate their own ways of making things intelligible, meaningful and relatable through data, without simply uncritically advancing the ways of knowing “baked into” data from dominant institutions, infrastructures and practices?

How can data journalism projects acknowledge and experiment with the visual cultures and aesthetics that they draw on, including through combinations of data visualisations and other visual materials?

How can data journalism projects make space for public participation and intervention in interrogating established data sources and re-imagining which issues are accounted for through data, and how?

How might data journalists cultivate and consciously affirm their own styles of working with data, which may draw on, yet remain distinct from fields such as statistics, data science and social media analytics?

How can the field of data journalism develop memory practices to archive and preserve their work, as well as situating it in relation to practices and cultures that they draw on?

How can data journalism projects collaborate around transnational issues in ways which avoid the logic of the platform and the colony, and affirm innovations at the periphery?

How can data journalism support marginalised communities to use data to tell their own stories on their own terms, rather than telling their stories for them?

How can data journalism projects develop their own alternative and inventive ways of accounting for their value and impact in the world, beyond social media metrics and impact methodologies established in other fields?

How might data journalism develop a style of objectivity which affirms, rather than minimises, its own role in intervening in the world and in shaping relations between different actors in collective life?

Words of thanks

We are most grateful to Amsterdam University Press for being so supportive with this experimental project, including the publication of an online beta as well as their support for an open access digital version of the book. It is perhaps also an apt choice, given that several of the contributors convened at one of the first European conferences on data journalism which took place in Amsterdam in 2010. Open access funding is supported by a grant from the Netherlands Organisation for Scientific Research (NWO, 324-98-014).

The vision for the book was germinated through discussions with friends and colleagues associated with the Public Data Lab. We particularly benefited from conversations about the book with Andreas Birkbak, Erik Borra, Noortje Marres, Richard Rogers, Tommaso Venturini and Esther Weltevrede. We were also provided with space to develop the direction of this book through events and visits to Columbia University (in discussion with Bruno Latour); Utrecht University; the University of California Berkeley; Stanford University; the University of Amsterdam; the University of Miami; Aalborg University Copenhagen; Sciences Po, Paris; the University of Cambridge; London School of Economics; Cardiff University; Lancaster University and the International Journalism Festival in Perugia. Graduate students taking the MA course in data journalism at King’s College London helped us to test the notion of “critical data practice” which lies at the heart of this book.

Our longstanding hope to do another edition was both nurtured and materialised thanks to Rina Tsubaki, who helped to gather support from the European Journalism Centre and the Google News Initiative. We are grateful to Adam Thomas, Bianca Lemmens, Biba Klomp‏, Letizia Gambini, Arne Grauls and Simon Rogers for providing us with both editorial independence and enduring support to scale up our efforts. The editorial assistance of Daniela Demarchi has been tremendously valuable in helping us to chart a clear course through sprawling currents of texts, footnotes, references emails and spreadsheets.

Most of all, we would like to thank all of the data journalism practitioners and researchers who were involved in the project (whether through writing, correspondence or discussion) for accompanying us, and for supporting this experiment with their contributions of time, energy, materials and ideas without which the project would not have been possible. This book is, and continues to be, a collective undertaking.