What is Data Journalism For? Cash, Clicks, and Cut and Trys

Written by:

The daily refreshing of Five Thirty Eight’s interactive 2016 election map forecasts was all but ritual among my fellow Washingtonians, from politicians to journalists to students to government workers and beyond. Some of this ilk favored The New York Times’ Upshot poll aggregator; the more odds-minded of them, Real Clear Politics, and those with more exotic tastes turned to The Guardian’s US election coverage. For these serial refreshers, all was and would be right with the world so long as the odds were ever in Hillary Clinton’s favor in the US presidential election’s version of Hunger Games, the bigger the spread, the better.

We know how this story ends; Nate Silver’s map, even going into election day, had Hillary Clinton likely to win by 71.4%. Perhaps it’s due time to get over the 2016 US election, and after all, obsession with election maps is perhaps a particularly American pastime, due to the regular cycle of national elections – though that’s not to say that a world-wide audience isn’t also paying attention.1 But until link rot destroys the map, it’s there, still haunting journalists and Clinton supporters alike, providing fodder for Republicans to remind their foes that the “lamestream media” is “fake news”. Politics aside, the US 2016 presidential election should not be forgotten by data journalists: even if the quantification was correct to anyone’s best knowledge, the failures in mapping and visualization have become one more tool through which to dismantle journalists’ claim to epistemic authority (or more simply, their claim to be “authorized knowers”).

Yes, it is unfair to conflate data journalism as electoral prediction – it certainly is far more than that, particularly from a global vantage point, but this sometimes seems that this what data journalism’s ultimate contribution looks like: endless maps, clickable charts, and calculators prone to user error, over-simplification, and marginalization regardless of the rigor of the computation and statistical prowess that produced them. With the second edition of this handbook is now in your hands, we can declare that data journalism has reached a point of maturation and self-reflection, and as such, it is important to ask “What is Data Journalism For?”

Data journalism, as it stands today, still only hints at the potential it has offered to reshape and reignite journalism. The first edition of this handbook began as a collaborative project, in a large group setting in 2011 at a Mozilla Festival, an effort I observed but quickly doubted as ever actually materializing into a tangible result (I was wrong); this second edition is now being published by the University of Amsterdam Press and distributed in the US by the University of Chicago Press with solicited contributors, suggesting the freewheeling nature of data journalism has been exchanged somewhat in return for professionalism, order, and legitimacy. And indeed, this is the case: data journalism is mainstream, taught in journalism schools, and normalized into the newsroom2. Data journalism has also standardized and as such, has changed little over the past five to seven years; reviews of cross national data journalism contests reveal limited innovation in form and topic (most often: politics), with maps and charts still the go-to3. Interactivity is limited to what is considered “entry level techniques” by those in information visualization (Young, Hermida and Fulda,, 2017); moreover, data journalism has not gone far enough to visualize “dynamic, directed, and weighted graphs”.4 Data journalists are still dealing with pre-processed data rather than original “big data” - and this data is “biggish,” at best – government data rather than multi-level data in depth and size of the sort an ISP might collect.

This critique I offer flows largely from a Western-centered perspective, if not-US centered perch, but that does not undermine the essential call to action I put forward: data journalists are still sitting on a potentially revolutionary toolbox for journalism that has yet to be unleashed. The revolution, however, if executed poorly, only stands to further undermine both the user-experience and knowledge-seeking efforts of news consumers, and at worst, further seed distrust in news. If data journalism just continues to look like it has looked for the past five to ten years, then data journalism does little to advance the cause of journalism in the digital and platform era. Thus, to start asking this existential question about “What is data journalism for?” I propose, that data journalists, along with less-data focused but web-immersed journalists who work in video, audio, and code, as well as the scholars that poke and prod them, need to rethink data journalism’s origin story, its present rationale, and its future.

Data Journalism in the US: The Origin Story

The origin story is the story we tell ourselves about how we and why we came to be, and is more often than not filled with rose-tinted glasses and braggadocio than it is reality. The origin story of data journalism in the US goes something like this: In the primordial pre-data journalism world, data journalism existed in an earlier form, as computer-assisted reporting, or was called that in the US, which offered an opportunity to bring social science rigor to journalism.

In the mythos of data journalism’s introduction to the web, data journalists would become souped-up investigative journalists empowered with superior computational prowess of the 21st century who set the data (or documents) free in order to help tell stories that would otherwise not be told. But beyond just investigating stories, data journalists also were to somehow save journalism with their new web skills, bringing a level of transparency, personalization, and interactivity to news that news consumers would appreciate, learn from, and of course, click on. Stories of yesteryear’s web, as it were, would never be the same. Data journalism would right wrongs and provide the much needed objective foundation that journalism’s qualitative assessments lacked, doing it at a scale and with a prowess unimaginable prior to our present real-time interactive digital environment replete with powerful cloud-based servers that offload the computational pressure from any one news organization. Early signs of success would chart the way forward, and even turn ordinary readers into investigative collaborators or citizen scientists, such as with The Guardian’s MP scandal coverage or WNYC’s Cicada project, which got a small army of New York-area residents to build soil thermometers to help chart the arrival of the dreaded summer instincts. And this inspired orchestration of journalism, computation, crowds, data, and technology would continue, pushing truth to justice.

The Present: The ‘Hacker Journalist’ as Just Another (Boring) Newsroom Employee

The present has not moved far past the origin story that today’s data journalists have told themselves, neither in vision nor in reality. What has emerged has become two distinct types of data journalism: the “investigative” data journalism that carries the noble mantle of journalism’s efforts forward, and daily data journalism, which can be optimized for the latest viral click interest, which might mean anything from an effort at ASAP journalistic cartography to turning public opinion polling or a research study into an easily shareable meme with the veneer of journalism attached. Data journalism, at best, has gotten boring and overly professional, and at worst, has become another strategy to generate digital revenue.

It is not hyperbole to say that data journalism could have transformed journalism as we know it – but hitherto it has not. At the 2011 MozFest, a headliner hack of the festival was a plugin of sorts that would allow anyone’s face to become the lead image of a mock-up Boston Globe home page. That was fun and games, but The Boston Globe was certainly not going to just allow user-generated content, without any kind of pre-filtering, to actually be used on its home page. Similarly, during the birth of this first Data Journalism Handbook, the data journalist was the “hacker journalist,” imagined as coming from from technology into journalism or at least using the spirit of open source and hacking to inspire projects that bucked at the conventional processes of institutional journalism and provided room for experimentation, imperfection, and play – tinkering for the sake of leading to something that might not be great in form or content, but might well hack journalism nonetheless5. In 2011, the story was of outsiders moving into journalism, in 2018, the story is of insiders professionalizing programming in journalism, the spirit of innovation, invention, has become decidedly corporate, decidedly white-collar, and decidedly less fun.6

Boring is ok, and serves a role. Some of the professionalization of data journalism has been justified with the “data journalist as hero” self-perception – data journalists as those who, thanks to a different set of values (e.g. collaboration, transparency) and skills (visualization, assorted computational skills) could bring truth to power in new ways. The Panama and Paradise Papers are perhaps one of the best expressions of this vision. But, investigative data journalism requires time, effort, and expertise that goes far beyond just data crunching, and includes many other sources of more traditional data, primarily, interviews, on-location reporting, and documents. Regularly occurring, groundbreaking investigative journalism is an oxymoron, though not for lack of effort – the European Data Journalism Network, the US’ Institute for NonProfit News, and the Global Investigative Journalism network – showcase the vast network of would-be investigative efforts. The truth is that a game-changer investigation is not easy to come by, which is why we can generally name these high-level successes on about ten fingers and the crowd-sourced investigative success of The Guardian MP example from 2010 has yet to be replaced by anything newer.

What’s past is prologue when it comes to data journalism. Snow Fall, The New York Times’ revolutionary immersive storytelling project that won a Pulitzer in 2012, emerged in December 2017 as “Deliverance from 27,000 Feet” or “Everest”. Five years later, The New York Times featured yet another longform story about a disaster on a snowy mountain, just a different one (but by same author, John Branch). In those five years, “Snowfall” or “Snowfalled” became shorthand within The New York Times and outside it for adding interactive pizzaz to a story; after 2012, a debate raged not just at The Times but in other US and UK newsrooms as to whether data journalists should be spending their time building pre-built tools that could auto-Snowfall any story, or work on innovative one-off projects.7 Meanwhile, Snow Fall, minimally interactive at best in 2012, remained minimally interactive at best in its year-end 2017 form.

“But wait,” the erstwhile data journalist might proclaim “Snow Fall isn’t data journalism – maybe a fancy trick of some news app developers, but there’s no data in Snow Fall!” Herein lies the issue: maybe data journalists don’t think Snow Fall is data journalism, but why not? What is data journalism for if it is not to tell stories in new ways with new skills that take advantage of the best of the web?

Data journalism also cannot just be for maps or charts, either, nor does mapping or charting data give data journalism intellectual superiority over immersive digital journalism efforts. What can be mapped is mapped. Election mapping in the US aside, the ethical consequences of quantifying and visualizing the latest available data into clickable coherence needs critique. At its most routine, data journalism becomes the vegetables of visualization. This is particularly true given the move toward daily and evenly demand for data journalism projects. Perhaps it’s a new labor statistic, city cycling data, recycling rates, the results of an academic study, visualization because it can be visualized (and maybe, will get clicked on more). At worst, data journalism can oversimplify to the point of dehumanizing the subject of the data that their work is supposed to illuminate. Maps of migrants and their flows across Europe take on the form of interactive arrows or genderless person icons, as human geographer Paul Adams argues, digital news cartography has rendered the refugee crisis into a disembodied series of clickable actions, the very opposite of what it could as journalism to make unknown “refugees” empathetic and more than a number.8 Before mapping yet another social problem or academic study, data journalists need to ask: to what end are we mapping and charting (or charticle-ing for that matter)?

And somewhere between Snow Fall and migration maps lies the problem: What is data journalism for? The present provides mainly evidence of professionalization and isomorphism, with an edge of corporate incentive that data journalism is not just to aid news consumers with their understanding of the world but also to pad the bottom lines of news organizations. Surely that is not all data journalism can be.

The Future: How Data Journalism Can Reclaim its Worth (and Be Fun, too)

What is data journalism for? Data journalism needs to go back to its roots of change and revolution, of inspired hacking and experimentation, of a self-determined vision of renegades running through a tired and uninspired industry to force journalists to confront their presumed authority over knowledge, narrative, and distribution. Data journalists need to own up to their hacker inspiration and hack the newsroom as they once promised to do; they need to move past a focus on profit and professionalism within their newsrooms. Reclaiming outsider status will bring us closer to the essential offering that data journalism promised: a way to think about journalism differently, a way to present journalism differently, and a way to bring new kinds of thinkers and doers into the newsroom, and beyond that, a way to reinvigorate journalism.

In the future, I imagine data journalism as unshackled from the term “data” and instead focused on the word “journalism.” Data journalists presumably have skills that the rest of the newsroom or other journalists do not: the ability to understand complicated data or guide a computer to do this for them, the ability to visualize this data in a presumably meaningful way, and the ability to code. Data journalism, however, must become what I have called interactive journalism – data journalism needs to shed its vegetable impulse of map and chart cranking as well as its scorn of technologies and skills that are not data-intensive, such as 360 video, augmented reality, and animation. In my vision of the future, there will be a lot more of BBC’s “Secret Life of the Cat” interactives and New York Times’ Dialect Quizzes; there will be more projects that combine 360 video or VR with data, like Dataverse’s effort funded by the Journalism 360 immersive news initiative. There will be a lot less election mapping and cartography that illustrates the news of the day, reducing far-away casualties to clickable lines and flows. Hopefully, we will see the end of the new trend toward interactives showing live-time polling results, a new fetish of top news outlets in the US). Rather, there will be a lot more originality, fun, and inspired breaking of what journalism is supposed to look like and what it is supposed to do. Data journalism is for accountability, but it is also for fun and for the imagination; it gains its power not just because an MP might resign or a trendline becomes more clear, but also because ordinary people see the value of returning to news organizations and to journalists because journalists fill a variety of human information needs – for orientation, for entertainment, for community, and beyond.

And to really claim superior knowledge about data, data journalists intent on rendering data knowable and understandable need to collect this data on their own – data journalism is not just for churning out new visualizations of data gathered by someone else. At best, churning out someone else’s data makes the data-providers’ assumptions visible, at worst, data journalism becomes as stenographic as a press release for the data provider. Yet many data journalists do not have much interest in collecting their own data and find it outside the boundaries of their roles; as Washington Post data editor Steven Rich explained, in a tweet, the Post “and others should not have to collect and maintain databases that are no-brainers for the government to collect. This should not be our fucking job”.9 At the same time, however, the gun violence statistics Rich was frustrated by having to maintain are more empowering than he realized: embedded in government data are assumptions and decisions about what to collect that need sufficient inquiry and consideration. The data is not inert, but filled with presumptions about what facts matter. Journalists seeking to take control over the domain of facticity need to be able to explain why the facts are what they are, and in fact, the systematic production of fact is how journalists have claimed their epistemic authority for most of modern journalism.

What data journalists is for, then, is for so much more than it is now – it can be for fun, play, and experimentation. It can be for changing how stories get told and invite new ways of thinking about. But it also stands to play a vital role in re-establishing the case for journalism as truth-teller and fact-provider; in creating and knowing data, and being able to explain the process of observation and data collection that led to a fact, data journalism might well become a key line of defense about how professional journalists can and do gather facts better than any other occupation, institution, or ordinary person ever could.

Christina Niederer, Wolfgang Aigner, and Alexander Rind, ‘Survey on visualizing dynamic, weighted, and directed graphs in the context of data-driven journalism’ Proceedings of the International Summer School on Visual Computing, (2015), pp. 49-58.