Data before the (alternative) facts.

Ambiguity and uncertainty, cultural richness, idiosyncracy and subjectivity are par for the course when it comes to humanities research, but one mistake that can be made when we approach computational technologies that proffer access to greatly increased quanta of data, is to assume that these technologies are objective. They are not, and just like regular old fashioned analogue research, they are subject to subjectivities, prejudice, error, and can display or reflect the human prejudices of the user, the software technician, or the organisation responsible for designing the information architecture; for designing this supposedly “objective” machine or piece of software for everyday use by regular everyday humans.

In ‘Raw Data’ is an Oxymoron Daniel Rosenberg argues that data is rhetorical.[1] Christine Borgman elaborates on this stating that “Data are neither truth nor reality. They may be facts, sources of evidence, or principles of an argument that are used to assert truth or reality.”[2] But one person’s truth is, or can be, in certain cases, antithetical to another person’s reality. We saw an example of this in Trump aide Kellyanne Conway’s assertion that the Trump White House was offering “alternative facts” regarding the size of the crowd that attended Trump’s inauguration. Conway’s comments refer to the “alternative facts” offered by White House Press Secretary Sean Spicer, who performed an incredible feat of rhetorical manipulation (of data) by arguing that “this was the largest audience to ever witness an inauguration, period.”

While the phrase “alternative facts” may have been coined by Conway, it’s not necessarily that new as a concept;[3] at least not when it comes to data and how big data is, or can be, used and manipulated. So what exactly am I getting at? Large data can be manipulated and interpreted to reflect different users’ interpretations of “truth.” This means that data is vulnerable (in a sense). This is particularly applicable when it comes to the way we categorise data, and so it concerns the epistemological implications of architecture such as the “search” function, and the categories we decide best assist the user in translating the data into information, knowledge and the all-important wisdom (à la the “data, information, knowledge and wisdom” (DIKW) system). Johanna Drucker noted this back in 2011, and her observations are still being cited:

It is important to recognize that when we organize data into categories (according to population, gender, nation, etc.), these categories tend to be treated as if they were discrete and fixed, when in fact they are interpretive expressions (Drucker 2011).[4]

This correlates with Rosenberg’s idea of data as rhetorical, and also with Rita Raley’s argument that data is “performative.”[5] Like rhetoric and performance then, this means the material can be fashioned to reflect the intentions of the user, that it can be used to falsify truth, or to insist that perceived truths are in fact false; that news is “fake news.” Data provided the foundations for the arguments voiced by each side over the ratio of Trump inauguration attendees to Obama inauguration attendees, with each side insisting their interpretation of the data allowed them to present facts.

This is one of the major concerns surrounding big data research and Rosenthal uses it to draw a nice distinction between notions of falsity as they are presented in the sciences and the humanities respectively; with the first discipline maintaining a high awareness of what are referred to as “false-positive results”[6] whereas in the humanities, so-called “false-positives” or theories that are later disproven or refuted (insofar as it is possible to disprove a theory in the humanities) are not discarded, but rather incorporated into the critical canon, becoming part of the bibliography of critical works on a given topic: “So while scientific truth is falsifiable, truth in the humanities never really cancels out what came before.”[7]

But perhaps a solution of sorts is to be found in the realm of the visual and the evolution of the visual (as opposed to the linguistic, which is vulnerable to rhetorical manipulations) as a medium through which we can attempt to study of data. Rosenthal argues that visual experiences of data representations trigger different cognitive responses:

When confronted with the visual instead of the linguistic, our thought somehow becomes more innocent, less tempered by experience. If data is, as Rosenberg suggests, a rhetorical function that marks the end of the chain of analysis, then in data-driven literary criticism the visualization allows us to engage with that irreducible element in a way that language would not. As Edward R. Tufte (2001, 14), a statistician and author on data visualization, puts it, in almost Heidegerrian language, ‘Graphics reveal data.’[8]

So “Graphics reveal data,” and it was to graphics that people turned in response to Conway and Spicer’s assertions regarding the “alternative facts” surrounding Trump’s inauguration crowd. Arguably, these graphics (below) express the counterargument to Conway and Spicer’s assertions in a more innately understandable way than any linguistic rebuttal, irrespective of how eloquent that rebuttal may be.

But things don’t end there (and it’s also important to remember that visualisations can themselves be manipulated). Faced with graphic “alternative-alternative facts” in the aftermath of both his own assertions and Conway’s defense of these assertions, Spicer countered the visual sparsity of the graphic image by retroactively clarifying that this “largest audience” assertion incorporated material external to the datasets used by the news media to compile their figures, taking in viewers on Youtube, Facebook, and those watching over the internet the world over. The “largest audience to ever witness an inauguration, period” is, apparently, “unquestionable,” and it was obtained by manipulating data.