Citizen Technologist

Data Science and Expanding Our Sources of Ethical Inspiration

Recent public controversies regarding the collection, analysis, and publication of data sets about sensitive topics—from identity and sexuality to suicide and emotion—have helped push conversations around data ethics to the fore. In popular articles and emerging scholarly work (some of it supported by our backers at CTSP), scholars, practitioners and policymakers have begun to flesh out the longstanding conceptual and practical tensions expressed not only in the notion of “data ethics,” but in related categories such as “data science,” “big data,” and even plain old “data” itself.

Against this uncertain and controversial backdrop, what kind of ethical commitments might bind those who work with data—for example, researchers, analysts, and (of course) data scientists? One impulse might be to claim that the unprecedented size, scope, and attendant possibilities of so-called “big data” sets require a wholly new kind of ethics, one built with digital data’s particular affordances in mind from the start. Another impulse might be to suggest that even though “Big Data” seems new or even revolutionary, its ethical problems are not—after all, we’ve been dealing with issues like digital privacy for quite some time.

This tension between new technological capabilities and established ethical ideas recalls the “uniqueness debate” in computer ethics begun in the late 1970s. Following this precedent, we think that contemporary data ethics requires a bi-directional approach—that is, one that is critical of both “data” (and the tools, technologies, and humans that support its production, analysis, and dissemination) and “ethics” (the frameworks and ideas available for moral analysis). With this in mind, our project has been investigating the relevance and viability of historical codes of professional and research ethics in computing, information science, and data analytics context for thinking about data ethics today.

The question of data ethics itself isn’t just an abstract or academic problem. Codes of professional ethics can serve a range of valuable functions, such as educating professionals in a given field, instilling positive group norms, and setting a benchmark against which negative or unethical behavior may be censured. Moreover, the most effective or influential codes have not arisen in theoretical or conceptual vacuums; instead, as Jacob Metcalf writes, they represent “hard-won responses to major disruptions, especially medical and behavioral research scandals. Such disruptions re-open questions of responsibility, trust and institutional legitimacy, and thus call for codification of new social and political arrangements.”

The initial ambit of our research was to determine how these codes are lacking in light of contemporary technical and social developments, and to make concrete recommendations on how to change these codes for the better in light of our brave new ubiquitously digital world. The immediate challenge, however, is in identifying those codes that might be most relevant to the emerging, often contested, and frequently indistinct field of “data science.”

Metcalf notes three areas of applied ethics that ought to guide our thinking about data ethics. The first and perhaps most obvious of these areas is computing ethics, especially as represented by major ethics codes developed by groups like the Association of Computing Machinery (ACM) and the Institute of Electronic and Electrical Engineers (IEEE). The second area—bioethics—help show how ethical concerns for data scientists must also include attention to the ethics of conducting and disseminating research on or about human subjects, Finally, Metcalf turns his attention to journalism ethics and its strong professional commitment to professional identity and moral character. To this list, we’d also add ethical codes developed by statisticians as well as professional survey and market research associations.

However, if professional codes of ethics are often “hard-won responses to major disruptions,” we also argue that it is important to pay attention to the nature of the “disruption” in question. Doing so may point towards previously underappreciated or overlooked ethical domains—in this case, domains that might help us better come to terms with the rise of “big data” and an epistemological shift in how we produce, understand, and act on knowledge about the world.

But how do we identify additional domains of ethical consideration that might be relevant to data ethics, especially if they’re not immediately obvious or intuitive?

Through our initial thinking and research, we suggest that one answer to this question lies in the metaphors we use to talk about data science and “big data” today. As Sarah Watson writes in her essay “Data is the New _____,” metaphors have always been integral to “introducing new technologies in our everyday lives and finding ways to familiarize ourselves with the novelty.” Indeed, there has been a great deal of discussion around the language we use to talk about data today (Microsoft Research’s Social Media Collective has put together a useful reading list on the topic.)

From data “floods” to data “mining,” from data as “the new oil” to data as “nuclear waste,” our discussions of data today invoke a host of concerns that aren’t adequately captured in the usual host of ethical codes and professional domains.

For example, if much of data work involves “cleaning”—to that point that some professionals might even describe themselves, however snarkily, as data “janitors”—what might the professional ethical or other commitments around preservation, stewardship, or custodianship tell us about the role of data scientists today? Or, if data really is “toxic,” how might the ethics of those professionals that regularly work with nuclear or hazardous materials help inform our understanding of the ethical handling of data?

Discourses around cleanliness, toxicity, and other environmental metaphors around big data and data ethics is one of the chief emerging themes of our ongoing analysis of ethics codes. Categorizing and unpacking these conceptual frameworks will, we believe, help both scholars and technologies fundamentally rethink their relationship and responsibilities to ethical practice in computation, without needing to reinvent wheels or throwing babies out with bathwater.

We’ll be presenting our full findings publicly in the fall – so stay tuned!