Don’t Just Build It, They Probably Won’t Come

Here, Ruth Mostern gives some background to her article, “Don’t Just Build It, They Probably Won’t Come: Data Sharing and the Social Life of Data in the Historical Quantitative Social Sciences”. Her article appears in the October 2016 issue of IJHAC: A Journal of the Digital Humanities, a multi-disciplinary, peer-reviewed forum for research on all aspects of arts and humanities computing, publishing twice a year.

On two occasions in my career, I have become involved with projects to build large-scale data repositories for history and the humanities. These alluring projects seek to gather hundreds or thousands of specialists, each with a small-scale dataset about some aspect of the historical human experience, to upload content that collectively grows into a coherent global resource at the scale of big data.

My article in IJHAC: A Journal of the Digital Humanities is inspired by the two projects I have worked on, and it is based on research that I conducted through one of them, as a PI with the Collaborative for Historical Information and Analysis (CHIA), founded by the great world historian Patrick Manning at the University of Pittsburgh in 2008. The article, co-authored with my graduate student researcher Marieka Arksey, reviews the literature about content contribution to repository projects, it summarizes surveys of the CHIA community, and it includes a set of recommendations about how to foster participation in future such projects. The article regretfully concludes that CHIA, like many such projects before it, did not succeed in creating a viable collection of datasets that are useful to world historians and likely to improve their research outcomes.

CHIA was my second repository project, and part of my disappointment in its failure to reach critical mass is that the same thing happened to the first such venture I was involved with. In the late 1990s, I joined the staff of the Electronic Cultural Atlas Initiative (ECAI), founded by Professor Lewis Lancaster, an eminent professor of Buddhist Studies. Lew’s research tracked how the corpus of sacred Buddhist texts changed as the religion moved out of its South Asian heartland and spread through Asia and beyond.

A long-time digital innovator, he dreamed of contextualizing the historical geography of Buddhism with spatial data about the polities, cultures and climates through which the religion spread. He gathered dozens of experts on Asian historical and cultural geography. He enlisted pioneers in interactive mapping, digital librarianship, web design, and database management. The idea was to create an ever-expanding interactive digital atlas that would ultimately encompass all the world’s cultural experiences.

Each dataset – each sighting of a version of a Buddhist canon at a particular place and time, each route of pilgrimage – would necessarily be a tiny one, compiled by an expert and traditionally accessible only to other specialists in a small field. However, by bringing together hundreds and thousands of such datasets in one repository, and by creating map and timeline search and display tools, we would collate these abstruse materials into building blocks for an unparalleled, even revolutionary way to tell the human story in a way that was complex and simple at the same time. As a graduate student, ECAI changed my life.

What went wrong? Both CHIA and ECAI had a beautiful vision, both were led by charismatic and famous founders, both received grant funding, both brought together scholars with content and developers with infrastructure. However, both, like many similar projects, failed to gain traction.

My IJHAC article aims to figure out why not. In short, it concludes that many repository developers fail to account for data’s embeddedness in social life. To rectify this problem, repository developers should hire staff to solicit and curate data, they should collaborate with cohesive scholarly communities, they should identify forms of participation that do not require sharing of complete datasets, and they should incorporate peer review into the repository lifecycle.

Regrettably, in this domain, a great idea and a strong will to succeed do not seem to be enough.