III. Methods

What methods and techniques still work, and which new methods and techniques show promise — especially at scale? How should we organize future studies? What assumptions and practices should we abandon?

Considering knowledge infrastructures historically reveals what might otherwise appear as necessary features to be, instead, historical creations which could have followed other paths. For example, the nineteenth century witnessed a massive and global shift in modes of classification (to the genetic form, favored by Darwin) — and

yet this occurred without generalized academic conferences on classification. Instead, each discipline painfully learned the same lessons in isolation. In retrospect, the degree of redundant effort seems staggering. Clearly the same pattern is presently underway, with many disciplines each struggling to find its own path through the maze of related issues, including appropriate cyberinfrastructure, trustworthy and enduring institutions, data management practices, and handoffs with other disciplines. One role for scholars of knowledge infrastructures, then, might be to help decrease the amount of effort this struggle demands, for example by organizing and synthesizing collective conversations about how to shape infrastructures for knowledge work in the 21st century.

Workshop participants emphasized that we cannot remain simple bystanders to the current transformations. Nor should we be mere critics of the emerging inequalities and tensions described in Theme 2 of this report. Instead, our task is to co-design new infrastructures, and refashion old ones, with scientists, knowledge institutions, and policymakers as our partners. Put another way, we can play a key part in today’s grand challenge: to debalkanize scholarship by assembling a methodological repertoire that can match the geographic and temporal scale of emerging knowledge infrastructures. This is an exhilarating possibility. Imagine what might have happened if scholars of the 15th and 16th centuries could have experimented directly with the sociotechnical reconfigurations that accompanied the advent of the printing press — as we can do today.

Our call for methodological and collaborative innovation is best explained via an analogy in the natural sciences. Twenty years ago, the average ecologist worked on a patch of land no larger than a hectare, typically for a few months or a year, gathered data over a thirty-year career, published results, and then gradually lost the data. With the creation of the Long Term Ecological Research Network (LTER), the National Science Foundation began to change the nature of research. Today, at a number of sites nationally and in consonance with international projects, ecologists are able to look beyond the scale of a field and timeframe of a career: they now have the prospect of studying ecology and climate locally, nationally, globally, and over spans of time that more closely match those of ecological change.

How did this happen? In the last twenty years, new sensor grids have come to cover the oceans, land, sky and space. These technologies did not solve the question of scaling by themselves; instead, they posed new problems, as streams of data from extremely heterogeneous sources poured into the hands of scientists (Courain 1991; Edwards 2010; Hey et al. 2009). Standardizing data has proven to be a crucial activity in scaling up the sciences, but it is never easy and rarely, if ever, complete (Bowker 2000; Edwards et al. 2011; Gitelman 2013). While preservation has been recognized as an issue (Blue Ribbon Task Force on Sustainable Digital Preservation and Access 2010), no general response to long-term preservation of datasets exists in any branch of the sciences; instead, we find a conflicted field of partial solutions ranging from supercomputer centers to university libraries (Borgman 2007; Bowker 2005). Preserving the meaning of data is a human affair, requiring continuous curation. For these reasons, managing and preserving ecological data for the long term ultimately required new organizational forms. LTER represents the beginning, not the end, of that transformation.

We advocate a similar revolution in the study of knowledge infrastructures, using the lens of what Stewart Brand has called “the long now” (Bowker et al. 2010; Brand 1999; Ribes & Finholt 2009). The need for thinking in stretches of years to decades is quite apparent. Paul David’s classic study on the “productivity paradox” of computing showed that introducing computers into the workplace did not immediately yield the productivity gains promised. In fact, productivity declined for twenty years before moving upwards. The cause, he argued, was that it took about 20 years to “think” the new technology: to move from using the computer as a bad, very expensive typewriter to realizing the potentials of new ways of working, which could happen only after a substantial period of social, cultural, organizational and institutional adjustments (David 1990; Landauer 1995).

With the advent of the Internet, we changing our knowledge generation and expression procedures root and branch. Yet currently we remain bound to the book and article format and to the classic nineteenth century technology of files and folders. It took well over 200 years for printed books to acquire the intellectual armature we now consider intuitive (such as the index, table of contents, bibliography, footnotes, and generally agreed rules on plagiarism). Even page numbers were once an innovation. Infrastructure researchers need a form of analysis that is actually responsive to the scale, scope and rhythms of the changes we are studying. Yet we are caught in the same cycle as the early ecologists mentioned above: our projects for studying social change come in three to five year chunks, in projects usually limited to three to five sites. Unlike the quantitative social sciences, which have benefitted enormously from now-vast stores of accumulated demographic, economic, and polling data, the qualitative social sciences have accumulated relatively little data across the years — and particularly across sites of research or across researchers. We reinvent the wheel with each investigation.

How can the qualitative social sciences accumulate, compare, and share data? Potential solutions exist. We present seven interlocking steps to meet the challenges for the future of sociotechnical studies. Together these make up our vision for an institution supporting long-term and large-scale qualitative research:

Create and nourish mechanisms for large-scale, long-term research. We need to go beyond one-off projects to develop systems and standards for collecting, curating and using similar kinds of data, while simultaneously protecting subjects’ identities and interests. Similarly, we need to build mechanisms to build and nourish larger, far more persistent research teams than the short-term, project-by-project work currently (and historically) typical of qualitative research. Organized research efforts at the scale of NSF science & technology centers would be a start — and funder investments on that scale could provide a powerful signal of need and reward —but innovation at all institutional levels and across disciplines will also be critical.

Build interdisciplinary collaborations across natural and social sciences. Sociotechnical phenomena do not rest within the domain of a single discipline or research approach. For example, climate change is simultaneously a matter of individual action and state policy, of technological innovation and economic reorganization. It demands the participation of social science but stretches well beyond it, requiring collaboration with ecological, hydrological, and biological scientists. Integrated assessment modeling — a popular and powerful tool for studying climate change impacts and adaptation —desperately needs better, more constructive contributions from the qualitative disciplines (Beck 2010; Hulme 2009, 2010; Lahsen 2010; van der Sluijs et al. 2008). This insight is far from new, but the fruits of previous integrative efforts have been modest; real innovation in knowledge infrastructures is needed.

Develop comparative analysis techniques for studying large-scale, long-term data. Comparison across cases is among the most revealing qualitative research methods, encouraging the identification of crucial similarities and differences as well as enabling generalization. The key to comparison is sharing data across teams of investigators. This means investing in the creation of comparable data, i.e. data that are properly documented to facilitate sharing.

Create sustainable, shareable data archives. We must explore ways to federate the data collected over multiple investigative projects. Researchers need to publish their data alongside their articles, as they are in the natural sciences and economics today. Only in this way can researchers discern trends happening beyond their noses, long as these may be. Significant confidentiality issues exist, and should be addressed through the creation of new kinds of consent form and anonymization procedures). The Human Relations Area File and the Pew Research Center’s Internet and American Life Project are two of the few extant examples of shared qualitative, long-term data; these models should be emulated and extended.

Build better software for qualitative work. The infrastructure of knowledge infrastructures research has not kept up with the ambitions of this emerging area. Tools for collecting and organizing qualitative data remain tedious, fragile, and intended for small-scale efforts. As examples, consider NVivo and AtlasTI, the best-developed such tools. Although their current incarnations claim to support teams of researchers, anyone who has worked with them will know that the single-investigator paradigm continues to dominate their function; at best, each can support a handful of investigators working simultaneously. Each claims to support “large-scale” analysis, but quickly becomes unusable when handling more than a few hundred documents. The result is that even after decades of development, project after project continues to confront, and often to fail at, this challenge.

Integrate qualitative work with statistical techniques and social network analysis. The strengths of qualitative research (detailed, in-depth, meaning-oriented investigations) must be combined with those of quantitative and semi-quantitative approaches, such as social network analysis, whose strengths are scope and summation. This kind of integration has proven very powerful in the field of history (through the work of the Annales school, such as Fernand Braudel and Emmanuel le Roy Ladurie), yet it remains unusual. No software of which we are aware has effectively surmounted this challenge, though the “controversy mapping” tools under development at Bruno Latour’s Médialab (Sciences Po, Paris) show promise.

Imagine new forms of cyberscholarship. In the social sciences, we continue to use the computer as a glorified typewriter. Some remarkable experiments, often in conjunction with new media artists, have demonstrated new possibilities (see, for example, the multi-modal journal Vectors). However, these remain one-off ventures and generally suffer from the marked absence of funding for new forms of expression.

When we begin to actively scale up qualitative social science, we will have to deploy the data storage, visualization, hypertext, and collective-creation possibilities of the web and social media. Further, we must, as a community, develop new tools for textual analysis that match the availability of electronic data. The digital humanities are already making remarkable strides in this area.