I. Change

Theme 1: How are knowledge infrastructures
changing?

What does it mean to “know” in an age of
social networks, big data, interdisciplinary research, and new modes of access
to “bigger,” “wider,” “longer,” and “faster” information? How is knowledge now
being generated, maintained, revised, and spread?How are open data, web publication, and
commodity tools affecting concepts of expertise, processes of peer review, and
the quality of knowledge?

Building on extensive literatures in science &
technology studies, including previous work by members of this group (Edwards et al. 2007), Edw

ards (2010)
defined knowledge infrastructures as “robust networks of people, artifacts, and
institutions that generate, share, and maintain specific knowledge about the
human and natural worlds.” This framing aimed to capture routine,
well-functioning knowledge systems such as the world weather forecast infrastructure,
the Centers for Disease Control, or the Intergovernmental Panel on Climate
Change. Under this definition, knowledge infrastructures include individuals,
organizations, routines, shared norms, and practices.

Key to the infrastructure perspective is their modular,
multi-layered, rough-cut character. Infrastructures are not systems, in the
sense of fully coherent, deliberately engineered, end-to-end processes. Rather,
infrastructures are ecologies or complex adaptive systems; they consist of
numerous systems, each with unique origins and goals, which are made to
interoperate by means of standards, socket layers, social practices, norms, and
individual behaviors that smooth out the connections among them. This adaptive
process is continuous, as individual elements change and new ones are
introduced — and it is not necessarily always successful. The current situation
for knowledge infrastructures is characterized by rapid change in existing
systems and introduction of new ones, resulting in severe strains on those
elements with the greatest inertia.

The workshop concluded that at least the following phenomena
require sustained attention:

Knowledge in perpetual motion.
A transition is underway from what Weinberger (2012)
calls “knowledge as a series of stopping points” — printed journal articles,
books, textbooks, and other fixed products — to a world where knowledge is perpetually in motion. Today,
what we call “knowledge” is constantly being questioned, challenged, rethought,
and rewritten. As Weinberger describes the current situation, we face a
world of abundant information, hyperlinked ideas, permission-free resources, highly
public interaction, and massive, unresolved disagreement. Individual expertise
is (many argue) being replaced by the wisdom of crowds: noisy and endlessly
contentious, but also rich, diverse, and multi-skilled. In part, this means that
the divide between knowledge producers and knowledge consumers is increasingly
and radically blurred. In such a world, the missions of educational
institutions such as schools and colleges, research institutions such as
laboratories and universities, and memory institutions such as libraries,
archives, and museums are bleeding into each other more than ever before. New
forms of collective discovery and knowledge production, such as crowdsourced
encyclopedias, wikis of all sorts, shared scientific workflows, and citizen
science are springing up within and across many academic disciplines (De Roure et al. 2011; De Roure et al. 2010; Goble & De Roure 2007;
Shilton 2009; Shirky 2009, 2010; Takeda et al. 2013; Wade & Dirks 2009).
The quality and durability of knowledge produced by such efforts remains
uncertain, but their tremendous vigor and growing utility cannot be questioned.

Shifting
borders of tacit knowledge and common ground.Consider that the
study of knowledge by the social sciences and the humanities has been based on
the same premises now being challenged by emerging forms. For example, several
decades of scholarship in sociology and anthropology of knowledge established
the difficulty of communicating local practices and understandings without
face-to-face contact (H. M Collins 1985; H. M. Collins & Pinch 1993). The phrase “distance
matters” — because technology-mediated communication makes it more difficult to
establish common ground — became a watchword in computer-supported cooperative
work. Tacit knowledge and common ground were, and still are, regarded as major
stumbling blocks to long-distance collaboration (Olson & Olson 2000; Olson et al. 2009).
Yet an increasing amount of important knowledge work occurs under precisely
these conditions; both technology and human skills are evolving to meet the
challenge (Rosner 2012; Rosner et al. 2008; Vertesi 2012; Wiberg et al. 2012).
In a world of Skype, Google Hangouts,
Twitter, YouTube videos, and highly developed visualization techniques, the
roles of tacit knowledge and common ground are changing, and a renewal of our
understanding is required (Cummings et al. 2008).

Complexities
of sharing data across disciplines and domains. Excitement continues
to mount over new possibilities for sharing and “mining” data across scientific
disciplines. Vast data repositories are already available to anyone who cares
to use them, and many more are on the way. Yet data sharing begs urgent
questions (Borgman 2012).
In science, at least, the meaning of data is tightly dependent on a precise
understanding of how, where, and when they were created (Bechhofer et al. 2010; Burton & Jackson 2012; Gitelman 2013; Ribes
& Jackson 2013; Vertesi & Dourish 2011).
But the rapid “commodification” of data
— the presentation of datasets as complete, interchangeable products in readily
exchanged formats — may encourage misinterpretation, over reliance on weak or
suspect data sources, and “data arbitrage” based more on availability than on
quality .Will commodified data lead
to dangerous misunderstandings, in which scientists from one discipline
misinterpret or misuse data produced by another? How far can the
standardization of data and metadata be carried, and at what scale? What new
kinds of knowledge workers are needed to bridge the gaps, both technical and
social, among the many disciplines called on to address major scientific and
social issues such as climate change, biodiversity triage, or health care for
an aging population? Can the reputation systems of science be re-tuned to
recognize and compensate these vital, but too often invisible and unrewarded,
workers?

New norms for what counts as knowledge.Scientific data analysis
increasingly uncovers significant and useful patterns we cannot explain, while
simulation models too complex for any individual to grasp make robust
predictions (e.g., of weather and climate change). Will these phenomena add up, as some predict, to an “end of theory” (Anderson 2008)? The question of how to evaluate simulation
models — of whether they can be “validated” or “verified,” and whether they
require a fundamentally different epistemology than theory and experiment — had
already been puzzling both scientists and philosophers for several decades (Giere 1999; Heymann 2010; Jackson 2006; Morgan & Morrison 1999;
Norton & Suppe 2001; Oreskes et al. 1994; Petersen 2007; Sismondo 1999;
Sundberg 2009; 2010aa; 2010b; Suppe 2000).
Data-driven science poses a similar, even harder problem of evaluation. Do we “know” things if we cannot explain
why they are true? Whatever the case, norms for what can count as
“knowledge” are clearly changing (Anderson 2008; Hey et al. 2009).

Massive shifts in publishing
practices, linked to new modes of knowledge assessment.Historically,
knowledge institutions depended on costly, hierarchically organized forms of
credentialing, certification, and publishing. These set severe limits not only
on outputs (in the form of published articles, books, etc.), but also on who
could count as a valid participant in knowledge assessment practices such as
peer review. Today, these mechanisms are challenged on all fronts. Much less
costly modes of publication permit the early release and broad dissemination of
virtually all data and models used in science; one result is a broad-based
movement toward publication practices that permit results to be readily
reproduced, at least in the computational sciences (Stodden 2010a, 2010b, 2011).
Commodified data analysis tools and
widely available software skills permit a much larger number of participants to
analyze data and run models. Networked social forms permit many more
participants to comment publicly on knowledge products, bypassing traditional
credentialing and certification mechanisms (De Roure et al. 2011; De Roure et al. 2010; Kolata 2013).

Challenges to traditional
educational institutions. Both research universities and teaching
colleges face extraordinary challenges. For decades, costs to students have
risen faster than inflation, while Coursera, open courseware, and online
universities offer new, lower-cost alternatives. The majority of university
students no longer attend 4-year residence programs. Many of those who do appear
more motivated by the university as a rite of passage and a lifestyle than by
learning itself, as reflected in numerous measures of student learning and the
amount of time spent studying (Babcock & Marks 2010, 2011; Mokhtari et al. 2009).
Classroom teaching competes directly with online offerings; professors are no
longer seen as infallible experts, but as resources whose facts can be checked
in real time. As institutions, research universities display patent-seeking
behavior that makes them increasingly difficult to distinguish from
corporations, and indeed corporate sponsorship and values have penetrated
deeply into most universities. Some have been more effective than others at
building firewalls between sponsors’ interests and researchers to protect their
objectivity, but no institution is immune to these challenges (Borgman et al. 2008). K-12 education faces related, but different
challenges, as schools struggle to adapt teacher training, equipment, and
teaching methods to the screen-driven world most children now inhabit. Major benefits will accrue to institutions
and students that find effective ways to meet these challenges — and doing so
will require new visions of their place in larger infrastructures of knowledge,
from national science foundations to corporate laboratories to educating new
generations of researchers.

Navigating across scales of space
and time, and rates of change. Given the layered nature of
infrastructure, navigating among different scales — whether of time and space,
of human collectivities, or of data — represents a critical challenge for the
design, use, and maintenance of robust knowledge infrastructures. A single
knowledge infrastructure must often track and support fluid and potentially
competing or contradictory notions of knowledge. Often invisible, these notions
are embodied in the practices, policies, and values embraced by individuals,
technical systems, and institutions. For example, sustainable knowledge
infrastructures must somehow provide for the long-term preservation and
conservation of data, of knowledge, and of practices (Borgman 2007; Bowker 2000, 2005; Ribes & Finholt 2009). In the current
transformation, sustaining knowledge requires not only resource streams, but
also conceptual innovation and practical implementation. Both historical and contemporary studies are needed to investigate how
knowledge infrastructures form and change, how they break or obsolesce, and
what factors help them flourish and endure.

standards
and ontologies.A
quintessential tension surrounds the deployment of standards and ontologies in
knowledge infrastructures. Fundamentally, it consists in the opposition between the desire for universality and the need for
change.

Robust hypotheses require information in standardized
formats. Thus the spread of a particular disease around the world cannot be
tracked unless everyone is calling it the same thing. At the same time, medical
researchers frequently designate new diseases, thus unsettling the existing
order. For example, epidemiologists have sought to track the phenomenon of AIDS
to periods predating its formal naming in the 1980s (Grmek 1990; Harden 2012). However, using historical
medical records to do so has proven difficult because prior record-keeping
standards required the specification of a single cause of death, precluding
recognition of the more complex constellation of conditions that characterize
diseases such as AIDS.

How might one solve this problem (if it is solvable at all)?
One could review the old records and try to conjure them into modern forms.
This could work to an extent; some fields, such as climate science, routinely
investigate historical data before adjusting and re-standardizing them in
modern forms to deepen knowledge of past climates (Edwards 2010).
Yet this is possible largely because the number of records and their variety is
relatively limited. In many other fields such a procedure would be extremely
difficult and prohibitively expensive. Alternatively, one could introduce a new
classificatory principle, such as the Read Clinical Classification, which would
not permit that kind of error to propagate. Here too, due to the massive
inertia of the installed base, it would cost billions of dollars to make the
changeover. On top of that, it would complicate backward compatibility: every
new archival form challenges the old (Derrida 1996).
In practice, this adds up to very slow
updating of classification standards and ontologies, marked by occasional
tectonic shifts.

Today, hopes for massively distributed knowledge
infrastructures operating across multiple disciplines consistently run headlong
into this problem. Such infrastructures are vital to solving key issues of our
day: effective action on biodiversity loss or climate change depends on sharing
databases among disciplines with different, often incompatible ontologies. If
the world actually corresponded to the hopeful vision of data-sharing
proponents, one could simply treat each discipline’s outputs as an “object” in an
object-oriented database (to use a computing analogy). Discipline X could
simply plug discipline Y’s outputs into its own inputs. One could thus
capitalize on the virtues of object-orientation: it would not matter what
changed within the discipline, since the outputs would always be the same.
Unfortunately, this is unlikely — perhaps even impossible — for both
theoretical and practical reasons (Borgman et al. 2012).

An “object-oriented” solution to these incompatibilities is
theoretically improbable because the fundamental ontologies of disciplines
often change as those disciplines evolve. This is among the oldest results in the
history of science: Kuhn’s term “incommensurability” marks the fact that “mass”
in Newtonian physics means something fundamentally different from “mass” in
Einsteinian physics (Kuhn 1962).
If Kuhnian incommensurability complicates individual disciplines, it has even
larger impacts across disciplines. A crisis shook virology, for example, in the
1960s when it was discovered that “plant virus” and “animal virus” were not
mutually exclusive categories. Evolutionary biology suffered a similar, and
related, crisis when it was learned that some genes could jump between species
within a given genus, and even between species of different genera (Bowker 2005).
Suddenly, disciplines that previously had no need to communicate with each
other found that they had to do so, which then required them to adjust both
their classification standards and their underlying ontologies.

In practice, an object-oriented solution to ontological
incompatibilities is unlikely because we
have not yet developed a cadre of metadata workers who could effectively
address the issues, and we have not yet fully faced the implications of the
basic infrastructural problem of maintenance. We do know that it takes
enormous work to shift a database from one medium to another, let alone to
adjust its outputs and algorithms so that it can remain useful both to its home
discipline and to neighboring ones. Thus three results of today’s scramble to
post every available scrap of data online are, first, a plethora of “dirty”
data, whose quality may be impossible for other investigators to evaluate;
second, weak or nonexistent guarantees of long-term persistence for many data
sources; and finally, inconsistent metadata practices that may render reuse of
data impossible — despite their intent to do the opposite.

We expect our knowledge infrastructures to permit effective
action in the world; this is the whole impulse behind Pasteur’s Quadrant or Mode
II science (Gibbons et al. 1994; Jackson et al. 2013; Stokes 1997).
And yet, in general, scientific knowledge infrastructures have not been crafted
in such a way as to make this easy. What policymakers need and what scientists
find interesting are often too different — or, to put it another way, a yawning
gap of ontology and standards separates the two. Consider biodiversity
knowledge. In a complex series of overlapping and contradictory efforts,
taxonomists have been trying to produce accounts of how species are distributed
over the Earth. However, the species database of the Global Biodiversity
Information Facility, which attempts to federate the various efforts and is
explicitly intended for policy use, does not produce policy-relevant outputs (Slota & Bowker forthcoming).The maps of distribution are
not tied to topography (necessary to consider alternative proposals such as
protecting hotspots or creating corridors), they give single observations
(where what is needed is multiple observations over time, so one can see
trends), and for political reasons, they do not cover many parts of the planet
(which one needs in order to make effective global decisions). Similarly, in
the case of climate change, for decades the focus on “global climate” — an
abstraction relevant for science, but not for everyday life — has shaped
political discourse in ways that conflicted with the local, regional, and
national knowledge and concerns that matter most for virtually all social and
political units. Climate knowledge infrastructures have been built to produce
global knowledge, whereas the climate knowledge most needed for policymaking is
regional, culturally specific, and focused on adaptation (Hulme 2009).