The Digital Future is Now: A Call to Action for the Humanities

Abstract

The digital humanities are at a critical moment in the transition from a specialty area to a full-fledged community with a common set of methods, sources of evidence, and infrastructure — all of which are necessary for achieving academic recognition. As budgets are slashed and marginal programs are eliminated in the current economic crisis, only the most articulate and productive will survive. Digital collections are proliferating, but most remain difficult to use, and digital scholarship remains a backwater in most humanities departments with respect to hiring, promotion, and teaching practices. Only the scholars themselves are in a position to move the field forward. Experiences of the sciences in their initiatives for cyberinfrastructure and eScience offer valuable lessons. Information- and data-intensive, distributed, collaborative, and multi-disciplinary research is now the norm in the sciences, while remaining experimental in the humanities. Discussed here are six factors for comparison, selected for their implications for the future of digital scholarship in the humanities: publication practices, data, research methods, collaboration, incentives, and learning. Drawing upon lessons gleaned from these comparisons, humanities scholars are "called to action" with five questions to address as a community: What are data? What are the infrastructure requirements? Where are the social studies of digital humanities? What is the humanities laboratory of the 21st century? What is the value proposition for digital humanities in an era of declining budgets?

Introduction

This is a pivotal moment for the digital humanities. The community has laid a foundation of research methods, theory, practice, and scholarly conferences and journals. Can we seize this moment to make digital scholarship a leading force in humanities research? Or will the community fall behind, not-quite-there, among the many victims of the massive restructuring of higher education in the current economic crisis? Much is at stake in the community’s ability to argue for the value of digital humanities scholarship and to assemble the necessary resources for the field to move from "emergent" to "established."

The sciences, arts, and humanities have converged and diverged in various ways over the
centuries. In the area of digital scholarship, many interests are in common across the
disciplines. It is the pace of adoption that is divergent. The sciences, and to a lesser
extent the social sciences, have been successful in developing the technical, social, and
political infrastructure for digital scholarship under the rubrics of
cyberinfrastucture — the term used in the U.S., and eScience —
the term more widely used in the U.K. and elsewhere [U.K. Research Council e-Science Programme 2009]; [Atkins et al. 2003]. Digital scholarship remains emergent in the humanities, while eScience has become the norm in the sciences. The humanities need not emulate the sciences, but can learn useful lessons by studying the successes (and limitations) of cyberinfrastructure and eScience initiatives.

While leaving definitions of "the humanities" to the reader, two complementary
definitions of "digital humanities" provide a useful scope statement. Frischer’s
definition is "the application of information technology as an
aid to fulfill the humanities’ basic tasks of preserving, reconstructing, transmitting,
and interpreting the human record"
[Frischer 2009, 15].
One resulting from the UCLA Mellon seminar claims that "Digital
humanities is not a unified field but an array of convergent practices that explore a
universe in which print is no longer the exclusive or the normative medium in which
knowledge is produced and/or disseminated"
[Digital Humanities Manifesto 2008]. Taken together, the digital humanities is a new set of practices, using new sets of technologies, to address research problems of the discipline.

Problem Statement

Interest in the digital humanities has grown steadily for several decades. The Digital
Humanities Conferences have occurred annually since 1989, sponsored by the Alliance of
Digital Humanities Organizations. Constituent organizations of the Alliance have held
conferences since 1973 [Alliance of Digital Humanities Organizations 2009]. MITH (Maryland
Institute for Technology in the Humanities 2009)
celebrated its tenth anniversary, and IATH (Institute for Advanced
Technology in the Humanities 2009) at the University of Virginia its 17th anniversary. Academic research in
the digital humanities at UCLA, Duke, Stanford, King’s College London, and elsewhere also
appears to be thriving. Funding continues apace, with the Mellon Foundation, Council on
Library and Information Resources, National Endowment for the Humanities, U.K. Arts and
Humanities Research Council, and others focusing on infrastructure, tools, and services to
support humanities scholarship in digital environments. Yet digital scholarship remains a
backwater in much of the humanities. Concerns about publishing, tenure, and promotion for
digital humanities scholars are a continuing theme in the conferences and in the
literature of the field [Friedlander 2008]; [Friedlander 2009]; [Unsworth et al. 2006].

Despite many investments and years of development, basic infrastructure for the digital humanities is still lacking. Those who wish to gather and analyze digital data for humanities problems often find the overhead daunting, as exemplified by this emailed complaint from a history student in my scholarly communication course, who is pursuing a doctoral dissertation about the German enlightenment:

I’m finding that something as simple as constructing my maps of
related concepts are not easily applied to primary sources in digital libraries.
So what use are the digital libraries, if all they do is put digitally unusable information on the web? The digital libraries don’t offer a platform for traditional note taking, much less for larger scale analysis, either quantitative or qualitative.
(emphasis added; quoted with permission)

"Digital libraries," the term used by my student, usually
implies the existence of tools, services, and a library imprimatur of cataloging and
curation. Her complaint is more about digital collections, which often lack basic
capabilities for retrieval or analysis. This distinction is particularly relevant to the
digital humanities. Content in digital collections may be "relatively
raw," as [Lynch 2002] puts it; others can add layers of
interpretation, presentation, tools, and services, but these layers may be maintained
separately from the content [Borgman 1999]; [Borgman 2000];
[Lynch 2002]. The invisibility of essential infrastructure for digital scholarship in the humanities is but one of many challenges to be addressed in growing the field. Until analytical tools and services are more sophisticated, robust, transparent, and easy to use for the motivated humanities researcher, it will be difficult to attract a broad base of interest within the humanities community.

Whose problem is it to improve the situation — that is, to design, develop, and deploy the
scholarly infrastructure for digital humanities? As my UCLA colleague, Johanna Drucker,
put it so well, "Leaving it to 'them' is unfair, wrongheaded,
and irresponsible. Them is us"
[Drucker 2009, B8]. She believes
that the digital humanities are at a "critical juncture," and is concerned that her fellow scholars are deferring responsibility for action to librarians, computer scientists, technology developers, publishers, and others.

The operant terms in "digital humanities scholarship" are the latter two. Scholarly
methods are as deeply seated in the humanities as they are in the sciences [Borgman 2007]. Only those who do the work and who require the infrastructure are in a position to take the field forward. Librarians and technology developers are essential partners, but those who conduct the research must take the lead.

This article, based on a keynote presentation to the most recent Digital Humanities
Conference, reviews and reflects upon the differences between the approaches of the
sciences and the humanities to digital scholarship [Borgman 2009]. First, I frame the notion of scholarly information infrastructure, then compare the approaches to digital scholarship of the sciences and the humanities. My analysis concludes with a call to action for the humanities community.

Scholarly Information Infrastructure

The term scholarly information infrastructure encompasses the technology, services,
practices, and policy that support research in all disciplines. Cyberinfrastructure and
eScience — both coined initially in reference to the sciences and technology, and both now
used more broadly — refer to an infrastructure that enables forms of scholarship that are
information- and data-intensive, distributed, collaborative, and multi-disciplinary.
eResearch has become the collective term for variants such as eScience, eSocial Science,
and eHumanities [Borgman 2007]. The report of the Commission on Cyberinfrastructure
for the Humanities and Social Sciences
[Unsworth et al. 2006] was modeled on the
strategy for science and technology [Atkins et al. 2003], while diverging to
emphasize the humanities’ motivations to make cultural heritage more widely available for
teaching, research, and outreach. A similar argument is made by Todd Presner and Chris
Johanson that digital humanities offers the opportunity to reconceptualize society as our cultural heritage migrates to digital formats, thus altering our relationship to knowledge and culture.

The technical and policy infrastructure for scholarship is being built rapidly,
particularly for the sciences [National Science Foundation 2007]; [Hey et al. 2009]. Rare are the encompassing visions for scholarly
infrastructure that originate in the humanities. Amy Friedlander provides a
notable exception [Friedlander 2009]. She identified four research areas in digital scholarship where the interests of humanists, technology researchers, and others converge. These are scale, language and communication, space and time, and social networking. Issues of scale are of general interest because methods and problems must be approached much differently when one has, for example, the full text of a million books rather than a handful. Inspection is no longer feasible; only computational methods can examine corpora on that scale. Issues of language and communication, which are central to the humanities, are of broader interest for problems such as pattern detection and cross-language indexing and retrieval. Space and time encompass the new research methods possible with geographic information systems, geo-tagged documents and images, and the increased ability to make temporal comparisons. Social network analysis, long popular in sociology and bibliometrics, has become generalized to include patterns of social relationships in older texts or in online communication. Cross-cutting agendas such as these can be very influential in the design of an encompassing infrastructure. Humanities researchers need to be at the table as fundamental infrastructure decisions are being made.

Science [and/or/versus] the Humanities

The humanities and the sciences each encompass broad swaths of scholarship, with much
internal diversity. These two communities have significant commonalities, while differing
in important ways. Identified here are six factors for comparison, selected for their
implications for the future of digital scholarship in the humanities: publication
practices, data, research methods, collaboration, incentives, and learning. The first five
of these are drawn from longer analyses published elsewhere [Borgman 2007];
the last is drawn from the NSF Task Force on Cyberlearning
[Borgman et al. 2008]. The sequence of topics is cumulative to reflect how the boundaries are blurring between the sciences and the humanities.

Publication Practices

Scholarly journal publication is shifting rapidly toward electronic formats, especially
in the sciences. Some journals are dropping print publication altogether; others are
declaring the online version (usually released several weeks to several months prior to
the printed edition) to be the edition of record. Under pressure from authors, the
majority of scholarly journals now appear to allow online posting of some form of
pre-print or post-print [SHERPA/RoMEO 2009].

For physics and related areas of computer science and mathematics, arXiv is the locus
of scholarly communication. Monthly deposits of new papers now number more than 5,000;
the site, which contains over 500,000 papers, typically receives 50,000 visits per hour
[ArXiv.org 2009]. At least three iPhone applications are
available for arXiv searching and retrieval. ArXiv, similar repositories in fields such
as economics, and institutional repositories such as ePrints, employ standard data
structures that make their contents readily discoverable by search engines [Open Archives 2009]; [Research Papers in Economics 2009]; [EPrints 2010]. It is little wonder that our science colleagues claim they never go to their campus libraries any more; their libraries come to them.

In the humanities, neither journal nor book publishing has moved rapidly toward online
publication, despite pioneering efforts such as the 1990 launch of the Journal of Post Modern Culture as an electronic-only journal and
the 2005 launch of Vectors as an online-only multi-media
journal [Journal of Post Modern Culture 2000]; [Vectors 2009]; [Hamma 2009]; [King et al. 2006]; [Whalen 2009].
A few of the established humanities journals have begun online versions that take advantage of digital technologies. Beginning in March 2010, for example, the JSAH (Journal of the Society of Architectural Historians) will publish an online version that will support "zoomable images, video, GIS map integration, Adobe Flash VR, 3-D models, and online reference linking" — while continuing to publish its static print version.

The reasons for the slow adoption of digital publishing in the humanities are many,
from not trusting online dissemination to a general reluctance to experiment with new
technologies, even those well proven — "professionally indisposed to
change" as Ken Hamma puts it [Hamma 2009]. Monographic publishing, which is
core to humanities scholarship, has begun a seismic shift toward digital publishing
[Jaschik 2008]; [Jaschik 2009]; [Poe 2001];
[Willinsky 2006]; [Willinsky 2009]. A growing number of
university presses are offering online access to monographs they publish in print,
whether or not they also offer digital-only or print-on-demand formats. Other university
presses are reinventing themselves in digital form [Rice University 2008]. The
University of California Press recently announced a partnership with the California
Digital Library, which hosts the university’s institutional repository, to offer
"a suite of publishing services robust and flexible enough to
support the complexities of content, format, and dissemination that increasingly
define scholarly communications"
[University of California Publishing Services 2009].

The "love affair with print"
[Whalen 2009] of art historians and other humanities
scholars places not only "traditional" humanities scholarship at risk but also that
of digital humanities. The distinction between print and digital publication is as much
about epistemology as genre. Digital publishing is not simply repackaging a book or
article as a computer file, although even a searchable pdf has advantages over paper. By
incorporating dynamic multi-media or hypermedia, digital publishing offers different
ways of expressing ideas and of presenting evidence for those ideas [Lynch 2002]; [Presner 2010]; [Presner & Johanson 2009]. When digital scholarship is published in print venues, much of its sophistication is lost.

Digital publishing differs from print publishing in several ways. One is the shorter time from submission to publication. While speed of publication is a much greater concern in the sciences than in the humanities, much of that time delay involves the physical production of the journal or book. Reviewing time varies little between print and digital formats. The humanities could benefit from faster turnaround, reaching audiences much sooner.

A second advantage of digital publishing — even more critical — is the larger audience
for online publications. Anyone with an online connection and a subscription (in the
case of fee-paid content), anywhere in the world, can read digital publications. Only
those with access to a physical copy can read print-only publications. The number of
titles and the number of copies of scholarly books and journals published in print form
are decreasing rapidly, thus limiting both publishing outlets and readership. Maureen
Whalen’s concern for art history, with its continuing reliance on print publishing, is
that "the voices of authority ... will be talking amongst
themselves"
[Whalen 2009].

Two other consequences of the inexorable shift toward digital publication should be of
concern to the humanities. One is that print material — including older material —
becomes "widowed" as students and scholars alike search only online. The widowing
problem was recognized early in the days of online catalogs, and was a major impetus for
research libraries to digitize their entire back catalogs rather than only records of
new material [Borgman 2000]; [Lynch 2003]; [Lynch & Garcia-Molina 1995].

While the details of these studies are much contested between authors, editors,
librarians, and publishers, the simple tautology that easier discovery is associated
with higher citation is difficult to dispute. As do authors in other fields, scholars in
the humanities desire recognition in the form of citations to their work. Universities
consider citation metrics in hiring and promotion decisions, despite known problems in
their use for evaluating scholarly productivity [Bollen & Van de Sompel 2008]; [Kurtz & Bollen 2010]; [Monastersky 2005]; [Reedijk & Moed 2008].

In sum, the sciences have benefited from online publication in ways that the humanities have not (yet). Digital publication is faster, reaches a wider audience, and tends to increase the citation rate over print-only publication. As the proportion of print-only publication continues to decrease, those for whom it is their only venue risk reaching an ever smaller and more closed community with their scholarship. Curation of digital objects is a concern in all fields, and is a topic that has the attention of management in libraries and archives. Nonetheless, digital publication has become the norm, and those who cling to print publication as the only acceptable format for promotion and tenure may be left out of the academic mainstream.

Data in Digital Scholarship

Central to the notion of cyberinfrastructure and eScience is that "data" have become
essential scholarly objects to be captured, mined, used, and reused. This trend has
been under way in science for many years, to varying degrees by field. As the technical and communications infrastructure became sufficiently robust to support large-scale data analysis and exchange, data became more valuable commodities. The availability of large volumes of data has enabled scientists to ask new questions, in new ways. Environmental scientists can conduct longitudinal analyses and make comparisons between locales using datasets compiled from multiple sources. Similarly, genome data offer analytical power at much finer granularity, and at larger scales.

While "data" is a less familiar terminology in the humanities, the availability of large
text, image, audio, and multi-media corpora has a similar result, enabling scholars in
multiple fields to interrogate sources in new ways [Crane et al. 2007]. Judging by presentations at the 2009 Digital Humanities Conference,
data is becoming a popular term, whether framed in terms of "mining"
or "cultural analytics." Data mining "is the process of
identifying patterns in large sets of data . . . to uncover previously unknown, useful
knowledge"
[National Centre for Text Mining 2009]. Cultural
analytics is a term that arose in the humanities as an analog to "visual
analytics,"
"business analytics," and "web analytics," and includes the
use of "computer-based techniques for quantitative analysis and
interactive visualization" to identify patterns in large cultural data sets
[Manovich 2009].

What Are Data?

The increasing value of data begs the question of "what are data?" Definitions
associated with archival information systems offer a useful starting point: "A reinterpretable representation of information in a formalized manner
suitable for communication, interpretation, or processing. Examples of data include a
sequence of bits, a table of numbers, the characters on a page, the recording of
sounds made by a person speaking, or a moon rock specimen"
[Consultative Committee for Space Data Systems 2002, 1–9].

Another way to think about data is by origin. In the context of cyberinfrastructure,
the four categories of data identified in an influential U.S. policy report
Long-Lived Data Collections 2005, and incorporated in National Science
Foundation strategy Cyberinfrastructure Vision for 21st Century Discovery 2007, are now widely accepted. Observational data include weather measurements and attitude surveys, either of which may be associated with specific places and times or may involve multiple places and times (e.g., cross-sectional, longitudinal studies). Computational data result from executing a computer model or simulation, whether for physics or cultural virtual reality. Replicating the model or simulation in the future may require extensive documentation of the hardware, software, and input data. In some cases, only the output of the model might be preserved. Experimental data include results from laboratory studies such as measurements of chemical reactions or from field experiments such as controlled behavioral studies. Whether sufficient data and documentation to reproduce the experiment are kept varies by the cost and reproducibility of the experiment. Records of government, business, and public and private life also yield useful data for scientific, social scientific, and humanistic research.

Data as Evidence

The need to address categories and levels of data is a pragmatic concern for managing
information. Yet data are often in the eye of the beholder. In Buckland’s terms, data
are "alleged evidence"
[Buckland 1991]. What counts as good data varies widely, as
one person’s noise is often another person’s signal. Similarly, the choices of data
depend heavily on the questions being asked [Scheiner 2004].

Whether any given set of observation or records can be considered data depends on
context, even in the sciences. In our research on science and technology researchers in
the environmental sciences, we found differing views of data on concepts as basic as
temperature. Some of the computer science and engineering researchers interviewed said, roughly, "temperature is temperature," whereas biologists gave much more
nuanced descriptions of how temperature was measured: "
'There are hundreds of ways to measure temperature. 'The
temperature is 98' is low-value compared to, 'the temperature of the surface,
measured by the infrared thermopile, model number XYZ, is 98.' That means it is
measuring a proxy for a temperature, rather than being in contact with a probe, and
it is measuring from a distance. The accuracy is plus or minus .05 of a degree. I
[also] want to know that it was taken outside versus inside a controlled
environment, how long it had been in place, and the last time it was calibrated,
which might tell me whether it has drifted…'
"
[Borgman et al. 2007]. Thus these two groups of researchers, often working side-by-side in the field as collaborators, had very different perspectives on what were acceptable data for their evidentiary purposes.

Studies of scientific practice, such as our work in embedded sensor networks, is
providing insights for the design of cyberinfrastructure and eScience. The social
studies of science and technology is a large and burgeoning field, with multiple
journals and book series, and a scholarly society established more than 40 years ago
[Van House 2004]. No comparable body of research on scholarly practices in the
humanities exists, with the exception of research on information-seeking behavior
[Anderson 2004]; [Bates 1996a]; [Bates 1996b]; [Bates et al. 1993]; [Bates et al. 1995]; [Case 2006]; [Siegfried et al. 1993]; [Stone 1982]; [Tibbo 2003]; [Wiberley 2003]; [Wiberley & Jones 1994]. Lacking an external perspective, humanities scholars need to be
particularly attentive to unstated assumptions about their data, sources of evidence,
and epistemology. We are only beginning to understand what constitute data in the
humanities, let alone how data differ from scholar to scholar and from author to reader.
As Allen Renear remarked, "In the humanities, one person’s data is
another’s theory"
(personal communication, June 22, 2009).

Data Sources

The sciences and humanities differ greatly in their sources of data and the degree of
control they have over those data [Borgman 2007]. Scientific data sources vary by discipline, as seen in these few examples:

Scientists, generally speaking, use data that were created by and for scientific purposes. They usually generate their own data, as in field observations or laboratory studies, or may acquire data from collaborators or other scientists. They may also acquire data from repositories in their field or from government sites, such as records of rainfall or river flow. Scientific documentation such as laboratory and field notebooks is sometimes considered to be data and sometimes metadata.

The social sciences occupy the middle position between the sciences and humanities on a
continuum of data sources and control. Those at the scientific end of the scale gather
their own observations, whether opinion polls, surveys, interviews, or field studies;
build models of human behavior; and conduct experiments in the laboratory or field.
Other social scientists rely on records collected by others, such as economic indicators
or demographic data from the census. Government and corporate records are often of
interest, as are the mass media. A number of important data repositories exist,
especially for large social surveys (e.g., [Survey Research Center 2009a]; [Survey Research Center 2009b]; [UK Data Archive 2009]).

The humanities and arts are the least likely of the disciplines to generate their own data in the forms of observations, models, or experiments. Humanities scholars rely most heavily on records, whether newspapers, photographs, letters, diaries, books, articles; records of birth, death, marriage; records found in churches, courts, schools, and colleges; or maps. Any record of human experience can be a data source to a humanities scholar. Many of those sources are public while others are private. Cultural records may be found in libraries, archives, museums, or government agencies, under a complex mix of access rules. Some records are embargoed for a century or more. Some may be viewable only on site, whether in print or digital form. Data sources for humanities scholarship are growing in number and in variety, especially as more records are digitized and made available to the public.

Lynch’s dichotomy of raw material vs. interpretation has
a number of implications for the digital humanities. Two are of concern here. One is that raw materials are more likely to be curated for the long term than are scholars’ interpretations of those materials. It is the nature of the humanities that sources are reinterpreted continually; what is new is the necessity of making explicit decisions about what survives for migration to new systems and formats. Second is the implication for control of intellectual property. Generally speaking, humanities scholars have far less control over the intellectual property rights of their sources — these raw materials — than do scientists, whose data usually are original observations or specimens. Typically, scholars can read, view, and cite cultural records, but often need explicit permission to reproduce them — and frequently need to pay a fee, especially in the case of images, to include them in reports of their research.

Intellectual property constraints on publishing of digital humanities scholarship are
much different than those that usually apply in other disciplines. Rights to reproduce
material remain closely tied to a print model, specified by number of copies printed and
by temporal rules on sale that are irrelevant to online publication. Even cultural
institutions as sophisticated as the Getty Trust encounter structural barriers to online
publication of humanities scholarship [Whalen 2009]. The policy shift
toward data sharing, well under way in the sciences, generally presumes that those who
produce the data have the authority to release or deposit them for reuse [OECD 2007]; [Arzberger et al. 2004].

In sum, "what are data?" is an important question for the humanities. The answer
will determine what data are produced, how they are captured, and how they are curated
for reuse. Data sharing in the humanities is a complex set of issues — not that they
are simple in the sciences — that must be addressed. The humanities community needs a
critical mass of digital resources and needs common tools, services, and repositories if
they are to move beyond "boutique projects"
[Friedlander 2009] to a solid foundation of theory and method.

Research Methods

Questions of "what are data?" are inextricable from the choice of research method.
Many of the sciences, especially those "big science" areas that require large scale
instrumentation and produce vast volumes of data, are in transition to a data-driven
paradigm [Bell et al. 2009]; [Foster 2009]. As the analysis, modeling, and merging of data become more central to scientific research, partnerships between scientists and computer scientists are becoming the norm.

An important case example of the changing role of data in science is the Sloan Digital
Sky Survey, begun in 1992 by Jim Gray, Alex
Szalay, and others [Gray et al. 2005]; [Gray & Szalay 2002]; [Szalay 2008]. It was the first major astronomical survey founded on the premise that the resulting data would be openly and freely available, both to the astronomy community and to the public at large. Not only did astronomers mine the Sloan datasets for research purposes — more than 1700 scholarly papers were published — but manifold more users of these data were students and amateur astronomers. Amateurs, whose backyard telescopes could never yield data of such quality, also made important discoveries.

The Sloan Digital Sky Survey is significant for its openness, research productivity,
and community engagement, and because it instantiates the "value chain" of scholarship
[Borgman 2007]. On the SDSS site, papers are linked to the datasets on which
they are based and datasets are linked to papers about them. One can enter the chain
from either point and follow the relationships. While the project has ceased collecting
new observations, the Sloan data remain available for use and are a canonical experiment
in curation of large-scale datasets [Choudhury et al. 2008]; [Choudhury & Stinson 2007]. Astronomers and computer scientists are now engaged in the next
generation project, Panoramic Survey Telescope and Rapid Response System, which is
yielding about twenty times as much data as Sloan [PAN-STARRS 2009].

Humanities scholars are more likely to find their data sources in the library — their traditional laboratory — than in the skies. While the library continues to be more central to scholarship in the humanities than it is to other fields, the characteristics of that relationship are changing. The use of physical space and of library staff has changed radically in the last two decades, largely in response to flat or declining university library budgets. Campus libraries have been consolidated in efforts to minimize the number of public service points to be staffed. Books, journals, and other physical materials have been moved to remote facilities, paged from the stacks upon request. Professional librarians, while a smaller proportion of library staffs, are turning their attention away from collection building — given the budget crises — and toward making the best use of the materials they have. The sciences are placing less demand on the physical library, allowing university libraries to reconfigure their spaces to benefit faculty and students in the humanities. Prime floor space previously devoted to card catalogs, journals, and book stacks is now available for groups to work together with physical and digital resources. More librarians have backgrounds in the humanities than in the sciences, and many are eager to partner with humanities scholars in building better tools and services for discovering, interpreting, and using scholarly content.

At most universities today, humanities scholars and students are the primary constituency for physical books, journals, and records. This community also makes the finest distinctions among editions, printings, and other variants — distinctions that are sometimes overlooked in the transition from print to digital form. For general reading, any edition may suffice, and some degradation in image quality may be an acceptable tradeoff for access to large corpora of books and journals. Scholars are much more dependent on metadata to identify and compare variants, and may require physical copies to examine characteristics of printing and paper, annotations, and other details.

Differences in the methods of using print and digital objects are being thrown into
sharp relief by mass digitization projects, most recently by the intense public debate
over Google’s book-scanning project. Concerns include not only the quality of scanning
and of metadata, but the possibility that libraries will discard physical copies of
books for which scans are available [University of California 2009]; [Duguid 2007]; [Nunberg 2009]; [Samuelson 2009]. Also lost in most
of these discussions is the distinction between scanning for search and access purposes
(the Google approach) and scanning for preservation purposes, which has higher standards
for image quality and for metadata [NINCH 2002]; [Mass Digitization 2006]; [Greenstein et al. 2004].

Digital humanities projects have yet to achieve the scale of data, audience, or
participation as the Sloan Digital Sky Survey. However, several long-lived digital
humanities projects have made important contributions to research methods and data
quality. Perseus is usually considered the first digital library in the humanities,
with planning begun in 1985 and services available by 1987 [Perseus Digital Library 2009]; [Crane et al. 2001]; [Marchionini & Crane 1994]. The initial collections
of Perseus cover the history, literature, and culture of the Greco-Roman world. They
have since expanded into other areas, and conducted significant research on the
classification, management, and use of visual and textual materials [Crane 2006]; [Mahoney 2002]; [Smith et al. 2002]. Rome Reborn, begun in 1997 at UCLA, was
first concerned with digital library problems such as metadata, organization of
historical and architectural periods, and representing relationships between textual
sources and visual models [Rome Reborn 2009];
[Frischer 2009]. Now the
system exists in multiple manifestations, supports three-dimensional
"fly-throughs," audio typical of the time period (including spoken Latin), and
gladiator fights in the amphitheater using the latest computer graphics technology.
Perseus, Rome Reborn, and newer projects such as HyperCities integrate map layers from
Google Earth and other sources, which broadens their scope, audience, and
interoperability with other components of the scholarly information infrastructure [HyperCities 2009]; [Presner 2010].

In sum, choices of data sources, research methods, and research problems are
inextricably linked. Research methods in the sciences and in the humanities are becoming
more data-driven. The key to "better" data — that is, data suitable for curation, reuse, and sharing — is capturing data as cleanly as possible and as early as possible in its life cycle. Agreements about data sources, structures, and formats will further the development of information infrastructure for digital humanities scholarship.

Collaboration

The size of collaborations is increasing in all fields, as measured by the number of
co-authors on papers, and at the fastest rate in the sciences [Cronin 2005]. In
sciences that rely heavily on instrumentation, such as astronomical observatories and particle accelerators, collaborations are large, diverse, and essential. Sciences that are more inductive and are conducted in field settings, such as habitat biology, tend to work in smaller groups. Sciences of all sizes are grappling with data management issues, as data are the glue — and often the product — of collaboration.

As noted above, the new forms of scholarship characterized by eResearch are information- and data-intensive, distributed, collaborative, and multi-disciplinary. Collaborations, when effective, produce new knowledge that is greater than the sum of what the participating individuals could accomplish alone. In fields where collaboration is the norm, graduate students learn teamwork, whether in the laboratory, the field, or in group work on data collection and analysis. Science dissertations frequently are carved out of larger group projects, with the student identifying a research problem worthy of sustained investigation. Funding agencies in the sciences consider dissertations to be important products of awards to faculty investigators. Dissertations and theses are listed explicitly in National Science Foundation annual reports, for example.

While the digital humanities are increasingly collaborative, elsewhere in the humanities the image of the "lone scholar" spending months or years alone in dusty archives, followed years later by the completion of a dissertation or monograph, still obtains. Students often are discouraged from conducting dissertation research under a faculty grant. Instead, they are expected to spend yet more time identifying funding for solo research. When one is groomed to work alone and does so for the years required to complete the doctorate, collaborative practices do not come easily.

Friedlander argues that for digital humanities to thrive, "one component must be a set
of organizational topics and questions that do not bind research into legacy categories
and do invite interesting collaborations that will allow for creative
cross-fertilization of ideas and techniques and then spur new questions to be pursued by
colleagues and students"
[Friedlander 2009, 6]. As she
suggests, the digital humanities need to move beyond large numbers of small,
uncoordinated projects. Collaborative projects attract more resources and more
attention. If properly designed, they also may be more sustainable, creating platforms
on which new projects can be constructed. The plethora of boutique digital humanities
projects risks the same fate as most digital learning objects. While intended for
general use, they lack a common technical platform, common data structures, and common means to
aggregate or decompose modules to a useful level of granularity [Borgman et al. 2008].

Scholarly collaboration is much studied but little understood. Among the predictors of
success are the ability to achieve a common vocabulary and shared knowledge
[Kanfer et al. 2000]; [Olson & Olson 2000]. The more disciplines involved,
the more effort is required to achieve common ground. Investments must be made in
learning enough about each other’s disciplines that at least a pidgin language is
established [Galison 1997]. Relationships take time, and must be nurtured. One
important measure of success, and a worthwhile goal in eResearch, is that papers
suitable for publication in each of the participating disciplines arise from a joint
project. The recent multi-national, multi-disciplinary, multi-year funding awards for
innovative uses of data included several humanities-computer science partnerships [ODH 2009]. Virtual Vellum, for example, applies advanced computational
methods to explore authorship of 15th century manuscripts [Ainsworth 2009]. The results are likely to advance the state of optical character recognition and other computing techniques with broad application.

In sum, the digital humanities community could benefit from more collaborative partnerships within the field and between the humanities and disciplines such as computer science. Collaboration requires investment in listening skills, always being alert to nuanced differences in assumptions, theories, definitions, and methods. Lessons and skills learned from these partnerships can enhance the scholarship of all participants. Common technology platforms also are important to achieve interoperability and sustainability, and can be leveraged as investments across projects.

Incentives to Participate

Constructing a critical mass of data sources for scholarship in any field presumes that people will share the products of their research. Because data and collaboration are so central to the methods of digital scholarship, data sharing is an important indicator of success for eResearch, although practices are somewhat different in the sciences and in the humanities.

The public nature of scholarship has deep roots. Notions of "open science" date
back at least to Francis Bacon, with scientific findings being accepted only after peer
review. Scholars’ incentives to share their results include recognition and acceptance
of their work, which in turn drives hiring and promotion. In the sciences, authors may
be required to release data as a condition of publishing the papers on which they are
based. Funding agencies also are becoming more assertive about the release of data that
result from grants. However, publishing data is a far less mature practice than is
publishing books and articles. Releasing a major dataset rarely brings as much
recognition as releasing a major paper or book, but that balance is shifting, at least
in the sciences [Borgman 2007]; [Hey et al. 2009].

Scholars compete as well as collaborate, and thus have reasons not to share their data
sources. The following are disincentives that apply to all disciplines, albeit to
varying degrees [Borgman 2007]: (1) Faculty get more rewards for publishing papers and books than for releasing data; (2) the effort of individuals to document their data for use by others is much greater than the effort required to document them only for use by themselves and their research team; (3) data and sources offer a competitive advantage and are essential to establishing the priority of claims; and (4) data are often viewed as one’s own intellectual property to be controlled, whether or not the data (or their sources) are legally owned. Means exist to address each of these concerns, but all are complex responses to a complex environment.

The first disincentive is the most universal across disciplines. The sciences and
medicine are under the greatest pressure to release their data. In these disciplines the
reward structure is adapting, and repositories and data structures exist. While humanities scholars are under less pressure to release their data and sources, they are contributing models, modules, and tools to participatory projects and shared collections.

Data documentation is an issue in all fields, but as the volume of data increases, consistent documentation becomes progressively more necessary. Once data are captured cleanly, sharing them later becomes less of a problem. Humanities scholars are acutely aware of the importance of metadata and finding aids in discovering sources. Metadata are equally important for data curation. Scholars understand the roles that documentation must play, while librarians and archivists have the expertise in documentation standards, practices, and technologies. Data documentation is thus an obvious area of partnership for humanities scholars and information professionals, together addressing the requirements for sustainability of research products.

The third disincentive — competitive advantage — is often addressed in the sciences
through embargoes, whereby the investigators have a set period of time (from a few
months to a few years, depending on the field) after the end of a grant before being
required to share their data. Embargoes serve two complementary purposes: they protect
the scholars’ control over data, and they ensure that others will have access to the
data within a reasonable time period. In the humanities, scholars are similarly
concerned about controlling access to the sources of their data, whether the Dead Sea
Scrolls or a set of manuscripts in a university archive, until they have published their
research. As data sources such as manuscripts and out-of-print books are digitized and
made publicly available, individual scholars will be less able to hoard their sources.
This effect of digitization on humanities scholarship has been little explored, but
could be profound. Open access to sources promotes participation and collaboration,
while the privacy rules of libraries and archives ensure that the identity of
individuals using specific sources is not revealed. Libraries and archives endeavor to
maintain privacy in the use of digital as well as print sources. However, when digital
content is controlled by commercial entities, protecting the privacy of users is a
greater concern [Mass Digitization 2006]; [Hoofnagle 2009].

Intellectual property, the fourth disincentive to share data and sources, is the most
intractable. The need to establish data sharing agreements in collaborative projects
arose early in eScience initiatives and is far from resolved [David 2003]; [David & Spence 2003]. In the case of the sciences, ownership — or at least control — usually
can be clarified through negotiation. If the research depends upon material acquired
from others, such as cell lines, rules on data release will be governed by contract. The
reliance of humanities scholarship on cultural records, as discussed above, creates
particularly complex intellectual property challenges in the sharing of data. For
example, an art historian usually can publish his or her notes, but not the paintings on
which the research is based. In the case of cultural models such as digital cities, it
can be difficult to distinguish between data that represent an individual city and the
model in which those data are incorporated. Difficulties in separating data from models
(a problem in the sciences and in the humanities) plague both curation and data release
efforts [HyperCities 2009]; [Rome Reborn 2009]; [Serving and Archiving Virtual Environments 2009].

In sum, the digital humanities encounter most of the same incentives and disincentives for sharing data and sources faced by the sciences and by other disciplines. The details play out somewhat differently, of course. The need to build critical masses of cultural sources and interoperable technology platforms affirms the need to broker agreements about data. If the infrastructure for the digital humanities errs toward openness, as is the norm in much of the sciences, the field will advance more quickly.

Learning

The last comparison between the sciences and humanities, but by no means the least, is
the role of information technology in learning. "Cyberlearning," as argued by the
National Science Foundation’s Task Force, can leverage the nation’s investment in
cyberinfrastructure to benefit learning at all ages — "from K to
grey"
[Borgman et al. 2008]. This argument was made earlier in
the humanities, claiming that cyberinfrastructure could serve the humanities both
for scholarship and for making cultural material more readily accessible for learning and
outreach [Unsworth et al. 2006]. Cyberlearning is defined as the use of networked computing and communications technologies to support learning. The scope of cyberlearning concerns in the Task Force report was necessarily constrained to the U.S. and to the domains funded by the NSF, which do not include the arts and humanities. However, the Task Force noted explicitly that the value of cyberlearning encompasses the sciences, social sciences, humanities, and arts, and is an important international initiative.

Several of the recommendations for advancing the state of cyberlearning have analogies
for advancing the state of digital humanities. One is the need to build a vibrant field
by promoting cross-disciplinary communities, publishing best practices, and recruiting
diverse talents. The Cyberlearning Task Force made a careful distinction between
cyberlearning as learning with distributed computing technologies and
workforce development as teaching people about cyberinfrastructure. The
latter is also a concern of the National Science Foundation [National Science Foundation 2007]. In the humanities as in the sciences, people need to learn about cyberinfrastructure before they can learn with it — or use it for their research and teaching.

Another analogous recommendation from the cyberlearning report is the need to instill a
"platform perspective." As noted earlier, the takeup rate of digital learning
modules has been limited by reliance on unique tools, proprietary software, and general
lack of interoperability. Unless products are easily adapted to new uses, others have
little incentive to invest in them. Both cyberinfrastructure and cyberlearning
initiatives are constructing common technical platforms that will improve the
sustainability and reuse of tools, services, and content. Some of these technical
platforms can be leveraged for digital humanities scholarship. Where capabilities are
lacking, the community can work in concert to construct them. Common platforms and
standards are among the goals of the Mellon-funded Bamboo project, for example Project Bamboo 2009.

The Cyberlearning Task Force also recommended initiatives to enable students to use
data. By embedding data skills early in the science curriculum — in the primary grades
where feasible — students can learn to "think like
scientists" early on. Hands-on science approaches endeavor to engage students
in "real" science, making it more interesting and exciting than purely textbook
approaches [Pea. et al 2003]. Projects like the Sloan Digital
Sky Survey and eBird encourage individuals to contribute their observations — whether
about the sky or about birds in their backyards — for use in scientific investigations
[Sloan Digital Sky Survey 2006]; [eBird 2009]. The same promise applies to the
humanities. If students can explore cultural records from the early grades and learn to
construct their own narratives, they may find the study of humanities more lively. By
the time they are college students, they will have learned methods of collaborative work
and the use of distributed tools, sources, and services. Projects such as Perseus,
HyperCities, and the Valley of the Shadow already enable students in humanities courses
to engage in new forms of collaborative discourse [Perseus Digital Library 2009]; [Ayers 2003]; [Presner 2010].

Lastly, the Task Force made a strong recommendation to the NSF to promote open
educational resources. Educational content resulting from cyberlearning grants should be
made available online with permission for unrestricted use and recombination. New
proposals for research and development in cyberlearning should include plans to make
their materials available and sustainable. These recommendations are relevant to all
disciplines. Open educational resources are growing rapidly in variety and number
[Atkins et al 2007]; [Baker 2009]; [Thierstein 2009]. Licensing models
such as Creative Commons [Creative Commons 2009] now include specific
capabilities for licensing learning materials [ccLearn 2009] and
scientific data [Science Commons 2009]. Digital humanities projects, whether or not they include a learning component, also can benefit from Creative Commons licenses. The owners of intellectual property retain their copyright; they simply license it for reuse under publicly stated conditions. Intellectual property owned by others must not be appropriated, of course, but usually it can be linked if not specifically licensed.

Openness matters for the digital humanities for reasons of interoperability, discovery,
usability, and reusability. Open resources — that is, those that can be used under
license or are in the public domain — are more malleable for research and for learning.
They can be mixed up and mashed up, and others can add value to them. Resources that
are available via open repositories also are more readily discovered than those posted
on local websites [OER Commons 2009]; [Open Education 2009]; [Case 2006]; [Atkins et al 2007].

In sum, cyberlearning is important for the digital humanities for a number of reasons. One is the need to learn how to use and how to evaluate digital cultural materials early; graduate school is rather late. Second is the need to build common technology platforms for digital humanities scholarship, which will advance the field by leveraging efforts and resources and by increasing interoperability. Third is the value of open access to resources, which then become more malleable for research and for learning. Last is the need to build a strong community of digital humanities scholars, one that represents a much larger portion of the humanities than is the case today.

Summary

My student’s complaint, "So what use are the digital libraries, if all they do is put digitally unusable information on the web?" nicely captures the challenges facing the humanities today. Digital content, tools, and services all exist, but they are not necessarily useful or usable. Much work remains to build the scholarly infrastructure necessary for digital scholarship to become mainstream in the humanities. Humanities scholars must lead the effort, because only they understand the goals and requirements for what should be built. Librarians, archivists, programmers, and computer scientists will be essential collaborators, each bringing complementary skills.

A number of developments in cyberinfrastructure, eScience, and eResearch offer guidance to the digital humanities community in the quest to become a more established field with a broader base of infrastructure. One is in the area of publication practices. The humanities lag in digital publication of journals and books. Digital publishing, while far from a panacea, offers a number of advantages in the speed, scope, and format of communication. Scholarly print publishing is on the decline, and those who publish only in print form risk being isolated, talking only to each other. More digital-only venues are needed, where dynamic and visual work can be published in its vernacular form.

Another area is the dissemination and use of data. The humanities community should continue to clarify their choices of data and data sources, for these will drive what content is produced, captured, managed, and available for reuse. Questions of data are closely related to research methods, which also are evolving. Data-driven research methods are most valuable when they enable scholars to ask new questions in new ways.

Collaboration is essential in digital humanities projects. Few individuals have the range of expertise required to execute these projects alone. Humanists should continue to seek out complementary partners and encourage people to listen and learn from each other. Working together is also more likely to lead to common platforms and other means of reducing the overhead of technical projects.

In both the sciences and the humanities, incentives to share one’s writing are more obvious than are incentives to share one’s data and sources. In the sciences, data release is being encouraged (or required) by journals and funding agencies, and data-driven research methods can draw upon large corpora that grow as new observations are contributed. In the humanities, data release is less of an issue, but the availability of common technical platforms, tools, and services will promote the sharing of data and sources. The disincentives to share are complex in both the sciences and the humanities, but are being addressed. As the sciences learn how to share data and to share credit for their findings, the humanities can build upon their best practices. Intellectual property constraints remain a major stumbling block, and the considerations vary between the sciences and the humanities.

Opportunities for using cyberinfrastructure for learning exist in all disciplines. Distributed access to scholarly content, common technical platforms, and open resources will advance the humanities as well as the sciences.

A Call to Action

In the process of developing the keynote presentation for the 2009 Digital Humanities Conference and in writing this paper, I consulted many individuals in the digital humanities community for their thoughts on the issues facing the field. From these discussions and my analyses above, five pressing problems emerged.

What are data?

What constitute data in the humanities? What are data sources? How are they made, shared, valued, used, and reused? Answering these questions will enable the digital humanities community to be more articulate about its scope and its goals, and better positioned to identify their requirements for infrastructure.

What are the infrastructure requirements?

The sciences have struggled with this question for a decade or two already. They have
convened workshops and study panels, and launched funding initiatives addressed
specifically at defining, designing, and deploying the necessary infrastructure for
eScience. The humanities have tackled this question on a much smaller scale, leaving
them in the position of building upon the infrastructure constructed by and for other
disciplines. As Johanna Drucker put it so well, "them is
us"
[Drucker 2009]. It is time for the community to articulate its own requirements and to act upon them.

Where are the social studies of digital humanities?

Why is no one following digital humanities scholars around to understand their practices, in the way that scientists have been studied for the last several decades? This body of research has informed the design of scholarly infrastructure for the sciences, and is a central component of cyberinfrastructure and eScience initiatives. Given how rapidly scholarship in the humanities is evolving, it is fertile ground for behavioral research. The humanities community should invite more social scientists as research partners and should make themselves available as objects of study. In doing so, the community can learn more about itself and apply the lessons to the design of tools, services, policies, and infrastructure.

What is the humanities laboratory of the 21st century?

This is a question of great concern to research libraries as well as to humanities scholars. The library continues to be a laboratory for the humanities, but not the only laboratory. Humanities scholars run computing laboratories and may work in distributed virtual environments for research and for learning. Humanists need to partner both with librarians and with the information technology planning and policy groups on their campuses. These communities urgently need to "think together" about the common challenges faced in a time of shrinking budgets for collections, physical space, staffing, and technology services.

What is the value proposition for digital humanities in an era of declining budgets?

For universities, the current economic recession is like no other. Public and private universities alike are re-examining core principles as budgets are slashed by 10% to 30% from one year to the next. Nothing is sacred, and "because it’s beautiful" is not a viable economic argument. The sciences have been remarkably effective at making the argument for their value in economic and political terms, whether to university administrations, legislatures, funding agencies, or the general public. While the humanities will have difficulty making parallel arguments in terms of economic competitiveness and medical advances, they have plenty to offer in terms of cultural understanding, writing and design skills, and critical thinking. Digital scholarship also promotes technical skills, which can be highlighted.

Digital projects require resources in the form of computers, software, staff, and content. Non-digital scholarship also costs money, of course, but more often in the form of travel and subsistence expenses for research in remote archives. Tradeoffs in travel and digitization can be made more explicit. The number of people who will use and benefit from any given project also can be made clearer. Investments in common technical platforms and standards that leverage resources across larger numbers of people and projects are easier to justify.

The digital humanities community has produced some beautiful work and made many
advances in technology, design, and standards. Now is the moment to consolidate that
knowledge and to articulate the community’s requirements and goals. Go forth and do
great things...

Acknowledgements

I am grateful to the colleagues who provided thoughtful commentary on an earlier draft
of this paper, including Murtha Baca, Gregory Britton, and Maureen Whalen of the Getty
Trust; Johanna Drucker, Alberto Pepe, Todd Presner, and Katie Shilton of UCLA; Amy Friedlander, Council on Library and Information Resources; Bernard Frischer, University of Virginia; Alexander Parker, Harvard University; and two anonymous reviewers.

Many other people were very generous with their time in response to my inquiries about
the past, present, and future of the digital humanities, including (in alphabetical
order) William Dutton, Oxford Internet Institute; Neil Fraistat, University of
Maryland; Richard Furuta, Texas A&M; Kimberly Garmoe, Anne Gilliland, UCLA; Charles Henry, Council on Library and Information Resources; Jason Hewitt, UCLA; Jieh Hsiang, National Taiwan University; Marina Jirotka, University of Oxford; Matthew Kirschenbaum, University of Maryland; Clifford Lynch, Coalition for Networked Information; Lev Manovich, University of California, San Diego; Ann O’Brien, Loughborough University; Susan Parker, UCLA; Allen Renear, University of Illinois; David Robey, University of Oxford; Ben Shneiderman, University of Maryland; Harold Short and Paul Spence, King’s College, London; Joshua Sternfeld, UCLA; Sarah Thomas, Bodleian Library; Sharon Traweek, UCLA; Anne Trefethen, University of Oxford; John Unsworth, University of Illinois; Sarah Watstein and Robert Winter, UCLA.

Borgman, Christine. “Scholarship in the Digital Age: Blurring the Boundaries between the Sciences and the Humanities”. Presented at Digital Humanities 2009 (2009). http://works.bepress.com/borgman/216/.

David & Spence 2003 David, P. A. & Spence, M.
(2003). "Towards Institutional Infrastructures for E-Science: The Scope of the Challenge."
Oxford Internet Institute Research Reports: University of Oxford. Retrieved from http://129.3.20.41/eps/le/papers/0502/0502002.pdf on 30 September 2006.

David 2003

David, Paul. The Economic Logic of 'Open Science' and the Balance between Private Property Rights and
the Public Domain in Scientific Data and Information: A Primer. Stanford Institute for Economic Policy Research, 2003. http://siepr.stanford.edu/publicationsprofile/445.

Directory of Open Access Journals 2009 Directory of Open
Access Journals. (2009). Open Society Initiative, Scholarly Publishing and Academic
Resources Coalition. Retrieved from http://www.doaj.org on 16 August 2009.

Directory of Open Access Repositories 2008 Directory of Open
Access Repositories. (2008). University of Nottingham, UK and University of Lund, Sweden. Retrieved from http://www.opendoar.org on 16 August 2009.

Effect of Open Access 2009 The effect of open access and
downloads (hits) on citation impact: a bibliography of studies. (2009). The Open
Citation Project - Reference Linking and Citation Analysis for Open Archives. Retrieved
from http://opcit.eprints.org/oacitation-biblio.html on 16 August 2009.

Facts about Open Access 2005 The Facts about Open Access: A
Study of the Financial and Non-Financial Effects of Alternative Business Models on
Scholarly Journals (2005). Kaufman-Wills Group LLC: Association of Learned and
Professional Society Publishers. Retrieved from http://sippi.aaas.org/Pubs/ on 27 September 2007.

Friedlander 2008 Friedlander, A. (2008). Head in the Clouds and Boots on the Ground: Science, Cyberinfrastructure and
CLIR. Kanazawa Institute of Technology Library Roundtable. Retrieved from http://www.clir.org/pubs/resources/articles.html on 15 August 2008.

Friedlander 2009 Friedlander, A. (2009). "Asking
questions and building a research agenda for digital scholarship." In Working Together or
Apart: Promoting the Next Generation of Digital Scholarship. Washington, DC, Council on
Library and Information Resources. CLIR Publication No. 145: 1-15. Retrieved from http://www.clir.org on 15 June 2009.

Maryland Institute for Technology in the Humanities 2009 Maryland Institute for Technology in the Humanities. (2009). University of
Virginia. Retrieved from http://mith.umd.edu/ on 6 August 2009.

Mass Digitization 2006 Mass Digitization: Implications for
Information Policy (2006). Report from "Scholarship and Libraries in
Transition: A Dialogue about the Impacts of Mass Digitization Projects" —
Symposium held on March 10-11, 2006, University of Michigan: National Commission on
Libraries and Information Science. ED495775. Retrieved from http://www.eric.ed.gov/ and http://www.lib.umich.edu/mdp/symposium/NCLIS-report.pdf on 10 September 2009.

NINCH 2002 NINCH Guide to Good Practice in the Digital
Representation and Management of Cultural Heritage Materials (2002). National Initiative
for a Networked Cultural Heritage. Retrieved from http://www.nyu.edu/its/humanities/ninchguide/ on 8 September 2009.

National Centre for Text Mining 2009 National Centre for
Text Mining. (2009). Retrieved from http://www.nactem.ac.uk/ on 2 September 2009.

The Open Citation Project
"The effect
of open access and downloads ('hits') on citation impact: a bibliography of
studies." (2009). The Open Citation Project - Reference Linking and Citation
Analysis for Open Archives. Retrieved from http://opcit.eprints.org/oacitation-biblio.html on 16 August 2009.