Essays on History and New Media

Below are links to essays devoted to the theoretical and practical aspects
of taking history into a digital format—many of them by people associated with
the Center for History and New Media. We would like to expand this list and
welcome suggestions of essays that might be added.

Scarcity or Abundance? Preserving the Past in a Digital Era

Roy Rosenzweig

This article was originally published in American Historical Review 108, 3 (June 2003): 735-762 and is reprinted here with permission.

On October 11, 2001, the satiric Bert Is Evil web site, which
displayed photographs of the furry Muppet in Zelig-like proximity to
villains such as Adolf Hitler (see Figure 1), disappeared from the web–a
bit of collateral damage from the September 11th attacks. Following the
strange career of Bert Is Evil shows us possible futures of the past in
a digital era–futures that historians need to contemplate more carefully
than they have done so far.

In 1996, Dino Ignacio, a twenty-two-year-old Filipino web designer,
created Bert Is Evil ("brought to you by the letter H and the CIA"),
which became a cult favorite among early tourists on the World Wide Web.
Two years later, Bert Is Evil won a "Webby" as the "best weird site."
Fan and "mirror" sites appeared with some embellishing on the "Bert Is
Evil" theme. After the bombing of the U.S. embassies in Kenya and
Tanzania in 1998, sites in the Netherlands and Canada paired Bert with
Osama bin Laden. 1

This image made a further global leap after September 11. When
Mostafa Kamal, the production manager of a print shop in Dhaka,
Bangladesh, needed some images of bin Laden for anti-American posters,
he apparently entered the phrase "Osama bin Laden" in Google's image
search engine. The Osama and Bert duo was among the top hits. "Sesame
Street" being less popular in Bangladesh than in the Philippines, Kamal
thought the picture a nice addition to an Osama collage. But when this
transnational circuit of imagery made its way back to more Sesame
Street-friendly parts of the world via a Reuters photo of anti-American
demonstrators (see Figure 2), a storm of indignation erupted. Children's
Television Workshop, the show's producers, threatened legal action. On
October 11, 2001, a nervous Ignacio pushed the delete key, imploring
"all fans [sic] and mirror site hosts of 'Bert is Evil' to stop the
spread of this site too."2

Ignacio's sudden deletion of Bert should capture our interest as
historians since it dramatically illustrates the fragility of evidence
in the digital era. If Ignacio had published his satire in a book or
magazine, it would sit on thousands of library shelves rather than
having a more fugitive existence as magnetic impulses on a web server.
Although some historians might object that the Bert Is Evil web site is
of little historical significance, even traditional historians should
worry about what the digital era might mean for the historical record.
U.S. government records, for example, are being lost on a daily basis.
Although most government agencies started using e-mail and word
processing in the mid-1980s, the National Archives still does not
require that digital records be retained in that form, and governmental
employees profess confusion over whether they should be preserving
electronic files.3
Future historians may be unable to ascertain not only whether Bert is
evil, but also which undersecretaries of defense were evil, or at least
favored the concepts of the "evil empire" or the "axis of evil." Not
only are ephemera like "Bert" and government records made vulnerable by
digitization, but so are traditional works–books, journals, and
film–that are increasingly being born digitally. As yet, no one has
figured out how to ensure that the digital present will be available to
the future's historians.

But, as we shall see, tentative efforts are afoot to preserve our
digital cultural heritage. If they succeed, historians will face a
second, profound challenge–what would it be like to write history when
faced by an essentially complete historical record? In fact, the Bert Is
Evil story could be used to tell a very different tale about the
promiscuity and even persistence of digital materials. After all,
despite Ignacio's pleas and Children's Television Workshop's threats, a
number of Bert "mirror" sites persist. Even more remarkably, the
Internet Archive–a private organization that began archiving the web in
1996–has copies of Bert Is Evil going back to March 30, 1997. To be
sure, this extraordinary archive is considerably more fragile than one
would like. The continued existence of the Internet Archive rests
largely on the interest and energy of a single individual, and its
collecting of copyrighted material is on even shakier legal ground. It
has put the future of the past–traditionally seen as a public
patrimony–in private hands.

Still, the astonishingly rapid accumulation of digital data–obvious
to anyone who uses the Google search engine and gets 300,000 hits–should
make us consider that future historians may face information overload.
Digital information is mounting at a particularly daunting rate in
science and government. Digital sky surveys, for example, access over 2
billion images. Even a dozen years ago, NASA already had 1.2 million
magnetic tapes (many of them poorly maintained and documented) with
space data. Similarly, the Clinton White House, by one estimate, churned
out 6 million e-mail messages per year. And NARA is contemplating
archiving military intelligence records that include more than "1
billion electronic messages, reports, cables, and memorandums."4

Thus historians need to be thinking simultaneously about how to
research, write, and teach in a world of unheard-of historical abundance
and how to avoid a future of record scarcity. Although these prospects
have occasioned enormous commentary among librarians, archivists, and
computer scientists, historians have almost entirely ignored them. In
part, our detachment stems from the assumption that these are
"technical" problems, which are outside the purview of scholars in the
humanities and social sciences. Yet the more important and difficult
issues about digital preservation are social, cultural, economic,
political, and legal–issues that humanists should excel at. The "system"
for preserving the past that has evolved over centuries is in crisis,
and historians need to take hand in building a new system for the coming
century. Historians also tend to assume a professional division of
responsibility, leaving these matters to archivists. But the split of
archivists from historians is a relatively recent one. In the early
twentieth century, historians saw themselves as having a responsibility
for preserving as well as researching the past. At that time, the vision
and membership of the American Historical Association–embracing
archivists, local historians, and "amateurs" as well as university
scholars–was considerably broader than it later became.5

Ironically, the disruption to historical practice (to what Thomas
Kuhn called "normal science") brought by digital technology may lead us
"back to the future." The struggle to incorporate the possibilities of
new technology into the ancient practice of history has led, most
importantly, to questioning the basic goals and methods of our craft.
For example, the Internet has dramatically expanded and, hence, blurred
our audiences. A scholarly journal like this one is suddenly much more
accessible to high school students and history enthusiasts. And the work
of history buffs is similarly more visible and accessible to scholars.
We are forced, as a result, to rethink who our audiences really are.
Similarly, the capaciousness of digital media means that the page limits
of journals like this one are no longer fixed by paper and ink costs. As
a result, we are led to question the nature and purpose of the scholarly
journals–why do they publish articles with particular lengths and
structures? Why do they publish particular types of articles? The
simultaneous fragility and promiscuity of digital data requires yet more
rethinking–about whether we should be trying to save everything, who is
"responsible" for preserving the past, and how we find and define
historical evidence.

Historians, in fact, may be facing a fundamental paradigm shift from
a culture of scarcity to a culture of abundance. Not so long ago, we
worried about the small numbers of people we could reach, pages of
scholarship we could publish, primary sources we could introduce to our
students, and documents that had survived from the past. At least
potentially, digital technology has removed many of these limits: over
the Internet, it costs no more to deliver the AHR to 15 million people than 15,000 people;
it costs less for our students to have access to literally millions of
primary sources than a handful in a published anthology. And we may be
able to both save and quickly search through all of the products of our
culture. But will abundance bring better or more thoughtful
history?6

Historians are not unaware of these challenges to the ways that we
work. Yet, paradoxically, these fundamental questions are often
relegated to more marginal professional spaces–to casual lunchtime
conversations or brief articles in association newsletters. But in this
time of rapid and perplexing changes, we need to engage with issues
about access to scholarship, the nature of scholarship, the audience for
scholarship, the sources for scholarship, and the nature of scholarly
training in the central places where we practice our craft–scholarly
journals, scholarly meetings, and graduate classrooms. That scholarly
engagement should also lead us, I believe, to public action to advocate
the preservation of the past as a public responsibility–one that
historians share. But I hope to persuade even those who do not share my
particular political stance that professional historians need to shift
at least some of their attention from the past to the present and future
and reclaim the broad professional vision that was more prevalent a
century ago. The stakes are too profound for historians to ignore the
futures of the past.

Although historians have mostly been silent, archivists, librarians,
public officials, and others have loudly warned about the threatened
loss of digital records and publications for at least two decades. Words
such as "disaster" and "crisis" echo through their reports and
conference proceedings. As early as 1985, the Committee on the Records
of Government declared, "the United States is in danger of losing its
memory."7 More than
a dozen years later, a project called "Time and Bits: Managing Digital
Continuity" brought together archivists, librarians, and computer
scientists to address the problem once again. Conferees watched the
Terry Sanders film Into the Future: On the Preservation of Knowledge in
the Electronic Age, and some likened it to Rachel Carson's Silent Spring
and themselves to the environmentalists of the 1960s and 1970s. A Time
and Bits web site assembled conference materials and promoted "ongoing
digital dialogue." But, as if to prove the conference's point, the site
disappeared in less than a year. Computer scientist Jeff Rothenberg may
have been over-optimistic when he quipped, "Digital documents last
forever–or five years, whichever comes first."8

Those worried about a problem like digital preservation that lacks
public attention are prone to exaggerate. Probably the greatest
distortion has been the implicit suggestion that we have somehow fallen
from a golden age of preservation in which everything of importance was
saved. But much–really, most–of the record of previous historical eras
has disappeared. "The members of prehistoric societies did not think
they lived in prehistoric times," Washington Post writer Joel Achenbach
observes. "They merely lacked a good preservation medium." And
non-digital records that have survived into this century–from Greek and
Chinese antiquities to New Guinean folk traditions to Hollywood
films–are also seriously threatened.9

Another exaggeration involves stories about the grievous losses that
never occurred. One widely repeated story is that computers can no
longer read the data tapes from the 1960 U.S. Census. In truth, as
Margaret Adams and Thomas Brown from the National Archives have shown,
the Census Bureau had by 1979 successfully copied almost all the records
to newer "industry-compatible tapes." Yet, even in debunking one of the
persistent myths of the digital age, Adams and Brown reveal some of the
key problems. In just a decade and a half, migrating the census tapes to
a readable format "represented a major engineering challenge"–hardly
something we expect to face with historical records originating from
within our own lifetimes. And although "only 1,575 records . . . could
not be copied because of deterioration," the absolute nature of digital
corrosion is sobering.10 Print books and records decline slowly and
unevenly–faded ink or a broken-off corner of a page. But digital records
fail completely–a single damaged bit can render an entire document
unreadable. Here is the key difference from the paper era: we need to
take action now because digital items very quickly become unreadable, or
recoverable only at great expense.

This has already happened–albeit not as much as sometimes suggested.
"Ten to twenty percent of vital data tapes from the Viking Mars
mission," notes Deanna Marcum, the president of the Council on Library
Information Resources, "have significant errors because magnetic tape is
too susceptible to degradation to serve as an archival storage medium."
Often, records lack sufficient information about their organization and
coding to make them usable. According to Kenneth Thibodeau, director of
the National Archive and Record Administration's Electronic Records
Archives program, NARA lacked adequate documentation to make sense
of several hundred reels of computer tapes from the Department of Health
and Human Resources and data files from the National Commission on
Marijuana and Drug Abuse. Some records could be recovered by future
digital archaeologists but sometimes only through an unaffordable "major
engineering challenge."11 The greatest concern is not over what has
already been lost but what historians in fifty years may find that they
can't read.

Many believe–incorrectly–the central problem to be that we are
storing information on media with surprisingly short life spans. To be
sure, acid-free paper and microfilm last a hundred to five hundred
years, whereas digital and magnetic media deteriorate in ten to thirty
years. But the medium is far from the weakest link in the digital
preservation chain. Well before most digital media degrade, they are
likely to become unreadable because of changes in hardware (the disk or
tape drives become obsolete) or software (the data are organized in a
format destined for an application program that no longer works). The
life expectancy of digital media may be as little as ten years, but very
few hardware platforms or software programs last that long. Indeed,
Microsoft only supports its software for about five years.12

The most vexing problems of digital media are the flipside of their
greatest virtues. Because digital data are in the simple lingua franca
of bits, of ones and zeros, they can be embodied in magnetic impulses
that require almost no physical space, be transmitted over long
distances, and represent very different objects (for instance, words,
pictures, or sounds as well as text). But the ones and zeros lack
intrinsic meaning without software and hardware, which constantly change
because of technological innovation and competitive market forces. Thus
this lingua franca requires translators in every computer application,
which, in turn, operate only on specific hardware platforms. Compounding
the difficulty is that the languages being translated keep changing
every few years.

The problem is still worse because of the ability of digital media
to create and represent complex, dynamic, and interactive
objects–another of their great virtues. Even relatively simple documents
that appear to have direct print analogs turn out to be more complex.
Printing out e-mail messages makes rapid searches of them impossible and
often jettisons crucial links to related messages and attachments. In
addition, multimedia programs, which generally rely on complicated
combinations of hardware and software, quickly become obsolete. Nor is
there any good way to preserve interactive and experiential digital
creations. That is most obviously true of computer games and digital
art, but even a large number of ordinary web pages are generated out of
databases, which means that the specific page you view is your own
"creation" and the system can create an infinite number of pages.
Preserving hypertextually linked web pages poses the further problem
that to save a single page in its full complexity could ultimately
require you to preserve the entire web, because virtually every web page
is linked to every other. And the dynamic nature of databases
destabilizes mundane business and governmental records since they are
often embedded in systems that automatically replace old data with new–a
changeability that, notes archival educator Richard Cox, threatens "the
records of any modern day politician, civic leader, businessperson,
military officer, or leader."13

While these technical difficulties are immense, the social,
economic, legal, and organizational problems are worse. Digital
documents–precisely because they are in a new medium–have disrupted
long-evolved systems of trust and authenticity, ownership, and
preservation. Reestablishing those systems or inventing new ones is more
difficult than coming up with a long-lived storage mechanism.

How, for example, do we ensure the "authenticity" of preserved
digital information and "trust" in the repository? Paper documents and
records also face questions about authenticity, and forgeries are hardly
unknown in traditional archives. The science of "diplomatics," in fact,
emerged in the seventeenth century as a way to authenticate documents
when scholars confronted rampant forgeries in medieval documents. But
digital information–because it is so easily altered and copied, lacks
physical marks of its origins, and, indeed, even the clear notion of an
"original"–cannot be authenticated as physical documents and objects
can. We have, for example, no way of knowing that forwarded e-mail
messages we receive daily have not been altered. In fact, the public
archive of Usenet discussion groups contains hundreds of deliberately
and falsely attributed messages. "Fakery," write David Bearman and
Jennifer Trant, "has not been a major issue for most researchers in the
past, both because of the technical barriers to making plausible
forgeries, and because of the difficulty with which such fakes entered
an authoritative information stream."14 Digital media, tools, and networks have altered
the balance.

"It took centuries for users of print materials to develop the web
of trust that now undergirds our current system of publication,
dissemination, and preservation," notes Abby Smith, a leading figure in
library and preservation circles. Digital documents are disrupting that
carefully wrought system by undercutting our expectations of what
constitutes a trusted and authentic document and repository. But to make
the transition to a new system requires not just technical measures
(such as digital signatures and "watermarks") but, as Clifford Lynch,
the executive director of the Coalition for Networked Information,
observes, also figuring out responsibility for guaranteeing claims of
authorship and financing for a system of "authentication and integrity
management."15

Such questions are particularly hard to answer since digitization
also undercuts our sense of who owns such materials and, thus, who has
the right and responsibility to preserve them. Consumers (including
libraries) have traditionally purchased books and magazines under the
"first sale" doctrine, which gives those who buy something the right to
make any use of it, including lending or selling it to others. But most
digital goods are licensed rather than sold. Because contract law
governs licenses, vendors of digital content can set any restrictions
they choose–they can say that the contents may not be copied or cannot
be viewed by more than one person at a time. Adobe's eBook reader even
includes a warning that a book may not be read aloud.16

But if libraries don't own digital content, how can they preserve
it? The problem will become even worse if publishers widely adopt copy
protection schemes as they are seriously considering doing for
electronic books. Even a library that had the legal right to preserve
the content would have no reason to assume that it would be able to do
so; meanwhile, the publisher would have little incentive to keep the
protection system functioning in a new software environment. In general,
digital rights management systems and other forms of "trusted computing"
undercut preservation efforts by embedding centralized control in
proprietary systems. "If Microsoft, or the U.S. government, does not
like what you said in a document you wrote," speculates Free Software
advocate Richard Stallman, "they could post new instructions telling all
computers to refuse to let anyone read that document."17

Licensed and centrally controlled digital content not only erodes
the ability of libraries to preserve the past, it also undercuts their
responsibility. Why should a library worry about the long-term
preservation of something it does not own? But then, who will?
Publishers have not traditionally assumed preservation responsibility
since there is no obvious profit to be made in ensuring that something
will be available or readable in a hundred years when it is in the
public domain and can't be sold or licensed.18

The digital era has not only unsettled questions of ownership and
preservation for traditional copyrighted material, it has also
introduced a new, vast category of what could be called semi-published
works, which lack a clear preservation path. The free content available
on the web is protected by copyright even though it has not been
formally registered with the Library of Congress Copyright Office or
sold by a publisher. That means that a library that decided to save a
collection of web pages–say, those posted by abortion rights
organizations–would technically be violating copyright.19 The absence of this
"process" is the most fundamental problem facing digital preservation.
Over centuries, a complex (and imperfect) system for preserving the past
has emerged. Digitization has unsettled that system of responsibility
for preservation, and an alternative system has not yet emerged. In the
meantime, cultural and historical objects are being permanently
lost.

Four different systems generally preserve cultural and historical
documents and objects. Research libraries take responsibility for books,
magazines, and other published cultural works, including moving images
and recorded sound. Government records fall under the jurisdiction of
the National Archives and a network of state and local archives.20 Systems for
maintaining other cultural and historical materials are less formal or
centralized. "Records" and "papers" from businesses, voluntary
associations, and individuals have found their way into local historical
societies, specialized archives, and university special collections.
Finally, the semi-published body of material we have called "ephemera"
has been most often saved by enthusiastic individuals–for example,
postcard and comic book collectors–who might later deposit their hoard
in a permanent repository.21

While research libraries have tried to save relatively complete sets
of published works, other historical sources have generally only been
preserved in a highly selective and sometimes capricious fashion–what
archivists call "preservation through neglect." Materials that lasted
fifty or one hundred years found their way into an archive, library, or
museum. Although this inexact system has resulted in many grievous
losses to the historical record, it has also given us many rich
collections or personal and organizational papers and ephemera.22

But this "system" will not work in the digital era because
preservation cannot begin twenty-five years after the fact. What might
happen, for example, to the records of a writer active in the 1980s who
dies in 2003 after a long illness? Her heirs will find a pile of
unreadable 5 1/4" floppy disks with copies of letters and poems written
in WordStar for the CP/M operating system or one of the more than fifty
now-forgotten word-processing programs used in the late 1980s.23

Government archives similarly continue to rely on the unwarranted
assumption that records can be appraised and accessioned many years
after their creation. A recent study, "Current Recordkeeping Practices
within the Federal Government," which surveyed more than forty federal
agencies, found widespread confusion about "policies and procedures for
managing, storing, and disposing of electronic records and systems."
"Government employees," it concluded, "do not know how to solve the
problem of electronic records–whether the electronic information they
create constitutes records and, if so, what to do with the records.
Electronic files that qualify as records–particularly in the form of
e-mail, and also word processing and spreadsheet documents–are not being
kept at all as records in many cases."24

This uncertainty and disarray would not be so serious if we could
assume that it could be simply sorted out in another thirty years. But
if we hope to preserve the present for the future, then the technical
problems facing digital preservation as well as the social and political
questions about authenticity, ownership, and preservation policy need to
be confronted now.

At least initially, archivists and librarians tended to assume that
a technical change–the rise of digital media–required a technical
solution. The simplest technical solution has been to translate digital
information into something more familiar and reassuring like paper or
microfilm. But, as Rothenberg points out, this is a "rear-guard action"
that destroys "unique functionality (such as dynamic interaction,
nonlinearity, and integration)" and "core digital attributes (perfect
copying, access, distribution, and so forth)" and sacrifices the
"original form, which may be of unique historical, contextual, or
evidential interest."25

Another backward-looking solution is to preserve the original
equipment. If you have files created on an Apple II, then why not keep
one in case you need it? Well, sooner or later, a disk drive breaks or a
chip fails, and unless you have a computer junkyard handy and a talent
for computer repair, you are out of luck. "Technological preservation,"
moreover, requires intervention before it is too late to save not just
the files but also the original equipment. The same can be said of what
is probably the most widely accepted current method of digital
preservation–"data migration," or moving the documents from a medium,
format, or computer technology that is becoming obsolete to one that is
becoming more common.26 When the National Archives saved the 1960 U.S.
Census tapes, they used migration, and large organizations use this
strategy all the time–moving from one accounting system to another.
Because we have lots of experience migrating data, we also know that it
is time consuming and expensive. One estimate is that data migration is
equivalent to photocopying all the books in a library every five
years.27

Some like Rothenberg also worry, for example, about the loss of
functionality in migrating digital files. Moreover, the process can't be
automated because "migration requires a unique new solution for each new
format or paradigm and each type of document that is to be converted
into that new form." Rothenberg is also derisive about the practice of
translating documents into standardized formats and then re-translating
as new formats emerge, which he finds "analogous to translating Homer
into modern English by way of every intervening language that has
existed during the past 2,500 years."28

Rothenberg's favored alternative is "emulation"–developing a system
that works on later generations of hardware and software but mimics the
original. In principle, a single emulation solution could preserve a
vast store of digital documents. In addition, it holds the greatest
promise for preserving interactive and multimedia digital creations. But
critics of emulation tellingly note that it is only a theoretical
solution. Probably the best strategy is to reject the all-or-nothing,
magic-bullet approaches implicit in the proposals of the most passionate
advocates of any particular strategy–whether creating hard copies,
preserving old equipment, migrating formats, or emulating hardware and
software. Margaret Hedstrom, one of the leading figures in digital
preservation research, argues persuasively that "the search for the Holy
Grail of digital archiving is premature, unrealistic, and possibly
counter-productive." Instead, we need to develop "solutions that are
appropriate, effective, affordable and acceptable to different classes
of digital objects that live in different technological and
organizational contexts."29

But even the most calibrated mix of technical solutions will not
save the past for the future because, as we have seen, the problems are
much more than technical and involve difficult social, political, and
organizational questions of authenticity, ownership, and responsibility.
Multiple experiments and practices are under way–more than can be
discussed here. But I want to focus on some widely discussed approaches
or experiments as illustrative of some of the possibilities and
continuing problems.

One of the earliest and most influential approaches to digital
preservation (and digital authenticity) was what archivists call the
"Pitt Project," a three-year (1993-1996) research effort funded by the
National Historical Publications and Records Commission (NHPRC) and centered at the University of Pittsburgh
School of Information and Library Studies. For historians, what is most
interesting (and sometimes puzzling) about the Pitt Project approach is
the way that it simultaneously narrows and broadens the role of archives
and archivists through its focus on "records as evidence" rather than
"information." "Records," David Bearman and Jennifer Trant explain, "are
that which was created in the conduct of business" and provide "evidence
of transactions." Data or information, by contrast, Bearman "dismisses
as non-archival and unworthy of the archivist's attention."30 From this point of
view, the government's record of your Social Security account is vital
but not the "information" contained in letters that you and others might
have written complaining about the idea of privatizing Social
Security.

The Pitt Project produced a pathbreaking set of "functional
requirements for evidence in electronic record keeping"–in effect,
strategies and tactics to ensure that electronic records produce legally
or organizationally acceptable evidence of their transactions. Such a
focus responds particularly well to worries about the "authenticity" of
electronic records. But for historians (and for some archivists), the
focus on records as evidence rather than records as sources of
information, history, or memory seems disappointingly narrow. Moreover,
as Canadian archivist Terry Cook points out, the emphasis on
"redesigning computer systems' functional requirements to preserve the
integrity and reliability of records" and assigning "long-term custodial
control . . . to the creator of archival records" privileges "the
powerful, relatively stable, and continuing creators of records capable
of such reengineering" and ignores artists, activists, and "marginalized
and weaker members of society" who have neither the resources nor
inclination to produce "business acceptable communications."31

While the Pitt Project emphasizes archival professionalism, a
narrowing of the definition of recordkeeping, a rejection of the
custodial tradition in archives, and planning for more careful
collecting in the future rather than action in the present, the Internet
Archive has taken precisely the opposite approach. It represents a
grass-roots, immediate, enthusiast response to the crisis of digital
preservation that both expands and further centralizes archival
responsibility in ways that were previously unimaginable. Starting in
September 1996, Brewster Kahle and a small staff sent "crawlers" out to
capture the web by moving link-by-link and completing a full snapshot
every two months. Although in part a philanthropic venture funded by
Kahle, the Internet Archive also has a commercial side. Kahle's
for-profit web navigation service, Alexa Internet (bought by Amazon in
1999 for $300 million), is what actually gathers the web snapshots,
which it uses to analyze patterns of web use, and then donates them to
the Internet Archive.32

By February 2002, the Internet Archive (IA) had gathered a
monumental collection of more than 100 terabytes of web data–about 10
billion web pages or five times all the books in the Library of
Congress–and was gobbling up 12 terabytes more each month. That same
fall, it began offering public access to most of the collection through
what Kahle called the "Wayback Machine"–a wry reference to the device
used by the time-traveling Mr. Peabody in the Rocky and Bullwinkle
cartoons of the 1960s. Astonishingly, a single individual with a very
small staff has created the world's largest database and library in just
five years.33

In December 2001, shortly after the Wayback Machine became public,
the search engine company Google unveiled "Google Groups," another
massive digital archive–this one under purely commercial auspices.
Google Groups provides access to more than 650 million messages posted
over the past two decades to "Usenet," the online discussion forums that
predate even the Internet. Although "ownership" seems like a dubious
concept in relation to a public discussion forum, Google purchased the
archive from Deja.com, which had brought the groups to the web but then
collapsed in the Internet bust. Despite Deja.com's failure, Google sees
the Usenet Archive as another attractive feature in its stable of online
information resources and tools.34

Both IA and Google Groups are libraries organized on principles that
are more familiar to computer scientists than to librarians, as Peter
Lyman, who knows both worlds as the head of the University of California
at Berkeley library and as a member of the IA board, points out. The
library community has focused on developing "sophisticated cataloging
strategies." But computer scientists, including Kahle, have been more
interested in developing sophisticated search engines that operate
directly on the data we see (the web pages) rather than on the metadata
(the cataloging information). Whereas archival and library projects
focus on "high-quality collections built around select themes" and make
the unit of cataloging the web page, the computer science paradigm
"allows for archiving the entire Web as it changes over time, then uses
search engines to retrieve the necessary information."35

Projects designed by librarians and archivists generally have the
advantages of precision and standardization. They favor careful
protocols and standards such as the Dublin Core, the OAIS (Open Archival
Information System), and the EAD (Encoded Archival Description). But the
expense and difficulty of the protocols and procedures mean that less
well funded and staffed archives and libraries often ignore them.
Responding to presentations by advocates of standards at a conference,
computer scientist Jim Miller warned that if archivists push for too
much cataloging metadata "they might end up with none."36

The Internet Archive, which is the child of the search engines and
the computer scientists, is an extraordinarily valuable resource. Most
historians will not be interested now, but in twenty-five or fifty years
they will delight in searching it. A typical college history assignment
in 2050 might be to compare web depictions of Muslim Americans in 1998
and 2008. But any appreciation of the IA must acknowledge its
limitations. For example, large numbers of web pages do not exist as
"static" HTML pages; rather, they are stored in databases, and the pages
are generated "on the fly" by search queries. As a result, the IA's
crawlers do not capture much of the so-called "deep web" that is stored
in databases. Multimedia files–streaming media and flash–also do not
seem to be captured. In addition, the Internet Archive's crawls cannot
go on forever; at some point, they stop, since, as one of the computer
scientists who manages them acknowledges, "the Web is essentially
infinite in size." Anyone who browses the IA regularly encounters such
messages as "Not in Archive" and "File Location Error" or even "closed
for maintenance."37

Some pages are missing for legal and economic as well as technical
reasons. Private, gated sites are off-limits to the Internet Archive's
crawlers. And many ungated sites also discourage the crawlers. The New
York Times allows free access to its current contents, but charges for
articles more than one week old. If the IA gathered up and preserved the
Times's content, there would be no reason for anyone to pay the Times
for access to its proprietary archive. As a result, the Times includes a
"robots exclusion" file on its site, which the IA respects. Even those
sites without the robots exclusion file and without any formal copyright
are still covered by copyright law and could challenge the IA's
archiving of their content. To avoid trouble, the IA simply purges the
pages of anyone who complains. It is as if Julie Nixon could write to
the National Archives and tell them to delete her father's tapes or an
author could withdraw an early novel from circulation.38

Thus the Internet Archive is very far from the complete solution to
the problem of digital preservation. It does not deal with the digital
records that vex the National Archives and other repositories because
they lack the public accessibility and minimal standardization in HTML
of web pages. Nor does it include much formally published
literature–e-books and journals–which is sold and hence gated from view.
And even for what it has gathered, it has not yet hatched a long-term
preservation plan, which would have to incorporate a strategy for
continuing access to digital data that are in particular (and
time-bound) formats. Even more troubling, it has no plan for how it will
sustain itself into the future. Will Kahle continue to fund it
indefinitely?39
What if Amazon and Alexa no longer find it worthwhile to gather the
data, especially since acquisition costs are doubling every year?

Similar questions could be raised about "Google Groups." What if the
company decides that there is no prospect of gaining adequate
advertising revenue by making old newsgroup messages available (as,
indeed, Deja.com previously determined)? While appreciating Google's
entrepreneurial energy in preserving and making available an enormous
body of historical documents, we should also look carefully at the way
private corporations have suddenly entered into a realm–archives–that
was previously part of the public sector–a reflection of the
privatization sweeping across the global economy. At least so far, our
most important, and most imaginatively constructed, digital collections
are in private hands.40

Given that the preservation of cultural heritage and national
history are arguably social goods, why shouldn't the government take the
lead in such efforts? One reason is that at least some key aspects of
the digital present–the Bert story, for example–do not follow national
boundaries and, indeed, erode them. If national archives were part of
the projects of state-building and nationalism, then why should states
support post-national digital archives? The declining significance of
state-based national archives may mirror the decline of the contemporary
national state. So far, the Smithsonian Institution and the Library of
Congress have worked with the Internet Archive only where they needed
its help in documenting some particularly national stories–the elections
of 1996 and 2000 and the September 11th attacks.

Another reason for the limited government role is that the digital
preservation crisis emerged most dramatically during the anti-statist
Reagan revolution of the 1980s. In the 1970s, for example, the
electronic records program of the National Archives made a modest,
promising start. But, as archivist Thomas E. Brown writes, it went into
"a near total collapse in the 1980s." The staff dropped to seven people
by 1983, and, amazingly, this beleaguered group charged with guarding
the nation's electronic records had no access to computer facilities.
Things began to improve in the early 1990s, but, after 1993, the
electronic records program suffered from further cutbacks in the federal
work force. An underfunded and understaffed National Archives was hardly
in a position to develop a solution to the daunting and mounting problem
of electronic federal records.41

The Library of Congress also initially eschewed a leading role in
preserving digital materials, as the National Research Council later
complained. Here, too, one could detect the weakening influence of the
state. The library's high-profile effort in the digital realm was
"American Memory," which digitized millions of items from its
collections and placed them online. Teachers, students, and researchers
love American Memory, but it did nothing to preserve the growing number
of "born digital" objects. Not coincidentally, American Memory was a
project that could attract large numbers of private and corporate
donors, who often saw sponsorship as good advertising and who paid for
three-quarters of the project.42

Better developed state-centered approaches to digital preservation
have, not surprisingly, emerged outside the United States–in Australia
and Scandinavia, for example. Norway requires that digital materials be
legally deposited with the national library in return for copyright
protection.43
One of the key ways that the Library of Congress could help preserve the
future of digital materials would be to aggressively assert its
copyright deposit claims, which would finesse some of the legal and
ownership issues troubling the Internet Archive.44

Nevertheless, the National Archives and the Library of Congress have
very recently begun–prodded by outside critics and supported belatedly
by Congress–to take a more aggressive approach on digital preservation.
The archives is proposing a "Redesign of Federal Records Management" to
respond to the reality that "a large majority of electronic record
series of continuing value are not coming into archival custody." It is
also working closely with the San Diego Supercomputing Center on
developing "persistent object preservation" (POP), which creates a
description of a digital object (and groups of digital objects) in
simple tags and schemas that will be understandable in the future; the
records would be "self-describing" and, hence, independent of specific
hardware and software. The computer scientists maintain that records in
this format will last for three hundred to four hundred years.45

In December 2000, the Library of Congress launched the most
important initiative, the National Digital Information Infrastructure
Program (NDIIP). Even this massive and important federal initiative bore
the marks of the anti-statist, privatization politics of the 1980s.
Congress gave the library $5 million for planning and promised another
$20 million when it approved the plan. But the final $75 million will
only be distributed as a match against an equal amount in private
funds.46

Although the future of the digital present remains perilous, these
recent initiatives suggest some encouraging strategies for preserving
the range of digital materials. A combination of technical and
organizational approaches promises the greatest chance of success, but
privatization poses grave dangers for the future of the past. Advocates
of digital preservation need to mobilize state funding and state power
(such as the assertion of eminent domain over copyright materials) but
infuse it with the experimental and ad hoc spirit of the Internet
Archive. And we need to recognize that, for many digital materials
(especially the web), the imperfect computer-science paradigm probably
has more to recommend it than the more careful and systematic approach
of the librarians and archivists. What is often said of military
strategy seems to apply to digital preservation: "the greatest enemy of
a good plan is the dream of a perfect plan."47 We have never preserved everything;
we need to start preserving something.

Given the enormous barriers to saving digital records and
information, it comes as something of a surprise that many continue to
insist that a perfect plan–or at least a pretty good plan–will
eventually emerge. Techno-optimists such as Brewster Kahle dream most
vividly of the perfect plan and its startling consequences. "For the
second time in history," Kahle writes with two collaborators, "people
are laying plans to collect all information–the first time involved the
Greeks which culminated in the Library of Alexandria . . . Now . . .
many [are] once again to take steps in building libraries that hold
complete collections." Digital technology, they explain, has "gotten to
the point where scanning all books, digitizing all audio recordings,
downloading all websites, and recording the output of all TV and radio
stations is not only feasible but less costly than buying and storing
the physical versions."48 Librarians and archivists remain skeptical of
such predictions, pointing out the enormous costs of cataloging and
making available what has been preserved, and that we have never saved
more than a fraction of our cultural output. But, whatever our degree of
skepticism, it is still worth thinking seriously about what a world in
which everything was saved might look like.

Most obviously, archives, libraries, and other record repositories
would suddenly be freed from the tyranny of shelf space that has always
shadowed their work. Digitization also removes other long-term scourges
of historical memory such as fire and war. The 1921 fire that destroyed
the 1890 census records provided a crucial spark that finally led to the
creation of the National Archives. But what if there had been multiple
copies of the census? The ease–almost inevitability–of the copying of
digital files means that it is considerably less likely today that
things exist in only a single copy.49

What would a new, virtual, and universal Alexandria library look
like? Kahle and his colleagues have forcefully articulated an expansive
democratic vision of a past that includes all voices and is open to all.
"There are about ten to fifteen million people's voices evident on the
Web," he told a reporter. "The Net is a people's medium: the good, the
bad and the ugly. The interesting, the picayune and the profane. It's
all there." Advocates of the new universal library and archive wax even
more eloquently about democratizing access to the historical record.
"The opportunity of our time is to offer universal access to all of
human knowledge," said Kahle.50

Kahle's vision of cultural and historical abundance merges the
traditional democratic vision of the public library with the resources
of the research library and the national archive. Previously, few had
the opportunity to come to Washington to watch early Thomas Edison films
at the Library of Congress. And the library could not have served them
if they had. Democratized access is the real payoff in electronic
records and materials. It may be harder to preserve and organize digital
materials than it is paper records, but, once that is accomplished, they
can be made accessible to vastly greater numbers of people. To open up
the archives and libraries in this way democratizes historical work.
Already, people who had never had direct access to archives and
libraries can now enter. High school students are suddenly doing primary
source research; genealogy has exploded in popularity because you no
longer have to travel to distant archives.

This vision of democratic access also promises direct and unmediated
access to the past. Electronic commerce enthusiasts tout
"disintermediation"–which is the elimination of the insurance and real
estate broker and other intermediaries–and the emergence of one like
eBay made up of only buyers and sellers. In theory, the universal
digital library might bring a similar cultural disintermediation in
which people interested in history make direct contact with the
documents and artifacts of the past without the mediation of cultural
brokers like librarians, archivists, and historians. Sociologist Mike
Featherstone speculates on the emergence of a "new culture of memory" in
which the existing "hierarchical controls" over access would disappear.
This "direct access to cultural records and resources from those outside
cultural institutions" could "lead to a decline in intellectual and
academic power" in which the historian, for example, no longer stands
between people and their pasts. 51 The "Wayback Machine" encapsulates this vision
of disintermediation by suggesting that everyone, like Mr. Peabody and
his boy Sherman, can jump in a time machine and find out what Columbus
or Edison was "really" like. Of course, most historians would argue
that, while digital collections may put "the novice in the archive,"
52 he or she
is not so likely to know what to do there. Still, the balance of power
may shift. Ask any travel agent how the widespread access to information
undercuts professional control.

Most historians have not embraced this vision in which everyone
becomes his or her own historian. Nor have they enthusiastically
endorsed the vision of a universal library that contains all voices and
all records. In my informal polling, most historians recoil at the
thought that they would need to write history with even more sources.
53 Historians
are not particularly hostile to new technology, but they are not ready
to welcome fundamental changes to their cultural position or their modes
of work. Having lived our professional careers in a culture of scarcity,
historians find that a world of abundance can be unsettling.

Abundance, after all, can be overwhelming. How do we find the forest
when there are so many damned trees? Psychologist Aleksandr Luria made
this point in his famous study of a Russian journalist, "S" (S. V.
Shereshevskii), who had an amazingly photographic memory; he could
reproduce complex tables of numbers and long lists of words that had
been shown to him years earlier. But this "gift" turned out to be a
curse. He could not recognize people because he remembered their faces
so precisely; a slightly different expression would register as a
different person. Grasping the larger point of a passage or abstract
idea "became a tortuous ... struggle against images that kept rising to
the surface in his mind." He lacked, as psychologist Jerome Bruner
notes, "the capacity to convert encounters with the particular into
instances of the general." 54

If historians are to set themselves "against forgetting" (in Milan
Kundera's resonant phrase), then they may need to figure out new ways to
sort their way through the potentially overwhelming digital record of
the past. Contemporary historians are already groaning under the weight
of their sources. Robert Caro has spent twenty-six years working his way
through just the documents on Lyndon B. Johnson's pre-vice-presidential
years–including 2,082 boxes of Senate papers. Surely, the injunction of
traditional historians to look at "everything" cannot survive in a
digital era in which "everything" has survived.55

The historical narratives that future historians write may not
actually look much different from those that are crafted today, but the
methodologies they use may need to change radically. If we have, for
example, a complete record of everything said in 2010, can we offer
generalizations about the nature of discourse on a topic simply by
"reading around"? Wouldn't we need to engage in some more methodical
sampling in the manner of, say, sociology? Would this revive the
social-scientific approaches with which historians flirted briefly in
the 1970s? Wouldn't historians need to learn to write complex searches
and algorithms that would allow them to sort through this overwhelming
record in creative, but systematic, ways? The future gurus of historical
research methodology may be the computer scientists at Google who have
figured out how to search the equivalent of a 100-mile-high pile of
paper in half a second. "To be able to find things with high accuracy
and high reliability has an incredible impact on the world"–and, one
might add, future historians. Future graduate programs will probably
have to teach such social-scientific and quantitative methods as well as
such other skills as "digital archaeology"(the ability to "read" arcane
computer formats), "digital diplomatics" (the modern version of the old
science of authenticating documents), and data mining (the ability to
find the historical needle in the digital hay).56 In the coming years, "contemporary
historians" may need more specialized research and "language" skills
than medievalists do.

Historians have time to think about changing their methods to meet
the challenge of a cornucopia of historical sources. But they need to
act more immediately on preserving the digital present or that
reconsideration will be moot; they will be struggling with a scarcity,
not an overabundance, of sources. Surprisingly, however, historians
themselves have been scarce on this issue. 57 Archivists and librarians have
intensely debated and discussed digitization and digital presentation
for more than a decade. They have written hundreds of articles and
reports, undertaken research projects, and organized conferences and
workshops. Academic and teaching historians have taken almost no part in
these conferences and have contributed almost nothing to this burgeoning
literature. Historical journals have published nothing on the topic.
58

Part of the reason is that preserving the born-digital materials for
future historians seems like a theoretical and technical issue,
tomorrow's problem or at least someone else's problem. Another reason
for this disinterest is the divorce of archival concerns from the
historical profession–a part of the general narrowing of the concerns of
professional historians over the past century. In the late nineteenth
and early twentieth centuries, historians and archivists were closely
aligned. Perhaps the most important committee of the American Historical
Association in the 1890s was the Historical Manuscripts Commission,
which led to the AHA's influential Public Archives Commission. Archival
concerns found a regular place in the AHA's Annual Meeting, the American
Historical Review, and especially the voluminous AHA annual reports.
Most important, the AHA led the fight to establish the National
Archives. But in 1936 (in the midst of an earlier technological upheaval
that came with the emergence of microfilm), the Conference of Archivists
left the AHA to create the Society of American Archivists. The
professions charged with writing about the past and preserving the
records of the past have sharply diverged in the past seven decades.
Today, only 82 of the 14,000 members of the AHA identify themselves as
archivists. 59

But historians ignore the future of digital data at their own peril.
What, for example, about the long-term preservation of scholarship that
is–increasingly–originating in digital form? Not only do historians need
to ensure the future of their own scholarship, but linking directly from
footnotes to electronic texts–an exciting prospect for scholars–will
only be possible if a stable archiving system emerges. 60 For the foreseeable
future, librarians and archivists will be making decisions about
priorities in digital preservation. Historians should be at the table
when those decisions are made. Do they wish to endorse, for example, the
Pitt Project's emphasis on preserving records of business transactions
rather than "information" more broadly?

One of the most vexing and interesting features of the digital era is
the way that it unsettles traditional arrangements and forces us to ask
basic questions that have been there all along. Some are about the
relationship between historians and archival work. Should the work of
collecting, organizing, editing, and preserving of primary sources
receive the same kind of recognition and respect that it did in earlier
days of the profession? Others are about whose overall responsibility it
is to preserve the past. For example, should the National Archives
expand its role in preservation beyond official records? For many years,
historians have taken a hands-off approach to archival questions. With
the unsettling of the status quo, they should move back more actively
into this realm. If the web page is the unit of analysis for the digital
librarian and the link the unit of analysis for the computer scientists,
what is the appropriate unit of analysis for historians? What would a
digital archival system designed by historians look like? And how might
we alter and enhance our methodologies in a digital realm? For example,
in a world where all sources were digitized and universally accessible,
arguments could be more rigorously tested. Currently, many arguments
lack such scrutiny because so few scholars have access to the original
sources–a problem that has arisen especially sharply in the recent
controversies over Michael A. Bellesiles' Arming America: The Origins of
a National Gun Culture (2000). In a new digital world, would historians
then be held to the same standard of "reproducible" results as
scientists? 61

Of course, when historians get to the preservation table, they will
discover a cultural and professional clash between their own impulses,
which are to save everything, and those of librarians and archivists who
believe that selection, whether passive or active, is inevitable. The
National Archives, for example, only permanently accessions 2 percent of
government records. 62 This conflict surfaced in the 1980s and 1990s,
when librarians tried to bring in scholars to discuss priorities in
preserving books that were deteriorating because of acidic paper.
Librarians found the discussion "frustrating." "Many scholars," recalls
Deanna Marcum, declared "that everything had to be saved and they could
not make choices." Not surprisingly, scholars have responded very
differently to Nicholson Baker's sharp attack on the microfilming and
disposal of aging books and newspapers in Double Fold than have
archivists and librarians. Whereas many scholars have shared Baker's
outrage that books and newspapers have been destroyed, archivists and
librarians have responded in outrage to what they see as his failure to
understand the pressures that make it impossible to save everything.
Whereas historians with their gaze fixed on the past worry about
information scarcity (the missing letter or diary), archivists and
librarians recognize that we now live in a world of overwhelming
information abundance. 63 If historians are going to join in preservation
discussions, they will have to make themselves better informed about the
simultaneous abundance of historical sources and scarcity of financial
resources that lead archivists and librarians to respond with
exasperation to scholars' blithe insistence that everything must be
saved.

Preservation of the past is, in the end, often a matter of allocating
adequate resources. Perhaps the largest problem facing the preservation
of electronic government records has nothing to do with technology; it
is, as various reports have noted, "the low priority traditionally given
to federal records management." In the absence of new resources, the
costs of preservation will come from the money that our society, in the
aggregate, allocates for history and culture. Richard Cox, for example,
has argued that a greater portion of the budget of the National
Historical Publications and Records Commission (NHPRC) should go to
electronic records preservation and management and correspondingly less
money should go to the letterpress Documentary Editions that the
commission also funds, since "most of the records represented by the
documentary editions are not immediately threatened." This stance does
not endear him to documentary editors, who are much better represented
among professional historians than are archivists. 64

The alternative to squabbling over inadequate resources that are
appropriated for these purposes is joint action to secure further funds.
When Shirley Baker, president of the Association of Research Libraries,
challenged historian Robert Darnton's favorable review of Baker's book
and noted "choices have always had to be made" in the absence of
"greater public commitment to the preservation of the historical
record," Darnton responded by urging the establishment of "a new kind of
national library dedicated to the preservation of cultural artifacts"
(including disappearing digital records) and funded by income generated
by the sale or rental of bandwidth.65 Such state-based solutions return us to the kind
of alliance between historians and archivists that led to building of
the National Archives in the 1930s, an era of growing rather than waning
confidence in the nation-state. Historians need to join in lobbying
actively for adequate funding for both current historical work and
preservation of future resources. They should also argue forcefully for
the democratized access to the historical record that digital media make
possible. And they must add their voices to those calling for expanding
copyright deposit–and opposing copyright extension, for that matter–of
digital materials so as to remove some of the legal clouds hanging over
efforts like the Internet Archive and to halt the ongoing privatization
of historical resources. Even in the absence of state action, historians
should take steps individually and within their professional
organizations to embrace the culture of abundance made possible by
digital media and expand the public space of scholarship–for example,
making their own work available for free on the web, cross-referencing
other digital scholarship, and perhaps depositing their sources online
for other scholars to use. A vigorous public domain today is a
prerequisite for a healthy historical record.66

More than a century ago, Justin Winsor, the third president of the
AHA, concluded his Presidential Address–focused on a topic that would be
considered odd today, that of preserving manuscript sources for the
study of history–with a plea to the AHA "to convince the National
Legislature" to support a scheme "before it is too late" to preserve and
make known "what there is still left to us of the historical manuscripts
of the country." For founders of the historical profession such as
Winsor, the need to engage with history broadly defined–not just how it
was researched but also how it was taught in the schools or preserved in
archives–came naturally; it was part of creating a historical
profession. 67
In the early twenty-first century, we are likely to be faced with
recreating the historical profession, and we will be well served by such
a broad vision of our mission. If the past is to have an abundant
future, if the story of Bert Is Evil and hundreds of other stories are
to be fully told, then historians need to act in the present.

About the Author

This article has benefited greatly from the generous and astute
comments of a number of friends and colleagues: Joshua Brown, Michael
Grossberg, Deborah Kaplan, Gary Kornblith, Michael O'Malley, Kelly
Schrum, Abby Smith, James Sparrow, Robert Townsend, and four anonymous
readers for the American Historical Review. My thanks also to Laurel
Thatcher Ulrich and Pat Denault of the Charles Warren Center at Harvard
University for providing the congenial setting in which most of this was
written.

Roy Rosenzweig is College of Arts and Sciences Distinguished Professor
of History and director of the Center for History and New Media (
http://chnm.gmu.edu ) at George Mason University. His books include
The Presence of the Past: Popular Uses of History in American Life
(1998), co-authored with David Thelen; The Park and the People: A
History of Central Park (1992), co-authored with Elizabeth Blackmar; and
Eight Hours for What We Will: Workers and Leisure in an Industrial City,
1870-1920 (1983). He is working on a book examining how new media and
technology has changed–and might change–historical research and
scholarship, teaching, museums, and archives, as well as popular history
making.

4.Arcot
Rajasekar, Richard Marciano, and Reagan Moore, "Collection-Based
Persistent Archives," http://www.sdsc.edu/NARA/Publications/OTHER/Persistent/Persistent.
html; U.S. Congress, House Committee on Government Operations,
Taking a Byte out of History: The Archival Presentation of Federal
Computer Records, HR 101-987 (Washington, D.C., 1990); National Academy
of Public Administration, The Effects of Electronic Recordkeeping on
the Historical Record of the U.S. Government (Washington, D.C.,
1989), 8, 29; Joel Achenbach, "The Too-Much-Information Age," Washington
Post (March 12, 1999): A01; General Accounting Office (hereafter, GAO), Information
Management: Challenges in Managing and Preserving Electronic
Records (Washington, D.C., 2002), 11, 66. See also Alexander
Stille, The Future of the Past (New York, 2002), 306; Richard
Harvey Brown and Beth Davis-Brown, "The Making of Memory: The Politics
of Archives, Libraries, and Museums in the Construction of National
Consciousness," History of the Human Sciences 11, no. 4 (1998):
17-32; Deanna Marcum, "Washington Post Publishes Letter from Deanna
Marcum," CLIR Issues, no. 2 (March/April 1998), http://www.
clir.org/pubs/issues/issues02.html#post.

5.John Higham,
History: Professional Scholarship in America (1965; rpt. edn.,
Baltimore, 1983), 16-20. See also American Historical Association
Committee on Graduate Education, The Education of Historians in the
21st Century (Urbana, Ill., forthcoming 2004). To observe this
broader vision is not to deny the very different historical
circumstances (such as the disorganization of archives), the obvious
blindness of the early professional historians on many matters (such as
race and gender), and the early tensions between "amateurs" and
professionals.

6.For
interesting observations on "abundance" in two different realms of
historical work, see James O'Toole, "Do Not Fold, Spindle, or Mutilate:
Double Fold and the Assault on Libraries," American Archivist
64 (Fall/Winter 2001): 385-93; John McClymer, "Inquiry and Archive in a
U.S. Women's History Course," Works and Days 16, nos. 1-2
(Spring/Fall 1998): 223. For a sweeping statement about political and
cultural implications of "digital information that moves frictionlessly
through the network and has zero marginal cost per copy," see Eben
Moglen, "Anarchism Triumphant: Free Software and the Death of
Copyright," First Monday 4, no. 8 (August 1999), http:
//www.firstmonday.dk/issues/issue4_8/moglen/index.html.

7.Committee on
the Records of Government, Report (Washington, D.C., 1985), 9 (the
committee was created by the American Council of Learned Societies, the
Council on Library Resources, and the Social Science Research Council
with funding from the Mellon, Rockefeller, and Sloan foundations); John
Garrett and Donald Waters, Preserving Digital Information: Report of
the Task Force on Archiving of Digital Information (Washington,
D.C., 1996); Paul Conway, Preservation in the Digital World
(Washington, D.C., 1996), http://www.
clir.org/pubs/reports/conway2/index.html. For other reports with
similar conclusions, see, for example, the 1989 report of the National
Association of Government Archives and Records Administrators, cited in
Margaret Hedstrom, "Understanding Electronic Incunabula: A Framework for
Research on Electronic Records," American Archivist 54 (Summer
1991): 334-54; House Committee on Government Operations, Taking a
Byte out of History; Committee on an Information Technology Strategy for
the Library of Congress, Computer Science and Telecommunications Board,
Commission on Physical Sciences, Mathematics, and Applications, and the
National Research Council, LC21: A Digital Strategy for the Library of
Congress (Washington, D.C., 2000), http://books.nap.edu/
html/lc21/index.html; GAO, Information Management; NHPRC Electronic
Records Agenda Final Report (Draft) (St. Paul, Minn., 2002).

8.Margaret
MacLean and Ben H. Davis, eds., Time and Bits: Managing Digital
Continuity (Los Angeles, 1998), 11, 6; Jeff Rothenberg, Avoiding
Technological Quicksand: Finding a Viable Technical Foundation for
Digital Preservation (Washington, D.C., 1998), http://
www.clir.org/pubs/reports/rothenberg/contents.html. The 1997
conference "Documenting the Digital Age" has also disappeared from the
web, nor is it available in the Internet Archive. The Sanders film is
available from the Council on Library and Information Resources, http://www.clir.
org/pubs/film/future/order.html.

9.Achenbach,
"Too-Much-Information Age." See also Stille, Future of the Past;
Council on Library and Information Resources (hereafter, CLIR),
The Evidence in Hand: Report of the Task Force on the Artifact in
Library Collections (Washington, D.C., 2001), http://www.
clir.org/pubs/reports/pub103/contents.html.

10.Margaret
O. Adams and Thomas E. Brown, "Myths and Realities about the 1960
Census," Prologue: Quarterly of the National Archives and Records
Administration 32, no. 4 (Winter 2000), http://www.archives.gov/publications/
prologue/winter_2000_1960_census.html. See also letter of August 15,
1990, from Kenneth Thibodeau, which says that recovering the records
took "substantial efforts" by the Bureau of the Census, quoted in House
Committee on Government Operations, Taking a Byte out of History, 3.
According to Timothy Lenoir, it is now too expensive to rescue the
computer tapes that represent Douglas Englebart's pioneering
hypermedia-groupware system called NLS (for oNLine System)–the basis of
many of the features of personal computers. Timothy Lenoir, "Lost in the
Digital Dark Ages" (paper delivered at "The New Web of History: Crafting
History of Science Online," Cambridge, Mass., March 28, 2003).

11.Marcia
Stepanek, "From Digits to Dust," Business Week (April 20, 1998); House
Committee on Government Operations, Taking a Byte out of History, 16;
Jeff Rothenberg, "Ensuring the Longevity of Digital Documents,"
Scientific American (January 1995): 42-47. See also Garrett and
Waters, Preserving Digital Information. Many Vietnam records are stored
in a database system that is no longer supported and can only be
translated with difficulty. As a result, the Agent Orange Task Force
could not use important herbicide records. Stille, Future of the
Past, 305.

12.Most
Microsoft software moves into what the company calls the "non-supported
phase" after just four or five years, although it offers a more limited
"extended support phase" that lasts up to seven years. After that, you
are out of luck. Microsoft, "Windows Desktop Product Life Cycle Support
and Availability Policies for Businesses," October 15, 2002, http://www.
microsoft.com/windows/lifecycle.mspx; Lori Moore, "Q&A: Microsoft
Standardizes Support Lifecycle," Press Pass: Information for Journalists
(October 15, 2002), http://www.microsoft.com/presspass/features/2002/Oct02/10
-15support.asp. On media longevity, see Rothenberg, Avoiding
Technological Quicksand; MacLean and Davis, Time and Bits; Margaret
Hedstrom, "Digital Preservation: A Time Bomb for Digital Libraries"
(paper delivered at the NSF Workshop on Data Archiving and Information
Preservation, March 26-27, 1999), http://www.uky.edu
/~kiernan/DL/hedstrom.html; Frederick J. Stielow, "Archival Theory
and the Preservation of Electronic Media: Opportunities and Standards
below the Cutting Edge," American Archivist 55 (Spring 1992):
332-43; Charles M. Dollar, Archival Theory and Information
Technology: The Impact of Information Technologies on Archival
Principles and Methods (Ancona, Italy, 1992), 27-32; GAO, Information
Management, 50-52.

14.Margaret
Hedstrom, "How Do We Make Electronic Archives Usable and Accessible?"
(paper delivered at "Documenting the Digital Age," San Francisco,
February 10-12, 1997); Luciana Duranti, "Diplomatics: New Uses for an
Old Science," Archivaria 28 (Summer 1989): 7-27; Peter B.
Hirtle, "Archival Authenticity in a Digital Age," in Council on
Library and Information Resources, Authenticity in a Digital
Environment (Washington, D.C., 2000), http://www.
clir.org/pubs/reports/pub92/contents.html; CLIR, Evidence in Hand;
Susan Stellin, "Google's Revival of a Usenet Archive Opens Up a Wealth
of Possibilities But Also Raises Some Privacy Issues," New York Times
(May 7, 2001): C4; David Bearman and Jennifer Trant, "Authenticity of
Digital Resources: Towards a Statement of Requirements in the Research
Process," D-Lib Magazine 4, no. 6 (June 1998), http://www.dlib.
org/dlib/june98/06bearman.html.

16.Brewster
Kahle, Rick Prelinger, and Mary E. Jackson, "Public Access to Digital
Materials" (white paper delivered at the Association of Research
Libraries and Internet Archive Colloquium "Research in the
'Born-Digital' Domain," San Francisco, March 4, 2001), available at http://www.
dlib.org/dlib/october01/kahle/10kahle.html.

19.Committee
on Intellectual Property Rights in the Emerging Information
Infrastructure, National Research Council, Digital Dilemma.

20.As with
our network of research libraries, this system is a modern invention.
The first public governmental archive came with the French Revolution;
the British Public Record Office opened in 1838, and the National
Archives is of startlingly recent vintage: the legislation establishing
it did not come until 1934. Donald R. McCoy, "The Struggle to Establish
a National Archives in the United States," in Guardian of Heritage:
Essays on the History of the National Archives, Timothy Walch, ed.
(Washington, D.C., 1985), 1-15.

21.Don
Waters, "Wrap Up" (paper delivered at the DAI Institute, "The State of
Digital Preservation: An International Perspective," Washington, D.C.,
April 25, 2002), available at http://www.
clir.org/pubs/reports/pub107/contents.html; Dale Flecker,
"Preserving Digital Periodicals," in CLIR, Building a National
Strategy for Digital Preservation.

22.Michael
L. Miller, "Assessing the Need: What Information and Activities Should
We Preserve?" (paper delivered at "Documenting the Digital Age," San
Francisco, February 10-12, 1997), copy in possession of author. To be
sure, it has been biased toward the preservation of the records of the
rich and powerful, although in more recent years energetic, "activist
archivists" have sought out more diverse sets of materials. Ian
Johnston, "Whose History Is It Anyway?" Journal of the Society of
Archivists 22, no. 2 (2001): 213-29.

23.See
Adrian Cunningham, "Waiting for the Ghost Train: Strategies for Managing
Electronic Personal Records before It Is Too Late" (paper delivered at
the Society of American Archivists Annual Meeting, Pittsburgh, August
23-29, 1999), available at http://www.rbarry.
com/cunningham-waiting2.htm. For numbers of commercial
word-processing programs, see House Committee on Government Operations,
Taking a Byte out of History, 15.

24.SRA
International, Report on Current Recordkeeping Practices within the
Federal Government (Arlington, Va., 2001), http://www.archives.gov/
records_management/pdf/ report_on_recordkeeping_practices.pdf. This
report responded to an earlier GAO report: U.S. Government Accounting Office, National
Archives: Preserving Electronic Records in an Era of Rapidly Changing
Technology (Washington, D.C., 1999). Archival consultant Rick Barry
reports that four-fifths of e-mail creators he surveyed "do not have a
clue" whether their e-mail was an official record and that most are
"largely unaware" of official e-mail policies. Quoted in David A.
Wallace, "Recordkeeping and Electronic Mail Policy: The State of Thought
and the State of the Practice" (paper delivered at the Annual Meeting of
the Society of American Archivists, Orlando, Florida, September 3,
1998), http://www.rbarry.com/wallace.
html.

26.See
Stewart Granger, "Emulation as a Digital Preservation Strategy,"
D-Lib Magazine 6, no. 10 (October 2000), http://
www.dlib.org/dlib/october00/granger/10granger.html, on this as the
"dominant" approach. An even earlier intervention version of "migration"
is to move digital objects to "standardized" formats immediately or as
quickly as possible, to put them in non-proprietary, open-source,
commonly accepted formats (for instance, ASCII for text, .tiff for
images, etc.) that are likely to be around for a long time. Of course,
popular standards are no guarantee of longevity; in 1990, NARA was
arguing that spreadsheets formatted for Lotus 1-2-3 were not a
preservation problem since the program was so "widespread." House
Committee on Government Operations, Taking a Byte out of History, 12.

27.Warwick
Cathro, Colin Webb, and Julie Whiting, "Archiving the Web: The PANDORA
Archive at the National Library of Australia" (paper delivered at
"Preserving the Present for the Future Web Archiving," Copenhagen, June
18-19, 2001). See also Diane Vogt-O'Connor, "Is the Record of the 20th
Century at Risk?" CRM: Cultural Resource Management 22, no. 2
(1999): 21-24.

29.Margaret
Hedstrom, "Digital Preservation: Matching Problems, Requirements and
Solutions" (paper delivered at the NSF Workshop on Data Archiving and Information
Preservation, March 26-27, 1999), NSFWorkshop/hedpp.html">http://cecssrv1.cecs.missouri
.edu/NSFWorkshop/hedpp.html (accessed March 2002 but
unavailable in May 2003). See also Margaret Hedstrom, "Research Issues
in Digital Archiving" (paper delivered at the DAI Institute, "The State
of Digital Preservation: An International Perspective, Washington, D.C.,
April 25, 2002, available at http://www.
clir.org/pubs/reports/pub107/contents.html). Rothenberg himself is
currently undertaking research on emulation, and other emulation
research is going on at the University of Michigan and Leeds University
and at IBM's Almaden Research Center in San Jose, California. Daniel
Greenstein and Abby Smith, "Digital Preservation in the United States:
Survey of Current Research, Practice, and Common Understandings" (paper
delivered at "Preserving History on the Web: Ensuring Long-Term Access
to Web-Based Documents," Washington, D.C., April 23, 2002), copy in
possession of author. More recently, Rothenberg has apparently tempered
his position on emulation versus migration.

30.David
Bearman and Jennifer Trant, "Electronic Records Research Working
Meeting, May 28-30, 1997: A Report from the Archives Community,"
D-Lib Magazine 3, nos. 7-8 (July/August 1997), http://www.dlib.
org/dlib/july97/07bearman.html; Terry Cook, "The Impact of David
Bearman on Modern Archival Thinking: An Essay of Personal Reflection and
Critique," Archives and Museum Informatics 11 (1997): 23. See
further Margaret Hedstrom, "Building Record-Keeping Systems: Archivists
Are Not Alone on the Wild Frontier," Archivaria 44 (Fall 1997): 46-48.
See also David Bearman and Ken Sochats, "Metadata Requirements for
Evidence," in University of Pittsburgh, School of Information Sciences,
the Pittsburgh Project, NHPRC/">http://www.archimuse.com/papers/NHPRC/. (Many parts of this site have
disappeared, but this undated paper is available at NHPRC/BACartic.html">http://www.archimuse.com/papers/
NHPRC/BACartic.html.) David Bearman, "An
Indefensible Bastion: Archives as Repositories in the Electronic Age,"
in Bearman, ed., Archival Management of Electronic Records (Pittsburgh,
1991), 14-24; Margaret Hedstrom, "Archives as Repositories–A
Commentary," in ibid.

31.Cook,
"Impact of David Bearman on Modern Archival Thinking," 15-37. From
another perspective, the Pitt Project broadened, rather than narrowed,
the concerns of electronic archivists, since previously the focus had
been on statistical databases. In one effort to join the emphasis on
records as evidence with a broader social cultural focus, Margaret
Hedstrom argues that "to benefit fully from the synergy between business
needs and preservation requirements, cultural heritage concerns should
be linked to equally critical social goals, such as monitoring global
environment change, locating nuclear waste sites, and establishing
property rights, all of which also depend on long-term access to
reliable, electronic evidence." Quoted in Richard J. Cox, "Searching for
Authority: Archivists and Electronic Records in the New World at the
Fin-de-Siècle," First Monday 5, no. 1 (January 3, 2000), http://
firstmonday.org/issues/issue5_1/cox/index.html. The Pitt Project has
been the subject of enormous discussion and significant debate among
archivists; a full and nuanced treatment of the subject is beyond the
scope of this article. Whereas Cook offers serious criticism of Bearman,
the leader of the project along with Richard Cox, he also celebrates
Bearman as "the leading archival thinker of the late twentieth century."
Linda Henry offers a sweeping attack on Bearman and other advocates of a
"new paradigm" in electronic records management in "Schellenberg in
Cyberspace," American Archivist 61 (Fall 1998): 309-27. A more
recent critique is Mark A. Greene, "The Power of Meaning: The Archival
Mission in the Postmodern Age," American Archivist 65, no. 1
(Spring/Summer 2002): 42-55. Terry Cook puts the story in historical
perspective (but from his particular perspective) in "What Is Past Is
Prologue: A History of Archival Ideas since 1898, and the Future
Paradigm Shift," Archivaria 43 (Spring 1997), available at http://www.
rbarry.com/cookt-pastprologue-ar43fnl.htm. The project "Preservation
of the Integrity of Electronic Records" (called the UBC Project because
it was carried out at the University of British Columbia) and the
InterPARES project (International Research on Permanent Authentic
Records in Electronic Systems), which built on the UBC Project, have
taken a different approach, but they share the Pitt Project's emphasis
on the problem of "authenticity" and on "records" rather than the
broader array of sources that generally interest historians. Luciana
Duranti, The Long-Term Preservation of Authentic Electronic Records:
Findings of the InterPARES Project (Vancouver, 2002), http://www.interpares.
org/book/index.htm. The December 2002 draft of the NHPRC Electronic Records Agenda Final Report suggests
that the consensus among archivists is moving toward a broader
definition of records. My understanding of these issues has been greatly
aided by attending the December 8-9, 2002, meeting convened to discuss
that agenda and by conversations with Robert Horton of the Minnesota
Historical Society, who is the leader of that effort.

37.Raymie
Stata, "The Internet Archive" (paper delivered at the conference
"Preserving Web-Based Documents," Washington, D.C., April 23, 2002). On
deep versus surface web, see Lyman, "Archiving the World Wide Web"; Roy
Rosenzweig, "The Road to Xanadu: Public and Private Pathways on the
History Web," Journal of American History 88, no. 2 (September
2001): 548-79, also available at
http://chnm.gmu.edu/assets/historyessays/e1/roadtoxanadu1.html.
Kahle himself indicates many of the problems and limitations of the
Internet Archive in Brewster Kahle, "Archiving the Internet: Bold
Efforts to Record the Entire Internet Are Expected to Lead to New
Services" (paper presented at "Documenting the Digital Age," San
Francisco, February 10-12, 1997), copy in possession of author.

38.On robots
exclusion, see http://www.
robotstxt.org/wc/exclusion-admin.html. Apparently, the IA will
retroactively block a site without direct request, if it simply posts
the robots.txt file. This would seem to mean that if someone took over
an expired domain name, they could then block access to the prior
content. There is some evidence, however, that the IA does not actually
"purge" the content, it simply makes it inaccessible. For an intense
discussion of these issues, see the hundreds of online postings in "The
Wayback Machine, Friend or Foe?" Slashdot (June 19-20, 2002),
http:/
/ask.slashdot.org/askslashdot/02/06/19/1744209.shtml. For a
pessimistic assessment of the legality of the IA's practices (though not
explicitly directed at it), see I. Trotter Hardy, "Internet Archives and
Copyright" (paper delivered at "Documenting the Digital Age," San
Francisco, February 10-12, 1997), copy in possession of author.

39.Insiders
have commented to me that the IA would disappear if Kahle left the
project. But there are very recent signs that the IA is broadening its
base of financial support.

41.Thomas
Brown, "What Is Past Is Analog: The National Archives Electronic Records
Program since 1968" (paper delivered at the OAH Annual Meeting,
Washington, D.C., 2002), copy in possession of author. In 1997, Kenneth
Thibodeau estimated that the NARA invested only token amounts (2 percent of
its budget) in electronic records. Gardner, "Report on Documenting the
Digital Age."

43."Background Information about PANDORA: The
National Collection of Australian Online Publications," PANDORA, http://pandora.nla.gov.
au/background.html; Cathro, Webb, and Whiting, "Archiving the Web";
Colin Webb, "National Library of Australia" (paper delivered at the DAI
Institute, "The State of Digital Preservation: An International
Perspective," Washington, D.C., April 25, 2002, available at http://www.
clir.org/pubs/reports/pub107/contents.html). For British efforts to
cope with digital materials, see Jim McCue, "Can You Archive the Net?"
Times (London) (April 29, 2002). On Sweden and Norway, see
Warwick Cathro, "Archiving the Web," National Library of Australia
Gateways 52 (August 2001), http://www.nla.
gov.au/ntwkpubs/gw/52/p11a01.html/.

44.There is
anecdotal evidence that this is being seriously considered.

45.National
Archives and Records Administration, Proposal for a Redesign of Federal
Records Management (July 2002), 10, http://www.archives.gov/records_management/initiatives/
rm_redesign.html; Richard W. Walker, "For the Record, NARA Techie
Aims to Preserve," Government Computer News 20, no. 21 (July 30, 2001),
http://www.gcn.com
/vol20_no21/news/4752-1.html/; GAO, Information Management, 50. So far, POP remains, as
a NARA
staff member explained in April 2001, "beyond the state of the art of
information technology." Adrienne M. Woods, "Toward Building the
Archives of the Future" (paper delivered at the Society of California
Archivists' Annual Meeting, April 27, 2001), accessed online May 1,
2002, but not available as of June 20, 2002. See also Kenneth Thibodeau,
"Overview of Technological Approaches to Digital Preservation and
Challenges in Coming Years" (presentation at the DAI Institute, "The
State of Digital Preservation: An International Perspective,"
Washington, D.C., April 24-25, 2002, available at http://www.
clir.org/pubs/reports/pub107/contents.html). In June 2002, the GAO reported that, in general,
NARA's
electronic records project "faces substantial risks" and "is already
behind schedule." GAO,
Information Management, 3.

47.The quote
is often incorrectly attributed to Carl von Clausewitz. It could be that
it is simply a reworking of Voltaire's remark that "le mieux est l'enemi
du bien" (the best is the enemy of the good) or of George S. Patton's
dictum, "A good plan violently executed now is better than a perfect
plan executed next week."

49.McCoy,
"Struggle to Establish a National Archives in the United States," 1, 12.
Indeed, one digital preservation program–LOCKSS (Lots of Copies Keep
Stuff Safe)–relies on precisely this principle: http://lockss.stanford.edu/.

50.Lee
Dembart, "Go Wayback," International Herald Tribune (March 4,
2002), http://www.iht.com/cgi-bin/generic.cgi?template=
articleprint.tmplh&ArticleId=50002; "Seeing the Future in the Web's
Past," BBC News (November 12, 2001). See also Joseph Menn, "Net Archive
Turns Back 10 Billion Pages of Time," Los Angeles Times
(October 25, 2001): A1; Heather Green, "A Library as Big as the World,"
Business Week Online (February 28, 2002), http://www.businessweek.com/technology/content/
feb2002/tc20020228_1080.htm. The dream of a universal archive is
also the nightmare of privacy advocates. In the paper era, the physical
bulk of personnel files and bank, criminal, and medical records made
them more likely to wind up in landfills than in archives. Even when
preserved, the possibility of retrospective prying (was your neighbor's
grandfather a deadbeat or a drunk?) was reduced by the sheer tedium of
sorting through thousands of pages of records. But what if sophisticated
data-mining tools ("tell me everything about my neighbors") made such
searching easy? Even the "public" material on the web poses ethical
challenges for historians. "The woman who is going to be elected
president in 2024 is in high school now, and I bet she has a home page,"
exclaims Kahle. The Internet Archives has "the future president's home
page!" Perhaps. But it also has the home pages of many other high school
students, at least some of whom are going through serious emotional
turmoil that they might later prefer to keep from public view. Kahle
himself wrote a prescient 1992 article, the "Ethics of Digital
Librarianship," which worries about "types of information that will be
accessible" as "the system grows to include entertainment, employment,
health and other servers." Menn, "Net Archive Turns Back 10 Billion
Pages"; Wood, "CNET's Web Know-It-All"; Kahle quoted in John Markoff,
"Bitter Debate on Privacy Divides Two Experts," New York Times
(December 30, 1999): C1. See also Jean-François Blanchette and Deborah
G. Johnson, "Data Retention and the Panoptic Society: The Social
Benefits of Forgetfulness," Information Society 18 (2002):
33-45; Marc Rotenberg, "Privacy and the Digital Archive: Outlining Key
Issues" (paper delivered at "Documenting the Digital Age," San
Francisco, February 10-12, 1997), copy in possession of author; "Wayback
Machine, Friend or Foe?"

51.Mike
Featherstone, "Archiving Cultures," British Journal of Sociology 51, no.
1 (January 2000): 178, 166. For examples of enthusiastic prophecy about
such changes, see Francis Cairncross, The Death of Distance: How the
Communications Revolution Will Change Our Lives (Boston, 1997); Kevin
Kelly, "New Rules for the New Economy," Wired 5, no. 9 (September 1997),
http://
www.wired.com/wired/archive/5.09/newrules_pr.html. For a sober and
sensible critique, see John Seely Brown and Paul Duguid, The Social Life
of Information (Boston, 2000), 11-33.

53.See, for
example, Geoffrey J. Giles, "Archives and Historians: An Introduction,"
in Archives and Historians: The Crucial Partnership
(Washington, D.C., 1996), 5-13, who writes that "there is too much
archival material for the archivists and for the historian to deal with"
and notes feelings of "envy" of "ancient and medieval historians, who
have so little material with which to work."

55.Linton
Weeks, "Power Biographer," Washington Post (April 25, 2002):
C01. Carl Bridenbaugh's derisive view of sampling provides a good
example of the traditional view that historians should look at
everything. "The Great Mutation," AHR 68, no. 2 (January 1963): 315-31, also available with
other Presidential Addresses at AHA.org/info/AHA_History/cbridenbaugh.htm">http://www.theAHA.org/info/AHA_History/cbridenbaugh.htm. Nevertheless,
historians have always struggled with the problem of how to deal with
large numbers of sources. Even medievalists worry about how to make
sense of the huge numbers of documents that survive from twelfth-century
Italy. Still, the digital era vastly increases the scale of the problem.

56.Stellin,
"Google's Revival of a Usenet Archive Opens Up a Wealth of
Possibilities"; Hedstrom, "How Do We Make Electronic Archives Usable and
Accessible?" (paper delivered at "Documenting the Digital Age," San
Francisco, February 10-12, 1997), copy in possession of author.

57.To be
sure, a number of key figures in digital archives and library circles
(for example, Daniel Greenstein, Margaret Hedstrom, Abby Smith, Kenneth
Thibodeau, Bruce Ambacher) have doctoral degrees in history, but they do
not currently work as academic historians. Still, it would be logical
for academic historians to build alliances with these scholars who have
a foot in both camps. Thus far, academic historians have been much more
likely to build ties to historians working in museums and historical
societies than to those in archives and libraries.

58.It is
difficult to prove a negative, but one searches in vain through the
participant lists at key digital archives conferences for the names of
practicing historians. One exception was the Committee on the Records of
Government, which had a historian, Ernest R. May, as its chair and
another, Anna K. Nelson, as its project director. But perhaps
significantly, that committee had a mandate that dealt as much with
paper as electronic records: Committee on the Records of Government,
Report (1985). Another partial exception was the February 1997
conference "Documenting the Digital Age" sponsored by NSF, MCI
Communications Corporation, Microsoft Corporation, and History
Associates Incorporated, which included a few public and museum-based
historians but only one university-based historian. Similarly, history
journals have provided almost no coverage of these issues. Archivists
are not reading historians, either. Richard Cox analyzed the almost
1,200 citations in 61 articles on electronic records management
published in the 1990s and found only a handful of references to work by
historians. Cox, "Searching for Authority."

59.Cox,
"Messrs. Washington, Jefferson, and Gates." Robert Townsend, Assistant
Director of Research and Publications, AHA, kindly supplied membership information. One
imperfect but telling indicator of the changing interests of
professional historians: Between 1895 and 1999, the American Historical
Review published thirty-one articles with one of the following words in
the title: archive or archives, records, manuscripts, correspondence.
Only four of those appeared after World War II, and they were in 1949,
1950, 1952, and 1965. Some representative titles include: Charles H.
Haskins, "The Vatican Archives," AHR 2,
no. 1 (October 1896): 40-58; Waldo Gifford Leland, "The National
Archives: A Programme," AHR 18, no. 1 (October
1912): 1-28; Edward G. Campbell, "The National Archives Faces the
Future," AHR 49, no. 3 (April 1944): 441-45. For
a good, brief overview of the AHA's active, early archive and manuscript
work, see Arthur S. Link, "The American Historical Association,
1884-1984: Retrospect and Prospect," AHR 90,
no. 1 (February 1985): 1-17. NARA's "Timeline for the National Archives and
Records Administration and the Development of the U.S. Archival
Profession," NARA_timeline.html">http://www.archives.gov/
research_room/alic/reference_desk/NARA_timeline.html, highlights the
role of the AHA. It should be
noted, however, that the AHA has made a notable contribution to
archival issues through its central role in the National Coordinating
Committee for the Promotion of History (NCC), which
was crucial, for example, in winning the independence of the National
Archives in 1984. The new National Coalition for History, which has
replaced the NCC, has also made archival concerns
central to its work. Access to archives and primary sources was, of
course, a central preoccupation–indeed, an obsession–of early
"scientific" and professional historians. See Bonnie G. Smith, "Gender
and the Practices of Scientific History: The Seminar and Archival
Research in the Nineteenth Century," AHR 100,
no. 4 (October 1995): 1150-76.

60.Deanna B.
Marcum, "Scholars as Partners in Digital Preservation," CLIR
Issues, no. 20 (March/April 2001), http://www.clir.org
/pubs/issues/issues20.html. "Scholars," warns the CLIR Task Force on
the Artifact in Library Collections, "may not see preservation of
research collections as their responsibility, but until they do, there
is a risk that many valuable research sources will not be preserved."
CLIR, Evidence in Hand.

61. I am
indebted to Jim Sparrow for a number of the ideas in this paragraph. For
detailed coverage of "How the Bellesiles Story Developed," see History
News Network, http://hnn.us/articles/691.html.

62. House
Committee on Government Operations, Taking a Byte out of History, 4. For
the assumption of selectivity among archivists, see, for instance,
Richard J. Cox, "The Great Newspaper Caper: Backlash in the Digital
Age," First Monday 5, no. 12 (December 2000),
http://firstmonday.org/issues/issue5_12/cox/index.html .

63. Abby
Smith, The Future of the Past: Preservation in American Research
Libraries (Washington, D.C., 1999),
www.clir.org/pubs/reports/pub82/pub82text.html ; Marcum, "Scholars as
Partners in Digital Preservation"; Nicholson Baker, Double Fold:
Libraries and the Assault on Paper (New York, 2001). Compare, for
example, Cox, "Great Newspaper Caper," and O'Toole, "Do Not Fold,
Spindle, or Mutilate," with Robert Darnton, "The Great Book Massacre,"
New York Review of Books (April 26, 2001),
www.nybooks.com/articles/14196 . In 1996, the Modern Language
Association (MLA) issued a statement arguing "that for practical
purposes, all historical publications, even those produced by
mass-production techniques designed to minimize deviations from a norm,
have unique physical qualities that may have value as a carrier of
(physical) evidence in a given research project." CLIR, Evidence in
Hand.

64. GAO,
Information Management, 16; Cox, "Messrs. Washington, Jefferson, and
Gates." Cox's article responded, in part, to an earlier article by
Raymond W. Smock that argues, "historians should not rely on archivists
alone to make decisions about what history to save or to publish."
Smock, "The Nation's Patrimony Should Not Be Sacrificed to Electronic
Records," Chronicle of Higher Education (February 14, 1997): B4-5.

65. Robert
Darnton, Sarah A. Mikel, and Shirley K. Baker, "The Great Book Massacre:
An Exchange," New York Review of Books (March 14, 2002),
www.nybooks.com/articles/15195.

66. See, for
example, Vincent Kiernan, "'Open Archives' Project Promises Alternative
to Costly Journals," Chronicle of Higher Education (December 3, 1999);
Budapest Open Access Initiative, www.soros.org/openaccess. On questions
of public domain and privatization, see Lawrence Lessig, The Future of
Ideas: The Fate of the Commons in a Connected World (New York,
2001).