I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.

Tuesday, September 8, 2015

Infrastructure for Emulation

I've been writing a report about emulation as a preservation strategy.
Below the fold, a discussion of one of the ideas that I've been thinking
about as I write, the unique position national libraries are in
to assist with building the infrastructure emulation needs to succeed.
Less and less of the digital content that forms our cultural heritage consists
of static documents, more and more is dynamic. Static digital documents
have traditionally been preserved by migration. Dynamic content is generally not
amenable to migration and must be preserved by emulation.

Successful emulation requires the entire software stack be preserved. Not just the bits the content creator generated and over which the creator
presumably has rights allowing preservation, but also the operating system,
libraries, databases and services upon which the execution of the bits depends.
The creator presumably has no preservation rights over this software,
necessary for the realization of their work. A creator wishing to ensure that
future audiences can access their work has no legal way to do so. In fact,
creators cannot even legally sell their work in any durably accessible form.
They do not own an instance of the infrastructure upon which it depends,
they merely have a (probably non-transferable) license to use an instance of it.

Thus a key to future scholars' ability to access the cultural heritage of
the present is that in the present all these software components be
collected, preserved, and made accessible. One way to do this would be for
some international organization to establish and operate a global archive
of software. In an initiative called PERSIST, UNESCO is considering setting
up such a Global Repository of software. The technical problems of doing so
are manageable, but the legal and economic difficulties are formidable.

The intellectual property frameworks, primarily copyright and the contract
law underlying the End User License Agreements (EULAs), under which
software is published differ from country to country. At least in the US,
where much software originates, these frameworks make collecting,
preserving and providing access to collections of software impossible
except with the specific permission of every copyright holder. The
situation in other countries is similar. International trade negotiations
such as the TPP are being used by copyright interests to make these
restrictions even more onerous.

For the hypothetical operator of the global software archive to
identify the current holder of the copyright on every software
component that should be archived, and negotiate permission with each
of them for every country involved, would be enormously expensive.
Research has shown that the resources devoted to current digital
preservation efforts, such as those for e-journals, e-books and the
Web, suffice to collect and preserve less than half of the material
in their scope. Absent major additional funding, diverting resources
from these existing efforts to fund the global software archive would
be robbing Peter to pay Paul.

Worse, the fact that the global software archive would need to obtain
permission before ingesting each publisher's software means that there
would be significant delays before the collection would be formed,
let alone be effective in supporting scholars' access.

An alternative approach worth considering would separate the issues
of permission to collect from the issues of permission to provide
access. Software is copyright. In the paper world, many countries had
copyright deposit legislation allowing their national library to
acquire, preserve and provide access (generally restricted to readers
physically at the library) to copyright material. Many countries,
including most of the major software producing countries, have
passed legislation extending their national library's rights to the digital domain.

The result is that most of the relevant national libraries already
have the right to acquire and preserve digital works, although not the
right to provide unrestricted access to them. Many national libraries
have collected digital works in physical form. For example, the German
National Library's CD-ROM collection includes half a million items.
Many national libraries are crawling the Web to ingest Web pages
relevant to their collections.

It does not appear that national libraries are consistently exercising
their right to acquire and preserve the software components needed to
support future emulations, such as operating systems, libraries and
databases. A simple change of policy by major national libraries could
be effective immediately in ensuring that these components were
archived. Each national library's collection could be accessed by
emulations on-site. No time-consuming negotiations with publishers
would be needed.

An initial step would be for national libraries to assess the set
of software components that would be needed to provide the basis for
emulating the digital artefacts already in their collections,
which of them were already to hand,
and what could be done to acquire the missing pieces. The German National Library is working on a project of this kind with the bwFLA team at the University of Freiburg, which will be presented at iPRES2015.

The technical infrastructure needed to make these diverse national
software collections accessible as a single homogeneous global
software archive is already in place. Existing emulation
frameworks access their software components via the Web, and the Memento protocol aggregates disparate collections into a single
resource.

Of course, absent publisher agreements it would not be legal for
national libraries to make their software collections accessible
in this way. But negotiations about the terms of access could proceed
in parallel with the growth of the collections. Global agreement would
not be needed; national libraries could strike individual, country-specific
agreements which would be enforced by their access control systems.

Incremental partial agreements would be valuable. For example, agreements
allowing scholars at one national library to access preserved software
components at another would reduce duplication of effort and storage
without posing additional risk to publisher business models.

By breaking the link that makes building collections dependent on
permission to provide access, by basing collections on the existing
copyright deposit legislation, and by making success depend on the
accumulation of partial, local agreements instead of a few comprehensive
global agreements, this approach could cut the Gordian knot that has so
far prevented the necessary infrastructure for emulation being established.

The Irish government has offered to cooperate with the US under the terms of their "Mutual Legal Assistance Treaty", but the US has not done so. This shows that the goal of the struggle is not to get timely access to the e-mails in question, but to establish a legal precedent that the US has jurisdiction over data anywhere in the world in the custody of companies with US operations.

Under whose jurisdiction would the global software archive be established? And how many other governments would claim jurisdiction over it?