Feature

Electronic Records Research Working Meeting, May 28-30, 1997: A Report from the
Archives Community

by David Bearman and Jennifer TrantIssues of digital preservation have caught the attention of the digital
libraries community in recent years, not in the least because of the work
of the Task Force on Archiving of Digital Information (1). However,
concerns in the archives and records management communities about
electronic records have been quite different from those expressed by the
library and preservation communities. Collaboration between these
communities is essential if we are going to design systems that ensure the
long-term preservation of electronic records.

A decade ago, most archivists thought about electronic records issues
much the way that librarians do today – as a problem of documenting and
preserving data files in specialized repositories. Since then, networked
computing has transformed the mechanisms of business communications.
Archivists have increasingly adopted the view that fundamental issues
regarding records capture and retention, whether in paper or electronic
form, are their identification, classification by provenance and retention
in context of use so that they can be understood. Only when these
challenges have been successfully met will questions of how or where to
keep records or how to provide access to them arise. Thus the “archives” as
files in need of retention and the “archives” as repository are issues only
after what are currently the most difficult challenges of day-to-day
recordkeeping have been satisfied.

Librarians and the preservation community still focus their attention on
electronic objects prepared and published as coherent entities to reside in
repositories. Thus they generally ignore the very real problem of acquiring
coherent records from disparate business information systems not designed
to keep records and rife with undocumented software and hardware
dependencies. Nor do they usually deal with objects whose content is
frequently splattered with proprietary, personal, private and legally
troublesome non-public data. So when archivists go to meetings of
librarians and preservationists focused on keeping electronic “archives”
they generally find the discussion overlooks the front end of the issue,
where records “happen.” Librarians and preservationists, meanwhile, find
it hard to understand how archivists can seemingly shrug off the back-end,
long-term retention issues as not terribly interesting and dependent on
technology developments very much out of the hands of either community.

In May 1997, a working meeting of international researchers and
practitioners of the archival approach to electronic recordkeeping was
organized in Pittsburgh by Archives & Museum Informatics. This meeting
focused primarily on the issues at the “front end,” before records can be
brought together to become the problem of any repository. The following
summary, however, is directed to the larger community. Issues of electronic
record creation and capture are shared by all those who have become
dependent upon technological systems to support their business processes.

Background

The 1997 meeting was modeled on similar sessions in 1994 and 1991. The
purpose of the meeting was to bring researchers familiar with work being
done worldwide together to define a set of clearly articulated research
questions that were the logical “next steps” for the field. Participants
were treated to a healthy dose of background reading to make it possible
for presenters to assume familiarity with the state of ongoing projects
worldwide (2).

The meeting confirmed the degree to which common ground has been reached
in the past several years. However, much research has focused on particular
portions of the problem: many solutions which appear independent are
actually interdependent. Tensions are emerging between practitioners who
want to “just get on with it,” and researchers who seem to be “peeling an
onion.” These tensions reveal a critical juncture in the development of
solutions for electronic records management. After a long period of
developing models, agreeing on terminology and defining problems, we seem
ready to begin serious testing of proposed solutions.
Much research remains, though. The following themes were explored in
presentations and breakout group discussions:

What makes an electronic record? How are records defined and what
metadata ensures their “recordness”?

Does policy adoption contribute to more effective electronic records
management? If so, what policies can best ensure electronic accountability
and integrity of records?

What business events generate records? How are these events recognized?

How can metadata about electronic records be captured? How can they
best be stored and maintained in relation to records?

How can records be maintained? What are the requirements for using them?

I. Definition of Records

The first session dealt with an issue that is crucial both to the law and
the technology of electronic records – the definition of electronic record.
Archivists distinguish between records and information or data; not all
information or data is a record. Records are those which were created in
the conduct of business and communicated between parties to that business.
Some archivists believe records must be “set aside” in the course of
business to be considered a record. In any case, being transacted in a
particular business context is crucial to a record, thus an adequate record
will contain evidence of the context of its creation. The consensus,
largely developed since 1990, is that

records are evidence of transactions (relationships of acts), means of
action and information about acts;

records are known by their metadata – for example, forms documentation;
ideal records metadata can be defined from societal understanding of
recordness;

any given record will be a better/worse (more/less risky) record for
having complete/incomplete metadata; and

the metadata is about content, context and structure.

Research into the definition of records has been focused on two major
groups of researchers at the University of Pittsburgh and the University of
British Columbia. Both were asked to summarize their findings about what
makes a record a record. Presentations by Luciana Duranti, Maria Guercio,
Richard Cox and Wendy Duff focused on the source of authority for, and
universality of, records metadata requirements. Driven by pragmatism, the
University of Pittsburgh team looked for “warrant” in the sources
considered authoritative by the practitioners of ancillary professions on
whom archivists rely – lawyers, auditors, IT personnel, etc. (See Duff,
Wendy M., “Compiling Warrant in Support of Functional Requirements,”
Bulletin of the American Society for Information Science, June/July 1997,
pp. 12-13.) In the European tradition, the UBC team examined the authority
of diplomatics, a discipline grounded in the juridical systems of early
modern Europe. To many, their differences on sources of authority (a more
philosophical issue about the nature of truth) were overshadowed by their
apparent agreement on basic characteristics and most concrete metadata
requirements of electronic records.

Subsequent discussions demonstrated that neither definition is adequate
for those responsible for managing electronic records or provides necessary
algorithmic specificity for systems to recognize records when they are
created by business events. The definitions put forward need to be
synthesized, and the common core elements of an electronic record must be
identified in a high-level definition useful across systems and
communities. Variable sets of metadata drawn from the warrant of different
juridical, business, organizational and procedural contexts could
supplement this core. In combination with an architecture to express
content, context and structure, a shared definition would provide a model
that maps the differing concepts and languages of the research projects.
This common semantic would enable collaboration across the discipline and
would provide a means of communication with record creators, users and
researchers in other disciplines.

A tension was inherent in the discussion of definitions of electronic
records. While a more generalized framework was seen as necessary to bridge
the philosophical differences of the researchers, it would not serve the
needs of those who are building systems. There, concrete expressions of
both the semantics and the syntax of electronic records and their
associated metadata are required urgently. The utility of the definition is
the basic issue.

II. Policy

The second session dealt with electronic record policy formulation at an
institutional, national and international level. Presenters included Luisa
Moscato of the Records Management Office of New South Wales and Greg O’Shea
of the Australian Archives who presented the process by which they
formulated a coordinated Australian professional policy, policies at the
state and national archives, and the cross-sectoral Australian Records
Management Standard, AS4390 (3). Peter Horsman of the Dutch National
Archives has worked within Dutch civil service to formulate policies in the
Netherlands. Both efforts in policy development have served as a major
vehicle for clarifying the roles and values of archival organizations, an
unanticipated benefit, regardless of whether the policies themselves result
in better recordkeeping.

The presenters agreed that broad frameworks directing people and
organizations to keep electronic records need to be accompanied by specific
performance standards, monitoring/reporting mechanisms, rewards and
penalties. The presentations reinforced the view that records result from
business processes and are the responsibility of process managers. Policy
is a strategic, and not fundamentally a technological, issue. But as yet we
know little about the acceptance or adherence to policies, the costs of
implementing (or even developing) them or the appropriate level of
granularity in implementation.

Discussion focused on the feasibility of implementing electronic records
management policies. If much of the responsibility for the creation and
retention of records is shifted to the desktop of individuals, how do we
maintain the quality of records? What are viable strategies in terms of
hardware and software implementation? Can we develop a generic set of
specifications? What role can professional “best practices” play, and how
do we train people to meet these new requirements?

Changes in policy require changes in accountability structures as well.
Can policies be enforced? Which mechanisms work? How can project managers,
whose output is measured in other business terms, be held accountable for
records management? Some organizations respond more readily to policy
changes than others. What kinds of organizations respond best to policy?
Which to design? Which to implementation? Which to standards? What
strategies are available as alternatives in less formal working
environments? Are there identifiable and measurable differences between
industries and between the public and private sectors?

III. Recognizing Record-Creating Events

The third session explored the questions: What business events generate
records? and How are these events recognized? Most archivists believe that
few if any electronic information systems existing in organizations create
records, or at least create records which are adequate to serve as evidence
of business transactions. Since organizations participate in far too many
record-creating events, in too distributed a fashion, to assign the
responsibility for making record-creating decisions to an office of
recordkeepers, systems communicating records must somehow implement a
decision to create records.

Groups presenting in this session included Artificial Intelligence
Atlanta, a team engaged in research with the Department of Defense, and
ASTRA (a Swedish pharmaceutical firm) and the Swedish National Archives,
jointly involved in research to develop methods for electronic
recordkeeping in the pharmaceutical industry (an industry well represented
at this meeting because they are both heavily regulated and have huge
long-term liabilities that can be defended only with their now largely
electronic, scientific records). Both teams are attempting to find methods
to identify a record-creating event, or a business transaction, that
requires a record to be created. How can a system recognize a “trigger”
event? The ASTRA team used STEP (4) to model the business process and
identify such events (which they have termed causa), while the DoD team
tried to develop a set-based logic to identify events and provide
“automated decision support for classification” to a human records
classifier. Both acknowledged that models of types of actions don’t
necessarily conform to actions as conducted; matching the process model to
real events has proven difficult. Unfortunately, the archival rules to
which the business model would relate, if it were a success, are also not
as formal as they need to be. Expressions in set theory proposed by AIA
look highly algorithmic, but in fact are too vague in operation.

Research questions focus on distinguishing creator vs. organizational
requirements. A tension was recognized between the creation of functional
and efficient business systems and the implementation of full electronic
records capture functionality. For those in the group who felt that one of
the primary characteristics of an electronic record was that it was “set
aside,” classification became a key moment in the process (5). Much work
has focused on how to classify documents consistently. Work-flow systems
that position the creation of a record within a function and link that
function to a pre-defined classification were seen as promising. Another
tack would be to identify functions assigned to personnel classifying a
record in order to narrow the possibilities available to them and improve
accuracy. Both of these approaches suggest the creation of a structured
electronic workspace where work is done within functional areas as an aid
in the record capture process. Such a space enables system implementation
methodologies that can test for rigorous adherence.

A reliance upon an understanding of the business processes carried out by
organizations raised questions regarding modeling of workflow itself. What
data is required about the function being performed and how is its location
in workflow related to a captured record? Clear models for functional
requirements specification are needed. But what is the role of the
archivist within an interdisciplinary team that is creating new systems to
support electronic work? Communicating recordkeeping requirements to
systems designers and implementors is a major challenge that would be aided
by a consistent and unambiguous model of events and activities. The model
should establish a synthesis between the various models proposed and the
business processes and functions identified.

IV. Capturing Records

The fourth session continued the exploration of the relationship between
business processes, business transactions, record creation, record capture
and recordkeeping systems. If the record-creating event and the
requirements of “recordness” are both known, focus shifts to capturing the
metadata and binding it to the record contents. The National Archives of
Canada team, represented by John McDonald, has been exploring interfaces in
the work environment constructed to enable the capture of electronic
records. David Bearman, of Archives & Museum Informatics, has been building
models of how the metadata captured in record creation can best be
structured for future use and how to ensure its inviolability and its
readability over time. If a record is comprised of both its metadata and
its content, how can these two facets be bound together? Bearman is
exploring reference models which might provide a generic record metadata
structure and examining how these models relate to other metadata
standardization activities, such as the Dublin Core.

McDonald reported on a vision developed at the National Archives of
Canada where recordkeeping is transparent, incorporated into an overall IT
strategy and integrated into tools and technology. But what does
“transparent” mean, and what does this world look like? How do we
articulate the relationships between programs, work processes and
activities within organization? What is required in order to specify
built-in capture and retention rules (to enable automated disposition)? How
can systems be designed that support the relatively unstructured
environment in the modern office, where work processes are complex, ad hoc
and dynamic? Can recordkeeping be made invisible? Or should those
responsible for record creation be made aware of their actions?

Even if systems could be designed and implemented to automate the capture
of electronic records, research is still needed into the required metadata.
How do we model recordkeeping systems that enable records and their
metadata to remain meaningful over time? How can we ensure the integrity of
a record through time? Will metadata have to be “registered”? What metadata
is required to support future re-use? How does metadata required for
electronic records map to that for other functions – information discovery
for example?

If an encapsulated object approach is taken, what are the characteristics
of a good envelope? Are there existing technologies or standards that can
be adapted or implemented? Are there standard syntaxes that are “good
enough” for some situations? Can we assign value metrics around the
capture, management, retention and migration of electronic records? What
are the costs vs. the benefits of various strategies?

Test-bed projects are needed to benchmark and cost various approaches to
the capture and retention of electronic records and their associated
metadata. The semantics and syntax of a generic attribute set need to be
designed and tested against the functionality required. The effectiveness
of metadata in reducing software dependencies must be evaluated and tested
in a variety of circumstances.

V. Maintaining Records Over Time

Consensus exists that exact replication of digital objects is rarely
feasible or cost effective, and that migration should replace technology
refreshment as preservation strategy. Migration, however, is inherently
imperfect: implementation dependency choices have their costs downstream,
and the gap between functional (semi-active) and non-functional
(representations) is, from a practical migration perspective, absolute.

Researchers in this session included Margaret Hedstrom of the University
of Michigan, Anne Marie Makerenko from Babson College Archives, and Alan
Murdock, representing a team from Pfizer Ltd., a British pharmaceutical
company. Their practical research questions focused on the costs and
mechanics of maintaining electronic archives. How can we model event-driven
records retention scheduling? What are migration cost elements? What risks
arise from what loss under what circumstances? And can models be developed
and/or partners be found in highly regulated industries where long-term
retention of electronic records is a legislated mandate?

It is evident to the researchers that much remains to be determined
before scaleable solutions are available. Though practitioners keep asking
for “core” definitions and implementable procedures, it is not yet clear
that “cores” are workable. The last mile is proving hard to travel because
frameworks aren’t good at the detailed semantics, because functional
requirements are far from specifications and because the real costs of
migrations depend on so many local variables. Concrete implementations are
necessary to build our understanding of these factors, but comparative
analysis and detailed reporting on choices made and the rationales for them
will be critical to building shared strategies.

As Margaret Hedstrom observed, we need to improve our knowledge of
alternatives to exact replication. What strategies are appropriate to
different types of records and different preservation goals? How much
functionality must be maintained in an archival electronic record? What is
acceptable information loss? Could we consider the preservation of
surrogates? Can we reconstruct context and structure? We need criteria for
the creation and evaluation of surrogates as preservation tools.

Again, implementation became a major theme. How can we devise migration
programs without a detailed understanding of the costs and benefits of
particular approaches to migration? How do we assess the risks involved in
information loss? We are unable to ensure that particular methods will work
in all situations; how do we support local decision-making to enable the
best conclusions for a particular situation? What are the project
management and quality assurance techniques that will be most effective
throughout the process?

Besides the need to maintain more explicit contextual metadata, it
remains unclear whether or how the requirements for long-term preservation
of records are fundamentally different from the requirements for the
preservation of other types of digital information. If they are, then how
are they different? Where can we collaborate with the broader community,
and where must specific archival solutions be developed?

Toward a Research Agenda

Outside of Australia, where strong community leadership is creating an
environment (now standards and law driven) that requires action on
electronic records, the archival community remains technically and
economically ill-prepared to step up to this formidable challenge.
Archivists have not yet found a way to enlist others in an ongoing fashion
to help solve problems that cannot be addressed by archivists alone. This
articulation of open issues may aid in convincing others to join the research.

For the near term, the most promising areas for research seem to require
greater specificity and granularity in their focus. In the definition of
records, we need concrete risks associated with different definitions in
different circumstances and an executable specification of recordness. In
policy, we need to define the concrete costs and benefits of specific
policies and their implementation through organizational, national and
international mechanisms. To understand record creation, we need testable
models of the kinds of records created by different business processes. In
the arena of capturing records, we need tests of registry mechanisms for
software and hardware dependency metadata and for business context
metadata, and we need to test proposed structures for the inviolable
storage of metadata and records’ content. For the maintenance of records
over time, we need comparative migration data, equivalent measures of the
effectiveness of different systems architectures and strategic solutions
for the universal retention of records (obviating the need for each
institution to invest in its own migration of dependencies). Finally, we
need very detailed and granular research into the needs of users and how
they are articulated so that metadata on the content and context of records
will support the research process.

None of these problems is going to be easy to solve. The research agenda
meeting in the spring of 1997 articulated a full set of open questions
which will provide grist for researchers and practitioners for a long time
to come. The archivists participating looked forward to the
interdisciplinary collaboration necessary to move beyond open questions to
workable solutions.

David Bearman and Jennifer Trant are with Archives & Museum Informatics in
Pittsburgh, Pennsylvania. Participants in the meeting discussed in this
article have contributed to the Proceedings, which are published in
Archives and Museum Informatics: the cultural heritage informatics
quarterly, Vol. 11, no. 3-4, available from Kluwer Academic Publishers
kapis.www.wkap.nl

Notes

Commission on Preservation and Access and the Research Libraries Group,
Task Force on Archiving of Digital Information, Preserving Digital
Information, Washington, DC: Commission on Preservation and Access, May 1,
1996.

This material is available on CD-ROM by Archives & Museum Informatics,
under the title Electronic Records Research Resources, 1997.

Australian Records Management Standard, AS4390, Australian Standards
Institute, Australian Council on Archives. Keeping Electronic Records:
Policy for Electronic Recordkeeping in the Commonwealth Government

STEP - the Standard for the Exchange of Product Model Data, is the
familiar name for ISO 1030, developed by ISO TC184/SC4
(Industrial-Automation Systems and Integration/Industrial Design). See
"STEP on a Page"