Encyclopedia of Information Science and Technology, Fourth Edition (10 Volumes) Now Available

For a limited time, take 5% off plus free standard shipping. Additionally, receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book.

Abstract

To preserve digitally encoded information over a long term following the OAIS Reference Model requires that the information remains accessible, understandable and usable by a specified Designated Community. These are significant challenges for repositories. It will be argued that infrastructure which is needed to support this preservation must be seen in the context of the broader science data infrastructure which international and national funders seek to put in place. Moreover aspects of the preservation components of this infrastructure must themselves be preservable, resulting in a recursive system which must also be highly adaptable, loosely coupled and asynchronous. Even more difficult is to be able to judge whether any proposal is actually likely to be effective. From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In this chapter several interrelated efforts which contribute to solutions for these issues will be outlined. Evidence about the challenges which must be overcome and the consistency of demands across nations, disciplines and organisations will be presented, based on extensive surveys which have been carried out by the PARSE.Insight project (http://www.parse-insight.eu). The key points about the revision of the OAIS Reference Model which is underway will be provided; OAIS provides many of the key concepts which underpin the efforts to judge solutions. In the past few years the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document has been produced, as well as a number of related checklists. These efforts provide the background of the international effort (the RAC Working Group http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers to have a basis for deciding which repository to entrust with their valuable data. It could shape the digital preservation market. The CASPAR project (http://www.casparpreserves.eu) is an EU part funded project with total spend of 16MEuros which is trying to faithfully implement almost all aspects of the OAIS Reference Model in particular the Information Model. The latter involves tools for capturing all types of Representation Information (Structure, Semantics and all Other types), and tools for defining the Designated Community. This chapter will describe implementations of tools and infrastructure components to support repositories in their task of long term preservation of digital resources, including the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. In order to justify their existence, most repositories must also support contemporaneous use of contemporary as well as “historical” resources; the authors will show how the same techniques can support both, and hence link to the fuller science data infrastructure.

2 Oais Reference Model

The OAIS Reference Model provides a number of models for repositories including a Functional Model, to which is relatively easy to map an existing archive system, an Information Model, which is rather more challenging, an Information Packaging Model and federation models, plus preservation perspectives including types of migration and a variety of software related processes. A number of overall strategies, processes and supporting infrastructures may be derived from these.

2.1 OAIS Information Model

The Information Model provides the concepts to support the long-term understandability of the preserved data. This introduces the idea of Representation Information.

The UML diagram in Figure 1 means that

Figure 1.

OAIS information model

•

an Information Object is made up of a Data Object and Representation Information

•

A Data Object can be either a Physical Object or a Digital Object . An example of the former is a piece of paper or a rock sample.

•

A Digital Object is made up of one or more Bits.

•

A Data Object is interpreted using Representation Information

Representation Information is itself interpreted using further Representation Information

Figure 1 shows that Representation Information may contain references to other Representation Information. When this is coupled with the fact that Representation Information is an Information Object that may have its own Digital Object and other Representation Information associated with understanding each Digital Object, as shown in a compact form by the .interpreted using. association, the resulting set of objects can be referred to as a Representation Network. The question of where this recursion ends is answered by the concept of Designated Community, which is touched on further in sections 2.3.1.1 and 2.3.

Figure 2 shows more details and in particular breaks out the semantic and structural information as well as recognising that there may be “other” representation information such as software.

Figure 2.

Representation information object

The types of Representation Information are very diverse and it is highly likely to be discipline dependent, although there will be some commonalities.

2.1.1 Role of Significant Properties

At this point it is worth comparing the concept of Significant Properties with that of Representation Information. The former is widely used in the library community but it is hard to see how it applies to, for example, science data.

Clearly Significant Properties focus on those aspects of digital objects which can be evaluated in some way and checked as to whether they have been preserved. In particular after a transformation of the digital object this is an important consideration. However, the meaning associated with a Significant Property is nowhere defined. Therefore it must be the case that the Significant Properties, while useful, do not contribute to Understandability. For example it a Significant Property might be that a text character is red, however the meaning of that redness is not defined.

The question then is what is their significance. Giaretta (2009) argues that the role that Significant Properties play is more closely related to authenticity. Essentially the data curator will check that the selected Significant Properties are unchanged after a transformation in order to assure him/herself that the new transformed version, to his/her satisfaction, may be used as an authentic copy.

This view of significant properties allows the concept to be included in the revision of OAIS which is being prepared, and related to Representation Information. It also allows Significant Properties of scientific data to be clearly defined; however that is the topic of another paper.