Problem: a typical characteristic of digital archives that aim for “long-term preservation” is that the life cycle of the technical infrastructure on which they are based is much shorter than the period for which their contained materials should be preserved. This means that migrations from one archival system to another are inevitable. In the simplest case this could be nothing more than a migration of AIPs from one storage medium to another. However, in most cases this will also involve the migration of metadata, and the contents of each AIP from the source system may need to be taken apart and re-assembled on the destination system. This will result in changes to the AIP’s internal structure that must be accounted for in the migrated (structural) metadata. Finally, such migrations may also involve one or more metadata enrichment steps (for example, because the availability of new or improved characterisation tools makes it possible to automatically extract technical and preservation metadata that couldn’t be established within the old system).

Scalability Challenge

Issue champion

To be defined

Other interested parties

Possible Solution approaches

ALL

At the most basic level we would like to ensure that the system migration does not result in the loss or alteration of any archived objects. In the case of a pure medium migration this could be realised very easily using checksums. More sophisticated mechanisms are needed for migrations where, as an example, AIPs that are held together in a physical container (e.g. a TAR file) on the source system need to be taken apart and subsequently re-assembled on the destination system. In that case we will need to check the integrity of each single file within the AIP, before and after the migration.

Possible issues: Due to the wide variety of legacy, publicly available and custom-built archiving systems that are used by different repositories, and the resulting variety of data models and structures, it may be difficult to establish use cases that are sufficiently generic to be of interest to more than one SCAPE partner. The best approach may be to start out with a limited number of relatively simple, generic and universally applicable use cases, such as: Migrate one object from one medium to another and verify the integrity of migrated object. Migrate set of files from one container file to another and verify the integrity of all constituting components. We could then establish a checkpoint where, based on the outcome of the work on the above simple use cases, we decide whether continuing the work on this scenario is worth any further effort or not. A more thorough understanding of the problem space (including key aspects for validation) would in itself be a useful output here.

EXL

This scenario sounds like a requirement for AIP migration. There has been some recent work in this area. See this paper.

KEEPS

Watch can contribute to the solution with the triggers:

Monitor new repository systems or new versions of existing ones

Monitor repository systems features and tools

Monitor repository systems popularity and support

Monitor operative systems

Monitor policies (policies may require functionality that is not supported in current repository system)

Context

Ideally this should be a representative cross-section of AIPs in a repository. However, the solutions that are needed for this scenario will most likely be highly dependent on the data (and metadata) models used by the source and destination systems, as well as on the specific hard- and software infrastructures.
At the time of writing, KB is exploring making a dataset of AIPs available.

Lessons Learned

Notes on Lessons Learned from tackling this Issue that might be useful to inform the development of Future Additional Best Practices, Task 8 (SCAPE TU.WP.1 Dissemination and Promotion of Best Practices)

Training Needs

Is there a need for providing training for the Solution(s) associated with this Issue? Notes added here will provide guidance to the SCAPE TU.WP.3 Sustainability WP.

Evaluation

Describe the success criteria for solving this issue - what are you able to do? - what does the world look like?

Automatic measures

What automated measures would you like the solution to give to evaluate the solution for this specific issue? which measures are important?If possible specify very specific measures and your goal - e.g. * process 50 documents per second * handle 80Gb files without crashing * identify 99.5% of the content correctly

Manual assessment

Apart from automated measures that you would like to get do you foresee any necessary manual assessment to evaluate the solution of this issue?If possible specify measures and your goal - e.g. * Solution installable with basic linux system administration skills * User interface understandable by non developer curators