Abstract

This paper describes work undertaken by Data Intensive Cyber Environments Center (DICE) at the University of North Carolina at Chapel Hill and the University of Liverpool on the development of an integrated preservation environment, which has been presented at the National Coordination Office for Networking and Information Technology Research and Development (NITRD), at the National Science Foundation, and at the European Commission. The underlying technology is based on the integrated Rule-Oriented Data System (iRODS), which implements a policy-based approach to distributed data management. By differentiating between different phases of the data life cycle based upon the evolution of data management policies, the infrastructure can be tuned to support data publication, data sharing, data analysis and data preservation. It is possible to build generic data management infrastructure that can evolve to meet the management requirements of each user community, federal agency and academic research project. In order to manage the properties of the data collections, we have developed and integrated scalable digital library services that support the discovery of, and access to, material organized as a collection.

The integrated preservation environment prototype implements specific technologies that are capable of managing a wide range of preservation requirements, from parsing of legacy document formats, to enforcement of preservation policies, to validation of trustworthiness assessment criteria. Each capability has been demonstrated and is instantiated in multiple instances, both in the United States as part of the DataNet Federation Consortium (DFC) and through multiple European projects, primarily the FP7 SHAMAN project.