Authoritative papers on key developments and practical applications for digital preservation, viewed through the prism of the Preserv project.

Preserv2

Towards Repository Preservation Services, Final Report to JISC

23 July 2009

&nbsp

Towards Repository Preservation Services, Final Report to JISC Preserv 2 investigated the preservation of data in digital institutional repositories, focussing in particular on managing storage, data and file formats. Preserv 2 developed the first repository storage controller, which will be a feature of EPrints version 3.2 software (due 2009). In a breakthrough application Preserv 2 used OAI-ORE to show how data can be moved between two repository softwares with quite distinct data models, from an EPrints repository to a Fedora repository. The largest area of work in Preserv 2 was on file format management and an 'active' preservation approach. This involves identifying file formats, assessing the risks posed by those formats and taking action to obviate the risks where that could be justified. Preserv 2 developed a repository interface to present formats by risk category. Providing risk scores through live PRONOM, a technical format registry service, was shown to be feasible. All this and more is revealed in the final report from the project.

Using OAI-ORE to Transform Digital Repositories into Interoperable Storage and Services Applications, Code4Lib

30 March 2009

&nbsp

Using OAI-ORE to Transform Digital Repositories into Interoperable Storage and Services Applications, Code4Lib Journal, Issue 6 Aim: find a way of effectively replicating a whole IR across any repository platform. On OAI-ORE: provides opportunities to bind low-level objects into multiple collections that can be used by higher level software while avoiding the need to make copies. On future uses of OAI-ORE: is likely to gain momentum as different storage technologies, such as cloud storage and open storage, are used more widely. On reconceptualising repositories: Binding objects in this manner would allow the construction of a layered repository where the core is the storage and binding and all other software and services sit on top of this layer. In this scenario, if a repository wanted to change its software, instead of migrating the objects from one software to another, we could simply swap the software. More work may open the possibility of two repository softwares simultaneously managing the same set of objects.

From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in your Repository, Open Repositories 2009

06 February 2009

&nbsp

From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in your Repository (4pp extended abstract) - update (1 April), accepted for Open Repositories Conference 2009 (OR09), Atlanta, May Different types of storage services are available to repositories, from local disks to the distributed 'cloud', offering choices of scale, bandwidth and cost. Rather than adopt a single storage approach, with growing data volumes and data types it is likely repositories will choose a combination of services, or 'hybrid' storage. This short paper introduces the EPrints storage controller, which allows repositories using this software to integrate with emerging network, storage and cloud services. Three storage plug-ins are available so far for the storage controller: the local storage plug-in that also supports the legacy local disk layout, a plug-in for the Sun STK5800 server, and one for Amazon S3/Cloudfront.

EPrints & Preservation, DPC/RSP/DCC/JISC Workshop, Tackling the Preservation Challenge: Practical Steps for Repository Managers, London Preserv 2 did get to go to the pre-Christmas DPC ball, in the guise of EPrints. Some of the preservation support developed in the project can be implemented in repository software, and this presentation illustrates features that are being added to a pre-release version of EPrints (and later to other repository softwares), including a storage controller and simplified risk analysis interface aimed at repository administrators. The workshop aimed to bring together representatives of the repository and preservation communities. More from the workshop, or listen to the RSP podcastDigital preservation: are repositories doing enough for preservation? in which all the speakers from the workshop tell us about practical tools and services to help repositories.

Preservation as a Process of a Repository (slides), Sun Preservation and Archiving Special Interest Group (PASIG) Fall Meeting 2008, Baltimore What is a repository? What services and processes are repositories based on? We may have taken the answer for granted, but with network and 'cloud' based services emerging, repositories are changing. This presentation reappraises repository services and processes, and illustrates how and where services such as preservation can act in emerging repository architectures. Approaches for managing repository storage and file format risk analysis within this framework are presented. An early example of an interface for managing the risk process is shown. These services have been implemented for EPrints, and will be supported in other repository software soon. More from PASIG Fall 2008 meeting.

Smart Storage and Preservation: How Digital Repositories Can Participate (slides), DLF Fall Forum 2008, Providence, RI There is a view that digital preservation, however daunting and complex, is a problem to be solved by each repository location. Aimed at repository and library managers, the presentation illustrates a more flexible and practical approach for technical preservation activities, such as handling format issues, based on modular services. By dealing separately with items in the preservation workflow - format identification, planning and risk assessment, and action (including migration) - a 'smart storage' approach based on a series of autonomous Web-based services, combining massive storage with the intelligence provided through the respective services, is being developed. The presentation identifies tools and services that can contribute towards smart storage, and shows how repository managers can specify and select components based on non-technical, policy-led considerations. This presentation is also available from the Oxford Research Archive. More presentations from the Forum.

OAI-ORE, PRESERV2 and Digital Preservation, Ariadne, Issue 57, 30-October-2008 ORE stands for Object Reuse and Exchange, and a first production release has just appeared this month. As a result its potential to transform digital content collections has yet to be fully appreciated. Since copying and aggregation are an intrinsic part of preservation management, ORE provides a new tool to help tackle digital preservation. The paper reveals how this approach might be applied at a whole-repository level.

Towards smart storage for repository preservation services (paper and slides), iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London Examines at a conceptual and practical level how preservation intelligence can be built into software-based digital preservation tools and services on the Web and across the network ‘cloud’ to create ‘smart’ storage for long-term, continuous data monitoring and management. Some early examples are presented, focussing on storage management and format risk assessment. The slides are also available from the iPRES 2008 site, and the paper is in the consolidated iPRES 2008 conference proceedings (PDF 17MB). See the full iPRES conference programme and the Ariadne report on this conference.

Applying Open Storage to Institutional Repositories (slides and 2pp extended abstract), 2nd European Workshop on the Use of Digital Object Repository Systems in Digital Libraries (DORSDL2), at ECDL 2008, Aarhus Introduces a new ‘open’ approach to repositories: open storage combines open source software with standard hardware storage architectures. Examples include platforms provided by Sun Microsystems, which we use in this work. Describes how this approach has been allied to the OAI framework for Object Reuse and Exchange (ORE) to enable repositories managed with different softwares to share and copy data more easily and to be provided with extra services such as preservation services. This presentation is also available from the DORSDL2 programme. See the D-Lib report on this workshop.

How to Make Preservation into the Repository's Friend, Repository Fringe

July 31 - August 1, 2008

&nbsp

How to make preservation into the repository's friend (slides), Repository Fringe, Edinburgh What's the connection between the financial crisis and digital preservation services? Hopefully none, but when the financial hurricane was blowing in the summer of 2008 (before the eye of the storm that was to follow) Preserv compared, as a topic for discussion, the architecture of banks, national libraries and digital server farms. Large institutions have typically presented a picture of reassurance and trust through their buildings (if only the banks had been as faithful to traditional financial engineering as to architectural engineering). Digital preservation is about projecting trust and reassurance, but through its architecture and public relations, is it doing this effectively? More from the Repository Fringe.

Reshaping Preserv 2 from a Life(cycle) perspective (slides), JISC Digital Curation and Preservation Projects Forum, London Skim of the evolution of the Preserv project, from the start through Preserv 2 and the present, in the context of preservation costs and the DCC Lifecycle model, the focus of the meeting. Shows how the effect of the changes has been to move away from a one-size fits all preservation service model towards a more flexible framework connecting repositories requiring preservation with many preservation tools and services, in most cases through the Web.

Preservation and storage management for Institutional Repositories (slides), Repositories Support Project Summer School 2008, The Wirral Workshop for repository managers, exploring the relation between different types of data - images, audio, as well as institutional data types such as Web sites and repositories - and the needs of storage and preservation. The session included the presentation and an associated group worksheet. Also covers a number of preservation support tools. Summary

From open storage to smart storage: enabling EPrints repository preservation (slides), Sun Preservation and Archiving Special Interest Group (PASIG) meeting, San Francisco Introduces a storage controller for EPrints repository software. The controller supports a pluggable storage layer for repositories, providing the ability to store objects in different locations based on metadata or type, and enables direct interaction between the repository software and open storage platforms. Other presentations from this PASIG meeting are also available. See Neil Jefferies' overview of preservation projects at Oxford, including Preserv 2.

Preservation and Storage Formats for Repositories, Repositories Support Project (RSP) briefing paper This 2pp paper for repository managers, written for RSP by Steve Hitchcock from Preserv, explains how formats affect preservation, considers which formats repositories should use for deposit and storage, and describes the practical steps repositories can take to produce an initial preservation plan. More RSP briefing papers are available.

Sun STK5800 and EPrints, Sun Webinar (combined slide show and audio, displays in a Flash player within the browser) Initial exploration of architectures providing Sun STK5800 (Honeycomb) large-scale, resilient storage to EPrints repositories. Possibilities include smaller repositories jointly using a single Honeycomb, and preservation service providers combining several servers into a "Honeycomb cloud". Other options include using a Sun STK 4500 (Thumper) storage machine locally. It is shown how this arrangement might be combined with a Honeycomb for scientific data projects. This joint Sun-EPrints presentation is part of the Sun Digital Libraries Webinar Series.

Applying preservation metadata to repositories (slides), with an associated worksheet, Repositories Support Project (RSP), Professional Briefing event, London Focussed on a practical exercise described in the worksheet, the workshop aimed to demystify preservation for managers of institutional repositories by linking 20 selected items from the PREMIS preservation metadata dictionary to activities that repositories perform today. A slightly amended and shortened version of this workshop was given at a later RSP Briefing in Bournemouth (Feb 29, 2008). RSP events

EPrints and the Sun Storagetek 5800 System, A Persistent, Scalable and Interoperable Solution, Sun Microsystems White Paper Les Carr from Preserv sketches some initial architectures and considers the practical implementation of this Sun storage system with an EPrints repository. "At the most superficial level, simply having easy access to that quantity of storage can revolutionise the use to which repositories can be put – High Definition video, large collections of high resolution images, automated experimental data collection activities that span years and decades – diverse activities of an institution’s research community can all be accommodated within the institutional repository. At a more profound level, the adoption of such an object store can help turn the repository into a kind of ‘thin client’, relieving it of the responsibility of simulating a persistent object store." This paper was prepared for distribution at the Sun PASIG meeting in Paris (November 2007)

Preserv Us! The story of the Preserv project by the people behind it (video, various formats, including embedded player, high quality and podcast formats). Produced by ECS-TV, School of Electronics and Computer Science, University of Southampton Tells the story through commentary and personal perspectives from the main project participants.

Laying the Foundations for Repository Preservation Services, Final Report to JISC

7 March 2007

&nbsp

Laying the Foundations for Repository Preservation Services, Final Report to JISC The PRESERV project (2005-2007) investigated long-term preservation for institutional repositories (IRs), by identifying preservation services in conjunction with specialists, such as national libraries and archives, and building support for services into popular repository software, in this case EPrints. Developments described in this report include the PRONOM-ROAR format profiling service for repositories, a rich set of preservation metadata based on PREMIS for a repository-preservation services model, a survey of repository preservation policy, and important changes to EPrints repository software to support preservation. All this and more is revealed in the final report from the project.

Survey of Repository Preservation Policy and Activity, Preserv paper The survey put a series of objective, practical questions to selected repositories among the largest content providers identified by the Registry of Open Access Repositories (ROAR), and which were amenable to format profiling using the PRONOM-DROID service from the National Archives of the UK. The aim of the survey is to inform the investigation of preservation services for repositories, and the profiles provide a benchmark against which to assess the preliminary requirements of these repositories for preservation. Preservation service providers need to know the scale and shape of the task facing them, and this survey will enable them to understand repositories and help to construct appropriate services. The results were revealing:

No repositories surveyed had a formal preservation policy

Preservation policy is being preceded by de facto policies on file formats and transformations without provision for acquiring source versions

Preservation Metadata for Institutional Repositories: applying PREMIS, Preserv paper Currently, the authoritive reference on preservation metadata is the PREMIS Data Dictionary (2005). This analysis attempts to map the five entity types identified in the PREMIS Data Dictionary -- intellectual entities, objects, events, agents and rights -- to potential metadata sources identified in an IR-preservation service provider model: author/IR submitter (via the repository deposit interface); IR software (in this case EPrints); associated tools (in this case file format ID tool PRONOM-DROID); IR policy; preservation service providers. An additional source of metadata is environment registries. The interim findings are that PREMIS appears to provide an excellent basis on which assess the needs of IRs with respect to preservation metadata, and it is possible to map the PREMIS elements to an extended model incorporating preservation services and registries. Preliminary evidence shows that most data can be provided by the sources identified, although some elements may need to be adapted or omitted. More implementation and testing are required.

Repository Models and Policies for Preservation (slides), DPC Briefing on Policies for Digital Repositories: models and approaches, London Update on Preserv models, PRONOM-ROAR format profiling implementation, and preliminary results from a survey of repository policy. More presentations from the DPC Briefing Day.

PRESERV: preservation services for institutional repositories (poster). Joint US-UK (NDIIPP-JISC) Digital Preservation Workshop, Washington, DC, USA Starting from the original preservation service provider schematic, develops an hierarchical range of OAIS-based models for investigation. Also illustrates for the first time the use of PRONOM to provide format profiles for IRs covered in the Registry of Open Access Repositories (ROAR).

Preservation Metadata for Institutional Repositories, Preserv draft paper Begins an investigation into supporting preservation metadata within the practical, real and growing content of IRs, with reference to a number of OAIS-based models. This draft paper was later split into two papers, on:

Capturing preservation metadata from institutional repositories (slides), DCC Workshop on the Long-term Curation within Digital Repositories, Cambridge, UK Builds a graphical picture of Preserv and distributed preservation services for IRs, then asks how this model fits with two key preservation references, the OAIS model and PREMIS initiative on preservation metadata. More from the workshop.