The author is a Forbes contributor. The opinions expressed are those of the writer.

Loading ...

Loading ...

This story appears in the {{article.article.magazine.pretty_date}} issue of {{article.article.magazine.pubName}}. Subscribe

Adam Zurek is head of the department of Scientific Documentation of Cultural Heritage at the Wroclaw University Library in Poland. The Wroclaw University Library is in the process of digitizing about 800,000 pages of rare and often ancient European manuscripts, books and maps dating back to the Middle Ages and rarely accessible, especially for the general public, until this project began. In all, 1,100 medieval manuscripts were digitized together with old prints, maps and music.

Material in this library includes works by Martin Luther, Cervantes and Shakespeare as well as rare maps and liturgical texts. This effort is partially funded through he European Regional Development Fund to create the largest digital archive of medieval manuscripts and ancient geographical atlases in Poland.

The digitized contents are going into a 300 TB storage system using System X servers combined with IBM Storwize V7000 Unified Disk Systems and IBM Storage SAN24B-4 storage systems. This storage system helps collect, preserve, store, manage and index text, images, audio and video as part of the digitization project. This system automatically and non-disruptively manages frequently accessed data, allowing readers to quickly open, view, retrieve and explore the library documents online.

The Polish library digitization project is only one of many efforts to digitize old analog content and make it available for digital access and use. These efforts are taking place all over the world and include ancient works such as the dead sea scrolls, accessible at http://www.deadseascrolls.org.il to the conversion of more modern audio and video collections into digital formats. The Library of Congress, motion picture studios and other content repositories have made digital copies of older content available for digital access.

These efforts support both free and paid access to past entertainment, news and historical events as well as general cultural information. The creation, preservation and resulting access to these digital troves allows us to understand better where we came from and what it was like to be alive in other times and other lands. Our connected society offers us greater access to this information than ever existed in the past, but it has also created vulnerabilities that could threaten the longevity and future use of digital content, threatening our access to our growing digital legacy.

The issues facing long-term digital content retention include limits on the shelf life of many digital storage devices. For instance, whereas silver halide film can last and be readable after 100 years, digital tape may only last 30 years. There are technologies that might extend the life of physical media, such as M-Disc’s optical media that may physically store data for 100’s of years but even before digital media decays, the formats and applications used to create and view the digitized content can become obsolete. With the current rate of change in digital technology, reading even 15 year-old content can become challenging. We need better ways to future proof the content that is stored as well as making sure the bits can still be accessed in the future.

This is an issue that the Storage Networking Industry Association (SNIA) Long Term Retention Committee (for more information see: http://www.snia.org/ltr) is addressing. At the upcoming Storage Developers Conference on September 16-19 the group will talk about methods for the long term retention of content that gets around issues of format obsolesce. The group is working on long term retention technologies as well as uses for the LTFS file system and archive storage in the cloud.

They call this long term retention technology Self-Contained Information Retention Format (SIRFs) and the committee has been developing and promoting this technology. SIRFs comprises a logical container for a set of digital preservation objects and a catalog. This catalog contains metadata related to the entire contents of the container as well as to the individual objects. This self-describing and self-contained and extensible format should allow reconstruction of digital content for decades or longer after the content is archived.

We have a ways to go to provide digital preservation that has the potential to retain recoverable information for 100’s or even 1,000’s of years, the way older analog content retention methods could, but people are taking important steps to bring our digital content into the far future. We need to do this or else the development of our digital civilization could be lost to future generations and even the digital preservation of ancient analog archives could be in danger.