The European Commission has awarded £6M [$9.84 million/USD] to archiving and digital preservation specialists to create E-ARK [European Archival Records and Knowledge Preservation], a method of archiving data that is set to become the gold standard across Europe. The system will ensure current digital archives, including ‘big data,’ are future-proofed. (Big data is data sets of such a size that it is difficult to manage with traditional software and databases.)

E-ARK will pilot an end-to-end OAIS-compliant e-archival service covering ingest, vendor-neutral archiving, and reuse of structured and unstructured data, thus covering both databases and records, addressing the needs of data subjects, owners and users. The pilot and methodology will also focus on the essential pre-ingest phase of data export and normalisation in source systems. The pilot will integrate tools currently in use in partner organisations, and provide a framework for providers of these and similar tools ensuring compatibility and interoperability. A core component of the project is the integration platform which uses the existing ESSArch Preservation Platform (EPP) application as an Archival Information System, which is already in productive deployment at the National Archives of Norway and Sweden. In order to achieve scalability, E-ARK will adopt a data management and storage layer for this tool on top of the proven open-source Cloudera CDH4 distribution of Apache Hadoop, enabling storage and computational power to be seamlessly added to the system.

The project will spend three years creating a standard archival process at a pan-European level supported by guidelines and recommended practices that will cater for a range of data from different types of source including record management systems and databases.

The University’s Dr Janet Delve and Professor David Anderson said the undertaking was “mammoth” and a problem that is becoming larger by the day.

“The size of the problem is huge. We are looking at years of accumulated data across almost 30 countries that have been stored using a variety of different methods and on different systems,” said Dr. Delve. “With the onset of e-government and open data initiatives, archives now have to cope with storing huge amounts of digital material. The size of the problem is growing because of the colossal quantity of electronic data generated on a daily basis from organisations as diverse as banks, public health organisations and national archives.”

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.