Protecting the Future of Data Archives

Over at Enterprise Storage Forum, Henry Newman from Instrumental writes that while the word “archive” is synonymous with tape for many people, one has to remember that it’s the software that is going to ensure that your data can be accessed after it is archived.

A complete examination requires looking at everything from interfaces to archive formats. No matter what anyone tells you, there is data that does not need to be on primary storage, and with the exponential growth of data, some of which might not be used for years, there is a need for archiving data—and for making sure that you’ll be able to access it and use it long after formats and interfaces have changed. In the future, requirements for archive systems are going to have to deal with the following issues:

End-to-end data integrity: Sooner or later, the movement of data around the data center or around the world is going to have to address end-to-end integrity. This kind of information is going to have to be immutable and live with the object for its life. It is also going to have to be validated at access. We need a standards-based framework to do this and therefore a standards body to work this out.

Security: This includes far more than UNIX user and group permissions and deals with things like the mandatory access controls that exist in SELinux. Equally important is auditing what happens with each user and each activity, including file access. All we have to do is look at the huge number of security breaches to know why this needs to be done as soon as possible.

Format migration: How do you migrate formats as technology changes? There needs to be agreement and understanding that you cannot keep objects in the same digital format for decades, much less thousands of years. And there needs to be agreement on how objects can and should be changed and how it all relates to integrity management and security management.

Secondary media formats: If these formats are used for disaster recovery on secondary media, then they have to support everything from data integrity to security and even potentially the provenance of the object. If these formats are going to be used to restore in the event of a disaster, then how can you trust the integrity of the data unless you have all of the information.

Resource Links:

Latest Video

Industry Perspectives

In this podcast, the Radio Free HPC team goes off the supercomputing rails a bit with a discussion on digital immortality. "A new company called Nectome will reportedly archive your mind for future uploading to a machine. While the price of $10K seems reasonable enough, they do have to kill you to complete the process." [Read More...]

White Papers

This guide to artificial intelligence explains the difference between AI, machine learning and deep learning, and examine the intersection of AI and HPC. To learn more about AI and HPC download this guide.