‘Digital Dark Age May Doom Some Data’

What stands a better chance of surviving 50 years from now, a framed photograph or a 10-megabyte digital photo file on your computer’s hard drive? The concern for archivists and information scientists is that, with ever-shifting platforms and file formats, much of the data we produce today could eventually fall into a black hole of inaccessibility.

About The Author

84 Comments

As long as the file specification is open source, the most important file types will be usable in the future. One might consider to store a few virtual machines with legacy applications to open such older files.

It may be solvable on a personal level if you stick to it, but not in general. Where are you going to find legacy applications in the future? Or even guarantee there is a virtual machine to run them?

Anyway, this is already happening now. I have a few examples, related to messaging. One is a huge archive of instant messages stored by Miranda in one large file nobody understands (last time I checked there was no export feature), or an SMS archive stored on the iPhone (it seems to use sqlite format but that isn’t helping much), or an RTF file generated by yet another software that is only readable on Windows… Time passes and you inevitably give up on that because there’s just no time to go back to installing virtual machines or try to make existing software to convert from the old formats to newer ones. Sadly, he may be right.

Yep, but moving to open data formats and standards when archiving and preserving important documents is still a huge step, and among the most important steps forward in solving the problems mentioned. Proprietary, locked and old data formats are likely and by far the biggest problem in the whole issue.

Like the article says, many archives and national governments have moved from prorietary data formats to open data formats for this exact reason.

But, of course, for example, CDs and magntic tapes may still corrupt over time etc.etc.

you’re so right kroc. Even a very recent format is unusable: an archive from WindowsXP’s built-in backup tool for example can’t be used in Vista. If a format from one generation can’t even be used then indeed we’re looking at a black hole.

Isn’t this basically why the ODF project exists? AFAIK it only supports openoffice like apps but is their goal (at least) eventually to include all kinds of data like email, IM’s even backup archives? Or is there perhaps another project with those goals?

As long as the file specification is open source, the most important file types will be usable in the future.

Indeed, that does solve the file format problem, but that’s not the only problem.

One might consider to store a few virtual machines with legacy applications to open such older files.

Problem solved.

Clever idea, but you still sit with a “what medium” issue. For example. Lets say we use VM’s for the legacy applications. And 1000’s of SATA hard drives to store all the data.

Who says we will be able to read data from a SATA drive in 50 years? I have stacks of 5.25″ floppies in my garage. So the data is there, but I don’t have the hardware (10 years down the line) to read that data anymore? We could have a similar problem in 50 years

with SATA hard drives.

The other problem is failing hardware. You store valuable data on a 1 Terabyte drive for future use. The drive goes faulty – you loose an unbelievable about of data in one go. Books and other printed material don’t have that vulnerability.

The other problem is failing hardware. You store valuable data on a 1 Terabyte drive for future use. The drive goes faulty – you loose an unbelievable about of data in one go. Books and other printed material don’t have that vulnerability.

What about fire? Or any other kind of natural disaster? Obviously if your house burns down your drive will burn too, but it’s much easier to back up all of your data onto new drives than it is to “back up” your books and other physical documents.

maybe the problem is solved. Let’s take a file from several generations ago: say a book report originally done on a 5.25 floppy with a Commodore 64 or Apple2c computer.

My old floppies actually still work and are even writable still, so the argument that the medium isn’t trustworthy doesn’t fly here. So let’s assume that I no longer have the machines nor a hard-copy print out to just type it all over again like a scribe back in the day. With this scenario how would a person go about getting that data from the 5.25 floppy and being able to either use it again or at least view it so it can be printed or exported etc?

Off the top of my head I think one would need:

a) a 5.25 floppy drive

b) software to read it which would probably be in PASCAL I think those machines used back then.

c) would an emulator work? I think those only use ROM files so they are kind of “dumb” for lack of a better term.

I think that is a realistic example for this general problem we are discussing.

For really important historical documents, perhaps we should, for the reason you have given. One would imagine that the Flintstones had a laser printer, and that design might be a good place to start for designing modern stone tablet printing technologies. However, I suspect that it may only have had a small dinosaur inside, likely now extinct, chipping the letters out with its teeth.

One would imagine that the Flintstones had a laser printer, and that design might be a good place to start for designing modern stone tablet printing technologies. However, I suspect that it may only have had a small dinosaur inside, likely now extinct, chipping the letters out with its teeth.

Actually, the tech was a little more sophisticated than that. It utilized a small but highly trained prehistoric bird using a hammer and chisel.

It can be a bit similar problem with proprietary data formats after, say, a century,

To be honest, I think it’s much ado about nothing. If there’s a need to read these formats someone will find a way. I think we can reasonably presume that the internet and it’s archived content will still exist in some way or the other in 20 to 50 years. Beyond that, lets say that technology will probably solve it.

To paraphrase someone else in this thread, it’s not like the vikings sat around wondering “geee, i wonder if this runestone will last thousands of years. And what if noone understands my writing?”

The problem is not the formats but how to store our data for such extended persiods of times. If you can store it you can always store the instructions for the formats with the data.

There are problems we need to worry about NOW and there are those that we dont. This is one of the donts.

it’s not like the vikings sat around wondering “geee, i wonder if this runestone will last thousands of years. And what if noone understands my writing?”

That only tells that you may not be too interested in history?

We would know much less about Viking history if we didn’t have any original Viking documents left. They, for example, wrote about their travels to America, which made many historians interested in researching the subject, and now we know, from those documents and from archeological findings together, that Vikings really did travel to America before Christopher Columbus.

as long as we have a way to actually store our data for that long the format is of lesser importance

If you really want to preserve stuff, both sides of the coin are important, of course. Like the other commentator said: “dismissing the file format issue with a “what could possibly go wrong” is irresponsible“.

I think we can reasonably presume that the internet and it’s archived content will still exist in some way or the other in 20 to 50 years. Beyond that, lets say that technology will probably solve it.

Probably? But how? That is what the article tries to ask.

If you can store it you can always store the instructions for the formats with the data.

You can, but is it always possible? If you use open standards and formats, that is easy. If you use proprietary closed formats, it can get much trickier, especially if companies don’t want to fully open those instructions and details of their proprietary formats for others to use too.

Preserving their writings for thousands of years wasn’t their concern and that’s not why they wrote

That’s mostly nonsense, again telling only that you seem to know and care little for history. This goes way off the original topic, but anyway:

In old and ancient times, including the Viking times, when writing was not so everyday thing as today, it was often very much the idea that the few written documents of those times, like chronicles of history or books containing religious myths, were meant to be carefully preserved and/or copied, for decades and centuries. If the Viking people were not interested in preserving their sagas for many future generations, also in written form, we would probably not have them left to read anymore nowadays.

People shouldn’t be shortsighted and undervalue the importance of preserving historical documents. Most of our culture (& a note to narrow one-track tech geeks: including also technology) is based on history and historical achievements. Because of historical documents we don’t have to trust some false claims about important historical events. Such things may often matter for political decision making too, for example. Actually we are human beings largely because of our culture and history.

it’s not like the vikings sat around wondering “geee, i wonder if this runestone will last thousands of years. And what if noone understands my writing?”

I don’t know about Vikings. But ancient Egyptians thought a great deal about this kind of thing. At least about the medium. Although they had a goal of transmitting certain information and stories on forever, I’m not sure that they appreciated the format problem. Egyptians, living in a time in which change was quite gradual by our standards, tended to think of their world and culture as being ever-unchanging. They expected that in a thousand years their descendants would live in the same world as they did. That people would come and go, but that the world they knew would remain forever.

The fabric of our world, in contrast, is ever changing. What world will we leave behind when our days our ended? And what kinds of worlds might follow? One of those possible worlds is post nuclear holocaust. Another is post asteroid strike. Another is post-plague or post bio-weapon. All that we take for granted is very fragile, indeed. It is possible… and perhaps even likely that the human race may have to pull itself up by its boot-straps one day. And likely, if that were to happen, it would happen with little to no warning. If we care about those future generations (and that is not a rhetorical question) then we should remain prepared at all times to leave the human race the best chance to be able to recover knowledge quickly. This involves every one of the many links in that chain that allows us to bring up Google, type in a term, hit “I’m feeling lucky” and read/play/render/hear the information we want.

The point of all this? I guess that this is potentially about more than whether we’ll be able to listen to the Eurythmics or view our family photos 15 years hence. This could potentially affect how long the human race remains trapped, suffering, in the next dark age. The human race is yet too shortsighted for us to dismiss that scenario out of hand.

This can be a problem in short time like 20-30 years but not in long. If in 200-300 years some archaeologist will like to read any data from now (of course if the medium like disc or CD survives) it will be not problem at all. It does not matter if specification for this file is open or completely lost the power of computers will be so big what they will “brake” the code in minutes.

It is like with Egyptians hieroglyphs. Nobody was using this language for thousand of years but we are able to read it now.

The XX century will be a “dark age”. Just in the beginning of the century we changed technology of making paper, and it is 100% sure what all books printed in XX century (and now) will completely fall apart after 100-200 years.

You ever hear of the Rosetta stone? The only reason we are able to read that language is because we found a common story in the language that helped us decipher it. Otherwise we still wouldnt be able to read it. Information loss is a real and long existing problem. The Romans had running water and roads the likes of which didnt get built again until well past 1800.

You are totally of the mark. No language can be translated simply by raw processing power. This is why Navajo speakers were used for top secret radio communications in WW2. The Welsh Guards also communicated by radio in Welsh in Bosnia because they knew that the Serbs couldn’t understand Welsh.

The Egyptian heiroglyphs were only translated because they discovered the Rosetta Stone in the early 1800s. This carried exactly the same passage written in three different languages (two which were well known) including the heiroglyphs. It was then realised that the heiroglyphs were a written form of Coptic a language still spoken widely in Egypt. This eventually made further translations possible.

I really do believe that the digital storage issue is going to be a big problem in years to come, especially for the majority of the computer and digital camera owning public who arn’t as proactive as us.

The biggest problem I see is redundancy in storage. I mean ffs – I backed up all my 30GB of photos from one PC to another over the weekend, and the source PC’s hard drive decided to die whilst copy!

Lucky I mirror that particular directory on a *third* hard drive on another PC I have.

I see JPEG being able to be read many many many years from now…. Reading ancient formats won’t be a problem…. Getting the digital data to even LAST THAT LONG is where the issue will be.

IMHO the safest photo is one which has been printed to paper! By the time the oceans rise due to global warming, flood my garage and destroy my prized lifetime of photos – I probably would have had a few dozen hard drives die on me 🙂

I seriously doubt people who have taken digital photos and kept them on the normal means (their PC), will have them in 50 years time – unlike my parents and grandparents who still have theirs.

I remember when we were told that CD’s would last for all our lives, and well, i rescued some of the first CD’s i burned 10 years ago, and the had become almost transparent, and obviously, they couldn’t be read and i had to throw them away. And i was a bit angry because they contained some music i download via dialup i wanted to listen back

I feel hard disks are much safer, but i don’t know if they will last longer than 10 years or so.

For text based data I always keep in mind RTF files. I have some papers from 20 years ago that I can open in OpenOffice. I like XML files for data also since you can always create a script to parse XML elements regardless of whether a utility exists. Binary data is where you can get in trouble. I have an old, old photoshop version 1 file that has trouble opening. Using open standards like PNG or ODF at least allows you to write an application to read your files. MS Word 97 format is so convoluted and difficult to read that its only benefit is that Office is still around to read it.

And how is a file system supposed to solve ANY of the problems discussed…? ZFS doesn’t magically make old, proprietary, undocumented file formats readable, nor does it make worn out media readable.

If you store your data in multiple hard-drives with a suitable redundancy scheme, the risk of losing data due to media corruption is dramatically reduced. One straightforward way of doing this is, I think, RAID-Z on ZFS. My comment was in response to worries about media corruption (as in termites eating punch cards). I was not adressing other concerns, like the use of proprietary formats and protocols.

I think this is probably going to be solved in one way with the rise of cloud services, as more and more of our data will be stored remotely on larger servers.

However this is a worry, is there any digital medium which can withstand the effects of ageing. HDD’s are incredibly violitile for archive purposes, being effected by magnetic waves and simply just decay.

CD’s also loose there capabilities after only a few years. This can of course be lengthened if the CD’s are kept in a dark stable environment.

And as mentioned by the article there is the big problem of legacy software. Concepts such as Adobe’s DNG (digitial negative) is a great idea but ive found very few manufacturers willing to adopt this format, thus leaving you to do a manual conversion. This of course leads us to the RAW format which has a 1001 variations depending on manufacturer and camera. It is quite worrying.

So far the only formats i can see sticking around for a while longer is PDF, ODF, DOC/XLS/PPT (ive not included the new Office 2007 as i still think these have to be proven in the work space, so currently we are left with millions of MS Office document files which of course suffer compatability problems between versions, i.e. tables misalign every now and again), JPG and BMP. Of course there are many others but these are the first to come to mind.

I suppose we will just have to wait for a really reliable method of data storage and archive. In the mean time i use the method of scattering data over a variety of devices and media. So for my photo’s i have them stored on my main server, everytime a set of photos are uploaded, i back them up to CDR/DVDR and then take the backup to a remote disk based server.

I don’t think that data will get inaccessible. Recently I bought a transfer cable which allows me to transfer data from my C64 to my PC. So I can still reuse everything that I have created a long time ago on that machine. It may be entirely different formats, but I can still use the old software to make it readable. And I could also write some little converter who converts that data into something modern.

As long as there are enough hobbyists who create cables or solutions like that, everything will be completely fine. There are converters, adapters, cables, emulators, and with that, everything should be recoverable.

And if it’s done soon enough, it even solves the problem of unreliable storage media.

And if it’s done soon enough, it even solves the problem of unreliable storage media.

The problem is active maintainance of data. Keeping data I consider important around isn’t too hard, the problem is stuff I consider unimportant. If I chuck a couple of CDs with photos on them in a box and forget about them, will my grandchildern be able to view those photos in 70 years time when they find that box in my attic, much like I can view photos I found taken of my grandparents when they where kids? Stuff that I don’t consider worth backing up may be of great historic value to people a couple of generations down the line.

Well okay, that’s an argument, and in that case it’s of course the “unreliable media” problem. But if the CDs are still in a good physical condition, it wouldn’t be a problem to convert the data into the “JPG of the 2070s” format or something

But I agree, in that case unreliable media would be a problem. Maybe, though, in 70 years, there might exist some good CD drives that are ten times better at recovering data (if it’s considered a common problem then).

By that I mean that even if devices will no longer be available in working condition, or software that can natively handle some archive formats won’t be at hand, if the specifications and plans of those devices, the specifications of those formats will be available then a new device can be built to support the old technologu anytime in the future. And yes, you’re right to sense an error in that sentence if you’re thinking ok, but how will we store those specs ? I’d say we need to have something established to address that issue, but still, it’s easier to store the specs for a machine than the machines, or to store the specs of a format than store a proprietary app which handles the data.

I have not had a problem in around 15 years because when new releases of things come up, you import into a new file format. Seriously, thus far from Aldus Freehand files to Adobe Illustrator CS…not a problem.

This is another geek need to gather things, have manuals, books..blah blah..cling on to that MS-DOS floppy for no reason.

If it doesn’t fit in my laptop bag, I don’t need it. y books are on PDF, Music on MP3s and movies on .AVI or .mp4. Now I have to relocate for work and guess what….it will be soooo easy! Once a new ‘hot, geek-ass’ file format comes along and it becomes universal…i will convert it and move on with other, cooler aspects of life

I think you are considering the issue too much from your personal point of view although it is not what the subject is about. Did you read the article and not only the teaser? The article is not about people’s personal information needs they may have at home.

The problem is in big archives (like a national history archive) and in accessing historical resourses for research after a few decades. More and more documents and archives are moved into electronic format. What will happen if those original materials are not usable anymore after, say, 50 years? The article mentions several examples too, like:

Magnetic tape, which stores most of the world’s computer backups, can degrade within a decade. According to the National Archives Web site by the mid-1970s, only two machines could read the data from the 1960 U.S. Census: One was in Japan, the other in the Smithsonian Institution. Some of the data collected from NASAÃ¢Â€Â™s 1976 Viking landing on Mars is unreadable and lost forever.

These kind of problems can be quite serious for future generations – the more we move information to electroni formats, and especially if those formats are proprietary and can be accesses only with some commercial software that may not even be available anymore.

Many societies deliberately planned tombs and temples to last for an eternity. They knew that texts carved in stone or rock paintings protected from the elements would remain intact for future generations. They anticipated that their societies would be ongoing for ever and wanted to maintain records.

Kodachrome slide (and movie) film is extremely archival. Slides taken in the 1930s and 1940s look like they could have been taken yesterday, they’re in such good shape. Thus if someone has important digital photos, they could “write” them to the slides. The one thing, though, is that Kodachrome is made in extremely limited quantities these days and is only processed by one remaining lab, so the ability to save our photos may be limited.

Come on! Did you even glimpse the actual article, or only its title? The article isn’t painting any sorts apocalyptic scenarios, not to mention opposing technical progress (where on earth did you get that idea??) but simply trying to find new smart technical solutions to a real existing problem. The article also mentions several real life examples of the problem.

Oh, and the article isn’t talking about your digital library, nor about your neigbours’ digital libraries, but archives and problems on a much bigger level, like national archives, university research etc.

Do you really think the stuff related in the article are actually problems?

Ok, probably I will not be able to find the e-mail I received ten years ago, but… who cares?

The sensible information is stored in huge databases and such kind of information is what really matters… The article mentions some storage problems found in the real life… but… what about the huge amount of information stored in YouTube, Google or Wikipedia? They store lots and lots and lots of information and they do not seem to have such problem.

Do you really think the stuff related in the article are actually problems?

Yes. maybe not to you, but to many others, yes. If you worked at a big archive with lots of different kind of electronic documents in various kinds of formats, maybe you would understand better.

the huge amount of information stored in YouTube, Google or Wikipedia? They store lots and lots and lots of information and they do not seem to have such problem.

Not now, but what if you would want to see some important YouTube video, related to, say, Obama’s election campaign, after a hundred of years? Would it still be available anywhere? Could your software read it?

Wikipedia is just plain HTML with images. Thus it is based on open standards and data formats easily readable by almost any software. Google is mostly only a search engine, so not so much related to the issues discussed here. YouTube uses Flash, partly a proprietary format, causing a bit bigger problem, although many other programs than just Adobe’s own Flash tools can read Flash files too, which makes the problem smaller.

And about legacy software… the VMs will have a lot of utility there.

A VM itself understand a proprietary data formats. besides, it should be easier for us common people to read older electronic documents too. Probably everyone of us reading OSnews have had irritating problems trying to read some documents saved in a different program or a version of a program than what we have and making reading the file practically impossible.

Now, how many different old versions of Word / Works / WordPerfect should and could a big archive have, not to mention all the other dozens of different data formats and programs? If all that data would have been saved using open document formats (like HTML) that other programs could read and support, it would help a lot.

Something that isn’t directly related to this article, but is certainly related to digital archiving.

Printed pages are immuteable. You cannot change them and if you try it is obvious that editing has been done. Can we, in good conscience, trust what will be a digital representation of things such as world history? How do we know that it will not be edited to suit the current time it is being viewed in? Editing electronic data can be done seemlessly and simply. This is what concerns me about digital archiving, rather than the unreliability of the media and/or file formats. Those have solutions. But the all-too-human temptation to create revisionist history has no solution to prevent it. In many ways we already have revisionist history, even with printed materials. It’s just more effort. How easy will this be when everything’s digital?

Of course I have heard of forgery. But forgery often takes a decent amount of effort, at least in situations where official documents have been watermarked or have other means of being positively identified. Further, forgery involves creating a look-alike sufficiently good enough to fool someone into thinking it is the original. Unless every copy of the original is destroyed, then the forgery can eventually be revealed as such or at the very least as questionable. It often involves a decent amount of effort–not as much as it used to, perhaps, but still a bit more difficult than going into a document and saving your changes. This depends, of course, on what is being copied.

This is not the case for electronic data. One change in the right file, on the right server or disk, is all it can take to propagate the change to the entire set of mirrors, and all backups from that point on–indeed, this is one of many reasons for regular backups. But not all backups are kept forever, and as has already been pointed out many times here, no electronic media will last forever at least none we have currently.

Does the world care about TOPS-20, how to log onto it, or anything related to it… I mean sure, it was the first machine onto the internet, but are such things preserved?

32v, and 4BSD while critical & important OS’s have only been able to run under emulation in the last few years… And sure there is information available *now* what about 10 years from now? 100? It’s a shame that more isn’t done now to save/preserve the old stuff. It’s all a mess, much like OS/2 & AmigaDOS.. The lawyers have made for certain that it’ll never be released, and just lost to the winds of time.

An article like this has been posted on osnews before. I canot find it right now, but it should be around here somewhere.

But let’s try it. Write a document in MS-Word 2.0 and try to open it with a modern word processor. I suppose this is already happening. Data becomes unusable. And in a 100 years scientist will be analysing data bit by bit trying to figure out what it meant.

Not only the data, but also the data carrier. Remember the Commodore 64. The way it formatted it’s floppy disks. Incompatible with PC floppy disk controllers. Some newer machines don’t even have sush a controller anymore.

It should be a legal requirement that all data is stored in fully documented open formats. All formats should also be freely available as ISO or IEEE Standards. There should be no exceptions allowed at all.

All existing formats and software code including documentation should also pass into the public domain 20 years after being released. This can be achieved by all software being made available at the time of release to an archive such as the Library of Congress. The software (including all source code and documentation) would then be automatically made public domain either after 20 years or immediately when official support ceases.

In Australia a copy of all physical publications (books, magazines, videos, newspapers etc) must be provided free of charge to the National Library of Australia for archival purposes.

First and foremost the loss of data accessibility will never occur without something very catastrophic occurring.

You see, for every file format ever devised, closed or open source, that is “extinct” there is some non-legacy piece of software to open/convert that data to something usable today – so long as the file format would hold data important enough to care about.

For instance, what was the first image file format?

Well, scouring the web I can’t find that one out, so I don’t know, but it is probably TGA or RAW ( which TGA basically is ).

Viewing that data is simple today, even if a program doesn’t support opening those file types, it is likely the approximate data they use in-memory ( the raw pixel data ) to display images.

Text is stored in ASCII, so that should never be lost, most file formats will never die so long as someone needs the data contained therein and haven’t yet converted that data.

So, let us journey into a the worst-case situation where a closed-source 50-year extinct formatted file is discovered in the old digital archives containing some data you need ( or want to investigate ).

Now, you have to consider what may or may not be in the file, if you know your job is a lot easier. If you don’t, then you will have more troubles.

If you don’t know what is in the file ( the type of data ), then you need to look for clues. You need to check for all known compression techniques, look for the tale-tale signs of encryption algorithms, and look for any standard markers in the file.

Now, this being 50 years in the future, the machine working for a solution can test thousands of solutions at once, following various possible solution paths ( quantum CPU, AI Algorithms ). There are a finite number of possible tests & solutions. Regardless of the file format or how much we know about it.

So, all you do is grab the data & tell the computer to find out what it is and how to read it, and in 15 ms or less, you have the data. Possibly 60ms for very complex files with large amounts data – I think I can tolerate the wait 😉 Gotta love them qubits!

Of course, someone will need to write the software, but by this time we will not be writing programs the way were are today – by any means. In fact, the software will begin to write itself with just a few instructions as to what you want it to do, this is the future role of the operating system. The best AI wins.

Custom software will very much remain due to market forces & technical reasons, they will just evolve into problem declarations with some of the possible solution paths plotted, allowing the AI to find new paths and optimize everything until the optimized version becomes the “software” to execute. ( You should already know about how qubits works and what they are, if not, what are you doing here? Google it! ).

Of course, that is the worse-case scenario. Here is the likely scenario:

Most data not needed after a jump in format will likely never be needed again, and will likely hold no value in 50 years. No loss if the data is lost.

Most data formats that have enough value that it should be preserved will be migrated into the future formats by those who care to keep their data in that format, otherwise they will just upgrade the data to a newer format ( which plenty of software does automatically today already ).

If a file format is lost completely, and your very interested in the file named “mysecrets.???,” then all you need to do is what I already mentioned. There is already software out there which will check files to see what kind of data they may contain, this will evolve, naturally.

Of course, there are formats out there that make very little sense. But if the data is needed or desired strongly enough, there will be a way to get that data. Period.

Afterall, in 10 years, the 256-bit encryption algorithms which are currently unbreakable by any computer in existence, will be easy to decipher should one try. 10 years should mark the actual deployment of truly usable quantum mechanical processors. If it doesn’t happen sooner. Self assembling computers are on the way. Can you even imagine taking a block the size of a deck of cards, putting it into a “microwave” and pulling out a laptop a couple hours ( or less ) later?

This is becoming reality fast, people!

And yet, somehow, people still think that we will care about the file formats of yore ( or think of computers/advanced tech as anything other than just extensions of ourselves ) or that, somehow, 50 years of progress will make it impossible to decode data if you don’t know what the file format is – we can do that today with enough processing power. All data is organized within rather simple paradigms because they are human-created and must be somewhat comprehendable. Even the most advanced encryption routines will be completely crackable.

We will eventually hit a point in our progress where that even if the currently most advanced encryption routine, started today on the world’s fastest super-computer, encrypting an 8GB HD movie, then encrypting the output in an endless cycle, growing the file size to beyond the peta-byte range, the then-current hardware ( in the future ) will be able to do the job in a reasonable amount of time, such as overnight.

Much of your proposal is still scifi. We need reliable methods to preserve data now and cannot count on some expensive and maybe time consuming future technology only. What realiable technology do we have now to help solving the situation.

Did you also consider the problem of corrupting media? Also hard drives, CDs, DVDs etc. can get corrupt while paper often does not not.

In an archive, customers expect to get the information quickly. A reply that a super computer and AI might be able to help them a few years from now will not make them happy.

Otherwise, data stored in archives will generally be migrated to storage devices as the systems receive upgrades, provided that data is important.

The only data to be lost is data that no longer serves any purpose – so where is the disaster?

Yes, CDs/VHS/Beta/8-Tracks etc all degrade, but it is still possible today to take an 8-track and convert it to a digital format.

The only things lost are those which are not worth the investment to recover.

Indeed, even the oldest formats can be read today by one means or another, which isn’t to say everyone is capable of doing it.

BTW.. quantum computing isn’t SciFi, nor is quantum storage or information transmission. Sure, it is mostly done within labs, but there are commercial products already available, and more being developed. Ten years will see some of this entering into the highest end markets.

Now, as far as keeping data today, you have to keep backups & keep them up to date. Data lost to stupidity is NOT due to format issues, it is due to stupidity – which the debate is not about.

So, if any data is lost, it is no real loss. Seriously, find me one extinct format that holds valuable data which cannot be, in some manner, recovered today. Of course, this doesn’t address degrading mediums. But if that data hasn’t been needed in so long that it has degraded on its medium, then the data probably has no real value.

If the data has value, and is being stored on a degradable medium it is the lack of intelligence that causes the data loss, not the medium on which it is stored, and not the format. If the data is degrading, and humans know it is degrading, and no one is willing to invest the effort to recover the data, then the data mustn’t be worth anything – so no loss.

The only thing that would cause the problems in the debate would be a global nuclear war – and then we have larger problems.

Mind you, I’m no stranger to losing irreplaceable – and valuable – data. And I still haven’t learned my lesson, I have AT LEAST 250GB of irreplaceable data that is in no way backed up. Losing it would be the result of my own stupidity, nothing more.

It is up to me to take care of my data, after all, as it is with everyone.

Indeed, if we had a universal storage cloud, the problem scenario would be of concern, but we don’t. We will lose small bits of useless data – even if that data once had value, it no longer does. If it did, the storage medium would be sent to a data recover center to retain that value ( something I couldn’t afford ).

Anyone who archives data on corruptable mediums deserves their data loss. Otherwise, data stored in archives will generally be migrated to storage devices as the systems receive upgrades, provided that data is important.

Have you visited a big national library or archive? The person interviewed in the article is talking about such environments: endless shelfs containing not only books and other paper documents but also photos, films, recordings, tapes, CDs, CD-ROMs, DVDs, saved data in various kinds of databases etc.etc.

Do you have any clue how much money and man hours it would cost to convert all that stuff located at a big national library into other newer formats? Besides, archives and libraries also aim at preserving stuff in their original format if possible. The original media is important too, not only the content.

It would be therefore ideal if – from the start when the information, like a movie, is first saved to some media – the formats and media used would be as durable and open as possible in order to allow also future generations to use those same documents and media as easily as possible.

If the desire is to preserve the original format in addition to the data, then that work should have started a long time ago.

Otherwise, a simple plan for data safety can be pursued.

Items must be categorized by the vulnerability of their storage medium versus their value. Priorities are made.

Then, a list of requirements for safer, more future-proof & resilient, archiving must be generated. What formats must be read, how would those formats be transferred, and finally how to maintain data integrity during the transfer. At this point, you would likely have the equipment needed to make an original-medium duplicate, then work on protecting and preserving the original.

The originals will be lost, little doubt about that.

There will be data loss, and already has been. Oh well, “we ain’t perfect” 🙂

However, none of that changes the fact that most data of significant value has been migrated from one medium to the next with little to no degradation.

Sure, we will lose (& have lost) some data which holds (mostly sentimental) value, such as the earliest movies & photographs. Paintings of such glory that it changes the onlooker, music so horrid it makes you commit suicide on the third measure, and so on.

I’m not saddened by this, and I do not consider it dire. It could present problems in 100+ years for historians when they are trying to find out who really killed JFK, and what was all this UFO crap? But I suspect the possibly up-and-coming nuclear holocaust could be event more devastating.

Indeed, a good solar flare could wipe out all magnetically stored data on the planet in a single swipe. That is a more grave concern – protecting what we have now. Solid state storage will go a long way to helping, but there seem to be limits with that paradigm which can only be overcome with quantum storage devices.

So the next step in storage is solid state (MLC SSD), which will undergo revision upon revision. But we have already made inroads into quantum storage, and that will ultimately take place of everything.

Indeed, we already can send data from one point to another without interacting with a single point in between. After-all, it is actually possible to be in two (or more) places at once. Data will be stored as a universe state, within the fabric of the universe itself.

Then we have to figure out what will destroy the data then. What will cause those states to change without us saying so?

Mind you, I’m no stranger to losing irreplaceable – and valuable – data. And I still haven’t learned my lesson, I have AT LEAST 250GB of irreplaceable data that is in no way backed up.

It is up to me to take care of my data, after all, as it is with everyone.

But your personal data and personal needs at home have not much if anything to do with the article subject here. The article is not really about your or my personal data saved at our homes. We may very well live happily – or maybe even happier – even after our kid destroys our precious Britney Spears MP3 collection while playing Quake on our PC.

The article is really about big archives and libraries on a national, and even bigger, level and scale, and about the interests of them and their customers. Such institutions couldn’t afford to lose access to a lot of their documents from recent decades just because of media corruption and/or old locked proprietary data formats not supported anymore. Converting all their stuff into other data formats is usually out of question too; and actually it might be even illegal if we are talking about locked and encrypted commercial multimedia formats.

They, at national archives and libraries, at least have to carefully consider in advance what they can do to prevent possible problems related to corrupting media and proprietary data formats not supported any more. That is the whole point of the article.

As someone who has 24 years of documents among a half dozen OSes over the years, it’s too expensive to continue converting data to newer formats. ODF is a lifeline in this case. But as far back as 1992, I knew this would be a problem, and have since either written or saved every document in simple text format.

Photo formats I don’t bother with. But as others have said, secure your own work by using ONLY open formats built with Open Standards.

In yesterdays ‘Australian’ newspaper there was an article on recovery data from some old Apollo 11 mission data tapes. It requires a special tape reader of which only one still exists. The tape reader needs to be rebuilt before transcription can occur. This data is only 39 years old