Scientists may have found the formula for preserving data for future
generations in the face of planned technological obsolescence

Are we sliding into a dark age of information? True, there are some three thousand billion billion bytes of recorded digital information on the planet, and the figure is rapidly rising.

However, most of this will be lost to future generations as we use ephemeral recording media and soon-to-be-obsolete storage devices, and rely on software whose business models depend on planned obsolescence and compulsory upgrades. If we’re not careful, historians will know more about the beginning of the past century than the start of this one.

The contrast between today and the pre-digital era is dramatic. A 7th century BC clay tablet bearing Babylonian observations of Venus some 3,500 years ago can still be viewed at the British Museum. But, as my colleagues at the Science Museum carry out research for our forthcoming Information Age gallery, they have found the zeroes and ones of digitised information easily succumb to technological change.

While William the Conqueror’s Domesday Book is still available for inspection at the National Archives in London, that is not the case when it comes to the original version of a survey commissioned to mark the 900th anniversary of the book, recorded on now-obsolete 12in laser discs, which had to be rescued by a digital preservation project. As for the first email sent in 1971 by Ray Tomlinson, all he could recall was that it was something like the first line of letters on a keyboard – “qwertyuiop”.

Data that once were held on magnetic tape or floppy disks are unreadable on today’s equipment and, no doubt, the same fate will befall CDs and other media. Computer files are not worth anything without the software to open them. We have all had problems with TIFF images that aren’t interpretable, or an email that isn’t readable, or documents that crash because the latest upgrade will not recognise them.

Even scientists, who put a particular premium on data, are proving unreliable custodians. Tim Vines, of the University of British Columbia in Vancouver, sought the data behind 516 ecology studies published between 1991 and 2011. He reports in Current Biology that while data for almost all studies published two years ago were still accessible, the odds of them continuing to be readable fell 17 per cent per year.

No doubt future generations will manage without those selfies, pictures and videos you lost when your hard drive died (we’re unsure how long USB memory sticks will last). But historians will mourn their passing. And when it comes to the results of particle accelerators, medical records and economic data, they are crucial for a rerun of experiments, new analyses or to check for traces of error or fraud.

One solution is to demand that researchers submit their data to a public archive as a condition of publication in a journal. Even so, many digital formats cannot be trusted to last more than a decade. This is a particular problem in particle physics (the Large Hadron Collider in Geneva generates 15 petabytes per year, where one petabyte is the equivalent of 210,000 DVDs) and in the life sciences, where massive volumes of data – notably DNA sequences – are growing. Even though they have been diligently archiving DNA sequences since the 1980s, other storage ideas are being pursued, such as microfilm and acid-free paper, along with clever ways to “compress” data.

But perhaps one solution is the one used by nature. After all, we already know that DNA is a robust way to store information, because we can extract it from the bones of extinct humans dating back tens of thousands of years.

Ewan Birney and Nick Goldman, at the EMBL-European Bioinformatics Institute near Cambridge, have created a way to store two thousand million million bytes per gram of DNA. Birney has speculated about how DNA records could be used in 10 millennia. “I’d say that should take care of most of our archiving needs since 10,000 years takes us back into human prehistory, before the earliest recorded writing appears on the scene (around 6,000 BC to 3000 BC).” Combined with diligent archiving and preservation, DNA stores could signal the beginning of the end of our digital doomsday.

Roger Highfield is the director of external affairs at the Science Museum