How Digital Data Dies

This poses a problem, because it makes data storage take effort. Anything that’s not interesting enough to actively preserve from hard drive to hard drive, cloud service to cloud service, simply ceases to exist. 99% of our data is simply being thrown away, into landfills and failed Internet companies. Even for the data we do care about, the prognosis isn’t good.

Consider the problems posed by data compression. In order to save storage space and bandwidth, we often use file formats (like .jpg and .mp4) which compress their contentsHow Does File Compression Work?How Does File Compression Work?File compression is at the core of how the modern web works, one could argue, because it allows us to share files that would otherwise take too long to transfer. But how does it work?Read More in some way. The compression algorithms used come in two general types: lossless and lossy.

Lossless formats eliminate redundancy, identifying chunks of the file that repeat and replacing them with shorter descriptions. This allows you to reconstruct the original file perfectly later, but can only compress the data so much (check out the link above for a visual metaphor of how these algorithms work).

Lossy formats are much more powerful, but come with major tradeoffs. Lossy formats work by discarding some of the information about the original file, in order to be able to encode the file in less space. These algorithms can’t precisely reconstruct the original file, but they’re tuned such that the information that gets dropped tends to be information that people don’t notice. These algorithms can get a spectacular reduction in file size with only a small drop in visual quality, and are used for nearly all audio, video, and pictures.

This is generally a good thing: it allows us to download much higher-quality content much faster than would be possible if we were stuck using lossless formats. However, there’s a dark side to lossy formats, and it looks like this:

When you re-encode a file into a lossy format, data is lost. Converting a lossy format to another lossy format doubles the damage. The above video was generated by repeatedly converting between two lossy formats many hundreds of times. By the end, the man speaking has degraded into a nightmarish mess of color and noise. This process is called generation decay.

As files travel around the internet, being copied and backed up and remixed and re-encoded, this data loss adds up, and files can become heavily degraded. As we get better at lossy encodings and less-efficient file formats fall out of favor, original versions can be lost forever.

Hopefully, movie studios care enough to keep a losslessly encoded version of Cool Hand Luke and Twelve Angry Men safe somewhere, so that we’ll always have high-quality versions of those files. However, this certainly isn’t true of most media. Your digital baby photos and home videos will slowly decay as you transcode them from obsolete formats into new ones.

The same goes double for online content. The originals of most YouTube videos likely no longer exist. When YouTube ceases to exist and those videos are migrated to a new platform, all of them will take a quality hit from the re-encoding process. A few generations of video-sharing platforms down the road, and even those videos that remain popular enough to be copied from platform to platform will be unacceptably degraded.

Vint Cerf, Google’s Chief Internet Evangelist, has talked at length about the dangers of throwing away all this information as cavalierly as we do. During one interview, Cerf described how in 2005, historian Doris Goodwin wrote a book on Abraham Lincoln, and studied his habits by visiting libraries across the country, digging up his old letters, and reconstructing the conversations they embody. Cerf notes that today, “those letters would be emails and the chances of finding them will be vanishingly small 100 years from now.”

This kind of data decay will pose a huge problem for future historians. The twenty-first century may well become a gaping hole in the historical record — a digital dark age.

Can We Do Better?

One solution to this problem is to develop archival storage that can last for much longer with less maintenance, so that it’s easier to archive information for the very long term. A number of smart people are working on this problem, and we’ve rounded up the best available data on their technologies.

So let’s say you want to back up a file for a really long time. How should you do it?

~50 years

Solution: Magnetic Tape

If you only need to store your data for a few decades at a time, your best bet is probably good, old-fashioned magnetic tape (of the kind used by IT departments all over the world). Stored underground in a cold, dry, magnetically-shielded environment, with a healthy degree of redundancy, magnetic tape is relatively stable compared to conventional CDs or hard drives, and only about three times as expensive as low-end hard drives (about $3.0 per gigabyte).

~100 years

Solution: Archive-quality optical disks

Conventional CDs are a terrible way to store data: the aluminum or silver backing starts to oxidize as soon as you open the package, and low build quality can cause other issues. Don’t expect them to last longer than a few years – hours, if you accidentally leave them in the sun. However, some CDs and DVDs are made with a gold backing and a much higher build quality. Gold doesn’t oxidize, which means that these disks can last a long, long time. It’s hard to know exactly how long, because we haven’t had them for very long, but we can get a good estimate by taking the disks, being really mean to them, and then trying to recover the data: this is called an accelerated aging test.

Based on these tests, manufacturers claim lifespans in the 1-3 century range. For maximum data density, you can pick up archival Blue Rays for about 2.5 gigabytes per dollar, with a projected lifespan of 200 years. Accelerated aging tests aren’t a sure thing, but it’s probably safe to count on them for a century or so. As a bonus, unlike magnetic tape, they require no special equipment to read and write, so startup costs are minimum.

~1000 years

Solution: M-Discs

Okay, forget that “century” nonsense, let’s get serious. To give you an idea of the timescale, one thousand years ago, Earl Eric Haakonsson outlawed berserkers in Norway for the first time. That’s these guys etched on a bronze plate discovered in the 20th century:

Until recently, there weren’t many good industrial options for this kind of timescale. However, recently, an exciting option has emerged called an ‘M-disc.’ These are archival DVDs made out of a thick layer of a “stone-like” mineral composite which is designed to be etched by special burners (though they can be read by normal DVD drives). These are absurdly robust, and expected to survive for at least a thousand years. That’s an ambitious claim, but the company has some solid research (including a study by the US Department of Defense) to back it up.

These discs are even reasonably cheap, at 5.7 gigabytes per dollar, though you’ll also need a special burner. If you’re seriously interested in storing a lot of data for a long time, M-discs are the clear winner.

~10,000 years

Solution: Engraving extremely stable metals

This is where we start to stray from the beaten path a little. As of right now, there are no digitally-readable formats that can survive anywhere near ten thousand years. That means that any data archived for this duration is going to be very difficult to recoverWhat Is Data Recovery And How Does It Work?What Is Data Recovery And How Does It Work?If you've ever experienced a major loss of data, you've probably wondered about data recovery -- how does it work?Read More. In some ways, this okay — it’s not like DVD readers are going to be around in ten thousand years anyway.

So how do you store data for that long? The answer is that the only materials that can survive those kinds of timescales are chemically stable metals and gemstones. This technology has already been used in practice for the Voyager records — gold disks, engraved with information representing audio and images, which were launched aboard the Voyager probe. The probe is on its way out of the solar system in order to provide a lasting record of humanity for aliens to someday find.

A modern take on the issue is nano-lithography. A company called Norsam has adapted lithography techniques originally developed for engraving semiconductors, and can use them to etch fine patterns onto surfaces like diamond or nickel. The resolution is decent (about 165 gigabytes per 12 centimer disk), and it’s also practically indestructible. Stored safely, these disks should last for many thousands of years, and can survive EMPs, most fires, and the collapse of human civilization. Pricing information isn’t easily available, but “expensive” is a really good guess.

One early application of this technology has been the creation of modern “Rosetta Stone” plates, made out of titanium, to be stored in safe places around the world, containing about thousands of pages of text, translated between many languages, to provide a reference for future historians if some modern languages are lost. As a side benefit, the disks also look incredibly cool:

More Than 100,000 Years

Let’s be clear here: if you’re shopping for computer storage and nano-engraved titanium is just too short-lived for you, then your planning horizon terrifies me. One hundred thousand years ago, early man first began to venture off the African continent to Europe. If you really care about making sure that your digital data survives that far into the future, then you have departed the ken of mere mortals, and probably also sanity and good sense.

Which is not to say that you don’t have options.

Solution: Fossilized DNA

One of the perks of the biotech revolution is that there are plenty of companies that will create custom DNA for you out of a string of base pairs that you provide, online, for a marginal fee. Each base pair has four possible combinations, which can store two bits. The data can then be read by sequencing those genes at a later date, using a variety of techniques. This allows DNA to serve as a kind of exotic data storage. Now, by itself, your custom DNA chains are pretty short-lived, and will chemically break down at room temperature in a few years. There are a few ways to extend its lifespan.

You could splice your data into the DNA of a long-lived organism, like the Great Basin Bristlecone pine (which is known to live more than five thousand years). Because these trees can reproduce, your primary concern then becomes keeping them safe from the numerous large-scale fires, meteor impacts, and volcanic eruptions that are going to happen in the future. You might be able to get your data to survive for a few tens of thousands of years by planting several forests of archival trees in safe, remote places; but – of course – you’re not interested in such small potatoes.

In order to really get your money’s worth out of DNA storage, you need to chemically fix the DNA to protect it against chemical change and radioactive breakdown. Researchers have found a way to imbed DNA into molten glass in order to create a “synthetic fossil” that will protect the DNA for extremely long periods of time. The process is based on natural fossilization, and was developed after the revelation that it is often possible to extract intact DNA from fossils millions of years old. With proper use of error-correcting codes and redundancy, there’s no reason you couldn’t preserve many gigabytes of information for single-digit millions of years.

In terms of cost-effectiveness: if you’re worried about price, this storage method isn’t for you. This is not a commercial process by any stretch. You are going to be spending at least hundreds of thousands of dollars to have the DNA fabricated and preserved. This is not an undertaking for the faint of heart. Still, it’s an option, and if you really want to make sure that the most important data on the Internet is still available long after humanity is dead and gone, it is within your power to do so.

Are you concerned about the digital dark age? What data do you want to preserve for future generations? The discussion starts in the comments!

My greatest concern is not so much the media. I use gold CDs but in 30 or 50 years will there be anything capable of reading it. Microfilm lasts a long time but they no longer make replacement bulbs and it is difficult to acquire a reader, No media will work unless there is some commitment to preserving the ability to read that media.

Blue-ray drives are not going to be around forever. Just as 8-tracks, cassettes, 78, 45, 33 rpm, magnetic tape, 3.5 & 5.25 diskettes, zip drives, blue-rays are going to be declared obsolete and be replaced by newer and supposedly better technology.

You've spent hundreds of hours scanning your Dad's old 35mm slides, endless afternoons scanning family pictures that go several generations back, and you have religiously edited countless videos of family gatherings -- trusting that your efforts will be passed down more generations. Wrong.

I find it disconcerting that, for all of our vaunted technology, we cannot assure the long term viability of digital data. After all, its just ones & zeros, right?

Sure, we can upload to Google Drive or Amazon Cloud or other clouds, but who can say what happens to that data? How is it compressed and handled in the long term, especially when the cloud firms use multiple redundancy of your files on different servers: were those redundancies all made from one copy, or are they each yet another new iteration, already subject to degradation?

Maybe its time to make up small packages of all the slides & photos, seal them up using one of those vacuum food sealers, and bury it all in a mayonaise jar under Funk & Wagnals porch. The videos? They might not survive for the grandkids to see.

Aside from the required hardware (e.g., 8-track tape players), data should be stored with the programs needed to display it and maybe edit it as well. Maybe a miniaturized Word would help.
As the possessor of files written in no longer extant proprietary software, I've had to resort to scanning them and converting to readable text or, in some cases, Word.

The manufacturers rate them for only 1 year data retention with the power off, although data may last up to 10 years in practice. If powered up periodically for a few hours the SSD's firmware will refresh the stored data and reset the clock.

In the article, in reference to magnetic tape, the following is stated... "only about three times as expensive as low-end hard drives (about $3.0 per gigabyte)."

Where did you come up with that statement and that figure?!

Tape is WAYYYY less expensive than hard drives. Everyone knows that. The article has a hyperlink to a website selling FujiFilm LTO-5 Tape Cartridges. Using the pricing on that page, I get $0.0146 per GB for LTO Tape -- or $14.6 per TB!

Now, if you want to talk TCO of a storage system, the analysis would be different but tape would still come out on top by orders of magnitude over disk.

The bigger problem with electronic preservation is how to read it. When talking about gold discs you say "unlike magnetic tape, they require no special equipment to read and write" which is false. The difference is that CD/DVD equipment is common now. It won't necessarily be common or even available to your children in a few decades. I have 5.25" and 3.5" floppy disks in my files (what can I say, I'm a packrat) but nothing set up to read them. And don't get me started on all the different formats of video from Beta to LaserDisc.

If we put it at the personal level and you want to preserve something for generations, write in india ink on acid free paper (or acid free rag bond if you can find it, or vellum if anyone still makes it). For those who do not relish writing with an inkwell (india ink does not work well with fountain pens), a typewriter, impact printer, or, in a pinch, an inkjet printer would be your best bet. Ink on good paper lasts for centuries. I have 500 year old books and every word is legible (although the bindings are crap or I never would have been able to afford them).

The good old print on page book wasn't exactly built for immortality but quite a few have survived years, decades, and even centuries. Everyday digital media can save vastly more data but have much shorter lifespans.
I sense the cosmic sense of humour at work.