Oracle Blog

Technology with a focus on storage, sometimes.

Thursday Mar 27, 2008

My family and I took a brief vacation this weekend and made our way to Washington D.C. for a little R & R. We enjoyed 2 and a half days of sights, tours, history and we even squeezed in a little time for the pool. For those of you that have been to D.C. (or live there), you know that 2 and a half days only allowed us to scratch the surface of the United States cultural base that is alive as well as preserved in the city (and often within a few blocks of the National Mall).

There are so many thought provoking and emotional moments as you move around that after two and a half days I found myself almost completely wrung out. We saved the Congressional Gardens with the Vietnam Memorial, World War II Memorial, Lincoln Memorial, Korean Memorial, and the others for the last day. The artistry and the thought that went into these memorials is astounding and the emotions that they pull out of you put you into knots.

I won't list everything we did on the whole journey over the weekend. For my youngest son, going up the Washington Memorial (and our need to start standing in line for tickets at 6:30am) will probably be the most impacting moments. For Shaun, hopefully the Vietnam Memorial and the Pederson House. For me, who knows, the Bill of Rights, the Constitution, the Declaration of Independence, the Magna Carta...simply amazing, but overall I can't name a single moment that wasn't worth its weight in gold.

Professionally though, the National Archives had to be one of the most thought provoking of our stops.

Here is this large building, with all of these physical manifestations of our history on display and in vaults around the building. The Constitution, the Declaration of Independence, the Bill of Rights, the Emancipation Proclamation and more than I could ever list here. It was "Magna Carta Days" at the National Archives as well and one of the four remaining copies of the 1297 Magna Carta from King Edward I was on display.

Here are a few thoughts that went through my head:

With infinite, perfect copies of digital content, what makes a digital entity "unique" and "awe inspiring"?

How is our country going to preserve and revere a digital creation 710 years from now?

How do you know what digital creations are worth preserving since you can hit a button and destroy them so easily?

The first of these seems entirely out of place with storage technology, but when you stand in front of the Declaration of Independence it makes you wonder what digital content could actually have this impact on a person and how you would embody that digital content. The record companies are struggling with it as well. In addition to digital download content, the companies are trying out releases on USB thumb drives as well as larger packages and "Deluxe" sets. Books are going to struggle through the same revolution (as magazines already have) of being bundled as bits with little or no "branding" or "artistry" about the packaging. How does one recreate that sense of uniqueness when the content is merely a bunch of bits that gets flattened into 10 songs amongst 8,000 on an iPod? Really, what "value" do those songs and books have anymore when they can be passed around at will and are part of a great "torrent" of traffic into and out of our computers? Something really has to "stick" to remain on "top of our stereo" these days.

And as for the Declaration of Independence...what a clean and simple document. The document itself hung in windows and is incredibly faded and worn down. Only after time passed did our country seek to formally preserve it for posterity. Perhaps we caught it in time to save it from deteriorating any more. But one still has to ask, is it the single original document that retains the significance or is it the content that remains significant. If it is the content, we wouldn't store the original in a huge underground vault and protect it as well as Vice President Cheney, would we?

Having seen the original, I would have to argue that there is something incredibly unique about it, it actually holds more reverence (for lack of a better word) than one of the many copies of it. So how does one reproduce that "reverence" in a digital world?

If that is not enough to think about, we have to think about digital preservation. The Magna Carta of 1297 has withstood time for 710 years and is in wonderful shape. What digital storage technology today do we have that can withstand decay for that length of time (of course, one could argue that some rock etchings have withstood time for 1,000s of years). Let's put this in perspective, today's disk drives and SSDs are generally spec'd for 5 years. If I want to preserve my family's pictures for 710 years, I would have to ensure the data was migrated 142 times. Hmmm, I'm not sure if my kids and their kids and their kids are up for that.

It appears that CDs and DVDs may have a lifespan of around 50-200 years if you preserve them properly. That is getting pretty reasonable...of course, they haven't been around for 50 to 200 years so they are certainly not battle tested like carving on a good rock. The National Institute of Standards and Technology appears to be looking heavily into the longevity of optical recording media. DLT appears to have a shelf-life of around 30 years if preserved properly.

Let's say, hypothetically, that you solve the problem of the storage media (perhaps a self migrating technology in a box that guarantees infinite lifespan and that, itself, produces the new disks and technology to ensure fresh DVDs are always built). Now you have two additional challenges (at least):

Maintaining the integrity of the data (how do I ensure) that the data that is NOW on the DVD is the original data

Maintaining the ability for outsiders to inspect and recall the stored information

The first of these seems obvious, but is actually quite difficult. Checksums can be overcome with time (imagine compute power in 700 years!) and we can't guarantee that the keepers of the information will not have a vested interest in changing the contents of the information. We see governments attempting to re-write history all the time, don't we?

Let's take a simpler example of what happens when "a" byte disappears. Recall Neil Armstrong's famous quote: "One small step for man ... ". Well, after a lot of CPU cycles and speculation and conspiracy theories, it turns out that we now believe that Neil Armstrong said: "One small step for a man...". It is a fundamentally different statement (though there is no less historical impact). This data is only 40 years old, but consider the angst in trying to prove whether or not "a" was a part of the quote. What happens when a government deliberately alters, say, the digital equivalent of an "original" 2nd Bill of Rights written in 2030?

One more thought for the day, since I really do have to work and if you have made it this far, it is my duty to you to free you of my ramblings.

We know for a fact that the English (dare I say...United States dialect) language is evolving. Even after 200 years there are phrases and semantics and constructs in the Declaration of Independence that require quite a bit of research for the common US citizen. Take the following paragraph:

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

There is the obvious use of the word perfidy, a word that has since all but disappeared from common speech in the United States.

Looking deeper at the paragraph we see evolution in spelling (compleat). There is also a fascinating use of capitalization throughout the Declaration of Independence. The study and usage of capitalization alone could be worth the creation of long research papers.

What does this tell us? The content and meaning of a work lies often with the context and times in which the work was created. How does one retain this context, language, and ability to read the content over 700 years? This is not a small problem at all. There are entire cultures lost or in the process of being lost as the language and the context is lost, consider the United States own Anasazi culture as an example.

A computer dialect (protocol, standard, information model, etc...) are themselves subject to evolution and are even more fragile than spoken language itself. A change in a capitalization in an XML model may break the ability of pre-existing programs for reading and migrating information, resulting in lost information. Once you break a program from 200 years prior, how much expertise will still exist to maintain and fix that program?

Crazy things to think about. Personally, I believe we are in a fragile place in our history where we could lose decades of historical information as we transition between written works and digital works. As part of my night job I'm trying to get more involved in the Sun Preservation and Archiving Special Interest Group (PASIG) to learn more about what our customers are doing in this area. I'm also trying to reorganize my own home "infrastructure" to be more resilient for the long run to ensure that my family's history does not disappear with my computers.

There are significant challenges in the computer industry all over, but preservation of history is one that our children and our children's children will judge us with. USB thumb drives will come and go, but hopefully our generation's digital treasures will not go to the grave with us.