A Black Hole of Information?

byPaul GilsteronFebruary 17, 2015

A couple of interesting posts to talk about in relation to yesterday’s essay on the Encyclopedia Galactica. At UC-Santa Cruz, Greg Laughlin writes entertainingly about The Machine Epoch, an idea that suggested itself because of the spam his systemic site continually draws from “robots, harvesters, spamdexing scripts, and viral entities,” all of which continually fill up his site’s activity logs as they try to insert links.

Anyone who attempts any kind of online publishing knows exactly what Laughlin is talking about, and while I hate to see his attention drawn even momentarily from his ongoing work, I always appreciate his insights on systemic, a blog whose range includes his exoplanet analyses as well as his speculations on the far future (as I mentioned yesterday, Laughlin and Fred Adams are the authors behind the 1999 title The Five Ages of the Universe, as mind-bending an exercise in extrapolating the future as anything I have ever read). I’ve learned on systemic that he can take something seemingly mundane and open it into a rich venue for speculation.

So naturally when I see Laughlin dealing with blog spam, I realize he’s got much bigger game in mind. In his latest, spam triggers his recall of a conversation with John McCarthy, whose credentials in artificial intelligence are of the highest order. McCarthy is deeply optimistic about the human future, believing that there are sustainable paths forward. He goes so far as to say “There are no apparent obstacles even to billion year sustainability.” Laughlin comments:

Optimistic is definitely the operative word. It’s also possible that the computational innovations that McCarthy had a hand in ushering in will consign the Anthropocene epoch to be the shortest — rather than one of the longest — periods in Earth’s geological history. Hazarding a guess, the Anthropocene might end not with the bang with which it began, but rather with the seemingly far more mundane moment when it is no longer possible to draw a distinction between the real visitors and the machine visitors to a web site.

Long or short-term? A billion years or a human future that more or less peters out as we yield to increasingly powerful artificial intelligence? We can’t know, though I admit that by temperament, I’m in the McCarthy camp. Either way, an Encyclopedia Galactica could get compiled by whatever kind of intelligences arise and communicate with each other in the galaxy.

Into the Rift

Of course, one problem is that any encyclopedia needs material to work with, and we are seeing some signs that huge amounts of information may now be falling into what Google’s Vinton Cerf, inventor of the TCP/IP protocols that drive the Internet, calls “an information black hole.”

Speaking at the American Association for the Advancement of Science’s annual meeting in San Jose (CA), Cerf asked whether we are facing into a century’s worth of ‘forgotten data,’ noting how much of our lives — our letters, music, family photos — are now locked into digital formats. Cerf’s warning, reported by The Guardian in an article called Google boss warns of ‘forgotten century’ with email and photos at risk, is stark:

“We are nonchalantly throwing all of our data into what could become an information black hole without realising it. We digitise things because we think we will preserve them, but what we don’t understand is that unless we take other steps, those digital versions may not be any better, and may even be worse, than the artefacts that we digitised… If there are photos you really care about, print them out.”

If that seems extreme, consider that no special technology was needed to read stored information from our past, from the clay tablets of ancient Mesopotamia to the scrolls and codices on which early historians and later medieval monks recorded their civilization. A great library is a grand thing that does not demand dedicated hardware and software to use. ‘Bit rot’ occurs when the equipment on which we record our data becomes obsolete — a box of floppy disks sitting on my file cabinet mutely reminds me that I currently have no way to read them.

Cerf’s point isn’t that information can’t be migrated from one format to another. We’ll surely preserve documents, photos, audio and video considered to be significant in many formats. But what’s surprising when you start prowling a big academic library is how often the most insignificant things can act as pointers to useful information. A bus schedule, a note dashed off by the acquaintance of an artist, a manuscript filed under the wrong heading — all these can unlock parts of our history, and we can assume the same about many a casual email.

Historians have learned how the greatest mathematician of antiquity considered the concept of infinity and anticipated calculus in 3 BC after the Archimedes palimpsest was found hidden under the words of a Byzantine prayer book from the 13th century. “We’ve been surprised by what we’ve learned from objects that have been preserved purely by happenstance that give us insights into an earlier civilisation,” [Cerf] said.

Will we preserve what we need to so that we have the kind of record of our time that is so tightly locked up in the books and artifacts of previous eras? The article mentions work at Carnegie Mellon University in Pittsburgh, where a project called Olive is archiving early versions of software. The project allows a computer to mimic the device the original software ran on, a technology that seems a natural solution as we try to record the early part of the desktop computer era. We’ll be refining such solutions as we continue to address this problem.

Cerf talks about preserving information for hundreds or thousands of years, but of course what we’d like to see is a seamless way to migrate our cultural output through advancing levels of technology so that if John McCarthy’s instincts are right, our remote ancestors will still have access to the things we did and said, both meaningful and seemingly inconsequential. That’s a long way from an Encyclopedia Galactica, but learning how to do this could teach us principles of archiving and preservation that could eventually feed such a compilation.

A great post…Our American underground atomic missile stations use 6 inch floppies in their system to keep out unwanted intruders…I should have kept that IMB Junior 128k circa 1984…Will keep the old 2002 XP just in case…

His novel Glasshouse also assumes an information dark age as part of the set up.

Vernor Vinge also had a solution for obsolete languages – layers and layers of newer languages sitting on top of older ones. This was mentioned in his novel ” A Fire Upon the Deep”.

It seems to me that digital media of any sort are a 2 edged sword. Their power requires that the decoding mechanism is known, and there just doesn’t seem to be anyway around it. We really need to archive important things as media that requires just visual (or other senses) decoding. In SF movies, unknown informational artifacts such as records are always self contained, so that a simple action will expose the content to the senses. Can we build such artifacts today that would have the longevity that we need?

Very interesting topic. It is true that we believe ourselves invulnerable to information loss because bits and bytes can be so easily copied, but that’s far from the complete truth. The places we put information into are very susceptible themselves, and their complexity makes them less dependable than old information storage mechanisms (like paper, ceramic and stone).

We are a simple drive failure away from irreversibly missing documents and even moments from our lives forever. The only safeguard are backups, and even those have an expiry date much shorter than stone, papyrus or even paper books.

From my personal experience, Google+ and other cloud services has already saved me from a couple digital information collapses.

The first time it was a personal computer whose disk simply completely died. Yes, I know I need to do physical backups, but I was contented with simply having the Google drive app in it and place my valuable stuff there.

The second one, a botched iPhone factory default restart, which left me without access to my iCloud photos/videos. But Google+ ensured I had them all, somewhere.

Thus my docs, videos and photos lived on.

Nevertheless if Google goes belly up one day, I might end up without any other backup whatsoever.

Besides that, and even if Google or the Cloud live for a long while, I think most personal files stored in the Cloud will become worthless once we are gone. I mean, who would want to keep a hoard of personal files from a dead guy? Most will simply be deleted once the owner is no longer there accessing them and for making any complaint when they are finally deleted.

The problem, as the article above says, is that the distinction of what’s trivial and what’s historically important is a very subtle one. Silly photos and personal messages with merely sentimental value can become crucial elements to untangle an historic mystery, some centuries later.

And it’s doubtful we would have the privilege of leaving so much of a trace, maybe even less than our ancestors, that used ceramic, stone and paper for recording the information of their lives.

Unless there is a quite consistent and almost pharaoh-burial-like effort to keep a register of relevant digital information of our age, for the posterity.

I am not sure I am buying this. It could be said that a single Hollywood movie contains more information than all Sumerian stone tablets taken together, even when including all the ones that have been lost. The average Sumerian didn’t have even a single stone tablet to their name, I would bet. The average person today leaves countless gigabytes of data in their wake. If there is any danger, it seems, it is from too much information being preserved.

Another thing to consider: Does anyone really think it is more difficult to decipher an unknown image format from binary data than it is to decipher an ancient script? Particularly when there is tons of information about old image formats available, so that they really aren’t unknown. Once its bits are correctly arranged into pixels, an image is immediately accessible to anyone. The same cannot be said of ancient writing.

Yet another thing to consider: In addition to the internet and all its archiving projects, we also still print and distribute more actual paper books than ever before in history. I am pretty sure that any one Barnes & Noble would put the Library of Alexandria to shame in terms of volume. Plus, the coffee is better.

Changes in language have been hard enough for historians to crack (try reading Beowulf or Le Morte d’Arthur in their original languages for a simple example), so attempting to read totally alien languages will be pretty much impossible. Until the Rosetta Stone was discovered, ancient Egyptian was indecipherable, and even now no one can read Linear-A script. Even languages like Sanskrit challenge the investigator, since much of it seems to be written in elaborate puns or word games which defy modern scholarship. And of course we also have elaborate visual languages for motion pictures, and musical “languages” as well; huge troves of information which may become impossible to decipher as cultural references die off.

So the meaning of long term archived materials may be lost even if the means of actually reading it may be preserved.

Exceptionally difficult problem. There is & probably never will be any neat solutions unless we consciously make strives to preserve our cultural heritage. Both for short term period (1-3 centuries) & for longer (eons?) ones. We don’t know how information would be retrieved in 400 year sin future on decayed harddrivers (HDD/SDD) or optical/tape media.

The only way to communicate with deep future is to have strong conscious choice in terms of having information knowingly preserved in long duration physical media in ways it’s easily consumed by future entities. How one preserves digital content into a physical media w/o need for extra play back device? I don’t see solution for such problem anywhere near or even in this millenniam. We already have a lost digital decade. Think back of all those games people had in the 70’s/80’s. The only reason people remember of them is because some have fond memories when see running in emulators.

The data format is not such a problem, although should, as far the meta data is attached & defines the content & means of decoding. Preserving and passing it forward to future is far more important problem.

This is & will be fundamental issue no matter how good or bad we would fare in the quest just the future civilization can look back & acknowledge how the problem was solved back then. For instance how the far future civilization would store information for deep future when everything is stored in quantum states in crystallized materials manipulated by unknown for us method (laser; quantum beam; axion/neutrino read-write device).

We can’t stop nor should worry how future generation will perceive us or did they do that correctly. The only mean to somehow control that is to provide tangible material with information.

Good example to highlighting information decay of past & rediscovering it again. Digitized high res photos of America made between 1862 – 1915 which are mesmerizing because the resolution & clarity is on par or exceeds contemporary results.

From all the pictures I find #084 from 1911 the most brilliant where one can see RMS Olympic (Titanic sister ship) & RMS Lusitania on background (easily could be mixed with Titanic if careless but yet has nothing to do with the Olympic class ships) where people might not perceive that it’s 1 year prior Titanic incident (currently being built) & 4 years to Lusitania torpedo hit by German U-boat. By all means this photo is taken long before the history as we know it today.

Dmitri gives several good examples of information having NOT been lost, e.g. old games on emulators (I doubt there are any that are no longer available in some ROM-pack). Also, the photographs he mentions and then posts links to.

In Centauri Dreams, Paul Gilster looks at peer-reviewed research on deep space exploration, with an eye toward interstellar possibilities. For the last eleven years, this site has coordinated its efforts with the Tau Zero Foundation, and now serves as the Foundation's news forum. In the logo above, the leftmost star is Alpha Centauri, a triple system closer than any other star, and a primary target for early interstellar probes. To its right is Beta Centauri (not a part of the Alpha Centauri system), with Beta, Gamma, Delta and Epsilon Crucis, stars in the Southern Cross, visible at the far right (image: Marco Lorenzi).

Centauri Dreams publishes selected comments on the articles under discussion here. The primary criterion is that comments contribute meaningfully to the debate. Among other criteria for selection: Comments must be on topic, directly related to the post in question, must use appropriate language and must not be abusive to others. Civility counts. In addition, a valid email address is required for a comment to be considered. Centauri Dreams is emphatically not a soapbox for political or religious views submitted by individuals or organizations. A long form of the policy can be viewed on the Administrative page. The short form is this: If your comment is not on topic and respectful to your fellow readers, I'm probably not going to run it.