Vinton Cerf’s talk discusses the desirable properties of a "Digital
Vellum" — a system that is capable of preserving the meaning of the
digital objects we create over periods of hundreds to thousands of
years. This is not about preserving bits, It is about preserving
meaning, much like the Rosetta Stone. Information Centric Networking may
provide an essential element to implement a Digital Vellum. This
long-term thinking will serve as a foundation and context for exploring
ICNs in more detail.

ICN is a generalization of the Content-Centric Networks about which I blogged two years ago. I agree with Cerf that these concepts are probably very important for long-term digital preservation, but not why they are. ICNs make it easy for Lots Of Copies to Keep Stuff Safe, and thus make preserving bits easier, but I don't see that they affect the interpretation of the bits.

There's more to disagree with Cerf about. What he calls "bit rot" is not what those in the digital preservation field
call it. In his 1995 Scientific American article Jeff Rothenberg
analyzed the reasons digital information might not reach future
readers:

Media Obsolescence - you might not be able to read bits from the
storage medium, for example because a reader for that medium might no
longer be available.

Bit Rot - you might be able to read bits from the medium, but they might be corrupt.

Format Obsolescence - you might be able to read the correct bits from
the medium but they might no longer be useful because software to render them into an intelligible form might no longer be available.

Media Obsolescence was a big problem, but as Patrick Gilmore pointed out
on Farber's list it is much less of a problem now that most data is on-line and thus
easily copied to replacement media.

Format Obsolescence is what Cerf was discussing. There is no doubt
that it is a real problem, and that in the days before the Web it was
rampant. However, the advent of the Web forced a change. Pre-Web, most
formats were the property of the application that both wrote and read
the data. In the Web world, these two are different and unrelated.

Google is famously data-driven, and there is data about the incidence of format obsolescence - for example the Institut National de
l'Audiovisuel surveyed their collection of audiovisual content from the
early Web, which would be expected to have been very vulnerable to
format obsolescence. They found an insignificant amount. I predicted this finding on twofold theoretical grounds three years before their study:

The Web is a publishing
medium. The effect is that formats in the Web world are effectively
network protocols - the writer has no control over the reader. Experience shows protocols are the hardest things to
change in incompatible ways (cf. Postel's Law, "no flag day on the
Internet", IPv6, etc.).

Almost all widely used formats have
open source renderers, preserved in source code repositories. It
is very difficult to construct a plausible scenario by which a
format with an open source renderer could become uninterpretable.

That is the danger of closed, proprietary formats and something
consumers should be aware of. However, it is much less of an issue for
most people because the majority of the content they collect as they
move through life will be documented in widely supported, more open
formats.

While format obsolescence is a problem, it is neither significant nor pressing for most digital resources.

However, there is a problem that is both significant and pressing that affects the majority of digital resources. By far the most important
reason that digital information will fail to reach future readers
is not technical, or even the very real legal issues that Cerf
points to. It is economic. Every study of the proportion of content that is being preserved comes up with numbers of 50% or less.
The institutions tasked with preserving our digital heritage, the
Internet Archive and national libraries and archives, have nowhere
close to the budget they would need to get that number even up to 90%.

Thank you, Vint. I am not claiming that any technology will save the interpretability or even the survival of bits for 1000 years. I would be interested in a plausible scenario by which a format with an open source renderer would be uninterpretable in a less speculative time-frame such as 100 years.

I'm quite familiar with the Olive project and have pointed to it from this blog. I've blogged about emulation as a preservation strategy since 2009 and in particular about the encouraging recent progress in delivering emulation in a user-friendly way at C-MU, Freiburg, the Internet Archive and elsewhere. But I'm very skeptical that these efforts will have a detectable impact on the survival of usable information for 1000 years.

Its a sad fact that very little information survives from 1000 years ago. The same will be said 1000 years from now. In order to survive 1000 years information has to survive the first 100. The vast majority of information generated today will not survive 100 years for reasons that have nothing to do with the interpretability of the bits.

We have a choice. We can deploy the limited resources society makes available for preservation to efforts that might or might not have an impact 1000 years from now, or to efforts that have a high probability of increasing the resources available to readers in the next few decades. I made my choice more than a decade and a half ago.