OpenTimestamps Has Timestamped the Entire Internet Archive — Here’s How

Well, sort of. In a blog post published last week, the developer and consultant explained how he used his OpenTimestamps project to timestamp all the Internet Archive’s 750,000,000 files onto Bitcoin’s blockchain. This means that no one — not even the Internet Archive itself — can modify this collection of books, videos, images and other records; not unnoticeably.

Here’s how he did it.

Merkle Trees

The effort is an interesting showcase of the OpenTimestamps project, an open-source and freely available timestamping service.

OpenTimestamps works by combining two cryptographic tools.

The first of these is a Merkle Tree, a cryptographic structure of hashes.

Any piece of data can be hashed, which means it’s scrambled and condensed into a short string of numbers: a hash. This string of numbers is seemingly random; it can’t really be used for anything in itself, not even to reconstruct the original data.

But it can be used as a sort of check. Anyone who has access to the original data can hash this data once again, and will get the exact same hash. Meanwhile, even if the original data is altered minimally (perhaps a picture includes an extra pixel), the resulting hash turns out completely different. A hash proves that the data you have is the exact same data used to create a hash.

A Merkle Tree, then, hashes multiple hashes together. Two hashes become one hash. And another two hashes also become one hash. Then these two resulting hashes become a new hash, so all four hashes are now represented by a single hash. And these four combined hashes can perhaps be hashed together with the hash of four other combined hashes, to once again conclude into a single hash. Etcetera. Because this hashing of hashes can continue in perpetuity, a Merkle Tree can ultimately “store” virtually unlimited amounts of data.

The real magic of a Merkle Tree is that any of the original data included in the tree can be checked against the single remaining hash of a Merkle Tree: the “Merkle Root,” even without requiring any of the other data hashed into the Merkle Tree. You just need to know where in the tree to find the hash.

The second piece of the puzzle is establishing when that Merkle Tree was created.

Bitcoin’s Blockchain

Establishing when a Merkle Tree came into existence is done by utilizing the power of Bitcoin’s blockchain. Literally, Bitcoin’s power-consuming, proof-of-work system guarantees that data must have existed at a certain point in time.

The Bitcoin blockchain is essentially a cryptographic structure, just like a Merkle Tree. But while a Merkle Tree merges hashes into a single compact hash, the blockchain merges them into a timeline. Each Bitcoin block is hashed and included in the next block. That block is hashed too and included in the block after that.

Meanwhile, Bitcoin’s proof of work makes it so that each of these blocks requires real resources to mine. Right now, this already costs thousands, perhaps even tens of thousands, of dollars per block.

This is what makes Bitcoin’s history immutable.

“Changing history,” for example, by removing a transaction from an old block, cannot be done by simply removing that transaction. That would entirely change the hash of the block that included the transaction, invalidating that block. That would in turn invalidate the subsequent block as well, as it doesn’t include the valid hash from the previous block, and as such it would invalidate all blocks that came after it.

Instead, the only way to change Bitcoin’s history is to completely re-mine it. An old transaction can only be “removed” from a block by mining that same block again, without the transaction. And then you’d need to mine the next block, and the block after that … all the way until you’ve mined the longest chain. (Technically, the chain with the most accumulated proof of work.)

This will become very expensive very quickly. Even without any competing miners, proof of work requires that re-mining a day of Bitcoin’s history should cost hundreds of thousands of dollars’ worth of energy. With competing miners to catch up on, you need at least a majority of hash power.

Especially with competing miners, rewriting even a couple weeks of Bitcoin history is practically unaffordable for anyone … never mind re-writing a couple of years.

And to top if off, re-writing this much history would be very obvious too. Many Bitcoin users would notice and would possibly take precautions to make it impossible.

OpenTimestamps

OpenTimestamps combines the magic of Merkle Trees with the immutability of Bitcoin’s blockchain.

To showcase it, Todd took 750,000,000 hashes of files from the Internet Archive last week to combine them all into one Merkle Tree. The “root” of that tree, then, was placed into a Bitcoin transaction. He sent that transaction over the Bitcoin network to have it included in the Bitcoin blockchain. This is now a couple of weeks ago and virtually impossible to ever revert.

As a result, almost the entire Internet Archive is now hashed into Bitcoin’s blockchain. Anyone can take any document from the Internet Archive and verify that it existed in its current form four weeks ago. If the hash checks out, the document has not been altered since, nor could it have been created later.

Finally, to make this timestamp actually useful, the OpenTimestamps team — specifically Riccardo Casatta, Luca Vaccaro and Igor Barinov — created an accessible search interface and an in-browser timestamp verifier. With it, anyone can easily browse through the Internet Archive’s database and immediately see whether the records check out with the corresponding hash, as embedded in Bitcoin’s blockchain.

For the first time in history, historical archived data cannot be altered without being noticed.