The problem with data is that, even at the fattest pipe speeds, the fastest transit method is still overnighting HDDs via FedEx. We used to get DNA sequences from Tufts, Johns Hopkins, etc. via this method when I was at CHOP. Transfer time via Internet2 connections: ~1 month. Via FedEx: 2 days.

While blockchains are interesting, I think your first problem is just
finding someplace to store an exabyte of data. At best it is a
component of the solution.

I'm also not sure it is aligned to the problem. My sense is that
blockchains are used when you have a large number of sources of data
who don't trust each other. In this case we only have a few
authoritative sources of the data and we just want to be able to trace
it back to them. Simply having the administrator of one of these data
sources publish a gpg signature for the store solves the integrity
problem.

Blockchains might be useful for more distributed work, like random
people contributing random observations. However, you still need some
mechanism to sort out reputation/etc.

Bitcoin works because it is very simple, and not really tied to the
outside world. All it does is create bitcoins and let people move
them around. And moving them around only involves determining whether
the person trying to move them has the wallet key, not whether they're
supposed to be able to move them based on any outside world rules.

And as far as storage goes at least bitcoin is incredibly inefficient.
It stores a copy of the entire blockchain on any system that needs to
perform a transaction. That only works because the transaction rate
is capped at an absurdly low limit for anything resembling actual
world commerce. I suspect there are ways to better distribute it, but
I'm not sure how those scale the exabyte scale.

If anybody is more up on blockchains I'd be interested in their take on this.