Blockchains vs centralized databases

Four key differences between blockchains and regular databases

If you’ve been reading my previous posts, you will know by now that blockchains are simply a new type of database. That is, a database which can be directly shared, in a write sense, by a group of non-trusting parties, without requiring a central administrator. This contrasts with traditional (SQL or NoSQL) databases that are controlled by a single entity, even if some kind of distributed architecture is used within its walls.

I recently gave a talk about blockchains from the perspective of information security, in which I concluded that blockchains are more secure than regular databases in some ways, and less secure in others. Considering the leading role that centralized databases play in today’s technology stack, this got me thinking more broadly about the trade-offs between these two technologies. Indeed, whenever someone asks me if MultiChain can be used for a particular purpose, my first response is always: “Could you do that with a regular database?” In more cases than you might think, the answer is yes, for the following simple reason:

If trust and robustness aren’t an issue, there’s nothing a blockchain can do that a regular database cannot.

This is a key point on which there is so much misunderstanding. In terms of the types of data that can be stored, and the transactions that can be performed on that data, blockchains don’t do anything new. And just to be clear, this observation extends to “smart contracts” as well, despite their sexy name and image. A smart contract is nothing more than a piece of computer code which runs on every node in a blockchain – a decades-old technology called stored procedures does the same for centralized databases. (You also cannot use a blockchain if this code needs to initiate interactions with the outside world.)

The truth about blockchains is that, while they have some advantages, they also have their downsides. In other words, like most technology decisions, the choice between a blockchain and a regular database comes down to a series of trade-offs. If you’re blinded by the hype and deafened by the noise, you’re unlikely to make that choice objectively. So I hope the following guide might help.

Disintermediation: advantage blockchains

The core value of a blockchain is enabling a database to be directly shared across boundaries of trust, without requiring a central administrator. This is possible because blockchain transactions contain their own proof of validity and their own proof of authorization, instead of requiring some centralized application logic to enforce those constraints. Transactions can therefore be verified and processed independently by multiple “nodes”, with the blockchain acting as a consensus mechanism to ensure those nodes stay in sync.

Why is there value in this disintermediation? Because even though a database is just bits and bytes, it is also a tangible thing. The contents of a database are stored in the memory and disk of a particular computer system, and anybody with sufficient access to that system can destroy or corrupt the data within. As a result, the moment you entrust your data to a regular database, you also become dependent on the human organization in which that database resides.

Now, the world is filled with organizations which have earned this trust – governments and banks (mostly), universities, trade associations, and even private companies like Google and Facebook. In most cases, especially in the developed world, these work extremely well. I believe my vote has always been counted, no bank has ever stolen my money, and I’m yet to find a way to pay for better grades. So what’s the problem? If an organization controls an important database, it also needs a bunch of people and processes in place to prevent that database being tampered with. People need hiring, processes need to be designed, and all this takes a great deal of time and money.

So blockchains offer a way to replace these organizations with a distributed database, locked down by clever cryptography. Like so much that has come before, they leverage the ever-increasing capacity of computer systems to provide a new way of replacing humans with code. And once it’s been written and debugged, code tends to be an awful lot cheaper.

Confidentiality: advantage centralized databases

As I mentioned, every node in a blockchain independently verifies and processes every transaction. A node can do this because it has full visibility into: (a) the database’s current state, (b) the modification requested by a transaction, and (c) a digital signature which proves the transaction’s origin. This is undoubtedly a clever new way to architect a database, and it really works. So where’s the catch? For many applications, especially financial, the full transparency enjoyed by every node is an absolute deal-killer.

How do systems built on regular databases avoid this problem? Just like blockchains, they restrict the transactions that particular users can perform, but these restrictions are imposed in one central location. As a result, the full database contents need only be visible at that location, rather than in multiple nodes. Requests to read data also go through this central authority, which can accept or reject those requests as it sees fit. In other words, if a regular database is read-controlled and write-controlled, a blockchain can be write-controlled only.

To be fair, many strategies are available for mitigating this problem. These range from simple ideas like transacting under multiple blockchain addresses, to advanced cryptographic techniques such as confidential transactions and zero-knowledge proofs (now being developed). Nonetheless, the more information you want to hide on a blockchain, the heavier a computational burden you pay to generate and verify transactions. And no matter how these techniques develop, they will never beat the simple and straightforward method of hiding data completely.

Robustness: advantage blockchains

A second benefit of blockchain-powered databases is extreme fault tolerance, which stems from their built-in redundancy. Every node processes every transaction, so no individual node is crucial to the database as a whole. Similarly, nodes connect to each other in a dense peer-to-peer fashion, so many communication links can fail before things grind to a halt. The blockchain ensures that nodes which went down can always catch up on transactions they missed.

So while it’s true that regular databases offer many techniques for replication, blockchains take this to a whole new level. For a start, no configuration is required – simply connect some blockchain nodes together, and they automatically keep themselves in sync. In addition, nodes can be freely added or removed from a network, without any preparation or consequences. Lastly, external users can send their transactions to any node, or to multiple nodes simultaneously, and these transactions propagate automatically and seamlessly to everyone else.

This robustness transforms the economics of database availability. With regular databases, high availability is achieved through a combination of expensive infrastructure and disaster recovery. A primary database runs on high-end hardware which is monitored closely for problems, with transactions replicated to a backup system in a different physical location. If the primary database fails (e.g. due to a power cut or catastrophic hardware failure), activity is automatically moved over to the backup, which becomes the new primary. Once the failed system is fixed, it’s lined up to act as the new backup if and when necessary. While all this is doable, it’s expensive and notoriously difficult to get right.

Instead, what if we had 10 blockchain nodes running in different parts of the world, all on commodity hardware? These nodes would be densely connected to each other, sharing transactions on a peer-to-peer basis and using a blockchain to ensure consensus. End users generating the transactions connect to (say) 5 of these nodes, so it doesn’t matter if a few communication links go down. And if one or two nodes fail completely on any given day, nobody feels a thing, because there are still more than enough copies to go round. As it happens, this combination of low cost systems and high redundancy is exactly how Google built its search engine so cheaply. Blockchains can do the same thing for databases.

Performance: advantage centralized databases

Blockchains will always be slower than centralized databases. It’s not just that today’s blockchains are slow because the technology is new and unoptimized, but it’s a result of the nature of blockchains themselves. You see, when processing transactions, a blockchain has to do all the same things as a regular database, but it carries three additional burdens:

Signature verification. Every blockchain transaction must be digitally signed using a public-private cryptography scheme such as ECDSA. This is necessary because transactions propagate between nodes in a peer-to-peer fashion, so their source cannot otherwise be proven. The generation and verification of these signatures is computationally complex, and constitutes the primary bottleneck in products like ours. By contrast, in centralized databases, once a connection has been established, there is no need to individually verify every request that comes over it.

Consensus mechanisms. In a distributed database such as a blockchain, effort must be expended in ensuring that nodes in the network reach consensus. Depending on the consensus mechanism used, this might involve significant back-and-forth communication and/or dealing with forks and their consequent rollbacks. While it’s true that centralized databases must also contend with conflicting and aborted transactions, these are far less likely where transactions are queued and processed in a single location.

Redundancy. This isn’t about the performance of an individual node, but the total amount of computation that a blockchain requires. Whereas centralized databases process transactions once (or twice), in a blockchain they must be processed independently by every node in the network. So lots more work is being done for the same end result.

The bottom line

Naturally there are other ways in which blockchains and regular databases can be compared. We could talk about codebase maturity, developer attractiveness, ecosystem breadth and more. But none of these issues are inherent to the technology itself. So when it comes to a long-term decision on using a blockchain, the question to ask is this: What’s more important for my use case? Disintermediation and robustness? Or confidentiality and performance?

When examined in this simple light, many of the use cases currently under discussion do not make sense. The biggest problem tends to be confidentiality. The participants in a fiercely competitive marketplace will naturally prefer the privacy of a centralized database, rather than reveal their activities to each other. This is especially true if a trusted central party already exists and can provide the neutral territory in which that database can reside. Even though there may be some cost associated with this central provider, this is more than justified by the value of the privacy retained. The only motivation for a shift to blockchains would be aggressive new regulation.

Nonetheless blockchains do have strong use cases, where disintermediation and robustness are more important than confidentiality and performance. I’ll write more about these in a subsequent post, but the most promising areas we’ve seen so far are: (a) inter-company audit trails, (b) provenance tracking, and (c) lightweight financial systems. In all three cases, we’ve found people building on MultiChain with a clear view to deployment, rather than just curiosity and experimentation. So if you’re looking for ways in which blockchains can add genuine value to your business, they might be a good place to start.