Spot the problem? max2 of course might be null, which means our code will throw an exception in the innocent-looking second for loop. To get the code right, we will have to protect that section with an if-then-else section that checks for null first.

Of course, such a protection is often the right thing to do. But in this example, a “max” should hardly ever be null, so using an “int” as a data type like for max1 (which can’t be null) is much better than using an “Integer” like for max2 (which may be null).

It’s the same thing for properties in InfoGrid models. Some properties simply should never be null. For example, consider a time stamp indicating when a MeshObject was created. Given that the MeshObject was created, the time stamp must exist, and therefore a null value makes no sense. In which case the property would be specified as “mandatory”. On the contrary, a time stamp when a MeshObject is likely to become obsolete is very likely optional: we might not know that time (yet), or it might never become obsolete, so null values are fine.

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

For many years, the canonical example why we need database transactions has been banking. If you move $100, you don’t really want the money be subtracted from the first account, but never be added to the second because of some problem in between. You want both the subtraction and the addition to happen, or neither.

Sounds good so far. Just apparently that is not how banks work in the real world, and they certainly use enough database systems that have ACID transactions. The Economist (July 24, 2010, “Computer says no”) quotes a former executive of the Royal Bank of Scotland saying: “The reality was you could never be certain that anything was correct.” Continuing: “Reported numbers fot the bank’s exposure were regularly billions of dollars adrift of reality.” The article offers an explanation: “banks tend to operate lots of different databases producing conflicting numbers.” HSBC is quoted: 55 separate systems for core banking, 24 for credit cards, and 41 for internet banking.

According to traditional transaction wisdom, if a customer makes an internet transaction to pay off his credit card, it should be a single transaction: start transaction, take money from checking account, put it into credit card account, commit. But transactions generally cannot span systems. Because the system responsible for internet banking is separate in this real-world example from core banking and from credit card systems, no such single ACID transaction is possible. Given the numbers above, it looks very much like those money transfers that actually can follow the canonical ACID transaction pattern constitute only a very small fraction of all transactions (like transferring from checking to savings.)

If I look at my own banking, the vast majority of my banking activity isn’t even within the same bank, but with other banks: bills to pay usually have to be paid at other banks. No cross-bank ACID database transactions that I’ve ever heard of.

So banking software necessarily has to have functionality that prevents that money is deducted but never arrives, all without depending on database transactions. If we have to have this functionality anyway, why then are transactions “indispensable” as some people still want to make us believe?

This pattern can be generalized: the more distributed and decentralized a system is, the less likely it is that we can use transactions that span the entire system. That is certainly true for the banking system, apparently also true for systems inside banks, and in many other places. ACID transactions were invented for the mainframe, the world’s most centralized computing construct. But computing is not “one mainframe” any more I’m afraid as it was in the sixties.

Instead of trotting out transactions as the answer, what we need for NoSQL databases is the ability to get the same benefits in a distributed system (“nothing is lost”) without relying on transactions. That’s where all of our efforts should be.

In the InfoGrid graph database, we have a weaker form of transactions for individual MeshBases, but then synchronize with the rest of the world by passing XPRISO messages. I’d be surprised if the eventual “transactional” architecture for large-scale distributed and decentralized systems looked very different.