May 14, 2010

SAP recovers a secret for keeping data safer than the standard relational database

Sometime in 1995 I needed a high performance database to handle transactions. Don't we all have these moments?

Like everyone, I looked at the market, which was then Oracle, Informix and so forth. But unlike most, I had one advantage. I had seen the dirty side of that business because I'd been called in several times to clean up clagged & scrunted databases for clients who really truly needed their data back up. From that repetitive work came an sad realisation that no database is safe, and the only safe design is one that records in good time a clean, unchangeable and easily repairable record of what was happening, for the inevitable rebuild. At any one time.

Recovery is the key, indeed, it is the primary principle of database design. So, for my transaction database, I knew I'd have to do that with Oracle, or the rest, and that's where the lightbulb came on. The work required to wrap a reliability layer around a commercial database is approximately as much as the work required to write a small, purpose-limited database. I gambled on that inspiration, and it proved profitable. In one month, I wrote a transaction engine that did the work for 10 years, never losing a single transaction (it came close once though!).

My design process also led me to ponder the truism that all fast stuff happens in memory, and also, that all reliance stuff happens at the point of logging the transaction request. Between these two points is the answer, which SAP seems to have stumbled on:

... As memory chips get cheaper, more and more of them are being packed into servers. This means that firms, instead of having to store their data on separate disks, can put most of them into their servers’ short-term memory, where they can be accessed and manipulated faster and more easily. The software SAP is releasing next week, a new version of Business ByDesign, its suite of online services for small companies, aims to capitalise on this trend, dubbed “in-memory”. SAP also plans to rewrite other programs along similar lines. ...

The answer is something almost akin to a taboo: the database is only in memory, and the writes to slow storage are only transaction logging, not database actions. Which leads to the conclusion that when it crashes, all startups are recoveries, from the disk-based transaction log. If this were an aphorism, it would be like this: There is only one startup, and it is a recovery.

In-memory technology is already widespread in systems that simply analyse data, but using it to help process transactions is a bigger step. SAP’s software dispenses with the separate “relational databases” where the data behind such transactions are typically stored, and instead retains the data within the server’s memory. This, says Vishal Sikka, the firm’s chief technologist, not only speeds up existing programs—it also makes them cheaper to run and easier to upgrade, and makes possible real-time monitoring of a firm’s performance.

In its space, an in-memory database will whip the standard SQL-based database in read-times, which is the majority usage, and it doesn't have to be a slouch in write times either, because a careful design can deliver writes-per-transaction on par with the classical designs. Not only in performance but in ROI, because the design concept forces it into a highly-reliable, highly-maintainable posture which reduces on-going costs.

But this inversion of classical design is seen as scary by those who are committed to the old ways. Why such a taboo? Partly because, in contrast to my claim that recovery is the primary principle of database design, it has always been seen as an admission of failure, as very slow, as fraught with danger, in essence, something to be discriminated against. And, it is this discrimination that I've seen time and time again: nobody bothers to prove their recovery, because "it never happens to them." Recovery is insurance for databases, and is not necessary except to give your bosses a good feeling.

But that's perception. Reality is different. Recovery can be very fast for all the normal reasons, the processing time for recovering each individual record is about the same as reading in the record off-disk anyway. And, if you really need your data, you really need your recovery. The failure and fall-back to recovery needs to be seen in balance: you have to prove your recovery, so you may as well make it the essence not the fallback.

That said, there are of course limitations to what SAP calls the in-memory approach. This works when you don't mind the occasional recovery, in that always-on performance isn't really possible. (Which is just another way of re-stating the principle that data never fails, because the transaction integrity takes priority over other desires like speed). Also, complexity and flexibility. It is relatively easy to create a simple database, and it is relatively easy to store a million records in the memory available to standard machines. But this only works if you can architecturally carve out that particular area out of your business and get it to stand alone. If you are more used to the monolithic silos with huge databases, datamining, data-ownership fights and so forth, this will be as irrelevant to you as a McDonalds on the Moon.

Some observers are not convinced. They have not forgotten that many of SAP’s new products in the past decade have not been big successes, not least Business ByDesign. “There is healthy scepticism as to whether all this will work,” says Brent Thill of UBS, an investment bank. Existing customers may prefer not to risk disrupting their customised computer systems by adopting the new software.

And don't forget that 3 entire generations of programmers are going to be bemused, at sea, when they ask for the database schema and are told there isn't one. For most of them, there is no difference between SQL and database.

On a closing note, my hat's off to the Economist for picking up this issue, and recognising the rather deeper questions being asked here. It is rare for anyone in the media to question the dogma of computing architecture, let alone a tea-room full of economists. Another gem:

These efforts suggest that in-memory will proliferate, regardless of how SAP will fare. That could change the way many firms do business. Why, for example, keep a general ledger, if financial reports can be compiled on the fly?

Swoon! If they keep this up, they'll be announcing the invention of triple entry bookkeeping in a decade, as that is really what they're getting at. I agree, there are definitely many other innovations out there, waiting to be mined. But that depends on somewhat adroit decision-making, which is not necessarily in evidence. Unfortunately, this in-memory concept is too new idea to many, so SAP will need to plough the ground for a while.

Larry Ellison, the boss of Oracle, which makes most of its money from such software, has ridiculed SAP’s idea of an in-memory database, calling it “wacko” and asking for “the name of their pharmacist”.

But, after they've done that, after this idea is widely adopted, we'll all be looking back at the buggy whip crowd, and voting who gets the "Ken Olsen of the 21st Century Award."

In the mid-90s, there were a lot of predictions that the telco industry was going to take over payments business. The issue was that telco had done a lot of work for high-volume call-record processing ... helping a number of "in-memory" DBMS operations. These defaulted to the data being in memory with periodic checkpoints as opposed data being on disk with in-memory caches (in-memory DBMS claims of ten times performance when compared to traditional RDBMS even when all data was also "cached" in memory).

The prospect was that the looming micropayments volumes could only be addressed by the efficiencies of the telco call-record processing. Then the telcos would leverage that to move up stream and take-over the remaining parts of the payment industry.

Micropayments has been a long time taking off. Also the foreys that some of the telcos had into payments floundered ... frequently because they had a different business model for dealing with fraud.

The intervening years has seen some of those in the payment industry installing "in-memory", ten-times DBMS ... starting to position for much higher payment transaction volumes.

So what's the difference between a hard disk and a memory module... Is it just
- Write to Hard disk = Slower. Can't lose written data on uncontrolled power loss or process termination. Wide ability to share data with different processes/ servers
- Write to memory = Faster. Can lose written data on uncontrolled power loss or process termination. Limited ability to share data with different processes/ servers

So should the questions be
- How fast do you need your read/ writes to be
- How often do you get uncontrolled power loss
- How often do you get uncontrolled process termination
- How widely do you need to share your data with other processes/ servers