Let’s get relational!

An old friend and ex-colleague of mine Tony Bain (we were building SaaS applications together before they were called SaaS applications), has written an interesting article over at ReadWriteWeb (Is the relational database doomed?) regarding the ever increasing popularity of so-called key/value databases and their relevance as compared to traditional relational databases.

I’m not going to explain either here (you can read Tony’s take on it for an overview of both technologies) but I thought I would add my thoughts with regards to decisions we’ve made at Xero on this very subject.

On this website it states quite clearly that we’re the world’s easiest accounting system and that’s what we want our customers to experience. What we’ve built (and are still building) though is one of the world’s most sophisticated accounting engines – one that will hopefully become the number 1 accounting engine in the cloud.

One of the reasons we’ve been able to achieve this is our choice of storing data in a relational database (in our case utilizing Microsoft SQL Server). The idea of key/value stores is essentially a hark back to the old days of hierarchical databases, a technology which ironically underpins most of the current desktop accounting systems in the world. We believe that by being able to store accounting information in a logical and structured relational form has allowed us to build a powerful platform to evolve our accounting engine over a relatively short period of time.

Let’s take a peek behind the curtain:

Xero’s accounting engine combines a multi-perspective general ledger with an event-driven transaction engine. I’ll try to explain what that means:

The transaction engine stores documents such as invoices and bank transactions and contains all of the logic necessary to process those documents based on actions by agents (users or contacts). The outcome of these actions is to produce journals in our general ledger and because each document keeps a record of its actions, they can all be replayed or rewound when something changes and new journals can be created as and when required.

The general ledger resembles a standard journal-based double-entry accounting system, however we’ve built in the capability to model different types of transactions in the same journal. To begin with we modeled both accrual transactions (what happens when you raise an accounts payable invoice) and cash transactions (what happens when you pay that invoice) in the same journal – allowing us at any time to look at the GL in both a cash or accrual perspective. As we’ve added complexity to the system (such as tracking and multi-currency – coming soon I promise) more perspectives are added to the GL. Think of it as a dice – depending on how you look at the dice you get a different number – that’s the power of our GL.

The diagram below shows how the two engines fit into our architecture:

The transaction engine and general ledger operate like two accounting engines in one because they both contain copies of the data, one from a business perspective and one from an accounting perspective. This is beneficial because typically in a small business data is fragmented which leads to poor communication, poor visibility, high costs and unnecessary delays. The power of Xero is that it is at once both a business and an accounting solution, so its important that our business and accounting layers are self-contained yet fully integrated and that we’re able to connect our business and accounting layers with the outside world.

Why use a relational database for what appears to be inherently hierarchical data? Because it is the simplicity of our data design due to the relational model that makes Xero so powerful and allows us to achieve seamless integration between our transaction and general ledger layers.

So am I advocating relational databases in all scenarios? Absolutely not! You should always go with a “horses for courses” strategy when choosing technology. But for accounting data relational data structures offer huge benefits, both in simplifying application architecture and delivering a more integrated experience. The ideas and strategies behind key/value stores are sound and there are some very smart people behind them: my point is, though, that relational databases are not dying, let alone dead and the use of relational data design can be extremely powerful.

You can learn a lot from what others are doing, especially the big players like Google, eBay, Amazon and Facebook. In fact Facebook is a great example of an organization that has applied different technologies to solve specific problems: a MySQL relational backend, memcached (essentially an in-memory key/value store that sits higher in the application stack) for caching almost everything and key/value stores using Cassandra for email searching. This is where I see massively scalable architectures moving: choosing specific technologies to solve specific problems – pragmatism over ideology.

I have more to say on this topic, especially around what you can learn from scaling key/value stores and how can this be applied to relational databases – will leave that stuff for next time.

One comment

Adam

February 19, 2009 at 9.40 am

I’m currently working on implementing a JSON DB (CouchDB) for a media company in the UK.

There are some real benefits to the KVP/document approach. The description of data in a nice, readable format such as JSON is a major benefit.
In my case a JSON document might describe a page layout, so to add a widget to the page you simply add on another ID string into an array.

The cool stuff comes with the views, however. My design splits the customisation of a page into a number of levels (user customisations, layout defaults, widget defaults, and the sub-parts of widgets defaults)… the final JSON description of that page is effectively composed through a cascading inheritance of all those documents, all composed by a view.
Under couchDB a view is a Map Reduce.

But, I haven’t gone this route so far, implementing the composition in PHP instead, as it appears it’s still too early to be using CouchDB in anger.
But the promise is there, and while it takes a different mindset to the Relational approach, I think it could prove to be much more powerful and flexible for certain domains (maybe not accounting though).