Friday, August 19, 2011

Multi-master database systems that span sites are an increasingly common requirement in business applications. Yet the way such applications work in practice is not quite what you would think from accounts of NoSQL systems like Cassandra or SQL-based systems like Oracle RAC. In this article I would like to introduce a versatile design pattern for multi-master SQL applications in which individual schemas are updated in a single location only but may have many copies elsewhere both locally as well as on other sites. This pattern is known as a system of record architecture. You can build it with off-the-shelf MySQL and master/slave replication.

Let's start by picking a representative software-as-a-service (SaaS) application: call center automation. Call center software integrates with a local PBX or VOIP to allow agents to answer and make phone calls for telemarketing campaigns in a systematic and automated way using standard procedures known as "agent scripts." Admins set up agent scripts and define lists of people to call as well as marketing campaigns. Finally and perhaps most importantly, managers receive a wide variety of detailed reports that allow them to optimize current work, examine historical performance, and make predictions about the future for planning purposes. Here is a typical application architecture.

Figure 1: Call Center Application Architecture

Bear in mind that this is a greatly simplified view. Like most business applications, call center automation systems may contain hundreds of database tables and many types of user services. There are also practical complications that go beyond the application itself. Call center automation is vital to the businesses that use it. Customers want assurance they can continue processing on another site if a SaaS vendor site goes dark. This means we have to think about maintaining applications and data on multiple sites.

The ideal solution for most SaaS vendors would be to have call center data and applications for all customers live on multiple sites at all times. Multiple live sites mean that failover is instantaneous since both applications and database servers are already up and running. Constant update means there is little or no data loss on failure. Customers could connect to the nearest site. Here is a picture of that dream that includes two sites and two customers, Acme Inc. and Pinnacle, Ltd.

Figure 2: Dream Architecture for Call Center Automation

This solution has only one problem. It is impossible to build. Readers may nod wisely and say this is because of CAP Theorem limitations, but that somewhat misses the point. Let's say we use a NoSQL DBMS like Cassandra that permits updates in multiple locations and reconciles the data using eventual consistency. However, there's a catch: as we saw above, much of the value of call center automation is in operational queries and reports. That drives you back to an RDBMS with cross-table joins, aggregation functions, referential integrity, and convenient SQL-based report writing tools. For this reason alone, Cassandra is a non-starter for call center automation.

What about a SQL DBMS? MySQL obviously has all the features you need for query-intensive solutions on smallish data sets (e.g. hundreds of millions of rows, not many billions or trillions). The problem is multi-master replication. Updating the same table from two or more places on a LAN is already quite difficult: witness the complexity of Oracle RAC or MySQL Cluster. The problem becomes intractable when you combine complex SQL transactions, referential integrity, and high-latency WAN connections. If you want full SQL semantics you cannot have updates on multiple sites. This is a serious dilemma and not just for call center automation. The same problems or worse affect a multitude of valuable business applications including market automation, credit card processing, customer relationship management (CRM), time/expense tracking, accounting, and many others.

Fortunately we are not really stuck. If we give up some requirements customers do not really want anyway, there is a perfectly good solution that will work for a wide range of problems. Data warehousing architects long ago developed the notion of a system of record. Bill Inmon's classic Building the Data Warehouse defined system of record as follows:

The definitive and singular source of operational data. If data element abc has a value of 25 in a database record but a value of 45 in the system of record, by definition the first value is incorrect and must be reconciled.

System of record applies to multi-master systems in the form of a simple rule. We just assert that every customer has master data in one and only one location and copies everywhere else. When particular customers update information they do so on their own master. Customers can have masters on different hosts or sites, but the system of record rule says that no customer has one in two places. This eliminates conflicts between masters, and multi-master replication now works without a lot of difficulty.

System of record thus meets the original requirement of having data on multiple sites, which was to handle a site failure. We can store data economically using off-the-shelf MySQL. We can update copies within and across sites using master/slave replication. We can shard customer data into independent schemas. The result looks like the following. Acme has a master in San Jose, whereas Pinnacle has its master in New York.

System of Record Architecture for Call Center Automation

Using the system of record approach simplifies other problems as well. Standard backup and restore techniques still work. If you mess up a customer copy, you re-provision from the master shard. You can implement failover across sites and also fail over locally onto slave copies, which can be complete copies containing data for all customers.

Meanwhile, most users are fine with a single site. Pinnacle is close to New York, which is why the SaaS vendor puts Pinnacle's data there and gives them the New York site DNS for login. It is also possible to run reports on the cross-site copies as well. You can even run full applications provided you forward writes to the system of record, as shown above for Acme.

The real issue in implementing system of record architectures is that existing replication and clustering tools are not quite up to the job of handling cross-site applications build on system of record. We are extending Tungsten to handle some of the obvious problems in building these types of systems using MySQL.

Locating the customer master and connecting applications to it.

Moving the customer master from one location to another. This happens more often than you would think, for example to minimize multi-master replication which can introduce problems beyond conflicts.

Detecting accidental updates to copies and preventing them from either reaching the DBMS and/or preventing them from propagating to other locations.

Proving a clean failover model that works on both cross-site as well as local copies of data.

Recovering corrupted copies of customer data from the master.

I will discuss two of the upcoming Tungsten features in follow-up articles. The first is assigning a shard master in the Tungsten Replicator using the new shard API. The shard API enables multi-master but enforces system of record constraints to avoid messing up data should you accidentally update in the wrong location. The second feature is cross-site management and connectivity using Tungsten Enterprise. This handles failovers within and between sites and automatically connects applications to the active master regardless of which site or DBMS it lives on.

The need for availability is pushing an increasing number of SaaS vendors and other application providers to operate systems across multiple sites. Applications like call center automation depend on the features of SQL and cannot be implemented using NoSQL DBMS's like Cassandra. The system of record architecture eliminates replication conflicts and enables multi-master updates to work on ordinary SQL databases between sites. If you are building complex SQL applications and thinking about going multi-site, this design pattern should be in your toolbox.

2 comments:

Anonymous
said...

Your solution is not really a multi-master solution.

Your solution does not allow for the modification of the same row of data from multiple sites. Many applications, particularly as companies become more global, have cross-geogrpahic collaboration on the same customer, vendors, etc. Most applications would need to be modified to understand where the master data resides and to perform operations accordingly.

Also, you do not detail how the data in the system of record makes its way to other tables. Given that you cant use master/slave replication on subsets of tables, I dont see how you do this.

I also disagree that problems with true multi-master become intractible. These challenges are difficult, but have been overcome by most major MMR applications (see Oracle Advanced Replication).

1.) Multi-master. There are several approaches to multi-master, of which this article describes one. The system-of-record approach makes the same kind of trade-offs as Yahoo's PNUTS DBMS (see a nice write-up on the issues by Daniel Abadi at http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html). It's a very scalable design choice, especially when you start to think about failures. Also, the code modifications are relatively simple in many cases. For many SaaS applications they reduce to changing connection strings. Changes for true multi-master SQL applications as you describe are often much greater and reach into the app logic itself. This is a problem that SQL makes harder because it allows table-spanning transactions with constraints.

2.) I may be misunderstanding your point but schemas are the basic subsetting mechanism. We replicate schemas bi-directionally. You can replicate table sets as well; Tungsten does that but somewhat awkwardly at this time.

3.) Updating shared tables is worth the other costs (app changes, failure complexity, etc.) and that's why Oracle, Microsoft, and others address them. We have work in progress on that problem as well.