There are many ways you can commit commercial suicide, but there is possibly no slower and more agonising death than that produced by attempting that great architectural objective, the single authoritative database to which all applications talk.

The theory is good, if we have a single database then we have all our business information in one place, accessible to all, easy to report against, reduced maintenance costs, consistency across all applications, and a host of other good objectives.

However all these noble ideals hide a more fundamental problem, that the single database does not solve any of them, and makes most of them into far bigger problems.

All Information In One Place - The Single Authority

On the face of it this sounds like a great objective - after all developers try to live by the maxim of Don't Repeat Yourself - and data in many place is a clear violation of that principle.

Data that appears across many applications and across many storage mechanisms leads to all sorts of massive problems; inconsistency, duplication, replication, duplicated business logic and code, essentially all boiling down to - you end up with spaghetti data. Spaghetti data is much like spaghetti code - it sprawls, gets tangled up, and becomes hard to pull apart without covering yourself in pasta sauce. This is obviously a "bad thing".

So what is wrong?

Well the first and most obvious thing that is wrong is that all applications have different requirements, and different "world views". Although there may in theory be some concept of a "Customer" for the organisation as a whole, even this most basic of data items varies widely between individual departments and even individual applications within a single department.

You could approach this problem, as many database centric people would do, and say "that is the problem right there, we need to standardise all these applications to use the One True Customer". But that is missing the really important word in the definition of the problem ... "requirements" ... this is not accidental that the Customer is different to different parts of the business, it really *is* different.

Your database guys could say "well your Customer in your application may be different as you have different requirements, but you will just have to fit it into our One True Customer", but this is then like trying to put snakes into a plastic bag - they really don't want to be in the bag, they don't fit too well in the bag, and sooner or later while putting one in some others are going to escape or bite your hand. And when your other department starts putting his snakes in the bag too you will be fighting for who gets to not be bitten.

And worse still - now you are trying to map from your requirements based Customer to the One True Customer, and spend an inordinate amount of time maintaining this translation layer in your application. When that One True Customer changes, as he undoubtedly will as new applications require he is expanded to deal with more data they require, every previous application needs to be revisited, large parts of it need ot be re-written, and the whole application needs to be regression tested again. And you have to do this for every single application talking to your single authority database.

You could just skip this stuff, and rely on your applications ignoring this new data, and rely on the database not caring if they correctly updated new data - but this really will come back to bite you - when that bag of snakes starts getting really large and really full - you really don't want to be the one trying to get new snakes into it.

The Truth About the One True Authority

There isn't one.

There - I've said it - I have upset all those database guys, probably upset a large number of SOA guys (I'll cover Commercial Suicide - Integration at Service Level in a later post), and have totally disagreed with noble business objectives.

The truth is, data must have context - without context, data is worthless, absolutely and totally worthless. Data stored in a database has no context, and therefore has no value. Context is provided by the applications that read and write that data, and therefore they are the only thing that matters, and their requirements are the only thing that matters. That means, they need data that is specific to their application, structured in a way that makes that application meet the business objectives, and in a way that makes that application meet non-functional requirements like resilience, reliability and consistency.

So Why Does Anyone Want the One True Authority Database?

Well, in legacy terms it is easy to understand why database admins and database developers want it - it is their lifeblood, their whole raison d'etre. More importantly, it is the culture in which they were brought up - the data is the important thing, the data is the centre of the universe, the data must be consistent, uniform and pure.

But leaving database developers aside, more importantly why would a corporation want the One True Authority Database (OTADB)? After all, the title of this post says this is "Commercial Suicide", so why hasn't this got through to management?

Well - the promise of OTADB is that it will reduce errors in duplication, reduce waste, reduce duplicated effort and reduce maintenance costs - all highly desirable business objectives. And indeed, from those that advocate this approach (those database admins and developers again), the OTADB sounds mighty attractive. On the face of it, it achieves all of these objectives.

Where it falls down is that this holy grail of software development is always just out of reach, they never quite manage to achieve it. Each application that is developed starts to make the OTADB worse, people start to hack things into it to get it to meet business requirements, not because developers want to hack things together, but precisely becasue the restriction of the OTADB *forces* them to do it that way if they are to deliver any kind of functionality at all.

They blame these hacks for ruining their vision of the One True Authority Database, the database admins tell them that they have to fight the application teams to stop them messing up their nice database, but that the OTADB no longer meets the noble objectives as those pesky development teams have messed it up for everyone.

Wait a Minute - What is the BUSINESS Objective Behind the One True Authority Database?

If anyone was to step back from those noble objectives and ask a far more fundamental question, the solution might actually be a lot more obvious than it may seem. While they are all noble objectives, largely actually made worse specifically by the OTADB approach, they are not the real business objective.

Underlying all the other requirements, the ultimate business requirement that drives people (in particular database admins) to want a single database is so that they can see what their company looks like, in other words - so they can produce Management Information - reports to you and me.

This is the single and most fundamental requirement for a business - to have a clear, consistent, accurate and up to date picture of what their company looks. This is what management needs, it is what allows them to make decisions, allows them to identify problems and allows them to spot opportunities.

So, we are going to all this effort, and believe me it is extensive and significant effort, all to support some reporting tools at the end of the day. Reporting tools have problems with data in different formats, with data that is inconsistent, with data that is disparate and distributed. So at some point in the past, the "accepted truth" became "we need one true authority database to be able to produce good reports"

Reporting is a Context - and data only has purpose and relevance in context

If the Elephant in the Room is Actually Reporting, How Do We Solve The Elephant Problem?

This is almost so easy to deal with, it is silly. Perhaps it is because it is so obviously simple that is has been overlooked by many and rejected by others. Especially as it violates another one of those noble objectives ... to provide quality reporting information, we duplicate more of our data.

Yep - we duplicate it - after all reporting data is read only, so it doesn't matter if it is just a copy of other data. Reporting requirements are also very different to transactional requirements too, so we get the added benefit of being able to optimise that duplicated data for the reporting functions.

Data in relational databases is actually very poor for query and reporting purposes, and there is a constant compromise to make it fast for all those applications to write to, that makes it poor to report on - and vice-versa.

How this data gets into the reporting database isn't my direct concern in this blog post, suffice to say the "easy" way is to publish messages with data changes, and have a reporting application pick those up and persist them. My point here - is that splitting the reporting functions from the day-to-day business functions pays massive dividends.

Now We Have a New Problem

That still leaves us with one problem - what happens when disparate applications really do need to know about data in other applications? What happens when my call centre operatives are asked to update the address for one of the Customers. Now as each application has it's own view of the world, and it's own data stores, my accounting applciation does not have access to that change.

Well, the solution to the "how does data get in the reporting databases" question is exactly the same one here - you published messages from your application when you have changes that the rest of the corporation may be interested in. Fire off a message saying "CustomerAddressUpdated" and any other applciation that is concerned can now listen for that message and deal with it as it sees fit.

As It Sees Fit

And this is the real business objective we were trying to achieve in the beginning ... avoiding Corporate Suicide.

When applications are each responsible for their own data, their own actions, and are only responsible for letting the "enterprise" know they have made some changes that other things may need to know about - then you have your solution.

In good development terms, we have proper separation of concerns ... applications are responsible for their data, and their data only. They decide if they care about data from other applications - they are not forced to use it, nor to work around it.

Comments

I've been talking about it recently with other people, and I could see how disturbing it is to talk about data duplication in enterprise context.

The project I'm working on currenty have several applications around a really large legacy database.

With this kind of omniscient database you usually try to create an all purpose business object layer. And the objects become overloaded with conflicting concerns:

should this collection contain delete items ? -> yes because we need it there, but then other parts of the system have to filter it...

You also have objects that load to much information when retrieved from database... the you implement lazy loading that is ineffective when you really need the information, the you implement complex fetching strategies...

It becomes a real nightmare.

I'm moving toward several small applications connected by a service bus. It seem far more effective. I'll tell more about this on my blog as I proceed.

As this was a kind of a big article I tried Ket Becks recommended solution - the important stuff is always stated in the second to last paragraph. Worked like a charm :)

KG2V
wrote
re: Commercial Suicide - Integration at the Database Level

on 05-14-2009 8:10 AM

I think there ARE places for SOA databases - but we should think of them as "classes to be aggregated" - for instance, one we have been asking to do here at my job for a few years - Employee - we don't want a single authority for everything, BUT it would be good if we just had "Last Name, First Name, Employee ID, and "active (boolean)"

We have a HUGE problem - let's say employee X is terminated - we have 5-10 apps just in out division to go flag the guy as inactive - and when a person is hired, we have to add the person to the same 5-10 application (these are just the applications that EVERY employee in the division has a default access to)

Putting this "list of employees" in ONE place would be great

Another good one - locations. We have have multiple "Master lists of locations" - right now, I think there are 3 of them running around, each with 2-3 maintainers, who can change/add locations, and delete unused locations. We had to do this, because do you have any idea how many different ways there are to missspell Albuquerque (as an example) ? Having to maintain the list in 3 places is a problem, that costs us significant hours every year. A simple one table database of "State, City, Location" would save us a LOT of work

Well master list of locations is reference data, essentially read only, so that can be shared (it is probably part of reporting)

The employee example is bad as an employee to one application is almost certainly very different to another employee.

And as I mention, integration at service level has exactly the same problems, as you allude to

KG2V
wrote
re: Commercial Suicide - Integration at the Database Level

on 05-14-2009 2:22 PM

Well, that's why I was saying we (or "they") should use composition - like with classes

I know that Last Name, First Name, EmpID, and Active are used throughout - after that, you just let each application/database do a 1:1 and add it's own information. I totally agree that you don't try and force "the one big table" - just dumb

In fact, what we really really wanted to do was just grab the list from Active Directory. All the application are used for one "class" of applications inside the company (can't go into too much detail, obviously) , but all 5-10 applications already have an employee table that is at least 80% in common (way beyond what I listed), and in fact, we have a utility that will log in, and add/flag inactive employees (we don't delete them). All the project managers agree that these applications should share a common employee table, but as is trypical, it's a bit of a "technical debt" problem. The table HAS been built, and in fact, is already in use for 2-3 of the applications, but management here is like "well, the other apps are working, don't mess with them" - when ground up re-writes happen, they are used

BTW this isn't company wide - it's one small subset (say 200 people) that all have a releated set of job responsibilities - 2-3 highly releated departments, on the rough order of "Blue Car Employees" and "Red Car Employees" - but they both share a common "Tires" database

I am a "SOA guy" and I've been pushing for a while now that Domain Driven Design is a much better option. Definitions of "Customer", "Order", "Product" etc change between each business unit (or Domain).

I find it very weird when I discover (business) programmers who don't know much about databases. (Yeah, game or firmware programmers are probably fine without DB knowledge.)

A DBMS knows how to handle data. It's sole purpose is to handle data. What you are saying is that manually splitting that data into many pieces, and then writing buggy code to try to coordinate changes to those pieces, is better than a dedicated DBMS.

Did you hear of views? The sole purpose of views is to present different... well, views of a table to diferent applications. That way, the app that only cares about first and last names and phone number only sees that much - but the data is still kept consistent by - wait for it - the DBMS. (Weird how that works, huh?)

Firstly you make a foolish assumption that I don't know much about databases. I am the first to say that DB level development should be done by specialist DB developers. I am also fully aware that most DB developers suck very badly and most application programmers can run rings around them.

"What you are saying is that manually splitting that data into many pieces, and then writing buggy code to try to coordinate changes to those pieces, is better than a dedicated DBMS"

What I am saying is that splitting data into application specific data, is better for each and every application, and that the one true authority database will never be achieved unless you cripple all of those applications.

And what DBMS are good at is nothing to do with what you are advocating - they are good at storing vast quantities of data, querying across that data, and heavy lifting - what they really suck at is maintaining data integrity, consistency and doing business logic.

Use the right tool for the right job - databases are for storing and retrieving data - we are talking about application logic here.