Platypus Header

Platypus Innovation Blog

17 February 2014

There are no AAA databases

It's a mistake to believe absolutely in uncertain things. That's one of the lessons of the financial crisis. Uncertain loans were dressed up as triple-A reliable assets, but it turned out to be wishful thinking.

Dice bag (cc) KaptainKobold@Flickr

I see similar practices in databases and business intelligence.

We all know that databases contain errors. The errors come from many sources: data is mis-entered, or it was accurate but people move on, or the database schema was changed, but not all the data was correctly updated, or two databases are merged, but the join is dodgy: same name doesn't always mean same person. I've yet to encounter a database that didn't contain errors.

Everyone knows this. And yet people build business processes that assume the database is 100% correct. Even best practice in data analysis is only to try and limit errors entering the system -- but once they're in, the mistakes can run free.

In business intelligence, we see claims that everything can be measured. Claims that are plausible & we'd like to believe. All too often it's over-confidence and over-selling.

Accepting uncertainty does not mean giving up on measurement. It just means accepting errors are part of measurement. Once we accept that, we can deal with it. We should estimate the things we cannot directly & accurately measure. But remember that is an estimate. And know how good that estimate is, and how much that affects your decisions. There are cases where the-right-order-of-magnitude is fine, and others where even 99% accuracy isn't good enough.

It's especially important to know the blind-spots in your KPIs -- the things you can't properly measure. And there are always blind spots.

Anyone who promotes KPIs and ROIs without talking about errors is selling something unreliable. It's easy enough to hide uncertainty & inaccuracy - but you pay the cost down the line with interest. Remember the AAA sub-prime loans -- not all that glitters is gold. We ignore uncertainty at your peril.

The salesmen of over-confidence cannot have it both ways: if data is important, you'd better be honest about its quality.

About

The platypus caused consternation, shattered existing categories. It's existence was undeniable, but how should taxonomic theory be adapted to accommodate this uncomfortable fact? This blog is also hard to classify. It loosely follows the professional interests and activities of Daniel Winterstein. Topics are likely to range from business affairs to new media via data science and abstract mathematics.