Replicated DBs as Data Sources: Are you sure?

Let us assume you have multiple OLTP databases catering to various kinds of applications and you are provided replicated databases for each of these OLTP databases. First order thinking makes you feel good about it. You have real-time data to tap into; based on your ETL system’s capabilities, you may be able to provide near real-time solutions. You are right if you are not looking for 100% accuracy at all times.

The problem with working on replications is that the data changes every instant. Few scenarios:
1. If you tapped an OLTP table for customer records to create your customer dimension, by the time you are filling in data into a fact-table with customer dimension key, the fact-table may be referring to a new customer that you didn’t have at the time you were populating your customer dimension.
2. You got the list of customers with certain statuses. By the time, fact-table is getting populated; the statuses for some of the customers have changed. Your report will not reflect the true picture at the time of processing the fact-table.

The root cause of all these scenarios is that we are not working with one particular state of source DBs at some particular point in time. OLTPs by themselves are okay to work on these DBs, but Data warehouses tap various unrelated domain areas in the OLTP systems and depend on a synchronized state.

Replications are also prone to lags and while you will assume that you processed all the data for a day or an hour, you actually may have got only a subset of data.

For mission critical data warehouses I don’t recommend to use replicated DBs are data sources (of course there are always exceptions if you know exactly what you are doing). Backup-restores or snapshots with properly defined backup/shotshot datetime that are clearly documented and agreed-upon in formal SLAs are the good solutions. It comes at the cost of faster data availability, but reduces the pains of inaccuracy and potential law suits and painful justifications to auditors.