Articles

Mistake #3: Not Adequately Addressing Data Quality

By: Bryn Davies

There is often a lot of hype and a lot of expectation generated in the process of selling the “new system”, both from external and internal parties, as the motivations and business cases run their merry courses. Consequently, prior to every new system implementation, we hear common expectations expressed such as “the data quality will be better” and “we will have a 360deg view of our customer”. Well this does not just happen and the new technology never magically “sorts the data out”: left to its own devices the data in the target will be as good, as bad and as fragmented as it is in the source! Complicating this situation is the fact that the System Integrator (SI) selected for the implementation will, even though they may take on the actual data migration, explicitly exclude the resolution of data quality problems from their project charter. They will leave it up to you, the client to sort out (as if you don’t already have enough to do!)

What we have seen works best is to outsource key aspects of the data migration and cleansing to professionals, and to contract directly with them rather than through the SI, so that they are working directly with you the client to address what are generally very complex issues. This also means profiling the data early and often, and putting into place systems, technology and processes that regularly validate data integrity and identify data quality problems that will cause the new system to fall short of the expected objectives, and those that will likely cause the data load into the target to simply fail. This requires careful articulation, management and alignment of business and data rules (to be covered in the fifth article in this series) across all activity, including not least extraction, exclusion, validation, cleansing, mapping and target front-end validation rules. These also need to be regularly made visible to the business via the Data Migration Working Group (see previous article: Mistake #2 “Not Involving Business Early Enough”) and then decisively dealt with through each iteration as the target evolves.

Another problem that regularly crops up is that most organisations expect to deal with data non-quality without specialised data quality tools. Whilst manual data quality resolution is usually always a part of a data migration, the extent thereof can be minimised by following a programmatic cleansing approach where possible. Especially with high data volumes or complex data problems (or typically a combination of both!) it will be impossible to adequately, predictably and consistently deal with data quality issues using home-grown SQL/Excel/Access type solutions. And often the objective to build a “single view” for the new system will require sophisticated match/merge algorithms that are generally only found in specialised data quality tools.

The bottom line is that data migration is not simply a “source to target mapping” exercise, and is not just about Extract Transform and Load (ETL), but about Extract Cleanse Transform and Load (ECTL). Ultimately a Data Migration sub-project needs to take a holistic approach that includes consideration of the high expectations of the organisation with regard to “better data” in the new application. Finally, don’t forget to also prevent the data quality problems from happening all over again in the new system, and therefore ensure that preventative measures that match the corrective ones taken during the cleansing, effectively protect the new database(s) from re-contamination.