CDI is a Three Letter Acronym which in the data management world stands for Customer Data Integration.

Today CDI is usually wrapped into Master Data Management (MDM) as examined in the post CDI, PIM, MDM and Beyond. As mentioned in this post, a well-known analyst, Aaron Zornes, runs a business called the MDM Institute, which was originally called the The Customer Data Integration Institute and still has this website: http://www.tcdii.com/.

Many Master Data Management (MDM) vendors today emphasizes on being multidomain, meaning their solutions can manage customer, supplier employee and other party master data as well as product, asset, location and other core business entity types.

However, some vendors still focus on customer master data and the topic of integrating customer data by excelling in the special pain points here, not at least identity resolution and sustainable merge/purge of duplicates. One example is Uniserv Smart Customer MDM.

In my recent little venture called The Disruptive Master Data Management Solution List the aim is to cover all kinds of MDM solutions: Small or big. New (start-up) or old. Multidomain MDM, Customer Data Integration (CDI), Product Information Management (PIM) or even Digital Asset Management (DAM). As a potential buyer, you can browse all these solutions and select your choice of one-stop-shopping candidates or combine best-of-breed solution candidates that matches your requirements in your industry and geography.

First thing that must happen is that vendors register their solutions on the site here.

3 thoughts on “What Happened to CDI?”

This is such a relevant topic Henrik. I’ve been scratching my head over this recently as I’ve struggled to understand the workflows built into the leading MDM tools on the market.

It seems the industry wants to go from Source to Match/Merge, instead of Source to Match/Identify and finally to Merge. I’ve had these debates with consultants from top vendors and they just don’t understand why Merge isn’t the next logical step after Match. In my opinion, Merging in the hub is purely a golden record task, it’s not an identification task. You identify based on similarities, you Merge to create a representation of those similarities. So, it seems we’ve lost sight of CDI as the decision workhorse underpinning our “mastering” strategies.

Bypassing or reducing the role of CDI creates risks to data management teams who don’t understand that artifacting and matching to a golden record reference domain means you never learn from the previous decisions you’ve made (which is what CDI would do). Thus you’ll manage as many manual decisions tomorrow as you do today. At scale and with each new data source, this can lead to tremendous cost challenges. CDI is where you should build intelligence into automating your identification decisions.

I’ve not seen many vendors focus solely on CDI, so the design of these systems typically falls on in-house IT teams. That becomes a struggle for sustainability (think Cloud shift) coupled with a shrinking landscape of match technology vendors (think MDM vendor consolidation). So the toolkit and success potential for building CDI is not what it once was, in my view.
In a perfect world (mine at least!) I’d suggest CDI is a modular service that sits between your DQ process and your MDM hub. CDI is the master of every operational view (CRM, Ordering, Support, etc.) of the customer record affiliated with an enterprise identifier. It ingests 100% of records used in operations and runs the intelligence model to group records that meet a statistically significant level of similarity under an appropriate identifier (or identifiers depending on how many levels of similarity you want to manage). Once grouped, using a combination of automated match and manual resolution, these records are now assisting in future match automation as new aliases subscribe to that identifier. Operations sees the outcome of the CDI identification service and makes decisions about whether to create or not create a new record in their functional customer reference system based on the presence of the master identifier(s) already in their system.

MDM takes these outcomes and determines the best version, “golden”, record to represent that identifier based on all contributing sources. This version is an uncontested layer in the identification of the customer, it doesn’t compete with the operational view, it interprets it for enterprise consistency. The CDI service protects the stability of this golden record by managing traffic inbound to that ID and associated Golden Record. Every record that matches with high confidence to an existing alias doesn’t trigger a review of the golden record. New aliases to that ID could trigger a review, if the business rules require. The golden record is the bottom floor into whatever hierarchy or householding models the enterprise wants to adopt. You can build multiple aggregation approaches off of the same golden record.

There’s so much wrapped up in Merge/Purge that I won’t get involved in that here other than to say many enterprises can’t afford to make the system and integration investments needed to build this level of cross enterprise automation. Instead, they’ll want to find ways to improve the quality of their functional data using business processes and stewardship rules which, again, would benefit greatly from CDI being a separate step between source and golden record.

These are thoughts from my experience over 20 years in master and reference data. I’m excited to see what others with CDI experience bring to bear in your discussion! Thanks again for raising this topic.

Interesting comments, Jeff. On the whole I agree with you, but perhaps your thoughts have an implicit assumptions: 1) that the volume of data that need manual intervention will actually be reduced tomorrow and 2) customers have realistic assumptions about what matching capabilities can do. While machine learning techniques should certainly be used to improve matching, a change in business conditions, M&A activity, new areas of fraud detection, etc may make the learning to date irrelevant. Among consultants and others I work with, I hear many stories of customers who expect that the matching engine is going to solve all their problems and there will be no need for manual intervention. I see this more in the large volume shops (big retail) where the value of any single customer is tiny compared to the whole and the loss in a mistake does not have dire consequences.

Architecturally, CDI is the place to have this learning. But depending on the use case, some customers feel that the overall value of getting some of the data in better shape is more important than getting more of it right. I think this drives vendors to support the Match -> Merge because it seems to be more in demand.