RESEARCH & RESOURCES

Q&A: The Ins and Outs of Enterprise Master Data Management

Will MDM will succeed where past standardization efforts have failed?

By Stephen Swoyer

October 2, 2007

When it comes to master data, Rick Sherman, founder and team leader with business intelligence (BI) and data warehousing (DW) specialist Athena IT Solutions Inc., is something of a grizzled veteran. Sherman first honed his relational database chops in the early 1980’s; became serious about data warehousing during the last years of the Reagan presidency; and—at about the same time—started wrestling with the problems of master or reference data.

As Sherman sees it, master data as a problem is more or less a byproduct of data warehousing. Master data management (MDM)—at least as a discrete discipline—isn’t an entirely new idea. Nevertheless, Sherman says, enterprises (and, like it or not, software vendors) are as serious about master data as they’ve ever been. Thanks to a perfect storm of drivers, including regulatory compliance and the performance management wave, there’s reason to believe that MDM will succeed where past standardization efforts have failed.

Master data management isn’t exactly a new idea, is it? Many companies have set up internal practices or strategies to deal with customer data or product data issues, after all, so what’s different this time around?

I've been dealing with reference data probably from the late 80's when I first started working with data warehousing. During most of that time, the debate we had with anybody trying to do any kind of enterprise data warehousing was always “How do we get the data consistent?”

The bigger issue, from an organizational side, is that nobody owns the reference data, so how do we get the business to manage it—to agree that it needs to be managed—and how and where do we house it? Twenty years after the whole question was first raised with MDM, CDI [customer data integration], and PIM [product information management], we finally have formal solutions to this problem.

How and where have organizations traditionally been consolidating their master or reference data?

In the past, the solution was to house it in the warehouse, because the operational systems—the OLTP transactional systems—weren't there or weren't up to the task. Certainly, SAP and Oracle are tackling that now, and have some of their own initiatives, but that wasn’t the case at the time.

Whether it was the best approach depended on the circumstances. Before MDM became a buzz word, you housed it in the warehouse. The warehouse was sort of read-only. You really wouldn't update [the reference data]—updating it was a non-trivial task.

With [the advent of] MDM, one of the biggest changes has really been getting [organizations] to think about [managing reference data] differently. It isn’t just a technology problem, for example, and [master data] isn’t just something you store somewhere. You really have to think of it as an application.

What do you mean?

Some business rules, some business logic has to be put on top of [the data]. It’s still an application for this reason, even if it’s being housed in a database. That's why it's been good to see MDM being thought of as a set of different IT and business processes over the last few years, although it's been disappointing to see that as much as things have changed, some of the old problems—with the siloing of [reference data], in this case, into SOA, MDM, and PIM—remain.

Even as organizations pursue a strategy that, in a sense, is supposed to help eliminate siloing or isolation of the data, they’re producing even more silos or isolation? Is that what you mean?

Yes. It is unfortunate that a lot of these efforts get to be their own practices. We have SOA, which should be a great glue, but I've seen too many companies set it up as its own separate kingdom. So—almost paradoxically—it becomes a silo. Unfortunately, it looks like that's what's happening with some of the MDM initiatives. History repeats itself, and it's tough to get an enterprise moving in the direction of data governance. It's the people and the organizational issues, especially with reference data, that contribute to the siloing. It becomes a political issue, an organizational issue.

This is something I’ve seen a lot of with SOA, where you have this kind of Utopian promise of a plug-and-play—and extremely agile—technology infrastructure, but it ends up getting snagged on the same people and process issues that have bedeviled organizations for decades. Is the same thing happening with the MDM-ification of the enterprise, or are there reasons to be optimistic that MDM will succeed where other such efforts have failed?

I think that with data governance, financial governance, and performance management, you have sort of this perfect storm that has given companies their best shot at dealing with the people and process issues. They’re under pressure [from these and other drivers] to get this right. That doesn't mean they're going to, but if you're just trying to justify data governance, the perfect storm has never been set up better than it has right now. There are certain companies that are doing very well at this. It's not like every company is falling into the silo trap.

Talking Technology

Let’s talk technology for a moment. Obviously, MDM isn’t just a technology issue, but—over the last 24 months, especially—a lot of business intelligence, data integration, and ERP vendors have introduced MDM-oriented solutions. If you’re an organization and you want to get serious about your master data—you finally want to start tackling and reconciling your reference data issues—what do you need? Can you do it on your own, sort of with homegrown tools, a data quality tool, some data integration technology, and maybe some outside consulting, or would you be better served investing in a dedicated MDM offering from IBM or Informatica or Business Objects, for example?

I was involved this year with one retailer whom I can't name who was dealing with [an MDM product from a prominent data integration middleware vendor]. They spent a whole lot of money last year on [this product] with a lot of expectations. It was sort of a classic example of what not to do: they sort of ended up with silos. As any large retailer, they had a number of campaign management tools, a number of customer data warehouses, data marts, a number of operational systems, as opposed to Oracle and SAP, and they had a number of ETL tools—including Informatica—and several data quality tools, including Firstlogic. But they really didn't think of an overall workflow or architecture [for their MDM practice].

They spent a lot of money on this, the big tool, and they thought that was going to solve the problem, but they were still using all of these other tools, [along with] their upstream and downstream data warehouses and data marts and their campaign tools. It was a mess. Most companies already have had to be dealing with some kind of reference data in this space: if they're a retailer, they deal with customers; if they're a manufacturer, maybe it's more on the product side; it depends on the industry and what they try to do—but they all have some initiatives in that space. If their solution hasn't really produced the level of consistency or the ability to integrate with their other systems very well, bringing in a newer solution from a vendor makes a lot of sense. If they're dealing with a lot of operational data, certainly bringing in SAP's approach makes a lot of sense.

No matter what, even if you bring those tools in, you have to identify your overall architecture, your overall business processes, [and decide what] you want to change. Identify what systems you need this consistent master data in. Figure out what are the key tools you need. Figure out if you can incrementally build it through better use of the data quality tools, or would you do a much better job using an SOA framework and try to use data integration services to move data between them. The issue is that you really need to figure out where your gaps are. I think a lot of people incrementally would be able to build out their infrastructure and don't necessarily need another set of tools out there.

The problem with the MDM/CDI solutions is they sort of get you on one stack of tools. If you already have data integration and data quality tools out there, and obviously there are other tools, too, [but] if that spec meshes with what you have, maybe it makes more sense to bring that tool in. If it's another set of tools that overlaps with what you have, then maybe you bought into an application data migration issue on top of what your problems are.

I think a lot of people can go with systems integration [tools], assessing where they are, and incrementally build out solutions, probably with a heavy dose of data governance. The key is anticipating and identifying overlap: if these pre-built solutions mesh with what you have, there's a lot of overlap, and you probably don’t want that.

MDM Overlap and Homegrown Solutions You mention the issue of overlap, and I think it’s a very real danger. I’m not just talking about overlap between homegrown and third-party tools, but also [overlap between] third-party and third-party tools. Everyone has an MDM solution now, after all. Isn’t there a sense where some of the BI, PM, ERP, and data integration vendors are trying to control the MDM tier, so to speak? In other words, they want to be the “site” for MDM data in any given enterprise?

I think that's an issue that even presents itself with performance management solutions, too. [In the performance management space] we have a big battle looming between the DI specialists and ERP titans, with more and more of their stack and their solutions incorporating performance management features.

What we were presented with years ago, five or ten years ago, was we had more cooperation and interchange between some of these vendors. There were also a lot more data integration, ETL vendors out there, as well as BI vendors. The BI, ERP, data integration vendors have all picked up and expanded their stacks.

The issue with the MDM and CDI stacks is that if you buy into any one of these [vendor] visions, you can get locked in. They're not interoperable with anybody. You pick up an MDM stack, you pick up their MDM tools, whatever they happen to be using on top of their stack. You also buy into their schema, and, of course, both loading of that data, ETL, EDI, whatever they use to load the data, and then reporting on the data, how they do analysis of the data—it’s all built on to that stack. If you have nothing, it's a great jump start. If you have something, then the issue is does any of that stack, how well does it mesh?

What would a homegrown MDM stack look like?

No matter what solution you have, homegrown or third-party, you still need data governance. Of course, if you have the homegrown solution, the level of commitment involved probably isn't as much. On the technology side, most likely some set of data quality tools, certainly ETL or a data integration suite. Data quality tools probably best acquit themselves for customer data integration scenarios; I'm not sure if they lend themselves as well to product information or product hierarchies.

If you do have a number of systems out there, you probably need to move into somewhat services [orientation] in order to interact between them, because you're not going to get by on an entirely ETL or batch-driven approach. As far as where it's housed, if it's housed in the warehouse or a separate database, that really depends on where the efficiency is for that particular company. That really depends on whether a company has everything on one particular platform or wants to split it off.

If you’re an enterprise and you’re looking to get serious about MDM, to do a bit more than just test the water, what do you do? Do you do homegrown or do you jump on one of these company's stacks?

I think it really depends. The place where it's most dangerous is to jump on [is] a specialized vendor's stack, because most likely if they get any momentum, they're going to be bought out. If you’re a big SAP shop, it might make a lot of sense to buy into SAP’s stack instead of building your own, just because SAP’s stack [i.e., MDM technology] is optimized for its ERP [operational systems]. A lot of Oracle’s customers have bought into Oracle’s stack, too, don’t get me wrong, but I think SAP has a firm lock on its customer base, for the simple reason that they all chose to be with SAP from the beginning.

We both sketched this scenario of a looming battle between BI/PM vendors and ERP vendors for control of the MDM tier, but when you think about it, doesn’t it really come down to an issue of market efficiency? The market will choose and it will choose the most logical technology (in this case, the ERP vendors) because, on the operational side, they’re generating most of this [reference] data to begin with. I think it is an issue of market efficiency. If a lot of the issues happen to be operational in nature, if it's within your operational systems and you need to get the product to customers consistently and fed across these operational processes, then I think you've got to tilt toward the ERP vendors.

If it's more reporting and analytics, it's on the CPM side, you're not really dealing with operational issues. I'm not sure if that really pushes it toward the BI vendors, or if it's more likely for someone to have homegrown, with different parts of people. My gut feel is that the operational side goes to the ERP vendors, definitely, but if it's PM, then that opens the doors to homegrown systems or a smattering of BI or specialty vendors.