Tag Archives: Metadata

Of all the IT disciplines I’ve worked in over the past 30 years, nothing seems harder to standup than a metadata program that delivers business value day to day. When you turn a light on in a dirty closet, not to many folks want to dive in and clean it up. They would rather shut the door and deal with it later or not at all. Pretty much the same thing happens when a metadata team comes along and tells you that you can find out where your data is by importing your business glossaries and application data models into the repository and attempts to connect them together, only to find inconsistent practices across the organization reducing the ability to connect them.

Glossaries and data models had not been used this way in the past and this exposes numerous inconsistencies:

Terms defined differently in parts of the organization

Valid values defined inconsistently

Multiple naming conventions for the same object

Data types changing for like objects

Use of logical names with syntactic structure instead of business terms

Inability to connect precise business terms to high level concepts found in enterprise data models

Business terms that do not align with the data model

Logical side of the data model that does not align with the physical side

How to change this? The premise here is for a metadata team to be self sustaining, where the business asks to get their metadata into the repository and asks for guidance on how to improve its quality, requires collaboration with several groups:

Governance’s ability to define standards and monitor metrics for adherence is invaluable. Working together, the factors behind each of the issues described earlier can be more effectively communicated through governance standards and best practices and metrics collected to measure progress.

Business Units ability to see the benefit of the cleanup work, provide the resources not only to learn how to look at their processes a bit differently but make the investment to improve the metadata quality.

Management support is crucial, especially during the initial phases where it takes time to get a tool configured to work with the design time artifacts the way the organization sees it, not just what the tool does out of the box.

Every company implements its development process in slightly different ways so when using a tool to help manage your metadata, the tool needs to be configured, enhancements made to fill the gaps which can only happen when you have a good working relationship with the tool vendor.

Only when the collective efforts of these groups are brought to bear on the quality of metadata and the capabilities of the metadata tool can the organization start to see the benefits of connecting them together, something that the evangelists and visionaries tell us lies ahead.

In many ways, a data warehouse resembles the children’s game broken telephone, where a message is distributed across a group by being whispered into the ear of one player after another, until the last player announces the message to the entire group. Since errors typically accumulate in the retellings, the final version of the message often differs significantly from its source. Some players appear to deliberately alter what’s being said, guaranteeing a garbled message by the end of the game.

As data journeys from operational sources through the staging area, the data warehouse, the data marts, and finally into dashboards and reports, a lot could be lost in translation. As it is processed, data is often deliberately altered to make it accommodate the structure of its next target. Every time data moves from point to point, there is a possibility for semantic inconsistencies to be introduced. (more…)

So the rest of you don’t have to read the details, what happened was that McAfee, yes, that McAfee of antivirus software fame, fled his home in Belize after his next door neighbor had been found murdered. He fled Belize and had been hiding out, blogging and tweeting along the way which I am sure the authorities found quite amusing. In the end, he was caught because he gave an interview to a reporter who took a picture of him with his iPhone. The picture was posted on the Internet and some hacker got a hold of the photo file which had the GPS coordinates of the location of the phone embedded in it. (more…)

Big data and related technologies such as Hadoop present significant opportunities and challenges to businesses. Nearly everybody in IT reports that they are actively evaluating big data technologies. And, just as you would expect, they are in a variety of stages of implementation. So, who has time to think about data governance when dealing with a massive change like this?

First, you have to get your hands around the new technology, right? Actually, this is exactly the right time to think about data governance for big data; before the wild, untamed data from outside the company starts getting mixed with your potentially more trustworthy, tamed, internal data. (more…)

Does your organization have a structured repository of metadata that can help a data center operator (whether they are on-site or off-shore) quickly troubleshoot a production incident related to a data integration job at 2:00 am in the morning? Or any time of day for that matter? This is just one use of metadata. A new Metadata Management whitepaper has just been published which describes the wide range of metadata types, uses and the business value derived from them. (more…)

A number of customers have asked me recently about the benefits of using a business glossary product over using a spreadsheet or Sharepoint. The discussion is worth sharing.

If you have a smaller company and all you need is a list of standard business terms to provide a common business vocabulary across the company, a spreadsheet or Sharepoint can work, …up to a point. The problem is that once your organization reaches a certain size, you are going to have trouble scaling the management of the business terms, making them available across a larger organization, and fostering collaboration based on the agree-upon business terms. (more…)

Data federation techniques help mitigate both the accessibility and the latency issues, but we still need to deal with the need for quality of content when employing a data virtualization approach. Within the ETL world, data inconsistencies and inaccuracies are dealt with through a separate data profiling and data quality phase. This works nicely when the data has already been dumped into a separate staging area. But with federation, not all the data is situated within a segregated staging area. Loosely-coupled integration with data quality tools may address some of the problem, but loosely-coupling data quality can’t eliminate those situations in which the inconsistencies are not immediately visible. (more…)

The conventional wisdom around master data management suggests that there are a few “core” master data domains that must be handled by the master data systems, namely customer and product. Of course there are a few other key ones as well, along with some critical master reference domains. But underlying the suggestion that master data management (MDM) systems must provide a model for customer and a model for product imply some degree of simplicity, both in terms of consideration of the potential conflicting definitions that can be applied to either of these two concepts, as well as the ability to manage the mapping between what should be the unified representation of those concepts and their actualized representations as they appear within existing application systems. (more…)