We’ve all seen examples of metadata management and data governance initiatives that have failed. A number of customers have asked me about best practices for driving successful metadata initiatives.

Some people are looking to grow their use of metadata beyond an IT productivity project into something larger such as a data stewardship or data governance initiative. Some are just getting started. In either event, here are my top ten recommendations based on conversations with customers and some of the presentations and panel discussions I ran at Informatica World.

1. Show up, start small and execute. Honestly, showing up is half the battle. Pick a small project that you can be successful at and show some tangible results quickly. Then, grow from there.

2. Quantify everything. Be ready to quantify your results at the drop of a hat, literally. You never know when you will be asked to justify the existence of your metadata project. One financial services company at Informatica World showed how they saved over 95% on the analysis phase for planning data integration changes. Another health care company showed how they save a minimum of two weeks on every change analysis using search, data lineage and impact analysis. Have these metrics ready. Because the big benefit, good business decisions based on good data, is so many steps removed from IT it is often hard to quantify. Quantify what you can and keep looking to quantify other benefits. Example: What would be the cost of a bad investment decision because of bad data in a data warehouse?

3. Get executive sponsorship. Your project won’t succeed without the right executive sponsorship. Period. Don’t shortchange the importance of getting the right sponsorship from both the business and IT sides of the house. Several people have told me that this is what they did right after their first failure. This is also critical for getting other groups to contribute to the overall cause.

4. “No metadata project is likely to succeed without a data governance initiative”. With a data governance initiative, a company moves beyond metadata as an IT productivity tool and into use cases that have much broader business benefit across the organization. The most important thing is that you have a data governance council to set the overall direction and priorities for data-driven projects. They also need to design an overall framework for business users to collaborate with IT on these projects.

5. Pick a high-value target. Pick a specific problem and solve it before expanding into other new areas. For example: provide complete data lineage and full business terms and definitions around a new data warehouse or master data management (MDM) implementation. The important thing is to pick a project that has high value and high strategic importance to the overall business.

That first win is critical. (See also #2, Quantify everything)

6. Scope for success. I have seen a number of metadata / data governance projects fail because they tried to do it all at once. The projects quickly bogged down and then got cancelled when they failed to produce any meaningful results. Nobody has time or budgets for endless meetings that produce no tangible results for the business.

Often the failure comes from trying to resolve common business vocabulary across divergent business units. Start with a single project in a single business unit if possible and grow from there.

7. Get your business users involved. Great quote: “Data without business context has no value.” A table with a column name of NW_Net_Revenue has no particular meaning unless it is attached to:

A business term

A term definition

A term owner

Reference data: Does “NW” refer to a geographic region? What is in that region?

Other documentation and comments.

Once you have this business context, you can link the business terms and definitions to the underlying technical metadata, creating a common lingua franca between business and IT that will improve communication and collaboration.

8. Use both the carrot and the stick. Think about incentives such as better access to business term definitions, contests for contribution, visibility, etc. It is also good to think about the stick. Having the backing of executive management and a strong data governance council can go a long way towards ensuring support and buy-in as well. The important thing is that it is important to use both approaches. All “stick” can be heavy-handed and result in only grudging support.

The challenge is to get the business community to want to participate. In almost all cases, it is not the full time job of the business side to provide this context. They all have full time jobs and are doing this “on the side.” The question is: What’s in it for them? You will have to find ways to incent them.

9. Tie metadata management to a business initiative. I picked this up after talking to two companies that had both tried and failed at data governance initiatives twice each. They both told me that their new approach was not a top-down data governance initiative, but to attach metadata management and data stewardship best practices to important new projects as they came through IT. This approach is much more pragmatic, scoped for success and likely to succeed. It is also more likely to show quick and measurable results.

10. Look for a data crisis … and be ready for when it inevitably happens. What if your management just won’t fund a metadata management / data stewardship initiative? A senior manager at a financial services company told me to “Just watch for a data crisis” and be ready with your proposal in hand. A crisis will happen. What matters is that you are ready with a proposal that shows how to prevent this type of crisis in the future.

These ten things won’t guarantee success but they will go a long way towards improving your odds of succeeding.

Big data and related technologies such as Hadoop present significant opportunities and challenges to businesses. Nearly everybody in IT reports that they are actively evaluating big data technologies. And, just as you would expect, they are in a variety of stages of implementation. So, who has time to think about data governance when dealing with a massive change like this?

First, you have to get your hands around the new technology, right? Actually, this is exactly the right time to think about data governance for big data; before the wild, untamed data from outside the company starts getting mixed with your potentially more trustworthy, tamed, internal data.

Consider this example: A leading pharmaceutical company told me that in the past all of their research data had been internally-generated and, as a result, was trusted. Their internal research was the proprietary edge that their company had against their competitors. Times have changed. Now, there is a massive amount of research available through public sources on the Internet. Any company trying to compete in pharmaceuticals would be crazy not to take advantage of the wealth of outside research, if for no other reason than to avoid recreating other people’s mistakes. But, “free” third-party data (like free puppies) from outside sources bring issues with them:

How trusted is the source of this information?

How current is the information?

Is there other research related to this information? Internal or external?

Was the information correctly moved into the company? For example: Was it loaded correctly? Were the data transformations done correctly?

Can the people who need to use this data find it?

Everybody working with data from sources outside their company (examples include social interaction data, mobile data and data from B2B partners) firewalls will soon be dealing with the same issues, plus a few new issues. These include:

What is the structure of the external data? If you use a technology like Hadoop, for example, it assumes no structure and is schema-on-read. If you are going to combine Hadoop data with other internal data, as many people are, you will need to apply some metadata with technologies such as Hive or HCatalog or store in a NoSQL datastore such as HBase.

Will the data structure enable you to create data lineage: Visual diagrams of the flow of data through your organization so that you can understand and manage it.

Is the data correct? Can it be cleansed of errors or ambiguities?

Where is the best place to deal with the data cleansing issues; at the source (such as Hadoop Distributed File System) or at the application level where the application data experts are matching it with their internal transactional data. There are good arguments for both use cases.

How do we clearly label the owner and source of this data?

How do we put a ranking on the data that represents how trusted the source is?

Who is moving and transforming the data? Is this somebody in IT or somebody in another organization? Do we want to treat the data differently if it is not from an IT source?

How is the data matched, related and linked to enrich customer, product and other master data?

Does the data contain any sensitive data such as personally identifiable information (PII) that needs to be masked (e.g. social security and credit card numbers) for regulatory compliance?

How do you search and find data that may already exist and may have already been curated to avoid creating endless copies of data and further contributing to the cost of managing big data?

The time to think about data governance for big data is up-front when the big data projects are being architected. Trying to catch up and govern the projects later will be much harder to accomplish. Here are a few recommendations:

Think about your data governance processes up front, before the data starts flowing.

Do not attempt to govern everything. Think about where the high-value and high-risk parts of these initiatives are and focus there. Ask: what has the highest business value to the organization and what would be the impact of bad data in this business initiative?

Assign data stewards for these new sources of data and hold them accountable for the quality of their data.

Make sure that the data stewards assign business terms and clear, unambiguous definitions classifications and taxonomies to better manage and standardize use of the new sources of data.

Start to define a way to rate the quality of the data sources in a way that everybody can agree upon.

Take a serious look at where to cleanse and where to match new data as it’s captured and starts flowing through the environment.

Don’t think of big data as something separate. It is all part of your overall data governance process and should be treated that way.

Think about your data governance plans in your architectural stages, before the initiative is implemented. Being proactive will be much more effective than trying to retrofit data governance to big data after the fact, like closing the doors after a horse that has already left the barn. And, just like data governance in general, the key to success will be to only manage what is really important to the company and to be able to show a clear ROI for your efforts. Not to be sensational here, but the takeaway I want to share is this: If you are not designing data governance into your big data initiatives, you are going to competing with companies that are.

I often ask people this question: What is the cost of bad data in your organization? The answers I get can vary quite a bit. For example, a bad address for a marketing campaign can mean the loss of hundreds of dollars per mailing. On the other hand, other customers have told me that bad data in an investment data warehouse can lead to bad decisions could result in losses in the millions, and even put those companies out of business.

When I raise this question in the context of metadata management and data governance, I often get the question; “Isn’t that a data quality issue?” And, the answer is yes. But, a great data quality program alone is not enough to guarantee trustworthy data.

There are lots of other ways that data can get damaged in your Data Integration / Data Warehouse / Business Intelligence environment. Here are two quick examples (and there are many more):

Changes to the environment can cause damage to date due to unseen cross-dependencies.

Communication between business and IT around new project or changes can cause damage to data if it is not based on a common business vocabulary that is available to everybody.

Last week, a long-time Informatica customer told me about how he is managing change in his data integration environment on a truly massive scale. The story he told reminded me of someone doing a heart transplant on themself while running a marathon. … blindfolded.

First, he is delivering several innovative new applications. One to improve customer service. Another, proactively monitors operational performance and suggests where the business needs to invest more to improve their capacity and service levels.

Next, he is rolling out new systems that will collect customer sentiment analysis from social media sources (Big Data) and integrate that with ongoing campaigns and company planning.

On top of that, he is managing the data integration aspects of a galactic merger between his company and another large company. In this effort, he has to manage application consolidation, application retirement, data migration, and data integration projections between the two companies. The merger has to be accomplished while keeping everything up and running.

Finally, he also has to deal with a complex and changing regulatory compliance environment that makes demands in terms of security, privacy, and financial reporting and auditing.

It’s a challenge to manage change on that scale, and at the same time, to maintain the data integrity of all of the enterprise software applications . But, most Data Integration managers I know are doing exactly that without the visibility necessary to effectively manage change on that scale. Managers worldwide are struggling to manage change in complex Data Integration environments where a seemingly simple change has huge business risk and can result in material negative impacts. The problem is that multiple systems may have cross-dependencies on the data objects that are being changed. A change to one data object cans cause ripple effects across the system, impacting multiple applications, reports, and dashboards. A loss of data integrity in any of these could lead to bad business decisions costing millions.

Examples of Data Integration Environment Changes

Common examples of changes to the data integration environment that can negatively impact data integrity include:

One question I hear a lot is: “Is this the right time to make the investment in a technical metadata management project?” Before responding, there are a couple of questions to consider first:

What is the likelihood of a change to your Data Integration environment causing bad data?

In your environment, what is the cost of a bad business decision?

How important is it that you able to “prove” your numbers to auditors?

Increasingly, the data is the business. Your business is only as good (and as competitive) as its ability to manage the integrity of the data it uses or provides. Nobody should perform heart surgery blindfolded, and nobody should make changes to their data integration environment without the tools to give them proper visibility either.

Consider this situation: Would you try to ride a bicycle blindfolded? You could probably pump the pedals and steer without trouble, but you would be lacking the visual feedback that the changes you are making in direction and velocity will keep you on your intended course and avoid harm.

This question undoubtedly sounds crazy, but people are making changes to their data integration environments every day without the tools in place to visualize the environment and to tell them the impact of proposed changes.

There are good tools available today to help with this problem.

Metadata Management

The first thing you would need to manage a complex data integration environment, is a visual map of where the data is coming from, how it is flowing through the system, and what targets are using that data. Metadata Management tools, such as Informatica Metadata Manager, provide connectors to automatically collect metadata from repositories, Business Intelligence tools, data modeling tools and Informatica PowerCenter. That data is then presented as a visual map of the data integration environment. Having a good map is an important starting point.

Good metadata management tools can also tell you the impact of proposed changes to your data integration environment before you implement them.

That’s the technical side of metadata. It’s all about gaining the visibility to see and manage what is going on in your data integration environment.

Business Glossary

On the business side of metadata, there are products such as Informatica Business Glossary, which business term owners use to create, document, and publish business terms. These terms become the common vocabulary of the business.

The business vocabulary can then be linked with the underlying technical metadata to enhance the quality of communication between the business and IT sides of the enterprise.

How many of your projects have suffered delays as your staff struggled to simply agree on the definition of a business term?

How many projects have failed because of a lack of a common vocabulary to use between business and IT? Many studies (including Gartner) have shown this to be a major cause of IT project failure.

The business side of metadata is all about building a common vocabulary to improve collaboration, reduce mis-communication, and speed up project delivery.

For many people, metadata can sound a bit esoteric and complicated. But, ask yourself this; Would you get on that bike blindfolded?

Who Needs Metadata?

Consider this situation: Would you try to ride a bicycle blindfolded? You could probably pump the pedals and steer without trouble, but you would be lacking the visual feedback that the changes you are making in direction and velocity will keep you on your intended course and avoid harm.

This question undoubtedly sounds crazy, but people are making changes to their data integration environments every day without the tools in place to visualize the environment and to tell them the impact of proposed changes.

There are good tools available today to help with this problem.

Metadata Management

The first thing you would need to manage a complex data integration environment, is a visual map of where the data is coming from, how it is flowing through the system, and what targets are using that data. Metadata Management tools, such as Informatica Metadata Manager, provide connectors to automatically collect metadata from repositories, Business Intelligence tools, data modeling tools and Informatica PowerCenter. That data is then presented as a visual map of the data integration environment. Having a good map is an important starting point.

Good metadata management tools can also tell you the impact of proposed changes to your data integration environment before you implement them.

That’s the technical side of metadata. It’s all about gaining the visibility to see and manage what is going on in your data integration environment.

Business Glossary

On the business side of metadata, there are products such as Informatica Business Glossary, which business term owners use to create, document, and publish business terms. These terms become the common vocabulary of the business.

The business vocabulary can then be linked with the underlying technical metadata to enhance the quality of communication between the business and IT sides of the enterprise.

How many of your projects have suffered delays as your staff struggled to simply agree on the definition of a business term?

How many projects have failed because of a lack of a common vocabulary to use between business and IT? Many studies (including Gartner) have shown this to be a major cause of IT project failure.

The business side of metadata is all about building a common vocabulary to improve collaboration, reduce mis-communication, and speed up project delivery.

For more of an introduction to metadata, see the Informatica Chalk Talk: Getting the Most out of Your Data Integration with Metadata Manager & Business Glossy.

For many people, metadata can sound a bit esoteric and complicated. But, ask yourself this; Would you get on that bike?

For those of you who have been reading this blog for a while, I am about to make a change in focus. You might have notices that I have not written a lot in the past couple of months. A few months back I took a Product Marketing job at Informatica (Data Integration). There, I’m responsible for the product marketing of two fairly-technical products aimed at a technical audience of data integration experts:

Informatica Metadata Manager & Business Glossary. I will write a lot more about this subject, but for now let’s say that MM&BG gives you the visibility and control needed to manage a complex data integration environment. In particular, it is very useful in helping to manage change in that environment. Simple changes often run afoul of unseen cross-dependencies. A simple change can cause a ripple-effect of unintended consequences across you data integration environment. The results can be bad data and bad business decisions.

The interesting thing is that the blog has kept up its readership very well without any new content!

This blog has always focused a great deal on social media and the application of social media to the marketing of technology products. That will continue. Only now, it will be a lot more personal as I apply social media best practices to my two products at Informatica.

I have had some conversations with people recently who do not see how social media applies in this space. A lot of people see social media as a way to reach vast audiences. But, social medial marketing is also really good at “Long Tail” maketing — reaching small, scattered, but well-defined markets. This also is a topic that I will be exploring in upcoming blogs.

Up next, a few blogs on metadata. Who needs it and what does it do for me?

You have the beginnings of your social media marketing strategy going for your business and now it is time to hire a Community Manager to run it all. That all sounds pretty straightforward.

The problem is, most of the job postings I see for these positions say something like: “We are looking for a social media God who knows all about Twitter, Facebook, LinkedIn, Digg, etc., etc.,” The more advanced postings say something like; “Proven track record of delivering measureable marketing results through social media-driven campaigns.”

Being a good Community Manager is far more than having 500 Twitter followers, a deep knowledge of the latest “bright, shiny thing” in Social Media, or even a track record of running some successful campaigns. This is a good start but not nearly enough to ensure long term success.

Your Community Manager is an extension of your company brand. This person is the human face that will interact with your customers, probably more than anybody else in your company. Think of them as your company’s Colonel Sanders, only one that actually talks to your customers on a daily basis.

Sticking with the Colonel Sanders / fried chicken business analogy for a minute, picture what happens to your business in these scenarios:

Your Community Manager decides to leave for another more interesting (to them) job at another company. It is their identity that your customers associate with your brand.

It becomes obvious over time that your Community Manager, while good at social media marketing, never had any real passion for the fried chicken business and doesn’t actually know much about fried chicken

Or, worst of all, your community manager likes your fried chicken ok, but has no empathy for your customers, their issues, or their concerns.

It’s time to stop thinking of Community Managers as social media Gods and think hard about them as extensions of your brand. Sure, you want your Community Manager to know social media marketing, to know how to run campaigns that deliver measurable results, but there is more. It is time to start thinking of Community Managers in terms of:

How long are they likely to stay in the job? Or, with the company?

Do they really have a passion for your product or service?

Are they effective evangelists for your product or service? (Or, do they just pass product questions off to other people?)

Do they have a deep and genuine connection with your customers?

Do they provide read value to your customers in their interactions?

Are they going to stay around long enough to be measured on the effectiveness of the campaigns they are proposing or are they going to be off to the next cool thing?

There is always a lot of hype around any new technology and social media is no exception. For serious business users of social media marketing and campaigns, the core values still apply. Social Media gives us great new tools for creating genuine two-way relationships with our communities, but we still have to do the hard work of building and growing the relationships.