Metadata Repository Essential Use Cases

“Even though we’re working with very sophisticated Data Science resources, we still hear over and over, ‘I don’t even know what data is available,’” said Susan Swanson, the Senior Manager of Data Modeling and Architecture at Health Care Service Corporation (HCSC), an organization that offers regional health care coverage and services.

Swanson presented three areas of focus for confidence in decision-making: Data Governance, Metadata Management, and Data Quality. “We want to be confident in the decisions that we make with our data. That is the message that resonates collectively within our organization.” Data Governance lends controls, Metadata Management curates and maintains the information governance catalog, and Data Quality provides measurement, monitoring, and improving the data, she said.

Users want to know what data is available, particularly in the Data Lake; where data is located among multiple data locations; how data is classified for security; and, if it can be shared. The Enterprise Metadata Repository is an organized catalog of descriptive data references and their relationships to each other that offers a framework to leverage implementation of Data Governance controls and enables effective Data Quality monitoring to correctly interpret the data and ensure confidence in its use.

Use Case 1: Business Glossary

“The most important word here is ‘business.’ It comes from the business. It’s defined by the business; it’s not defined by IT.” The Business Glossary is an entry point for navigation and can offer a logical place to start if a user is not quite sure what they’re looking for, Swanson said. It provides a framework for users to see how things are grouped and collected, and how they are tied to other Metadata Repository components, with descriptors that enrich the contents. The workflow for the Business Glossary is an iterative process where important concepts and related terminology are identified, defined, reviewed, and standardized before being approved and published, she said.

HCSC gives read-only access via a series of SharePoint sites, with instructions for quick access and feedback on proposed terms and definitions. Once approval is secured, terms are linked or “stitched” to other related terms and documents to establish traceability, she said. The stitching process happens behind the scenes and varies from a semi-automated process to a manual process. “Everything really aligns to the Business Glossary. It’s still that high logical level,” she said. “As other metadata contents come in, they all come to their proper point within the Business Glossary where they’re associated.”

Use Case 2: Data Policies, Rules, and Security Classification

Data Governance policies and rules are typically established outside the Metadata Repository tool. Those policies set desired behaviors, establish boundaries, and clarify the process and management of data in detail. Security classification sets up pathways to compliance, and governance rules connect to operational rules to deliver policy. “This is really the Data Governance component of our Metadata Repository,” she said.

The associations and linkages are very orderly and created in a step-by-step fashion. “It’s only when you get all of these components in line and associated with each other that you can actually execute on the operational rules,” and implement policy, she said. Compliance is validated and impact is calculated through reporting, with the understanding that security classification of critical and sensitive data requires greater oversight.

Use Case 3: Data Profiling and Data Quality

In a 2017 capability maturity model assessment, HCSC respondents in eight out of 10 areas ranked Data Quality as the most important focus for 2018. For Data Profiling and Quality workflow, stewards and subject matter experts (SMEs) define rules and expectations for quality, with guidance from the governance group, and support from the application development team within IT.

“It really is around validation for us,” and an understanding of the level of quality used in their data solutions. Workflow for Data Quality varies based on workload in other areas. During open enrollment periods there are multiple checks performed daily, and at other times when the flow of sensitive personal information is slower, those checks are done once a week. “We rely on our business resources to guide and direct what it is that we are going to monitor from a quality perspective,” rather than having the process managed by IT, Swanson said.

Use Case 4: Data Lineage and Technical Metadata

For convenience, Swanson combines business Data Lineage — or “designed Data Lineage” — (which is essentially source-to-target mapping), with traditional technical Data Lineage (which for their purposes is ETL). The business requested that they trace “Every hop along the way, but we are keeping it very focused. We can at least provide lineage from an ETL perspective for what has been implemented at this point in time.”

An audit trail that includes filename and processing date is used for analysis, tracking, and compliance, and business data teams proactively review Data Lineage to identify potential issues. In HCSC’s 2017 capability maturity model assessment, Data Lineage teams chose Data Lineage as the #2 area for focus in 2018. “It was a strong affirmation that [lineage] is the place to be focusing our resources and attention,” she said.

Summary

The Business Glossary provides a common vocabulary and offers navigation to other metadata. Data Governance policies and rules establish controls and oversight, and business resources ensure that quality improvements align for business values. Exposing Data Lineage creates trust in the data.

“The Metadata Repository really is the linchpin that’s holding all of the metadata content. We make it centralized. We make it a one-stop shop. Everybody knows where to go.”

The Metadata Repository provides standardization, a shared business vocabulary, and creates opportunity for business users to talk together, Swanson said. It is the framework that supports governance policy and rule definition, and where policies and rules get associated with data. “You’ve got that common connectivity and you’ve got the integration that the Metadata Repository can provide.”

Swanson said that development of the Metadata Repository is continuous and as strides are made going forward there will also be steps going backward. Focus will change from year to year, but her recommendation is to prioritize the needs of the business. “It’s an exciting journey. It has starts and stops, it has struggles, but it’s a very rewarding journey.”

About the author

Amber Lee Dennis is a freelance writer, web geek and proprietor of Chicken Little Ink, a company that helps teeny tiny companies make friends with their marketing. She has a BA in English, an MA in Arts Administration and has been getting geeky with computers in some capacity since 1985.