The Role Of Data Quality Monitoring In Data GovernanceAligning Data Quality Metrics With Business Insight

Completeness is generally a measure of the presence of an actual data value within a field or column, excluding NULL values and any non-NULL values indicating missing data (e.g., character spaces). Completeness can also be used as a measure of the absence of some of the sub-values that would make a data value complete (e.g., a telephone number in the United States missing the area code). Either way, completeness is not a measure of the validity or accuracy of the values present within a field or column.

There is a subtle, but important distinction between the closely related notions of validity and accuracy. Validity is the correctness of a data value within a limited context such as verification by an authoritative reference. Accuracy is the correctness of a valid data value within an extensive context including other data as well as business processes. Validity focuses on measuring the real-world alignment of data in isolation of use. Accuracy focuses on the combination of the real-world alignment of data and its use.

A good example of the distinction between validity and accuracy is postal address validation. Data quality processes certified by the United States Postal Service (USPS) serve as an authoritative reference for the validity of a United States postal address. Correspondence mailed to a validated postal address is guaranteed to be successfully delivered to that location. But postal address validation doesn’t verify the accuracy of the relationship between the customer and the location, meaning whether or not it is an accurate home or work postal address for the customer. An important question is how concerned the organization is about the accuracy of postal address since it may vary with business use. For example, accuracy is more important when mailing customer bills than when mailing customer marketing collateral.

Most data quality dashboards create and monitor metrics based on the summary statistics provided by data profiling tools, attempting to elevate low-level data-myopic metrics up to the level of business relevance. However, at best, these disconnected summaries establish a correlation with business performance, but do not establish data quality metrics that drive—or should drive—the organization.

When data profiling is performed during data migration, data integration, or ETL (extract-transform-load) activities, the focus is mainly on conformance to target expectations, meaning that the data is being profiled to determine what data transformations might be necessary to prepare the source data to be successfully loaded into the target database. These aspects of domain and structural integrity analysis have an important technical context, but lack any relative context regarding the business uses of the data.

A common mistake made by those advocating that data needs to be viewed as a corporate asset is measuring data quality independent of its business use and business relevance, which is why most data quality metrics do a poor job in relaying the business value of data quality. Without data quality metrics that meaningfully represent tangible business relevance, you should neither expect anyone to feel accountable for providing high quality data, nor expect anyone to view data as a corporate asset.

Measuring data quality using the real-world alignment definition establishes a theoretical measurement of potential business impact. Measuring data quality using the fitness for the purpose of use definition establishes a practical measurement of actual business impact.

Therefore, every data quality metric you create must be able to answer two questions:

How does this data quality metric relate to a specific business context?

How does this data quality metric provide business insight?

If a data quality metric cannot answer these questions, then it is meaningless. Meaningful metrics provide business insight when they are created in relation to a specific business context. Instead of beginning with this relative business context in mind, many organizations begin with only the data in mind, which results in creating and monitoring data quality metrics that provide little, if any, business insight.

Data governance policies are the corrective lenses that resolve the organization’s data myopia, bringing its data quality metrics into focus with clearly defined and measurable business context.

The compliance metrics associated with data governance policies align data quality with business insight, providing the historically missing link between data quality and business performance.

Before we examine some examples of how data governance policies improve data quality metrics, let’s first examine how data governance provides the framework for a proactive data quality program.

The Role of Data Governance in Data Quality

When the correlation between data quality and business performance isn’t measured in a tangible way, the organization is blindsided by an event making it painfully aware of the negative business impacts of poor data quality. Some examples include a customer service nightmare, a regulatory compliance failure, or a financial reporting scandal. These events typically trigger a reactive data cleansing project where the only remediation will be finding and fixing the critical data problems, but without taking correction action to resolve the root cause—and in some cases, without even identifying the root cause.

Often the root cause of poor data quality can be traced to the lack of a shared understanding of the roles and responsibilities involved in how the organization is using its data to support its business activities.

Data governance provides the framework for a proactive approach to data quality, which requires going beyond reactive data cleansing projects, and establishing a pervasive program for ensuring that data is of sufficient quality to meet the current and evolving business needs of the organization.

Policy is the Central Concept of Data Governance

The central concept of data governance is its definition, implementation, and enforcement of policies, which govern the interactions among business processes, data, technology and, most important, people. It is the organization’s people, empowered by high quality data and enabled by technology, who optimize business processes for superior corporate performance.

Data governance policies for data quality clearly define the business, data, and technical requirements that must be satisfied in to make data fit for the purposes of its operational, tactical, and strategic uses.

Data governance policies define and document all of these requirements in a straightforward and natural language that everyone can understand, with unambiguous definitions of the business, data, and technical terminology. Although documentation is a crucial aspect, it is just the beginning. Data governance must go beyond documentation. The data governance policies must be implemented as executable processes, which are directly embedded within the daily activities of the organization.

Ownership, Responsibility, and Accountability

Data governance policies clearly illustrate the intersection of business, data, and technical knowledge spread throughout the enterprise, revealing how interconnected and interdependent the organization is, and promoting awareness of the end-to-end process of how data is being used across the enterprise.

A data quality program within a data governance framework is a cross-functional, enterprise-wide initiative requiring that everyone, regardless of their primary role or job function, accept a shared responsibility for preventing data quality lapses, and for responding appropriately to mitigate the associated business risks when issues do occur.

Data governance not only reveals the business value of the organization’s data but also reveals the communication and collaboration necessary to materialize that value as positive business impacts.

Data governance enables the organization to manage its data as a corporate asset, for which the entire enterprise has collective ownership and a shared responsibility, but also requires individual accountability for specific roles associated with the data, business process, and technology aspects of data quality.

Transparency and Throwing Stones at Glass Houses

Data governance provides the organization with a substantially improved view of how it is using its data, allowing data consumers to clearly see the data providers servicing their business needs, and allowing data providers to better align themselves with those business needs.

Data governance policies provide the framework for the communication and collaboration of business, data, and technical stakeholders, aligning data quality with business processes through relevant metrics, and establishing an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities.

Even prior to their implementation as executable processes, the definition, documentation, and publication of data governance policies is a significant deliverable because often they will help the organization catalog existing data sources, build a matrix of data usage and related business processes and technology, identify potential external reference sources to use for data enrichment, as well as help define the metrics that meaningfully measure data quality using business-relevant terminology.

The transparency provided by this combined analysis of the existing data, business, and technology landscape will provide a more comprehensive overview of the enterprise data management problems, which help the organization better evaluate possible solutions.

The transparency of data governance policies also provides an excellent basis for building strong business cases for continuous data quality improvements, and prioritizing critical business needs.

Additionally, data governance policies express the organization’s business needs in a way that often reveals existing data and technology resources capable of meeting those needs that may never have been previously considered.

Impact analysis can be performed to evaluate any existing data and technology re-use and redundancies, as well as whether investing in new technology or new external reference data will be necessary.

Data governance can help topple data silos by first turning them into glass houses through transparency, empowering the organization to start throwing stones at those glass houses that must be eliminated. And even if data silos persist, they remain glass houses, clearly illustrating whether or not they have the business-justified reasons for remaining—i.e., are the data silos servicing truly unique business needs?

The Role of Data Quality Monitoring in Data Governance

The role of data quality monitoring in data governance is not to measure the quality of data in isolation, but to measure the quality of data within the relative context of a specific business use, or in other words, to measure the ability of a data provider to service the needs of a specific data consumer.

A data governance policy defines a relative business context for data quality. This business context has business, data, and technical requirements that must be satisfied in order to make data fit for a specific business purpose. These requirements identify a data consumer and the policy establishes a service level agreement (SLA) with a data provider. The data governance policy specifies the business rules and data rules to be executed. The associated metrics provide summary and detail level measurements and monitor the ability of the data provider to comply with the policy. Compliance metrics also help identify, assess, and prioritize data quality issues for remediation, alerting the appropriate people accountable for the business process, data, and technology aspects of the data governance policy.

The compliance metrics associated with data governance policies align data quality with business insight, providing the historically missing link between data quality and business performance.