The time has come to acknowledge that an organization can no longer treat data as a byproduct of their systems. In order to be an effective enterprise, your organization must learn how to optimize the creation and storage of data. This chapter will help you understand where data can go wrong, and how to fix problems when they occur.

This chapter is from the book

This chapter is from the book

"Virtually everything in business today is an undifferentiated
commodity, except how a company manages its information. How you manage
information determines whether you win or lose."

—Bill Gates

Everybody wants better quality of data. Some organizations hope to improve
data quality by moving data from legacy systems to enterprise resource planning
(ERP) and customer relationship management (CRM) packages. Other organizations
use data profiling or data cleansing tools to unearth dirty data, and then
cleanse it with an extract/transform/load (ETL) tool for data warehouse (DW)
applications. All of these technology-oriented data quality improvement efforts
are commendable—and definitely a step in the right direction. However,
technology solutions alone cannot eradicate the root causes of poor quality data
because poor quality data is not as much an IT problem as it is a business problem.

Other enterprise-wide disciplines must be developed, taught, implemented, and
enforced to improve data quality in a holistic, cross-organizational way.
Because data quality improvement is a process and not an event, the following
enterprise-wide disciplines should be phased in and improved upon over time:

A stronger personal involvement by management

High-level leadership for data quality

New incentives

New performance evaluation measures

Data quality enforcement policies

Data quality audits

Additional training for data owners and data stewards about their responsibilities

Data standardization rules

Metadata and data inventory management techniques

A common data-driven methodology

Current State of Data Quality

We repeatedly run into a common example of data quality problems when trying
to speak with a customer service representative (CSR) of a bank, credit card
company, or telephone company. An automated voice response system prompts you to
key in your account number before passing your call to a CSR. When a person
finally answers the call, you are asked to repeat your account number because
the system did not pass it along. Where did the keyed-in data go?

Another more serious data quality problem involves a report in 2003 about the
federal General Accounting Office (GAO) not being able to tell how many H-1B
visa holders worked in the U.S. The GAO was missing key data and its systems
were not integrated. This presented a major challenge to the Department of
Homeland Security, which tried to track all visa holders in the U.S.

According to Gartner, Inc., Fortune 1000 enterprises may lose more money in
operational inefficiency due to data quality issues than they spend on data
warehouse and CRM initiatives. In 2003, the Data Warehouse Institute (TDWI)
estimated that data quality problems cost U.S. businesses $600 billion each year.

At an Information Quality Conference in 2002, a telecom company revealed that
it recovered over $100 million in "scrap and rework" costs, a bank
claimed to have recovered $60 million, and a government agency recovered $28.8
million on an initial investment of $3.75 million. Clearly, organizations and
government are slowly realizing that data quality is not optional.

Many companies realize that they did not pay sufficient attention to data
while developing systems during the last few decades. While delivery schedules
have been shrinking, project scopes have been increasing, and companies have
been struggling to implement applications in a timeframe that is acceptable to
their business community. Because a day has only 24 hours, something has to
give, and what usually gives is quality, especially data quality.