Data Management – Terminologies & Definitions

As a third step in my Data Management article series – lets look at commonly used terminology in the domain. Now these are very standard definitions I am quoting from a standard available glossary. The next step – next article would be to explain the relevance and usage of these terminology in business world. E.g. How to look at data standardization in supplier data context or material data context – when it comes to optimizing your procurement processes. That’s next.

Data analysis : Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.

Data Governance : The exercise of decision-making and authority for data-related matters. The organizational bodies, rules, decision rights, and accountability of people and information systems as they perform information-related processes. Data Governance determines how an organization makes decisions — how we “decide how to decide.”

Data Governance Framework: A logical structure for organizing how we think about and communicate Data Governance concepts.

Data Modeling :The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entities the relationships among these data objects, and creates models that depict those relationships

Data Classification :The categorization of data, following various schema to support various business or technology goals.

Data Cleansing : Also referred to as data scrubbing. Data Cleansing is the process of detecting dirty data in a database (data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly) and then removing and/or correcting the data. Data cleansing is often necessary to bring consistency to different sets of data that have been merged from separate databases. Cleansing data involves consolidating data within a database by removing inconsistent data, removing duplicates and re-indexing existing data in order to achieve the most accurate and concise database. It can involve manual tasks or processes automated by special Data Quality tools. A particular type of Data Cleansing is Address Cleansing, in which street addresses are converted to a standard format as set forth by the U.S. Postal Service master database. For example, standard abbreviations are utilized, typos are corrected and ZIP codes are converted to 9-digit format. Address cleansing is usually done in conjunction with address matching, a process that validates an address against one of the 57 million addresses in the USPS database

Data Conversion: The manipulation of information sets from one format or structure to another. Data Conversion is often required when acquiring sets of information from outside sources

Data Mart: A repository of data gathered from operational data and other sources. The data may derive from an enterprise-wide database or data warehouse or from more specialized sources. The emphasis of a data mart is on meeting the expectations and needs of a particular group of users, so it may be designed to assist them in performing analysis and understanding the content

Data Mining: The analysis of data for relationships not previously discovered. Data Mining (DM) is also known as Knowledge Discovery. It is the process of automatically searching large volumes of data for patterns that may be used to predict future behavior

Data Profiling: The process of examining data in an existing database and collecting statistics and information about that data. The information collected may be used to collect metrics on data quality, assess whether metadata accurately describes the actual values in the source database, determine if existing data can be re-purposed, or understand risks and challenges in using the data

Data Quality: The practice of correcting, standardizing, and verifying data

Data Standardization: The transformation of data into consistent formats

Data Validation: As a broad concept, Data Validation refers to the confirmation of the reliability of data through a checking process. As a set of processes Data Validation refers to a systematic review of a data set to entity outliers or suspect values. More specifically, data validation refers to the systematic process of independently reviewing a body of analytical data against established criteria to provide assurance that the data are acceptable for their intended use. Within databases, Data Validation refers to procedures built into databases to define and check acceptable input for fields, and to accept or reject the data