As mentioned in my post describing the major business processes that comprise a data governance function, the Define processes documents data definitions and business context associated with business terminology, taxonomies, relationships, as well as the policies, rules, standards, processes, and measurement strategy that must be defined to operationalize data governance efforts. This process runs parallel and is iterative to the Discover process stage as Discovery drives Definition, and Definition drives more targeted focus for Discovery.

The most relevant processes that comprise the Define stage include:

Business Glossary Creation. Collaborative process to capture and share full business context around critical data.In addition to the expected definitions of core data entities and attributes, context can also include rules, policies, reference data, free form annotation, links, and data owners, to name a few.

Ensures everyone is on the same page –data architects, modelers, developers, stewards and data consumers: business process owners as well as operational and strategic decision-makers.

Data Classification. In the world of structured data, this process is often referred to as metadata management – the act of capturing relevant supporting business and IT context about data in the form of metadata. For unstructured content, classification plays a critical role in tagging and categorizing content with appropriate context to deliver relevant search results.

Effective data classification delivers context and fast tracks information access to business users, allowing them to respond quickly to regulatory and compliance requirements, reduce costs and inefficiencies and gain improved insights into the business and customers. Trusted data classification benefits IT by reducing integration complexity, providing transparency often missing from black box/custom coding and ultimately improves collaboration, agility and time to value.

A data model without relationships is just a data inventory. Defining the expected relationships between master data, transactional data and reference data – and the applications and processes that depend on them – ultimately defines an organization’s business model. Data hierarchies (e.g., organizational, bill of materials, customer, product, sales, and channel) represent the foundation for an organization’s planning, decision-making, and customer engagement.

Business Rules Definition. The process of creating and documenting logical business requirements to build the rules and policies for data validation, cleansing, enrichment, matching, merging, masking, archiving, standardization, etc. These rules define both the automated machine-supported and manual human-centric processes.

When operationally implemented in the Apply process stage, business rules are the key to ensuring data is trusted, secure, and ultimately fit for business usage.

When defined, approved, evangelized and enforced appropriately, these policies have the power to evolve your corporate culture to one that manages data as an asset. Without these top-down, executive-driven policy mandates, it will be difficult to change past behavior impacting data quality and security.

Other Dependent Policies Alignment. It’s likely other efforts within an organization already document business- or IT-driven policies that set parameters on how enterprise data should be managed and used, but may not currently label it a “data governance” policy. These include policies for information security, data privacy, GRC, IT governance and others.

When a data governance effort is ready to scope and define what policies it must document and implement, it can start with the work that’s already been done and reconcile which policies should be owned by the data governance effort, which should simply be recognized and complied with, and which should be replaced or improved.

Key Performance Indicator (KPI) Definition. Processes that define service level agreements (SLAs), operational baselines metrics for DQ and policy compliance, return on investment (ROI) and total cost of ownership (TCO) measures, and other measures used to define the effectiveness and value delivered from the data governance efforts. (For more on measuring data governance effectiveness, see my post “Measuring Data Governance: Lies, Damned Lies, and ROI”)

Data governance efforts will not receive sponsorship, resources, funding or prioritization without a means to measure the value and effectiveness of the efforts.