Metathink: An Enterprise-Wide Single Version of the Truth, and Beyond

What Does it Require?

Enabling an enterprise-wide single version of the truth is an enterprise infrastructural matter. It requires an infrastructure composed of two major related components:

Enterprise data warehouse. One of the functionalities of the enterprise data warehouse is to provide an enterprise-wide data integration. It integrates and consolidates operational data generated by diverse operational applications in and outside the enterprise, not only structurally but also semantically. The data in the enterprise data warehouse builds the material foundation for enabling an enterprise-wide single version of the truth. From an evolutionary perspective, the enterprise data warehouse is usually the result of one or more initiatives of consolidating departmental data warehouses/independent data marts existing in the enterprise. The challenges for constructing an enterprise data warehouse are primarily technical.

Enterprise business glossary. The enterprise business glossary is an enterprise-wide authoritative meaning provider. Not only does it provide uniquely and unambiguously defined meaning to the data items stored in the enterprise data warehouse, but also it defines the meaning of derived business terms like complex key performance indicators by means, for instance, of calculation formulas based on more granular semantically defined data items. Furthermore, it authoritatively defines the relationships among all these terms. From an evolutionary point of view, the enterprise business glossary is usually the result of consolidating various departmental business glossaries existing in the enterprise. The challenges for establishing an enterprise business glossary are quite often of political nature.

What is Essential?

For the realization of the above two components, metadata is essential. Abstractly, we consider metadata in the context as such that confers meaning to the actual data as bits and bytes, and/or affects the behavior or the state of the systems. In other words, without metadata there exist only bits and bytes in a computer or elsewhere that no person, no software and no hardware can understand. In our context, there are two classes of metadata:

Operative metadata.Operative metadata defines operative/system objects and the relationships among them in the system, and determines the system behavior or state therewith. It is, thus, indispensable for the well-functioning of the system in consideration. Operative metadata is almost always stored in a "structured" form in the system for performance reasons. Examples of operative metadata are the column list of a table, or the column mappings from a source table to a target table. The former is usually stored in the system catalog and maintained by the system automatically, whereas the latter is stored in the user/tool-defined catalog and maintained by the system constructors manually. For business users, operative metadata is such that is stored in the systems somewhere, which they do not always understand and are, therefore, not interested in. Quite frequently, without appropriate tools, operative metadata cannot be read properly. In short, it is generally not easily approachable. This is mostly an artificial "feature" that the tool vendors introduced in order to prevent metadata exchanges. As a matter of fact, a substantial portion of activities in constructing data warehouses, regardless of whether or not some tools or aids are employed and which approaches are applied, are related with treatment of the operative metadata.

Descriptive metadata. Descriptive metadata describes subjects in order to define their meaning. In most cases, it is held in appropriately "unstructured" forms such as prosaic text, links, videos, etc. for expressiveness reasons. The unavailability of descriptive metadata does not directly affect the functioning of the involved systems, although it is vital for effective understanding and communications. Just for these purposes (ensuring effective understanding and communications) descriptive metadata is generally easily approachable, i.e., we can get it without employing special tools or aids. Obviously, an enterprise business glossary is nothing but an enterprise-wide authoritative collection of descriptive metadata.

It is worth pointing out that operative metadata mentioned above usually also contains certain describing information and can be exploited as descriptive metadata, although this is only a side effect.

If we regard a person as a special system, then the descriptive metadata, which determines the meaning of normal data, affects the behavior or the state of such a system as well, just as the operative metadata does against the normal systems. This affected system behavior or state is actually the ultimate meaning of the data defined by the metadata. In short, metadata affects system behavior or state, directly (with operative metadata) or indirectly (with descriptive metadata via normal data), immediately (e.g., with operative metadata) or after some time (e.g., with descriptive metadata).

A Closed-Loop of Single Versions

With the enterprise data warehouse, we get a single version of the reality represented by the integrated data stored there. With the enterprise business glossary, we get a single version of the semantic definition of the terms used to denote the data stored in the enterprise data warehouse in order to make the information carried by this data perceivable. Information gained this way forms a single version of the "truth." Unfortunately, by no means does a single version of the truth guarantee a single version of the interpretation. Multiple versions of the interpretation of the same truth, quite frequently induced by political factors, are one of the major causes for ineffectiveness in general. A single version of the interpretation of the truth should lead to a single version of the decision that equals, in turn, a single version of the action or no action. Recording such an action results in new data representing the effect of the action on the reality. This effect of the action is the ultimate meaning of the information carried by the data mentioned at the beginning of this section. This way, we close the loop. Note that if the data never leads to an action, it actually can be considered meaningless.

Summary

An enterprise-wide single version of the truth can be enabled by providing an infrastructure composed of two related components:

The enterprise business glossary provides uniquely and unambiguously defined meaning to the data stored in the enterprise data warehouse.

For the construction of the enterprise data warehouse, dealing with operative metadata is essential.

The enterprise business glossary is nothing but an enterprise-wide authoritative collection of descriptive metadata.

Data represents reality. Meaning transforms data into information, alias truth. Without interpretation, there is no decision. Decisions eventually equal action or no action. The effect of the action, i.e., the changed reality, can be recorded as data again.

Metadata is data about data. Meta-x is x about x. Metathink is think about think. Think about think is philosophizing.

Bin Jiang, Ph.D.Dr. Bin Jiang received his master’s degree in Computer Science from the University of Dortmund / Germany in 1986. In 1992, he received his doctorate in Computer Science from ETH Zurich / Switzerland. During the research period, two of his publications in the field of database management systems were awarded as the best student papers at the IEEE Conference on Data Engineering in 1990 and 1992.

Afterward, he worked for several major Swiss banks, insurance companies, retailers, and with one of the largest international data warehousing consulting firms as a system engineer, software developer, and application analyst in the early years, and then as a senior data warehouse consultant and architect for almost twenty years.