Concepts in Information Management – blog by Ronald Fuller

OLAP cubes are among the most powerful resources available for business intelligence and analytics. Here's what the product manager for Microsoft Analysis Services said about OLAP back in 2002:

OLAP multidimensional databases combine incredible performance with unsurpassed analytical power and, in my opinion, are the foundation of the BI platform.

The multidimensional data model is vastly superior to the relational data model when it comes to the expressiveness of analytical operations. The ability to have random access to any point in space, both detailed data and aggregates, makes it a breeze to express calculations that would otherwise take pages of SQL statements using a relational database.

This remains true about OLAP today, and it is likely to remain true for a long time. Because while technologies change, the underlying concepts remain the same. But OLAP seems to be falling out of favor recently. Why would that be?

The top Google search result for OLAP proclaims that "OLAP Is Becoming Obsolete". It is actually a paid advertisement that invites users to download a paper titled "Selecting the Right Database Technology for Your Business Analytics Project." Interestingly, this paper says very little about OLAP and does not even use the word 'obsolete' anywhere in the paper. The only negative thing it says about OLAP is the single sentence below:

Storing the results of these pre-calculations takes exponentially more storage resources than the actual raw data does, limiting the size of raw data that can make up a cube to gigabyte scale.

But that is actually false, because the most popular storage format for OLAP cubes are multidimensional structures which require far LESS storage than the original source data. Production OLAP cubes have exceeded more than 20 terabytes of raw data, and their continued growth is limited only by computing power, memory and storage.

In the real world there is no reason to believe that OLAP is becoming obsolete. But there are even more signs that people think it is, notwithstanding the facts. The statement below is from the Microsoft website:

For new projects, we generally recommend tabular models. (rather than OLAP cubes)

Microsoft has been a leader in promoting OLAP, so why are they now downplaying it and steering people towards other approaches? The reason they give is that "tabular models are faster to design, test, and deploy..." But that is true only when the business requirements and data are very simple. When the requirements or the data become more complex, even a little bit, the complexity of developing a tabular model explodes and quickly becomes far, far more complex than a multidimensional OLAP project with exactly the same requirements and data. And the end products are less flexible and less able to adapt to changing needs.

So what gives? Why would Microsoft make a claim that relies on an implausible assumption of simplicity which does not exist in most enterprise environments? And why would another company claim that OLAP is becoming obsolete in a paid advertisement with only a single dubious claim to back it up?

Here's what I think is going on: OLAP is most valuable when the source data is clean and well integrated. And clean, well-integrated data is difficult to achieve for many organizations. Claiming that OLAP is obsolete is a marketing ploy to promote products that work well with poor-quality data and data that is not well integrated.

There is a legitimate need for such products, and they have a huge market potential – but vendors should make that case and sell those products without claiming that OLAP is becoming obsolete, because it is not. For organizations with clean, well-integrated data, OLAP is far and away the best choice for business intelligence and analytical applications.

The idea that managing information involves cost-benefit tradeoffs is not new, but unfortunately those decisions are treated as engineering issues focuing on implementation issues like performance and storage, while business utility is ignored. By business utility I mean the capacity of an information resource to meet expressed and unexpressed needs and adapt to changing requirements.

The statements below were made by Don Chamberlin who was one a co-designers of SQL, the world's most widely-used database query language. He is describing decisions made in the mid-1970s to give users more flexibility with cost-benefit tradeoffs:

“When the original SQL designers decided to allow users the options of handling nulls and duplicates, they viewed these features as minor conveniences, not as major departures from orthodoxy, taken at the risk of excommunication."

"SQL trusts the database designer to decide whether the costs ... are justified. To impose these costs on all applications ... seems a little heavy-handed, and seemed even more so in 1975 given the costs of storage and processing at that time."

Even today (in 2015) Dr. Chamberlin and his former colleagues continue to bear harsh criticism for those decisions. This is completely unfair because they did not force anyone to do or ignore anything. Rather, they gave users the freedom to make their own cost-benefit decisions.

But unfortunately those decisions, and others of equal importance, do not fall to the stakeholders who pay the price of poor choices or who are in any position make good ones. Instead they are decided behind the scenes by technical professionals, while information owners remain unaware that such tradeoff opportunities even exist. The result has become that information systems are too expensive, do not adapt well to changing requirements, and do not meet the expectations of their owners.

Information owners do not need to become technical experts to make good decisions about organizing information, and it is absolutely essential that business experts, rather than architects or engineers, have final decision authority. The role of IT departments in these matters should be strictly limited to cost-benefit advisement.

"the determination of appropriate normal forms frustrates many systems analysts ... a trade-off exists among system performance, storage, and costs."

But the author omits a critical factor: the primary overall consideration in any cost-benefit equation should always be business utility. Technology professionals cannot be expected to reocgnize the full range of implications for business utility in any given tradeoff scenario. They must instead rely on a set of requirements specifically spelled-out in advance by the business customer. That is why scope creep is such a huge problem in most IT projects, and why most information systems cannot adapt well to changing requirements.

It would probably be impossible for a business owner to make a complete list of every possible use-case for a given resource. But a business owner can easily determine whether a resource can satisfy their foreseeable needs even before any requirement has been expressed. What an owner considers to be foreseeable can change over time, but it will always change more slowly than the perceived requirements when technology professionals have to guess or fill-in the blanks. New technologies can expand the range of use cases for information, but for information to be usefull it must still be organized in a way that allows the desired use – technology cannot change that.

Dr. Thomas Haigh tells a fascinating story in The Business History Review that explains why information management is seen as a technical discipline rather than a management one: The systems men were members of the Systems and Procedures Association during the 1950s and 60s. The purpose of this association was not to promote research or continuing education, but rather to seek increased status and management authority for its members within their employing organizations. They offered an implicit bargain to corporate executives:

You put us in charge and we’ll deliver to you more power over your firms than you’ve ever dreamed of

Executives for the most part were not convinced. They understood that technical expertise does not translate into management ability. By the 1970’s the Systems and Procedures Association was defunct, and the various roles of the systems men merged into corporate IT departments. But they left a stubborn cultural legacy that persists still today, the idea that managing information is a job for architects and engineers rather than business experts:

For better or worse, to speak of something as an information system continues to imply that it should be engineered by an information specialist and built using information technology

It seems unlikely that the idea of information can ever truly be separated from these roots: it is just too historically and culturally charged.

The cost-benefit value of any normal form can only be determined by an information owner

Only a subject matter expert (SME) can identify the normal form of any set of tables

As an example of reason 1, the table below contains an error which you can see in the last row: an address in Grand Junction, Tennessee has the same ZIP code as the address above it in Grand Junction, Colorado. This is clearly a mistake, but without investigating further we cannot know whether the error is with the State or the Zip.

Breaking City and State off into two separate tables as shown here eliminates any potential for this kind of inconsistency, however the advantage might come at a cost because performance can suffer when applications and reports have to query 2 joined tables instead of 1.

So what is more important, better performance or lower risk of error? Who should decide? Only information owners have the appropriate perspective and incentive to make that kind of decision wisely (see Sophotaxis). Architects and engineers can provide valuable cost-benefit advisement, but final decisions should be made by owners.

The example above is pretty simple, but large business databases can have thousands of similar cost-benefit scenarios that can be far more complex. The more complex the situation, the greater the need for ownership perspective and expertise in the specific business issues at hand.

For an example of reason 2, consider a simple table containing only one column with phone numbers. Many engineers would agree that this table is in first normal form. In most situations it would work fine just as it is, or as a column in a larger table. But for a company such as a telephone service provider, where users would want to group or sort phone numbers by area code or exchange, this table would not be in first normal form. In other words, it would not satisfy even the minimum theoretical standard for use in a modern database system. Instead, in this particular case, the numbers should be broken out into each meaningful component as shown in the lower image.

This shows that even in very simple cases the normal form of a table can only be determined by someone who understands the full-range of potential uses for the information. Business databases are filled with situations like this, but unfortunately most decisions about organizing information are made by technologists rather than business experts. That is why businesses struggle with databases that cannot adapt well to changing requirements.

Information should be organized in a way that makes the most sense to its owner, not according to some predetermined normal form which cannot even be reliably achieved. The normal forms are guidelines to improve performance and protect consistency. But any decision made for performance can potentially degrade the usefulness of the information, and the range of decisions needed to protect consistency can only be determined by a business expert.

A widely held assumption in the academic field of business information systems is that technology and behavior are inseparable, but this is false. One widely-cited source that promotes this idea is Design Science in Information Systems Research, where the authors state:

"Technology and behavior are not dichotomous in an information system. They are inseparable (Lee 2000)"

"The problem of 'technology vs. behavior' is a dilemma in the following way: If we take a technology approach to IS, then how would we be different from engineering and computer science? But if we take a behavioral approach to IS, then how would we be doing research that any behavioral field could not already do?"

"Just as a physician cannot design a remedy for a patient’s body and emotions separately, and just as an architect cannot design “form and function” independently, the IS field similarly does not have the option of designing the technology subsystem alone or the behavioral subsystem alone – we have to design both together."

My response to the first paragraph is this: Deciding how business information is organized has nothing to do with engineering and computer science and everything to do with management priorities and objectives. The role of technology professionals should be limited to implementation and cost-benefit advisement. This should be an easy distinction to make, but the conventional wisdom is clouded by historical and cultural biases. For example, technology workers in the 1950's persuaded business leaders to think of information management as an engineering discipline instead of a management perrogative. This is discussed further in The Legacy of the Systems Men. There are plenty of purely 'behavior' related issues that the academic IS community could focus on which have nothing to do with technology, such as the problem of poorly-organized information, which I describe further in Sophotaxis.

My response to the second paragraph is that it is false. Decisions about how information is organized can and should be made before any automated system is created. The design of the automated system might raise cost-benefit issues that will impact the information decisions, but those cost-benefit trade-offs cannot be well-understood untill decision makers know what the trade-offs will entail, and that is possible only when the information is defined first (see Cost-Benefit Value is Ignored).

When business-oriented priorities are subordinated to technology-oriented factors without a deliberate cost-benefit analysis, operational and analytical capabilities suffer. Organizing information is an act of business administration, which is a role IT departments are not intended for. The role of IT should be limited to implementation and cost-benefit advisement.

The following statement is evidence of how deeply the issue of information management is misunderstood. It has been repeated in various forms more than ten thousand times by authors at universities, technology companies and government organizations:

"With the proliferation of information technology starting in the 1970s, the job of information management had taken a new light, and also began to include the field of data maintenance. No longer was information management a simple job that could be performed by almost anyone. An understanding of the technology involved, and the theory behind it became necessary. As information storage shifted to electronic means, this became more and more difficult."

This statement is false; information management has never been a simple job that could be performed by almost anyone – at least not since the birth of modern accounting in the late 13th century. The techniques used in manual accounting systems rely on a set of cross-referenced and interconnected books which use structures and rules that are precisely consistent with the theory behind modern relational database systems, which I explain further in The Ancient Secrets of Information Management.

Accounting is the discipline of managing information about money. The same logic-based techniques can be used to manage other kinds of information as well, but manual accounting is so painstaking and time consuming that it is easy to understand why early merchants only made the effort with the kind of information they considered to be most important. When relational database software was introduced in the 1970s, business leaders and scholars mistakenly assumed that it had created an entirely new computer-based method to organize information. But in reality it created a new computer-based way to automate old logic-based techniques that had been used with success for 700 years. If this were understood the shift to electronic automation would have made things far easier rather than more difficult.

Early merchants certainly did not look to the craftsmen who made their tools to also define their accounts. But that is exactly what modern organizations do. It makes no difference that the old tools were made from paper, feathers and dye, and the new ones from computers, software, and networks. The old tools served exactly the same purpose as the new with respect to the organization of information. The new tools serve an additional purpose of automating processes and workflows, but that is no reason to believe that the engineers who create the tools should also be responsible to organize information. And there are important reasons to understand why they should not.

For most of modern history formal logic was a core requirement for every educated person. In fact, teaching logic was one of the main reasons universities were created in the first place. Logic was de-emphasized as a required subject with the education reforms of the early twentieth century. Computers were developed in the latter part. But computers do not decrease the value of logic education, they increase it.

Logic could, and should, be used as a common language between business experts and IT professionals to communicate requirements for information systems. That is not possible without logic education, because knowledge of formal logic does not come naturally or even from experience, it must be taught. It has become possible to earn a degree in almost any subject, including an MBA or a PhD in computer science without taking a single introductory course in formal logic.

Without training in formal logic, business and IT professionals are unable to communicate with the absolute precision that is only possible with logic.