Today's next-generation data warehouses are being built with a clear goal: to maximize the power of Customer Relationship Management. To make CRM-focused data warehousing work, you need new techniques, and new methodologies. In this book, Dr. Chris Todmanone of the world's leading data warehouse consultantsdelivers the first start-to-finish methodology for defining, designing, and implementing CRM-focused data warehouses. Todman covers all this, and more:

Critical design challenges unique to CRM-focused data warehousing

A new look at data warehouse conceptual models, logical models, and physical implementation

The crucial implications of time in data warehouse modeling and querying

Project management: deliverables, assumptions, risks, and team-buildingincluding a full breakdown of work

If you want to leverage the full power of your CRM system, you need a data warehouse designed for the purpose. One book shows you exactly how to build one: Designing Data Warehouses by Dr. Chris Todman.

Logical Modeling. The Implementation of Retrospection. The Use of the Time Dimension. Logical Schema. Performance Considerations. Choosing a Solution. Frequency of Changed Data Capture. Constraints. Evaluation and Summary of the Logical Model.

Preface

Preface

The main subject of this book is data warehousing. A data warehouse is a special kind of database that, in recent years, has attracted a great deal of interest in the information technology industry. Quite a few books have been published about data warehousing generally, but very few have focused on the design of data warehouses. There are some notable exceptions, and these will be cited in this book, which concentrates, principally, on the design aspects of data warehousing.

Data warehousing is all about making information available. No one doubts the value of information, and everyone agrees that most organizations have a potential "Aladdin's Cave" of information that is locked away within their operational systems. A data warehouse can be the key that opens the door to this information.

There is strong evidence to suggest that our early foray in the field of data warehousing, what I refer to as first-generation data warehouses, has not been entirely successful. As is often the case with new ideas, especially in the information technology (IT) industry, the IT practitioners were quick to spot the potential, and they tried hard to secure the competitive advantage for their organizations that the data warehouse promised. In doing so I believe two points were overlooked. The first point is that, at first sight, a data warehouse can appear to be quite a simple application. In reality it is anything but simple. Quite apart from the basic issue of sheer scale (data warehouse databases are amongst the largest on earth) and the consequent performance difficulties presented by this, the data structures are inherently more complex than the early pioneers of these systems realized. As a result, there was a tendency to over-simplify the design so that, although the database was simple to understand and use, many important questions could not be asked.

The second point is that data warehouses are unlike other operational systems in that it is not possible to define the requirements precisely. This is at odds with conventional systems where it is the specification of requirements that drives the whole development lifecycle. Our approach to systems design is still, largely, founded on a thorough understanding of requirements-the "hard" systems approach. In data warehousing we often don't know what the problems are that we are trying to solve. Part of the role of the data warehouse should be to help organizations to understand what their problems are.

Ultimately it comes down to design and, again, there are two main points to consider. The first concerns the data warehouse itself. Just how do we ensure that the data structures will enable us to ask the difficult questions? Secondly, the hard systems approach has been shown to be too restrictive and a softer technique is required. So not only do we need to improve our design of data warehouses, we also need to improve the way in which we approach the design.

It is in response to these two needs that this book has been written.

First generation data warehouses

Historically, the first-generation data warehouses were built on certain principles that were laid down by gurus in the industry. This author recognizes two great pioneers in data warehousing: Bill Inmon and Ralph Kimball. These two chaps, in my view, have done more to advance the development of data warehousing than any others. Although many claim to have been "doing data warehousing long before it was ever called data warehousing," Inmon and Kimball can realistically claim to be the founders because they alone laid down the definitions and design principles that most practitioners are aware of today. Even if their guidelines are not followed precisely, it is still common to refer to Inmon's definition of a data warehouse and Kimball's rules on slowly changing dimensions.

Chapter 2 of this book is an introduction to data warehousing. In some respects it should be regarded as a scene-setting chapter, as it introduces data warehouses from first principles by describing the following:

Need for decision support

How data warehouses can help

Differences between operational systems and data warehouses

Dimensional models

Main components of a data warehouse

Chapter 2 lays the foundation for the evolution to the second-generation data warehouses.

Before the introduction to data warehousing, we take a look at the business issues in a kind of rough guide to customer relationship management (CRM). Data warehousing has been waiting for CRM to appear. Without it, data warehouses were still popular but, very often, the popularity was as much in the IT domain as anywhere else. The IT management was quick to see the potential of data warehouses, but the business justification was not always the main driver and this has led to the failure of some data warehouse projects. There was often a reluctance on the part of business executives to sponsor these large and expensive database development projects. Those that were sponsored by IT just didn't hit the spot. The advent of CRM changed all that. CRM cannot be practiced in business without a major source of information, which, of course, is the data warehouse raison d'etre. Interest in data warehousing has been revitalized, and this time it is the business people who are firmly in the driving seat.

Having introduced the concept of CRM and described its main components, we explore, with the benefit of hindsight, the flaws in the approach to designing first-generation data warehouses and will propose a method for the next generation. We start by examining some of the design issues and pick our way carefully through the more sensitive areas in which the debate has smoldered, if not raged a little, over the past several years. One of the fundamental issues surrounds the representation of time in our design. There has been very little real support for this, which is a shame, since data warehouses are true temporal applications that have become pervasive and ubiquitous in all kinds of businesses.

In formulating a solution, we reintroduce, from the mists of time, the old conceptual, logical, and physical approach to building data warehouses. There are good reasons why we should do this and, along the way, these reasons are aired.

We have a short chapter on the business justification. The message is clear. If you cannot justify the development of the data warehouse, then don't build it. No one will thank us for designing and developing a beautifully engineered, high-performing system if, ultimately, it cannot pay for itself within an appropriate time. Many data warehouses can justify themselves several times over, but some cannot. We do not want to add to the list of failed projects. Ultimately, no one benefits from this and we should be quite rigorous in the justification process.

Project management is a crucial part of a data warehouse development. The normal approach to project management doesn't work. There are many seasoned, top-drawer project managers who, in the beginning, are very uncomfortable with data warehouse projects. The uncertainty of the deliverables and the imprecise nature of the acceptance criteria send them howling for the safety net of the famous system specification. It is hoped that the chapter on project management will provide some guidance.

People who know me think I have a bit of a "down" on software products and if I'm honest I suppose I do. I get a little irritated when the same old query tools get dusted off and relaunched as each new thing comes along as though they are new products. Once upon a time a query tool was a query tool. Now it's a data mining product, a segmentation product, and a CRM product as well. OK, these vendors have to make a living but, as professional consultants, we have to protect our customers, particularly the gullible ones, from some of these vendors. Some of the products do add value and some, while being astronomically expensive, don't add much value at all. The chapter on software products sheds some light on the types of tools that are available, what they're good at, what they're not good at, and what the vendors won't tell you if you don't ask.

Who Should Read This Book

Although there is a significant amount of technical material in the book, the potential audience is quite wide:

For anyone wishing to learn the principles of data warehousing, Chapter 2 has been adapted from undergraduate course material. It explains, in simple terms:

What data warehouses are

How they are used

The main components

The data warehouse "jargon"

There is also a description of some of the pitfalls and problems faced in the building of data warehouses.

For consultants, the book contains a method for ensuring that the business objectives will be met. The method is a top-down approach using proven workshop techniques. There is also a chapter devoted to assisting in the building of the business justification.

For developers of data warehouses, the book contains a massive amount of material about the design, especially in the area of the data model, the treatment of time, and the conceptual, logical, and physical layers of development. The book contains a complete methodology that provides assistance at all levels in the development. The focus is on the creation of a customer-centric model that is ideal for supporting the complex requirements of customer relationship management.

For project managers there is an entire chapter that provides guidelines on the approach together with: