The answer is that these storage areas do have data in them, and you should
have created their structures using the design information from the
requirements gathering phase.
But the data elements I'm talking about now are those that you will use in
a Data Warehouse, or the collection of Data Warehouses that comprise an
Enterprise Data Warehouse. The processes you use to design these elements are
different than those used in a transactional reporting system.

It may be helpful to describe exactly what the Data Warehouse should store.
There are two schools of thought on the purpose for the Data Warehouse. The
first is that it should be exactly what it sounds like – a single place to
house all of the data a company has.

The issue with this view is what to do with the system when you're done.
In the
last tutorial
I explained that you will need to combine data from various systems into the
Data Warehouse to be relevant. In a system that tracks clothing sales, your data
might have the value "Jacket, Men's" in one source and
"Men's Coats" in another. Often it's not a simple matter of
making the values equal, because in certain cases they aren't. The end
result of storing both descriptions is that your users would have to manually
tally each set to come up with a grand total, assuming they knew the two were
the same. That's just not possible, even though you've met the goal of
storing all the company's data in one place.

So in the first system, you have to begin the arduous task of mapping or
transforming the detailed data to come up with a single set of data that is
relevant to the users. Even then, storing so much detail makes the system very
large and slow to process queries. Not only that, this is a redundant use of
data. You already have all that detail out in the Data Marts and the Operational
Data Store. If you need a detailed report, you can always get it from there.

The second definition of a Data Warehouse is exemplified in the goal for a
Business Intelligence landscape that I've been repeating in the previous
tutorials. What we're trying to create is a set of consolidated,
aggregated, strategic data presented in an analytical format to upper
management. The Data Warehouse part of that system provides the
aggregated, strategic and analytical parts of the
definition. The sources, ODS and Data Marts are how we get the data staged in a
meaningful way.

So that is how we'll proceed with the data element design process for
the Data Warehouse. The outline for the process is to find out what aggregated
strategic data we need, and the best way to format it. From our definition, we
also have to think like upper-level management. What do they really want to see?
Do they need to know how many paperclips someone in a regional office bought, or
whether that office is adhering to the business supplies budget? Do they care
about what supplier we used for parts, or that we are getting a 10 per cent
reduction in costs for the same quality? And how are we measuring quality?

What this means is that we are really after only two types of data: Things
that we want to measure, and the measurements. I'll explain the star
schema that uses these concepts in the terms dimensions (for the
things we want to measure) and facts (for the numbers showing the
measurements) in the next tutorial, but for now, we need to concentrate on
getting this data from our Data Marts.

So to begin our design we first ask the managers the question, what do we
want to measure? I've given you a couple of examples already, and you can
probably think of more for your own organization. For a pharmaceutical firm you
might find dimensions such as chemicals, trials, marketing, efficacy, lifespan,
sales and so forth. For a marketing company dimensions might include region,
time, advertising method and the like. In all cases you care about time, since
strategic analysis is rarely useful without it.

For the data involving the facts, examples include the number of items sold,
moved, changed or produced. Other numbers might be customers contacted, amount
donated, total fees charged and so forth. Any numerical aggregate that applies
to the dimensions I mentioned a moment ago are applicable to the facts data.

Now that you know the type of data you're after, how do you find out
what it is at your organization? You can use the same process as you did when
you perform any requirements analysis, with a slight twist. In most requirements
gathering exercise, the form is quite rigid. You ask the users what kind of
reports they want, how they want them formatted and so on. You then decompose
those requirements into tables and views. This is the process you followed for
the ODS and perhaps even the Data Mart.

In the Data Warehouse, however, you're going to provide a tool to upper
management that is more freeform in nature. You may still produce reports, but
the main power of this type of system is in the ability to ask questions and
flip them around, on the fly. Interview your upper-level management and ask them
what they want to know. What are the reports that they ask their staff
to produce every month? Once they get those reports, what do they do with them?
You'll find that most of the time they are culling through the reports
trying to ferret out one or two answers, so that they can base their strategy on
sound numbers.

Rather than manually collating this data, wouldn't it be better for
managers to ask the question they are really after, and be able to look at the
number right away? That's what you need to design as the data element.

Once you have those dimensions and facts, you'll need to determine where
they are buried in all the data sources you have. If you designed your ODS and
Data Marts properly, you should be able to get the measurements from there,
after summing and grouping the data.

But what if the elements aren't easily discerned from the data you
already have? What if the manager is asking for something that you don't
store directly? In the
last tutorial,
I mentioned that when you begin to combine and transform data, you are assigning
meaning. That meaning should be defined by the business, not the technical
staff. This is the crux of the design effort. You'll need to involve
managers from the level just below the top and then work your way down to the
line managers. Along the way, each will give you a more detailed description of
the measures the top managers are looking for. From there, you can build your
model.

While this sounds simple, there are a few problems you'll encounter
along the way. The first is that aggregation and combination problem we
discussed earlier. In reality, the primary issue there is one of definition, and
you can overcome that by asking the managers to set a business owner for the
element you're designing. Only that manager can state what the element
means, which will give you the description of how to aggregate it.

The second problem comes when top management isn't aware that regions,
locations or plants are doing things differently than other places. As the
"data detective" you'll uncover all manner of unseemliness, and
the only thing you can do in that case is have your Business Analyst diagram the
business processes used at each location to vet the information from a business
perspective. Let the managers sort that out; that's what they get paid for.
Your job is to report the data, set up the elements in a cohesive fashion.
Present the business process analysis to the managers, and explain the
difficulty in combining disparate processes into a single meaning. Ask them how
to proceed, get a Business Owner to sign off, and design the element with that
definition.

In the next tutorial I'll show you a little more about dimensions and
facts, and how you can design your Data Warehouse.