Ready--Get Sets--Go!

Microsoft's entry into the OLAP market has significantly lowered the cost of implementing analysis solutions. OLAP products are less expensive and more widely available. And because the products are more accessible, companies are deploying analysis solutions instead of or alongside traditional reporting solutions.

Many experts in the OLAP marketplace believe that analysis solutions will continue to displace reporting solutions—not that reporting solutions will ever go away, but many customers need more information than the traditional reports provide. I think of a report as a still photograph that captures a snapshot of your business in time. OLAP analysis is more like a movie. But it's a special movie—one that lets you guide the camera. For some business decision makers, a couple of good still photographs from different angles are enough to help them understand a particular business problem. But for a decision maker who is responsible for solving the problem or communicating the details of the problem to others, there's no substitute for being able to explore the problem with guided moving pictures.

OLAP is most useful for online, interactive exploration of data. Depending on the application, you can use OLAP preanalysis or postanalysis exploration. You use preanalysis to discover particular trends, relationships, and exceptions in your data. Postanalysis is for exploring a particular trend, relationship, or exception after its discovery. Named sets are useful for creating OLAP reports. Named sets help you to identify a natural grouping of business entities (dimension members) with a name; you can use the name repeatedly to create many reports or as a shortcut during interactive analysis.

Discover OLAP Reports

If OLAP is about interactive exploration and reporting is a snapshot of data, then what is an OLAP report? Although it sounds like an oxymoron, an OLAP report is a view of data that you use as a starting point for analysis—typically, postanalysis. Imagine that a business analyst discovers that profit margins for a particular product line have disintegrated during the past three quarters. The analyst might communicate this problem by distributing an OLAP report that shows a trend line (line chart) of revenue and cost of goods for that product line during the past three quarters. Decision makers who receive this report can then use it to further analyze the problem. They can drill down into the cost of goods to see the product's constituent parts; or, they can compare volume and product pricing against their competitors' volume and pricing to learn more about the problem and identify potential solutions.

The key to making good OLAP reports is creating report entities and values that match the business terminology. For example, in a particular report, the entities might need to be sales regions and the values might need to be year-to-year percent changes in sales. You'll probably need to use MDX formulas to derive some of these entities and values from the entities in the OLAP cube. That task doesn't sound too hard; using formulas to derive values is commonplace in reporting. But a good OLAP report has an additional requirement: The entities and values need to be browsable, meaning that when you drill down, drill up, pivot, or otherwise navigate through the data, the report entities still calculate meaningful values.

For a simple example of browsability, look at the following formula for year-to-year change in sales:

\[Time\].\[99Q1 Year to Year\] = ( 99Q1 -- 98Q1 ) / 98Q1

This formula for a new time dimension member gives you the percent change for first-quarter (Q1) sales from 1998 to 1999. You could create equivalent formulas for each of the other three quarters and use all four formulas to create a static OLAP report. Now, compare that technique to the following formula:

This formula defines a measure as year-to-year percent change in sales for whatever time dimension member you've selected. If you use this formula to create an OLAP report that contains all four quarters of 1999, decision makers can drill down into any of the quarters and see the year-to-year percent change for the months in that quarter. Thus, because the second formula works at any level in the time dimension, it's a browsable formula. The first formula isn't browsable because when you drill down in the time dimension, the formula no longer applies. You would have to create a separate formula for each member in the time dimension. (I explored the concept of browsability in detail in "The Zen of MDX," April 2000.)

Defining Sets

Calculated members are crucial in modeling business entities and values, but another, lesser-known feature—named sets—rivals the power of calculated members. Named sets are available with SQL Server 7.0 OLAP Services, but they become more powerful in SQL Server 2000 Analysis Services. In OLAP Services, you can define a set in one of two ways, both of which control the set's lifetime. One technique defines the set for the life of an MDX query; the second defines it for the life of a client session. Analysis Services adds a third technique that defines the set on the server so that the set is available to all clients. Sets that you define this way persist beyond sessions until you delete them.

You need to know a couple of things about sets. First, the scope of a set is a cube rather than a dimension, so a set can include members from more than one dimension. Second, technically speaking, a set consists of tuples. A tuple can be one dimension member or several dimension members, with no more than one member from each dimension. In most situations, a set consists of several members from one dimension; each member represents a tuple.

Here's an example of a set that exists for the lifetime of an MDX query:

When you use the WITH keyword to define a set in a query, as you do with this technique, you don't need to specify which cube the set belongs to because the FROM clause contains that information. However, the second technique of using the CREATE SET command to create a set requires a cube name:

You can execute this statement from a client session with OLAP Services, and the definition will persist until you terminate the session (exit your application) or until you use the DROP SET command. More advanced OLAP applications will remember these set definitions for you and recreate them each time you start the application. If you're going to use a set multiple times in a session, using CREATE SET is better than using the WITH clause because CREATE SET causes OLAP Services to cache the results of the set definition. In contrast, SQL Server must parse and evaluate the WITH clause each time the application passes the clause to OLAP Services.

The third technique you can use to create a set definition, available only in SQL Server 2000 Analysis Services, uses the Analysis Manager. From the cube editor (right-click a cube, and select Edit), you can drop down the Insert menu and select Named Set to bring up the Named Set Builder, which Figure 1 shows.

Having named sets on the server gives you power. You can create a collection of named sets that represent your business entities. You can then base OLAP reports for distribution on those definitions. Later, you can change the sets' definitions without rebuilding the cube. For example, if you have a named set called Last 4 Quarters, you can roll the definition forward each time you load a new quarter's data in the cube.

If you're developing a client application and want to take advantage of server-defined sets (named sets that you can define on the server through Analysis Manager), you need to know about the set schema rowset. ActiveX Data Object, Multi-Dimensional (ADO MD) exposes all the OLAP meta data through schema rowsets. For example, each list of cubes, dimensions, levels, and other information has a schema rowset. Each server-defined set has a schema rowset that includes columns for the set name, set description, expression defining the list of tuples, and the dimensions that the set references. The dimensions column, which Microsoft included in later beta releases of SQL Server 2000, is important for front-end applications that use sets. Front-end applications that formulate MDX queries must not include members from the same dimension on different axes (rows or columns) of the query. The dimensions information tells the front end which dimensions the set uses so that the application won't use the same dimensions again on another query axis.

In a few scenarios, sets consist of a simple list of members (as in the Best Beer example I used earlier). The most common of these scenarios is alternative member groupings. For example, if you have a customer dimension that rolls up geographically into cities, states, and countries, you might want to group your customers in an alternative way, such as into sales regions. If sales regions are a collection of states, you could rebuild the cube with a new level (regions) between the country and state levels. If sales regions didn't land evenly on state boundaries, you could create a second hierarchy for the customer dimension that rolls up as city, region, and country. Creating named sets for the sales regions is easier than either of these two solutions. However, using sets isn't as flexible as restructuring a dimension because writing queries that span sets is difficult. For example, writing a query that returns the top three regions is much easier if the regions are dimension members.

Sets with Expressions

The power of sets becomes apparent when you use MDX expressions to define sets. Following are a few examples of the kinds of information you can find by using sets in MDX expressions.

Most recent 3 months' sales with data in the cube. The first of these two sets, NonEmpty Months, returns all the month dimension members that have data. The second set, Recent 3 Months, returns the most recent 3 months from the first set.

Three least profitable cities. This set, Bottom 3 Cities, is the inverse of the three most profitable cities. Here, you use BottomCount instead of TopCount. This formula also filters out the cities that have a NULL profit before it calculates the BottomCount.

1998 variance report, with fourth quarter broken down by month. This formula demonstrates the use of tuples in a set. The set Recent Periods includes tuples with Time and Category together. I used the Union and Crossjoin functions to display actual and budget values for only the most recent 3 months. The previous 9 months display actual values only.

Get Set—Go

You need to know one last thing about sets: If you nest definitions of sets and calculated members, you must drop (delete) them in the order opposite of the order in which you created them. For example, if you create set A based on calculated member B, you must drop A before you drop B. If you try to drop a set or member that has dependencies on it, OLAP Services sends you an error message.

Two key elements make good OLAP reports. First, use the right business terminology. Second, build the report entities in such a way that people can use them not just while viewing the report but also while exploring the data surrounding the report. Calculated members and sets are essential for achieving these objectives; they're the building blocks that raise the business terminology in OLAP reports to a level that business analysts understand. And they ease maintenance and enable reuse of OLAP reports. (For answers to the September MDX Puzzle, see "September MDX Puzzle Solution Revealed," page 71. For the October puzzle, see "October MDX Puzzle," page 71.)

From the Blogs

The quest for the Golden Record to achieve a single, accurate and complete version of a customer record is worth the pursuit to attain survivorship. Record matching and consolidation are only the beginning. Melissa Data takes a new approach. Learn how to apply intelligent rules based on reference data to make smarter and better decisions for data cleansing....More

On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague....More

One of the biggest strengths of AlwaysOn Availability Groups is that they allow DBAs to address both high availability and disaster recovery concerns from a single set of tooling or interfaces. But, this doesn’t mean that you won’t still need backups....More