MDX Guide for SQL Folks: Part II - Hierarchies and Functions

Introduction

This is a continuation of the series to introduce some core MDX and cube concepts by first drawing on the similarity and differences between MDX and SQL.

In Part I we looked at how tables translate directly into to cube dimensions. We also looked at how MDX is different or similar to SQL, by looking at both in the context of the basic SQL SELECT statement. For example we saw why all three queries in listing1 below yield the same result. All three queries display the Sales Territory Group(s) with more than 10 million in bike sales.

Please refer to Part I for the prerequisites needed to execute all SQL and MDX queries in this series.

By comparing the SQL and MDX queries in Listing 1, let's recap the similarities and differences that we established in Part I. We learned that:

In SQL the SELECT list is derived from tables and the FROM clause is also directly on tables. On the other hand in MDX the SELECT list derived from dimensions but unlike SQL the FROM clause is always on the cube not on dimensions.

By default SQL display results on COLUMNS and ROWS. MDX on the other hand can display results on axes from 0, 1, and 2 …up to 128. As a result MDX requires that you always explicitly specify a display axis for all elements in the SELECT list.

In SQL if you select from two or more tables you have to explicitly join the tables with a “JOIN” word. On the other hand in MDX if you require attributes from two or more dimensions you don’t need to explicitly join them in the statement with the JOIN word, the cube does all the joining for you.

MDX by default always implicitly aggregates the attributes in the Measures dimension and implicitly group the result by any other dimensional attribute in the SELECT list. The aggregation function applied is a default defined at design time. On the other hand SQL will return detail records unless you explicitly tell it to aggregate attributes by including the aggregate function and the Group By clause in the statement as shown in Listing1.

A “WHERE” clause acts as a filter in SQL but as a Slicer dimension in MDX. This means that, in MDX, the predicate can either be used with the WHERE clause as in MDX1 or in the SELECT list without the WHERE clause as in MDX2 as shown in Listing1 above.

Refer to Part 1 for more on the similarities and differences between MDX and SQL and also how to navigate the cube structure with MDX queries. In the next section we are going to see how to use brackets to format the MDX SELECT list and also look at a few nomenclature.

MDX Brackets ( { }, ( ), [ ] ) and some Cube Concepts

Because MDX does not require joins when you are querying more than one dimension there is often the edge to throw anything into the MDX SELECT list and on any axis and expect MDX to be able to handle it. I will advise you to think of MDX result sets as if you are generating reports and thus whatever you are trying to display on every axis and how you combine them should make logical sense. This means that you may have to move attributes around and on different axes till you get the result that you are looking for, and that’s OK.

MDX uses brackets to ensure that you generate the right report. In the rest of this section I will illustrate the concept of dimensionality and cube slicing by the correct use of brackets and commas in MDX.

The MDX query in listing 2 below displays Internet Bike sales for the "Europe" Sales Territory Group on the COLUMNS Axis.

Now, let’s say in addition to “Europe” we want to retrieve the same bike sales for “North America” and the “Pacific” Sales Territory Groups. There will be the propensity to add the rest of the groups namely “North America” and “Pacific” (shown in red) to the earlier select list as shown in listing 3 below.

Error: Query (2, 3) The 'SalesTerritoryGroup' hierarchy appears more than once in the tuple.

In reality, what the logic in Listing3 above tried to do is put all the other members onto one column as shown on figure2 below. The problem is, even if MDX returned the result set on one column as below, it wouldn’t make much sense to anyone looking at the report, more so if a lot more members are added onto that one column.

To display the result correctly, the logical thing to do is to separate “North America” and the “Pacific” Territory groups onto separate columns with the same "dimensionality" as the “Europe” territory group. To do this you separate the three Groups with parenthesis ( ) separated with commas as below.

As demonstrated in listing 4 above, using logical selection, dimensionality and the correct use of brackets enables us to appropriately display a complete set of result on an MDX Axis.

Before we move on, let me mention the fact that, for the most part you will not be generating convoluted tuples (explained below) in the MDX you write. The times that you will have to do that is when you personally want to display very specific items on different Axes. For instance the query MDX1 below is simpler but pretty much yields the same values as the MDX in listing 4 above, albeit the Sales Territory Groups are displayed on the ROWS axis and “Bikes” is not displayed as part of the results in the MDX query below.

MDX Nomenclature

I am going to use the logic in listing4 to explain a few MDX terminologies as illustrated on figure 4 below.

Figure 4: Showing MDX Tuples and Set

Tuple

As illustrated in the MDX on figure 4 above, a tuple is a way of referencing the value of any cell in the cube. It is represented by logical comma separated list of unique names of members from different hierarchies in the cube, surrounded by round brackets ( ).

If a tuple is composed of only one member from a single dimension hierarchy the ( ) are optional. As shown in listing5 below, both syntaxes in MDX1 and MDX2 are correct.

As illustrated on figure above, a set constitutes collection of zero, one or more tuples with the same dimensionality and explicitly enclosed in the set { } symbol.

Note that { } brackets are optional if you have only one tuple in your SELECT list, for instance from figure 4 if we want to select only one of the tuple then both MD1 and MD2 in listing6 below are correct.

Error: Query (2, 3) The 'Measures' hierarchy appears more than once in the tuple.

Notice that in the tuple in MDX2, the two members are from the same (Measures) dimension. Because they are of the same dimensionality, they must be separated into two tuples. When separated into two tuples you must explicitly enclosed them in the set curly brackets as in MDX3 below.

Cube Slicing and Dimensionality

To illustrate the concept of slicing and "dimensionality" of a cube, let's look at the listings and figures below.

In listing8 below, the two tuples on ROWS in MDX1 has the same number and ordering of the dimension members. On the other hand the number of dimension members in the two tuples on the ROWS axis in MDX2 does not conform. The second tuple has only two dimension members as oppose to three in the first tuple so it fails when you execute it.

Error: Query (2, 2) Two sets specified in the function have different dimensionality.

Note that in MDX2 the ordering of the members in the two tuples does not conform so it fails execution with the error shown above.

The dimensionality of MDX1 and MDX2 are illustrated in figure 6 below. MDX2 does not represent a properly form MDX syntax.

Figure 6: Showing dimensionality on COLUMNS axis

From the examples and illustrations above it is clear that to generate the right MDX logic, you always have to keep a consistent order and number of tuple members and also apply brackets correctly in other to achieve the right dimensionality on any axis you choose to display on.

As a note, the square bracket characters are used if the name of a member has a space or a number in it, the right bracket (]) can be used as an escape character in MDX if the member name or member key contains a right bracket. It mostly has the same use as in SQL.

The concepts in the last two sections may seem a little confusing initially but with little practice you should get a good handle on it.

Most of the syntax differences between MDX and SQL will also be due to hierarchies, the fact that cube dimensions have them but relational tables don’t have them. In the next section I am going to spend some time explaining how hierarchies determine the MDX you write.

Navigating Hierarchies

In Part I we learned how the cube turns each regular dimensional attributes into a two-Level hierarchy called Attribute Hierarchies. All Attribute Hierarchies have a top Level called "All" and an Attribute level and are created by default by the cube for every dimensional attribute as shown in figure 7 below.

User-Defined Hierarchies

Besides attributes hierarchies generated by the cube, cube developers are allowed to create their own hierarchies called user-definedhierarchies on any dimension and as many as they want. The best practice is to create user-defined hierarchies based on the natural relationships in the underlying Data. Some obvious candidate dimensions with natural such relationships are for e.g. Geography (Country, State, City and Zip) or Time (Year, Quarter, Month, week and day). At design time a cube developer will design hierarchy levels base on these naturally occurring hierarchies in the data. For instance in the cube a user defined hierarchy called “Sales Territory” has been defined for the Sales Territory dimension as shown in Figure8 below. Note that the “Sales Territory” user-defined hierarchy has been defined on natural relationships within the underlying data of the dimension i.e. Sales Territory Group, Sales Territory Country and Sales Territory Region renamed to Group, Country and Region respectively in the cube.

Remember, cube developer are allowed to design as many user defined hierarchies as they want on any dimension, so when you are writing MDX you have to know the hierarchies available, know their names and the levels in their definition before you can navigate them or include them in your MDX queries.

Referencing Hierarchies in MDX

Once you’ve identified and checked out the definitions of user-defined hierarchies you can referenced them in your MDX just as we have been referencing attribute hierarchies all this while. We have been referencing attribute hierarchies by the name of the dimension they belong to, the attribute hierarchy name and then the attribute level by name as below.

In the same way, you navigate a user-defined hierarchy by referencing the dimension it belongs to, the user-defined hierarchy Name and then the levels under the hierarchy by Name as below.

[<Dimension Name>].[<User-Defined Hierarchy Name>].[<Level Name>]

For instance MDXI and MDX2 in listing 10 below show how to retrieve the Sales Territory Group attribute level (on rows) through the Sales Territory GroupAttribute Hierarchyand the Sales TerritoryUser-Defined Hierarchyrespectively.

Note that we retrieved the same information from an attribute hierarchy and user-defined hierarchy. As a matter of fact any attribute that you can retrieve from a user-defined hierarchy would be available as an attribute hierarchy. So now the questions is, if you can obtain the same results using either hierarchy types then what is the need for going the extra step to create User-Defined Hierarchies?

The answer is in the way each type is designed and how you use each in your MDX queries. Let’s look at how each hierarchy type is designed in the next section.

Hierarchy Architecture

First let’s look at the raw data in the SalesTerritory table and how the data from this table is structured into attribute and user-defined hierarchies for the Sales Territory dimension in the cube. Table 1 below shows the detailed data in the SalesTerritory table.

Table 1: Showing the raw data SalesTerritorry table.

Now let’s look at how the cube models this data as attribute hierarchies. Figure11 below shows a representation of the 3 attribute hierarchies generated by the cube for each of the 3 dimension attributes, each of which represent a column in the underlying table.

Attribute hierarchies

Figure 1: Showing a dimensional attribute hierarchy architecture.

Notice that the overall attribute hierarchy architecture is not very different from the table architecture, the attribute level in each of the three attribute hierarchies contains the detail record from a table column, the only difference is the “All” level member (renamed “All Sales Territory” ) is added to the attribute hierarchies but not present in the table data.This can also be shown from the SQL and MDX queries and results shown below.

From the two results above we can see that the only difference is the "All Sales Territory" present in the MDX resultset but absent in the SQL resultset.

Now let’s look at how the cube developer modeled Sales Territory user-defined hierarchy. Figure 12 below shows a representation of the Sales Territory user-defined hierarchy in the cube.

Notice that there is only one user-defined hierarchy as oppose to three attribute hierarchies but this one user-defined hierarchy captures all the detail data in the dimension for that matter all the records in the underlying table in this one hierarchy. In this architecture the levels are designed to much the natural relationships in the underlying data.

User-defined hierarchies

Figure 12: Shows a User-Defined Sales Territory Hierarchy

As you can see from the MDX query below, the design is such that by referencing this one hierarchy you can have access to all or subsets of the detail level data in the dimension.

There are great performance benefits for designing and using user-defined hierarchies in your cube, a complete discussion of these benefits are out of scope in this series. By the nature of their internal architecture user-defined hierarchies results in faster MDX queries when referenced in MDX, so bottom line is, if they are available in your cube identify them and used them in your MDX queries.

Introduction to Hierarchical Functions

In part I I mentioned that creating hierarchical structures allows the cube to navigate to the different levels of the dimensions quickly, and by the use of functions like members, children, siblings, cousin, ancestors, descendants it know where to go and what to retrieve. From the previous sections it becomes quite obvious that these functions are more useful for user-defined hierarchies than attribute hierarchies because the later has only two levels.

The Descendants Function

The descendant hierarchical function is one of the very useful functions you will use in MDX. The function takes three parameters, one version is shown below.

Descendants (member, Level, Desc_Flag)

Remember that dimension have two types of hierarchies therefore hierarchical functions are applicable to both types.

Let’s see the use of the descendant hierarchical function with the user-defined hierarchy from Sales Territory dimension , as shown in MDX1 below.

Note that if you have several levels in a hierarchy, by using the upper member, the lower levels and the flag you can manipulate the syntax to display sets from different levels of user-defined hierarchies on the other hand some of these function are useless on attribute hierarchies since attribute hierarchy types have only two levels as shown in MDX3 and MDX4.

There are many argument and flag options for the Descendants function. For a complete list of MDX functions and how they are used check out the MDX function reference at (http://msdn.microsoft.com/en-us/library/ms146075.aspx)

Other Functions

Cross join function

The Cross join function automatically returns the cross-product of two sets and thus allows you to display the result on an axis.

The Syntax to display a cross join sets on an axis is as below;

Select Cross join ( { set1}, { set2 } ) on Axis(n)..
from Cube

another variant of this syntax is as below.

Select { set1}*{ set2 } on Axis(n)..
from Cube

Let’s say we want display the cross join sets of all Years from the Date Calendar hierarchy and a set of all Sales Territory Groups on the ROWS axis. MD1, MDX2 and the resultset in listing11 below shows both cross join syntax variants we saw above.

Note that unlike SQL “Cross Join” without a where clause which returns a full Cartesian product, MDX cross join limits returns to the set of tuples that actually exist in the cube. For example, examine the results from the execution of the query below.

Note that even though the second set has more than two members (all Sales Territory Group members) the cross join returned only two members because first set has two tuples in the set.

Another thing to note is that, I derived the first Sales Territory set from attribute hierarchies and the second set from user-defined hierarchy in the Sales Territory dimension. This is because MDX expects the sets you are cross joining to always come from different hierarchies.

Non Empty function

The Non Empty function is used to filter out nulls from a specified set. in the example below MDX1 displays nulls in the resultset. Applying the Non Empty function to the same query eliminates the nulls as shown in MDX2 below.

Summary

In this Part we've looked at some indispensable cube terminologies and definitions. We also looked at the use of brackets and concept of Dimensionality and Cube slicing. We finally looked at the differences between attribute hierarchies and user-defined hierarchies, and the importance of identifying user-defined hierarchies in your cubes and using them in your MDX queries.

Understanding the concepts introduced here is key to moving on to write advanced MDX queries which you will find out are just extensions of these concepts.

Next Steps

We will look at some other important functions and finally write some advanced MDX queries by combining everything that we've learned in these series.

Want more great articles like this? Sign up for fresh SQL Server knowledge delivered daily.