MDX Scope VS Pre Aggregated measures.

So I looked into the use of Scope in MDX and compared it to using pre aggregated measures in the cube.

I am a big fan of handling any type of row level business rules in SQL and any type of set level business rules in MDX calculated measures. Period to Date sales should definitely be coded up in MDX calculated measures whereas calculating discount for products based on what category they belong to should be done at row level in SQL before loading the data into the cube.

Scope seems to be quite good at handling the row level type calculations though, there are pros and cons to both approaches though.

The example I will use is not a good “real life” example, but it’s as good as any for a technical example.

On the “Internet order Quantity” measure in the AdventureWorks 2008 R2 cube database, we will “discount” the quantity based on which category the products belong to.

The Scope approach.

1.)On the Internet sales measure group, duplicate the “Internet order Quantity” measure and call it “Internet Order Quantity2”

I have included both variables (Original and discounted using scope) in the query to confirm that the results are as desired. The rows at the top have no difference, but that is because they belong to the Accessories department which wasn’t discounted, the rows further down show the clear difference.

What the scope code is doing: Quite simple really, inside each scope statement for each of the Poroducts derived by getting the Grand children of category (FirstChild.FirstChild to lastChild.LastChild) it is replacing the value of “Internet Order Quantity” with whatever code is specified in the piece of code followed by “This =”

The pre-aggregated approach.

1.)Create a view in the sql database, based on the FactInternetSales table using the same logic used in the cube, the code should look like this:

As you can see the pre aggregated approach returns exactly the same result as the scope version.

I tested the performance in MDX of both versions and i found that the pre aggregated version seems to perform a bit better. I did the tests with this code (I tried to make the query as granular as possible for a good test):

This query took 58 seconds on my pc. Now if i run the clear cache query again and change the “Internet Order Quantity2” measure to “Internet order Quantity3” the query completes in 47 seconds, not much of a saving, but it does make a differences.

Both approaches I believe are good, but I personally prefer the pre-aggregated in sql approach for a few reasons:

1.)Performance

2.)Deployment into production (a view in sql is deployed a lot easier than a cube), in both cases the cube needs to be processed though.

3.)The sql approach will not be affected by a design change to hierarchies on the dimensions.

The scope approach however can change logic at any level in the hierarchies ie different logic at category level to product level, this will be a lot more difficult to achieve in sql though (probably need to do a bit in both parts cube and sql).

SSAS is blessed with a variety of built in time calculations, as well as many methods of creating your own calculations, for when your time dimension is designed in such a way that the built in calculations don’t serve the exact purpose that you require. I find that these methods are flawed in a couple of ways though, as they rely on the “.CurrentMember” function. The problems that I experience with these are listed below:

1.) The “.CurrentMember” function does not behave as expected when multiple members are selected.

2.) The “.CurrentMember” function is a function of a dimension hierarchy which when referenced in a complex time dimension design, with a set of complex time calculations, can cause a maintenance nightmare if for whatever reason the hierarchy design changes.

Well I was forced to come up with a better solution a couple of years ago as my users were wanting to report on Last Year figures and Year on year growth figures while multiple time members were selected and on top of this, they added an extra hierarchy to the time dimension and I already had hundreds of time calculations coded into the cube which was done in many thousands of lines of MDX code as the solution was for a large corporate comprising of 5 companies, each sharing the same cube, each with 5 time hierarchies (same time hierarchies, but with a different make up of members).

Let me illustrate the problem below:

Using the Adventureworks Cube db that comes with a 2008 R2 installation – run the following code from an MDX query session (It returns the Reseller sales amount for the Calendar year of 2007 as well as the previous year’s sales in the LY measure:

By going down to the lowest level of granularity on the time dimension, this multi select solution works perfectly. The date attribute being referred to in this query is the primary key on the time dimension in the adventure works cube, but it is a hidden attribute, which is why you may not see it in management studio.

The only problem that we are left with is that we need this LY measure to be dynamic enough to reference the exact LY equivalent of what is being selected in the time filter of the query, also we will need this to be coded as a measure in the cube itself and not necessarily reference it as a “WITH MEMBER” at the start of an MDX query.

That’s when I had a “light bulb” moment, essentially we are trying to run a query to aggregate the sales value against a set spanning from the MIN member of the previous year’s equivalent of the selected range to the MAX member previous years equivalent of the selected range, now the current year’s sales works perfectly because the min and max member are being returned from the hierarchy structures available on the time dimension design, so we need to replicate this for our LY calculation and all other time calculations for that matter. So I decided to create another measure group referencing only 1 dimension being the time dimension, it also has a 1 to 1 relationship with the time dimension. It holds a “DateKey” plus an equivalent min and max “DateKey” for each attribute involved in all hierarchies on the time dimension, so that calculations can be coded up using the STRTOSET function in MDX referencing these measures on this new measure group.

I know that STRTOSET is often referred to as a “dirty word” in MDX as it often means a performance lag in queries, but in this case it seems quite the opposite. I have found that this method outperforms the built in time calculations available to Analysis Services. AdventureWorks is hardly the place to do the performance tests because of the fact that it is a tiny data set compared to what one would face at large corporate sites. I will post results of the performance tests in a later post.

The fact table feeding the measure group should look something like this:

DateKey

MinDayLY

MaxDayLY

MinMTD

MaxMTD

20060101

20050101

20050101

20060101

20060101

20060102

20050102

20050102

20060101

20060102

20060103

20050103

20050103

20060101

20060103

20060104

20050104

20050104

20060101

20060104

20060105

20050105

20050105

20060101

20060105

20060106

20050106

20050106

20060101

20060106

20060107

20050107

20050107

20060101

20060107

20060108

20050108

20050108

20060101

20060108

20060109

20050109

20050109

20060101

20060109

20060110

20050110

20050110

20060101

20060110

Imagine the measure group having a MIN aggregation set for each column prefixed with Min and a MAX aggregation for each column prefixed with a MAX.

Now if a user selected an attribute that spanned across the date range ‘2006-01-01’ to ‘2006-01-10’ in a query and we had a measure reporting on the LY Reseller Sales Amount for that period we would using this method need to have a piece of code looking something like this:

All columns are pretty self explanatory except for the last column “OverallMin” this will hold the lowest DateKey in the date Dimension against each record. We need this because we will create some LY records that are up to a year earlier than the lowest date member on the date dimension (we can’t have any nulls in these columns as Analysis Services treats them as zeros and that will mess us around especially when they are set to have a min aggregation).Then in our MDX code in the cube we also can’t make reference to a Date member that is lower than the lowest date member on the Date dimension, so we have to start at the lowest date member – hence the reason for this column.

Execute the code to create the table.

Next run this piece of code, to identify if there are any “holes” in your date dimension as the version of the AdventureWorks database that I had did have some holes, we can’t have holes in our dimension or date calcs measure group for that matter as that too will have a big effect on our MIN and MAX aggregated data:

<code>
select * from dbo.DimDate order by 1
--Hole in datekey from 20080901 to 20101031
</code>

I identified a large hole in my Date dimension, by manually checking the rows, see my comments above.

Next create the stored procedure to populate our date calc measure group’s fact table, I have pasted the code below and used comments in the code to explain as much as possible:

<code>
CREATE PROCEDURE [dbo].[PRC_FactDateCalcs_Populate]
AS
DECLARE @OverallMin AS INT
SELECT
@OverallMin = MIN(DateKey)
FROM
dbo.DimDate
SELECT
CalendarYear,
MonthNumberOfYear,
MIN(DateKey) AS MinMTD
INTO #MTD
FROM
dbo.DimDate
GROUP BY
CalendarYear,
MonthNumberOfYear
--****Add a year previous worth of months for LY calcs.
INSERT INTO #MTD
SELECT TOP 12
CalendarYear - 1,
MonthNumberOfYear,
MinMTD - 10000
FROM
#MTD
ORDER BY 1,2
SELECT
CalendarYear,
CalendarQuarter,
MIN(DateKey) AS MinQTD
INTO #QTD
FROM dbo.DimDate
GROUP BY
CalendarYear,
CalendarQuarter
--****Add a year previous worth of quarters for LY calcs.
INSERT INTO #QTD
SELECT TOP 4
CalendarYear - 1,
CalendarQuarter,
MinQTD - 10000
FROM
#QTD
ORDER BY 1,2
SELECT
CalendarYear,
MIN(DateKey) AS MinYTD
INTO #YTD
FROM
dbo.DimDate
GROUP BY
CalendarYear
--****Add a previous year for LY calcs.
INSERT INTO #YTD
SELECT TOP 1
CalendarYear - 1,
MinYTD - 10000
FROM
#YTD
ORDER BY 1
SELECT
CalendarYear,
CalendarSemester,
MIN(DateKey) AS MinSTD
INTO #STD
FROM
dbo.DimDate
GROUP BY
CalendarYear,
CalendarSemester
--****Add a year previous worth of semesters for LY calcs.
INSERT INTO #STD
SELECT TOP 2
CalendarYear - 1,
CalendarSemester,
MinSTD - 10000
FROM
#STD
ORDER BY 1,2
SELECT
FiscalYear,
FiscalQuarter,
MIN(DateKey) AS MinFQT
INTO #FQTD
FROM
dbo.DimDate
GROUP BY
FiscalYear,
FiscalQuarter
--****Add a year previous worth of fiscal quarters for LY calcs.
INSERT INTO #FQTD
SELECT TOP 4
FiscalYear - 1,
FiscalQuarter,
MinFQTD - 10000
FROM
#FQTD
ORDER BY 1,2
SELECT
FiscalYear,
MIN(DateKey) AS MinFYTD
INTO #FYTD
FROM
dbo.DimDate
GROUP BY
FiscalYear
--****Add a previous fiscal year for LY calcs.
INSERT INTO #FYTD
SELECT TOP 1
FiscalYear - 1,
MinFYTD - 10000
FROM
#FYTD
ORDER BY 1
SELECT
FiscalYear,
FiscalSemester,
MIN(DateKey) AS MinFSTD
INTO #FSTD
FROM
dbo.DimDate
GROUP BY
FiscalYear,
FiscalSemester
--****Add a year previous worth of Fiscal semesters for LY calcs.
INSERT INTO #FSTD
SELECT TOP 2
FiscalYear - 1,
FiscalSemester,
MinFSTD - 10000
FROM
#FSTD
ORDER BY 1,2

Now we have our Date calcs table and we are ready to add it to our cube and get cracking with a “better” way of performing time calculations in SSAS.

I’m assuming that everyone reading this article knows how to add this dbo.FactDatecalcs table as a measure group in a cube, so I won’t explain that, but just ask you to do it now.

Once you have added it and set up its relationship with the Date table in the Dimension Usage section of the cube we need to also set all the aggregation types, this too should be pretty self explanatory, all measures prefixed with “Min” and the “OverallMin” measure need to have a MIN aggregation and all those prefixed with “Max” need a MAX aggregation.

Now click on the Adventureworks cube in Visual studio and then on the “Calculations” tab, then select “Script View” and scroll down to the bottom of the page.

I have coded up the “.CURRENTMEMBER” style of a LY calculation and MTD calculation as well as my new style equivalents of each (using the “Fact Calendar Calcs” measure group method).

If you go through the above code, you will notice that the traditional style is full of “.CURRENTMEMBER” references and lots of references to hierarchy names etc, which means that more work will need to be done if Hierarchies are added or removed from the time dimension.

The method using the Fact Date Calcs measure group is referencing the “OverallMin” measure in the LY calculation to determine whether it must ignore the calculation, start it at “OverallMin” or perform it normally, it only needs to do this on the LY calculation as MTD should never reference a date member less than the lowest date member on the time dimension.

All seems fine with one date member selected, notice I included the Min and max day ly in the query, just for interest sake, if you put this solution into production you should hide the fact date calcs measures as users don’t need to see them.

Now let’s run the code with a multiple member selection of the Date dimension: