One of the most interesting functions in Denali’s DAX implementation is the Summarize() function. To see how this can be of great use, imagine the following table called Payments:

Suppose we wanted to know what the employee monthly payment average is. To calculate this, we decide we might want to start by calculating the monthly payment average as a first step. In order to do this, we must first add the values for each month and then calculate the average. So we try the following DAX query in SSMS:

The number is obviously incorrect. Yes, it is true that Values() will return a table with only two rows (one representing each date). However, this one-column table has no knowledge of the [PaidAmount] column or how to related it to date values, hence the SUM is executed over the entire table resulting in the following…

… for a final average output of 480. Another option, if using PowerPivot to consume the tabular model, is to create an aggregate table and reimport it into PowerPivot. How? By creating a pivot table based on [Date] and [PaidAmount], the values are automatically aggregated:

Unfortunately, a pivot table cannot be directly re-imported into the PowerPivot window, but we can do this by converting to formulas (In the Excel ribbon, go to PivotTableTools, OLAP Tools, Convert to Formulas). After the pivot table cells have been replaced with a CUBE formula reference, go to the PowerPivot menu and click on ‘Create Linked Table’. For some mysterious reason, doing this will result in measure cells losing its reference as the header is now static text. This is easy to fix though: remap the formula reference to appropriate tabular measure:

Once the table is imported in the PowerPivot model, we can now issue a simple DAX calculation:

AVERAGE( [Sum of PaidAmount] )

The output is 240, which is correct: it worked because the table over which the calculation acts is now at the appropriate level of granularity.

There are a few disadvantages on this approach, however. First, it relies on PowerPivot linked table feature, so it is a client operation. Second, the re-import process required remapping CUBE functions. And third, what if the base table contained hundreds of millions of rows? The solution above would not scale.

In BISM Tabular, we can do all this in-memory as part of a calculation. Let’s try it with the same example illustrated above. Connect SSMS to the Tabular model and try the following DAX query:

That’s all. We have now our calculated aggregate table. Monthly average is now quite simple:

Unlike our first attempt using Values(), our new summarized table has a reference to the aggregated [PaidAmount] column.

Going back to our original attempt of calculating the employee monthly payment average, you may have already noticed there are two entries for Mark on 1/1/2011. In order to properly compute the average, we must take this into account and only count distinct employees by month. Here is the final calculation:

Hopefully this post will have demonstrated how powerful this DAX functionality can be, mainly when requiring technical aggregate tables which are only an intermediate step in the process of arriving to a specific output.

Lets examine what the options are for calculating Quartile / Percentiles in PowerPivot. On my last post, I explained one way to calculate Median using pure DAX. Though the calculation works, don’t expect it to be speedy when dealing with large datasets; The reason for this is that it relies on row context iterations to rank values prior to selecting the median.

As similar technique can be used to calculate Quartile or Percentile, following this blog entry. However, the result is not always optimal: not only the calculation can be slow, be the results do not match Excel’s native functions. The reason for this is that Excel uses interpolation to calculate both Quartile and Percentile, whereas the DAX expression above picks one value out of the set (in other words, its just an approximation).

Can we use a technique to match Excel’s results within a PowerPivot model? The answer is yes, though the technique discussed here does not use DAX. What I want to share is another way of using the inherent capabilities of PowerPivot: being itself a version of SQL Server Analysis Services, it is actually able to understand MDX. It is quite interesting to notice that PowerPivot can behave in ways that reveal its SSAS nature: from the ability to respond to Excel cube functions to generating a trace file which can be read in SQL Profiler.

So what does this have to do with calculating Percentile? Well, even though DAX does not include any of those functions you can leverage PowerPivot’s multidimensional interface to create an MDX calculated member that will encapsulate each of those functions. Lets see how.

In order to see PowerPivot queries to the Vertipaq engine in MDX format, we can use a free codeplex add-in called ‘OLAP PivotTable Extensions’ (download it here, and be sure to select the correct bitnessversion). This add-in was created with SSAS in mind, but – without any further effort from the add-in itself – PowerPivot responds to it. Once installed, create a PowerPivot table and then use the add-in to view the MDX query (select the ‘MDX’ tab on the window that appears when you click on the OLAP Pivot Table Extensions menu item)

Lets examine the actual query. MDX has a built-in ‘Median’ function ( see definition here ). As you can see, the expression uses parameters that look like regular SSAS dimensions. This is the PowerPivot data model though, and we didn’t have to use Visual Studio to create it: the PowerPivot engine automatically creates dimensions based on the tabular model loaded into Vertipaq.

The measure [Sum of SalesAmount] is the aggregate PowerPivot created based on the Sales Amount column of the in-memory fact table. Once it became an actual measure in the PowerPivot model, we can use it as the measure expression of the MDX function.

Lets try now creating Quartile and Percentiles. It gets more interesting here: Unlike the ‘Median’ function, MDX has no built-in Percentile or Quartile function. So here we have it: neither DAX nor MDX will help us here. However, we can leverage a another feature of SSAS: the ability to perform COM interop calls through the ExcelMDX assembly. You can see this library on a regular SSAS installation by browsing the assemblies folder:

It is quite interesting when you think about it. We are asking an Excel pivot table to communicate with the PowerPivot engine through a calculated member which in turn invokes Excel functionality not present in the MDX language. Pretty cool indeed. So lets try it: Using the following expression we can compute Quartiles:

As you can see, the output matches Excel’s native quartile function. The calculation is now built into the PowerPivot model, and hence is dynamic and can respond to slicers and filters, as well as cell context by using MDX properties like ‘CurrentMember’. And though the PowerPivot is performing a COM interop call to Excel (marshalling calls from one environment to the other), it should still perform better than a DAX calculation based heavily on row context.

If you examine the MDX that is sent to the Vertipaq engine, you will notice the calculated member is declared as a query scoped expression:

The dimension used (DimDate) as well as the measure (Sum of SalesAmount) were automatically created by the PowerPivot engine. This is what provides compatibility with OLAP interfaces: for each table in the PowerPivot model, a default ‘count’ measure has been created as well as a dimension, in which each column is an attribute. In other words, dimensions and measures are somewhat interchangeable in PowerPivot, following a more flexible model than with the traditional SSAS engine. Also, notice the query is executed against a cube called ‘Sandbox’. This is the default cube-interface name for Vertipaq PowerPivot databases. It is the same name you will see in the cubes generated in the dedicated SSAS instance used on PowerPivot for SharePoint (This is an instance of SQL Server Analysis Services 2008 R2 in Vertipaq mode).

One downside of this method is that the PowerPivot field list is not aware of the calculation. However, if you open the standard pivot table field list you will see it there. In other words, and as I said before, this calculation is now part of the data model. Which means you can access it outside of the constraints of the pivot table: through Excel cube functions. As an example, using the following formula expression on a worksheet cell we can dynamically invoke the MDX calculated member against PowerPivot, with the ability to specify a query context to it:

The result will be 925,015 again. In other words, we were able to dynamically invoke the first member of the CalendarYear attribute on the DimDate dimension and leverage the MDX calculated member, all in one cell – no need of pivot tables. Pretty powerful right?

You can verify that our Quartile calculation is now part of the intrinstic model by browsing through measures members when constructing the cube formula:

What I find pretty interesting is that the list of available measures on the PowerPivot model includes now regular DAX measures ([Measures].[Sum of Sales]) as well as MDX measures ([Measures].[Quartile]).

How about Percentile? Using the same method, it is pretty simple to calculate it. Just replace Quartile in the MDX expression… that’s it!

Be careful to not define your underlying measure as ‘currency’ type. Though I always thought data types in the PowerPivot window were purely decorative (as they don’t carry over the pivot table), the MDX calculated member will not work on currency types. Instead, try decimal type and you will be fine.

And one last thing… as you may have noticed, this technique opens the possiblity to a huge range of calculated expressions. An though it is not the standard PowerPivot/DAX implementation, it still leverages PowerPivot technology by making use of its inherent data model. In other words, PowerPivot applications can not only benefit from the amazing power of DAX , but also that of MDX (which in turn opens the door to further functionality through interfaces with Excel and VBA). Things like Internal Rate of Return (IRR) are also possible using this method

Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality. There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column.

Recently I tried to adapt this calculation to a PowerPivot application I was working on. I ran into two issues:

1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column? In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters. Due to this, the calculation above won’t work

2) I realized there should be a separate logic depending on whether the set of values is even or odd. For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.

Fortunately, it is fairly easy to deal with both issues. With Alberto Ferrari’s help, I implemented the calculation below. There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle. I then created an alternate execution path for even sets, which simply averages the two values in the middle.

To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):

Step 1

Create a measure. This is to test that we can calculate a median over a column or measure using the same method.

[Test Measure] =SUM([Amount]) / 2

Step 2

Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)

The calculation first uses the MOD function to determine if the set is even (which is the case in our example). It then gets the two values in the middle by:

1) Filtering the months for all that have an index of less or equal to the count of months divided by two. Take the maximum value of [Test Measure] for this set.

2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set

3) add these values and divide by two (simple average)

The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets. In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.

A common calculation required in business is moving averages. How can we create it in DAX? Lets walk through the calculation.

Imagine we have the following PowerPivot table:

In order to calculate a three month moving average, we first need to be able to grasp a reference to the last two months and the current month – for each of the months listed on the table. In other words, if I am in September of 2001, I should be able to grab July, August and September of 2001. How do we do this in DAX?

Using DAX filter context you have the ability to make a reference to the in-memory date table, and – for each month in the pivot table – filter the table only for the current month and the last 2 prior months. Lets see this in action:

Lets start by trying to grab a reference to the current month. To test it, we can use DAX’s VALUES function which returns a table with distinct values depending on the current context. As we want to “grab” the current month, we try creating a measure using this expression:

=VALUES(DimDate[EnglishMonthName])

DimDate is the in-memory table in the PowerPivot window that contains all dates, and “EnglishMonthName” is a column on that table. This is the column that has been placed on the pivot table.

The result we get is an error. It says:

ERROR – CALCULATION ABORTED: MdxScript(Sandbox) (5, 57) A table of multiple values was supplied where a single value was expected.

Hmm. The error says there is a table with multiple values. Clearly, each cell’s output should represent only one value. But.. why do we have multiple values when there is only one month in context? The reason is that there is a grand total row (not shown above). In the case of the grand total, there ARE multiple month values. The way to avoid that error is by ensuring only one distinct month value is in context. Try this now:

We get an error again. As you notice, the PARALLELPERIOD function takes a date for its first parameter, not a month. In the context of the month, there are multiple dates. When using PARALLELPERIOD, we get an output of multiple dates and that cannot be represented in once cell unless we tell the function which date to use. So lets try this now:

Ok, so now we have the boundaries of dates we need. In DAX, a filter context can add or restrict the current context based on the expression used. Using CALCULATE we can test we are properly aggregating the months we need:

We use the DATESBETWEEN function to return a one column table starting two months ago and ending on the last date of the current month. DAX will then add all the sales amount data over that ‘calculated’ table. The result?

What? The formula gives the exact same result as the regular “Sales Amount” measure?

The reason is that the expression is evaluated on the current’s month context. In order to force the evaluation to happen across months, we must force the filter context to evaluate all dates (not just the current month). We do this by adding one more filter to the expression:

Why? Each DAX calculation “activates” certain rows on the tables loaded into the PowerPivot window. For example, for July of 2001 the reported sales amount is 489,329. However, this is an aggregate number: the underlying data is at the date level. When we execute a DAX average function, we ask the PowerPivot engine to compute the average on the active rows, that is, all relevant rows at the level of granularity of the fact table. The output of the average calculation is, then, the correct daily average: not what we want.

Instead, we can add the sales amount for the period and then divide by the distinct count of months in the current context. Here is the complete expression:

PowerPivot is generally marketed as a ‘Self-Service BI’ tool. However, it can alternatively be used as a way to enhance the design phases of traditional Business Intelligence projects, even if the final implementation does not use PowerPivot technology at all.

Why is that? Because it more closely aligns business experts with BI developers during architectural discussions. In order to explain this better, I think it’s relevant to first describe the process by which data analysts can provide relevant input to BI designers:

A lot of intermediate Excel users will probably never use VBA macros and will barely feel comfortable with Pivot Tables. For advanced Excel users, however, there is a thrill associated with getting a cool formula to work, or knowing the most interesting formatting shortcuts or even knowing how to add dropdowns to a spreadsheet to achieve dynamic interactivity.

These folks do not normally belong to the technology team; however, in addition to being savvy Excel users they are in the front lines of the decision making process which many times makes them subject matter experts (SMEs). It is likely they will eventually be motivated to start playing with the data import facility in PowerPivot and start developing DAX calculations.

Fast forward a little into the future, and imagine these users developing meaningful and insightful PowerPivot reports. These contain connections to a multitude of data sources and aggregate data in a way that provides enlightening information to decision makers.

Though not always aware of it, these users will have accomplished three things of tremendous value for Business Intelligence professionals:

Validated one or more sources of quality data

Validated a useful data model

Validated one or more business calculations

In other words, as they benefit and make use of ‘Self-Service BI’ technology they will unknowingly serve as ‘data explorers’ to the Business Intelligence and Data Warehousing team that benefits from their discoveries.

In the past, these ‘spreadsheet-based discoveries’ went unnoticed by members of the technology team (and became spreadmarts). However, the Business Intelligence team members can now gain insight into popular spreadsheet reports across the enterprise (PowerPivot for SharePoint). In addition to that, they will be able to take these PowerPivot applications and open them in Visual Studio as a SQL Server Analysis Services Business Intelligence Semantic Models (BISM) in SQL Server 2011. This is the server equivalent of a PowerPivot for Excel workbook: it is built using the same technology (the columnar based storage engine "Vertipaq") but with a number of robust features proper of an enterprise solution.

One of the great things of this new approach to BI that Microsoft is proposing is that it will most likely help close the gap between Excel power users and Business Intelligence professionals by creating an analytical environment built on the same technical foundation on both ends: DAX. This will most certainly help avoid communication challenges that normally occur when both sides speak in totally different “languages”: Excel formulas \ business calculations on one side, and T-SQL/MDX and OLAP on the other.

So what about BI prototyping? And how can we benefit from it even before SQL Server 2011 is released? Well, imagine having the ability to engage subject matter experts in their most familiar environment (Excel) and with their help be able to rapidly test assumptions about BI project deliverables, before starting development. Rather than purely describing a calculation in words or spreadsheets disconnected from a database, we can instead quickly and iteratively build a sample of it tied to a data source and observe the calculation in action as its being sliced by multiple dimensions. This can generate the necessary early insights that can help correct potential misunderstandings quickly and generate a more complete and accurate data model that can be used to guide actual development. And these insights have been already validated by the SMEs (in collaboration with the BI team).

As an example, imagine a dimension called “Orders”. During the initial design phase, it becomes clear it must be added to the cube in order to generate the expected reports. Fast forward a few months (and many hours of development). While building the final reports, the BI team becomes aware something that is missing: Even though DimOrders is used in many reports (as business users started at the beginning of the project) it was built at a different grain that what reports require; conditional logic is discovered – yes, at the last stages of the project – that treats some Order Details differently from others. What to do? Different approaches are possible, from redesigning parts of the ETL to creating new measure groups based on named queries in order include the data that is missing. Architectural re-arrangements are going to have to be made in order to complete it on time; and any ‘quick’ solution will risk compromising the ad hoc nature of the final product (which will in turn limit the ability to use it for analysis).

How would this change if PowerPivot is used as a prototyping tool? Using an Agile approach, during early scrum meetings it will be possible to ‘sample’ the Order dimension against a copy of all relevant reports. By loading a relatively small set of data into PowerPivot and designing ad hoc reports to mirror actual views across any functional existing reports, one is able to quickly uncover the fact that different grain is required for the dimension in question. This work would probably be done in very small amount of time compared to what it will require to re-desing and re-develop parts of the ETL and OLAP solution. Time – and money – can be saved while increase accuracy of the design.

How would it be possible to ‘mimic’ the dimension ‘Orders’ without building into a real SSAS cube? This is the most interesting part: PowerPivot was built on SQL Server Analysis Services code. In other words, it is a version of SSAS. All rules expected by SSAS when building cubes apply to PowerPivot as well; for example, dimensional tables in a star schema translate to ‘lookup’ tables in the PowerPivot window. PowerPivot data is consumed by Excel using ‘cube’ interfaces (MS OLAP provider, Excel CUBE functions, etc); However, PowerPivot was built for fast development which makes it an ideal tool for ‘mimicking’ an SSAS cube without the complexity of advanced server features.

I want to emphasize, we are not trying to build the entire BI project or corporate data warehouse in PowerPivot. We are just ‘sampling’ the data model and ‘testing’ the different calculations in order to allow subject matter experts to help us gain a much clarity regarding the quality of the data, the relationships among the different data entities and the validity and relevance of the business calculations. It is often the case that business users will start ‘refining’ their idea of what they want once they see a working model.

This final result of a BI project that used PowerPivot for prototyping can very well be a traditional UDM/OLAP environment using MDX calculations. The discoveries gained during the prototyping phase have no direct relationship to the specific technology that will be implemented; they only offer pointers to a better architecture that will increase the chances of a successful implementation, one that will have a profound impact on the business.

A lot of Business Intelligence projects fail for two major reasons: lack of Business Sponsorship and poor Data Quality. There is a good chance that using PowerPivot as a prototyping tool can help tackle these issues as subject matter experts use their familiar Excel environment while collaborating with BI professionals under the iterative Agile framework. Rapid dimension/measure sampling can quickly uncover data quality issues while establishing a solid data model that analysts can understand which gradually builds support for the BI vision (in contrast to solutions they perceive as overly complex and many times inaccurate, which is the mean reason why ‘spreadmarts’ exist in the first place).

When developing DAX calculations in order to create meaningful business metrics you normally want to remove or refine an existing filter in the current filter context. For example, when calculating the sales contribution of a country to the sales across all countries in the Adventure Works sample database one could use:

So far so good. However, what if we want to pick only certain years and have the expression dynamically recalculate only for the selected years? We could add a slicer with the year values. This won’t work, though: the ALL function will force the filter context to evaluate across all years – regardless of slicer selection:

As you can see, DAX computes the contribution across the total for all years in the in-memory DimDate table. The result incorrectly adds up to only 54.82% as the denominator is still taking into account all years, not just the selected ones. How can we make this dynamic, to use only the selected values on the slicer?

The solution is to take advantage of the inherent flexibility offered by calculated columns. We can create another ‘Year’ column in our data model, and then add another context to the calculation that will use ALL years but only within the boundaries defined by the user defined filters. The nice thing about this solution is that the output only takes into account the years in the slicer (in other words, the query context), instead of either taking ALL years or just the CURRENT year.

So how to do this?

1) Add a calculated column called "QueryContextYear" to the in-memory FactResellerSales table, using this DAX expression :

RELATED(DimDate[CalendarYear]) 2) Replace the slicers in the spreadsheet with this column: Instead of using the CalendarYear from DimDate in the slicer, use the new QueryContextYear from FactResellerSales

That’s it! Now the result is dynamic to your slicer selection: the denominator will be calculated only using the selected years.

The reason why works is that the ALL function in the DAX expression removes the filters only on the DimDate[CalendarYear] column, but not in FactResellerSales[QueryContextYear]; The output of a calculation in DAX is always the result of stacking all the filtered tables. So while FactResellerSales[QueryContextYear] is being filtered, DimDate[CalendarYear] is not; And the intersection of these filters determines the calculation output.

PowerPivot currently has no programmatic interface, so there is really no automated way to leverage its capabilities (other than those provided by PowerPivot for SharePoint). However, we must remember that one way to consume PowerPivot data is through an Excel pivot table (there are others, like Excel Cube functions). Pivot tables do have an API, which allows for a certain degree of programmatic interactivity with PowerPivot data.

A client recently asked me if a PowerPivot report could behave in this way: upon expansion of a row lable, a graph should display only the ‘child’ members of the selected row. A screenshot will probably help in explaining the desired effect:

One may be inclined to think this functionality is not possible but keeping in mind that pivot tables do have an API, it was worth giving it a try. As you may have seen on my prior blog entries, PowerPivot automatically generates an interface of measures and dimensions. We can use this interface to dynamically generate the desired effect.

The first thing we must do is generate a second pivot table with countries in the row labels along with a slicer for the ‘Sales Territory’ (the one the user will be clicking on). I call this second pivot table a utility pivot table as it is not meant to be shown to the user, but will allow us to get the job done. It will not only be sliced by the same ‘Calendar Year’ as the original one, but will serve as the source for chart data.

In order to have both tables react to the slicer, we must configure the ‘Calendar Year’ slicer to connect to both. This can be easily done in the slicer ribbon tab:

Now all we have to do is generate a VBA script to take the user selection of ‘SalesTerritoryGroup’ (in the slicer) and filter the utility table as well. First we must pick the right event handler to capture a user click. For this we can use the Worksheet_PivotTableUpdate event handler. Once the event is triggered, we can capture the text on the active cell, and pass that to the slicer on the utility pivot table.

“Slicer_SalesTerritoryGroup” is the name of the slicer of the utility pivot table. Upon pivot table update, we pass to that slicer a reference to the fully qualified dimension member which the user has just clicked on. The result is a chart that dynamically shows the child countries of the territory the user just double clicked on. This happens at the same time the row expands to show those children in the pivot table itself:

Though the solution may not be perfect (the effect happens not only on expansion but also when collapsing a row), it is still an interesting way to add functionality to a PowerPivot report.