Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Month: September 2016

Last week I was doing some performance tuning for a customer using Power BI on top of a large SSAS Multidimensional cube via a Live connection. Some of their reports were performing particularly badly and I uncovered a problem with the DAX generated by Power BI for tables with a large number of rows, fields from two or more large hierarchies and totals turned on.

The problem is very easy to reproduce; I’m going to do it using a simplified version of the Adventure Works cube that contains only the Date and Product dimensions. Take a look at the following table from a Power BI report:

It has the Date attribute from the Date dimension, and the Product attribute from the Product dimension, on rows and two measures on columns. The table has a large number of rows in it (both Date and Product are fairly large hierarchies) and if you look at the DAX generated by Power BI you can see that it only requests the first 501 rows. That’s fine – the real problem is that Power BI also generates a second DAX query to get the two values displayed in the Total line at the bottom of the table. The DAX looks something like this, and is easily identifiable because it uses the Row() function:

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

Also look at the Duration column in the trace for these events which shows the time in milliseconds that they took to execute. This will tell you how much of an impact this problem is having on your report refresh times.

The easy way to stop this happening is to turn off the totals row in the table:

Displaying the totals in a separate table also seems to avoid the problem, although of course it doesn’t look as good. Only putting one hierarchy in the table apart from your measures, also seems to solve the problem. You could also try all the usual methods to improve SSAS performance such as building aggregations on the cube.

The dev team is aware of this problem but it’s unlikely to be fixed in the short term.

I can hear you yawning already – yet another blog post on getting data from multiple Excel workbooks in Power Query and Power BI. Just about everyone who has ever written a blog post on Power BI has written about this subject, including me. However there’s a twist this time: what if your Excel workbooks are stored in SharePoint or OneDrive For Business? If they are, then your dataset may not refresh successfully after you have published unless you load your data in a particular way.

Some background first. A few weeks ago I was contacted by a reader who had seen my post on data refresh errors and the Web.Contents() function and was experiencing the same issue when using Web.Contents() to get data from multiple Excel workbooks stored in SharePoint. Up until recently the Web.Contents() function – which is used by the From Web option in Power BI’s Get Data experience – was the only way to get data reliably from single Excel files stored in SharePoint or OneDrive For Business. However the limitations on Web.Contents(), M functions and data refresh described in my blog post meant that it wasn’t possible to use Web.Contents() to get data from multiple Excel files stored in SharePoint or OneDrive For Business.

The April 2016 Power BI Desktop update introduced a new way of getting data from Excel files stored in SharePoint: the SharePoint Files data source, based on the SharePoint.Files() M function. Both Mike Carlo and Ken Puls have already blogged about this in depth and so I won’t repeat what they’ve written; their posts have a lot of good information on how to construct the URLs to find your files in SharePoint. How do you use it to combine data from multiple Excel workbooks though?

Imagine you have four identically-structure Excel workbooks stored in a folder in OneDrive For Business:

Each one has a table called Table1 with some sales data in it:

In Power BI Desktop, create a new query and click the Get Data button. In the Get Data dialog, go to Files and click on SharePoint Folder:

Enter the URL for your OneDrive For Business site:

You’ll then see all the files in all your folders in OneDrive For Business:

Filter the folders in the Folder Path column so you only see the files in the folder containing your Excel workbooks:

Right-click on the Content column and select Remove Other Columns to get rid of all but the Content column. Then click the Add Custom Column button and add the following expression:

Excel.Workbook([Content])

This tells Power BI to treat each file in this folder as an Excel Workbook. Click OK, and then right-click on the Content column and select Remove (you won’t need this any more). Next, click on the Expand icon in the top right-hand corner of the Custom column and click OK on the flyout:

This will give you a table listing all of the contents of each workbook:

Filter this table so you only have the four tables from each workbook:

Next, right-click on the Data column and select Remove Other Columns, then finally click OK on the Expand icon again:

You’ll now have a table containing all of the data from the tables called Table1 in each workbook in the folder:

Don’t forget to set appropriate data types for each column (columns will have the data type Any by default, which will be treated as text later on)! You can now build your report and publish it:

In order for refresh to work, you’ll need to configure the credentials used by the Power BI service to connect to your data. In the browser, click on the ellipses for the Dataset for your report and select Schedule Refresh:

This will take you to the Datasets tab on the Settings page. You can schedule refresh here if you want, but the important thing is to click the Edit Credentials link:

If like me you have an Office 365 subscription and use SharePoint Online, then the dialog choose oAuth2 in the Authentication Method dropdown box and click Sign In:

You’ll see the Windows Organizational Account sign-in page appear briefly while you are signed in but you shouldn’t need to do anything. And that’s it!

You can now edit the data in any of your Excel workbooks and, once the dataset has refreshed, those changes will show up in the report. If you want to do a manual refresh of the data, clicking the Refresh button at the top of the report won’t do any good; you have to click on the Refresh Now option on the dataset (seen in the screenshot above, just below the Schedule Refresh option).

Since my last post on the Excel cube functions proved to be very popular, I thought I’d write a follow-up that shows off some more little-known but interesting functionality: how to use the CubeSet() function to combine selections from multiple slicers.

The Problem

Consider the following Power Pivot model based on data from the Adventure Works DW database:

The Internet Sales table contains one row per sales transaction; Product and Date are dimension tables that provide more information on the product bought in the transaction and the date the order was taken on. There’s one measure called Distinct Customers that returns the distinct count of customers; its definition is as follows:

Distinct Customers:=DISTINCTCOUNT('Internet Sales'[CustomerKey])

With this model you could build the following simple report by converting a PivotTable to cube functions:

Obviously this will tell you how many distinct customers bought the products selected in the slicer each year. You could also duplicate the slicer and the cube functions and have two reports side-by-side so you could compare the number of customers that bought two different groups of products:

The question here is, though, how can you create a third report that shows the number of distinct customers that bought any of the products selected across both slicers? So, using the example above, the report on the left shows the number of customers that bought Mountain-200 Black 38, Mountain-200 Black 42 or Mountain-200 Black 46; the report on the right shows the number of customers that bought Mountain-200 Silver 38, Mountain-200 Silver 42 or Mountain-200 Silver 46; the aim is to create a third report that shows the number of distinct customers that bought Mountain-200 Black 38, Mountain-200 Black 42, Mountain-200 Black 46, Mountain-200 Silver 38, Mountain-200 Silver 42 or Mountain-200 Silver 46 without having to select all those products in a third slicer.

The Solution

Step 1

The first thing to point out is that you can use the CubeSet() function to return the set of items selected in an Excel slicer. This is a widely-known technique – Rob Collie has a great post describing it here. What I didn’t realise until recently though was that one CubeSet() function can point to other cells containing CubeSet() functions and union all the items returned by those CubeSet() functions. This allows you to combine the selections made in multiple slicers, assuming that those slicers refer to the same field in your Power Pivot model. Here are the formulas in three cells that do this:

In cell I2 the formula uses the CubeSet() function to capture the selection made in the left-hand slicer in the report above, captioned Product Selection A, which has the name Slicer_Product. In cell I3 there is a similar formula to capture the selection made in the slicer on the right in the report, captioned Product Selection B, which has the name Slicer_Product1. The formula in I4 then uses a third CubeSet() function to combine the sets returned by the last two formulas into a third set.

Step 2

Now that you have a single CubeSet() formula in I4 that combines the selections in both slicers, you can use the trick I showed in my last post and reference that cell in CubeValue() formulas to get the number of distinct customers that bought any of the products listed in either slicer:

Here’s what the report looks like at this point:

Step 3

Notice that, thanks to the power of Power Pivot, customers are not double-counted in the new combined selection report on the right-hand side. For example, in 2003 841 customers bought one or more products selected in the slicer Product Selection A while in 2003 796 customers bought one or more products listed in the slicer Product Selection B. 841+796=1637, but in the combined selection report correctly states that only 1588 distinct customers bought products listed in either Selection A or Selection B. We can use these numbers to do one final, useful calculation: 1637-1588=49, so there were 49 customers who must have bought products in both Selection A and Selection B. The formula to do this is pretty straightforward and looks like this:

The Excel Cube Functions are incredibly powerful, and I’m still amazed at the kind of problems they can solve. This post describes how they can be used to build a report with a complex OR filter using data from Power Pivot (it’s equally applicable to SSAS) that shows a number of advanced uses of these functions. No knowledge of MDX or DAX is needed but if you’re new to the Excel Cube Functions I recommend that you watch this video of a presentation by Peter Myers, which provides an excellent introduction to them.

The Problem

Imagine you’ve got a very simple Power Pivot model that looks like this:

There’s a Date table with dates, months and years in, and an Internet Sales table with sales data in and a measure called Sales Amount. Here’s what the data looks like in a PivotTable:

Now, imagine that you want a report with the Sales Amount measure on columns and Years on rows, and you want to filter the data so that you only see values for Mondays in July or Wednesdays in September. Using the Fields, Items and Sets functionality you could filter the data to only show the day/month combinations you need for each year, but since you can’t put a named set into the Filter area of a PivotTable you would have to use Excel formulas to sum up the combinations to get the totals you need:

Lukcily it is possible to build the report you need using the Cube Functions! Here’s how:

Step 1: Build Your Combinations Using CubeMember()

The first thing to point out is that the CubeMember() function does not have to just return a member, it can return a combination of members (in MDX this is known as a tuple). These combinations can be built in several ways, one of which is by using cell references to other cells that themselves contain CubeMember() functions. It’s probably easier to explain this by showing a worksheet that contains six cells with CubeMember() functions in. Here it is with the formulas visible:

Cells B5 and B10 contain references to days of the week; cells B6 and B11 contain references to months. Cells B7 and B12 contain CubeMember() functions that return the combinations we want to filter by: Mondays in July and Wednesdays in September respectively.

Here’s what the formulas above return:

You’ll notice that the ‘combination’ cells only show the month names, not the day/month combinations – this is just a feature of the CubeMember() function and can be a bit misleading, but rest assured they do return the combinations you need.

Step 2: Build A Set Using CubeSet()

The CubeSet() function is also able to build sets using cell references to cells containin CubeMember() functions. In this case I want a set containing the two ‘combination’ CubeMember() functions from B7 and B12. I can do this by using the formula:

=CUBESET($B$2,($B$7,$B$12), "The set of combinations")

Here are the formulas on the worksheet at this point:

And here’s the output:

Step 3: Reference The CubeSet() Function In Your Report

Now you have a CubeSet() function that returns the two day/month combinations, you can use this in a cube function report. When you reference a cell containing the CubeSet() function in a CubeValue() formula, the CubeValue() formula will return the aggregated value of all of the combinations in the CubeSet(). So for example, here’s a report with the Sales Amount measure on columns, Years on rows, and displaying the Sales Amount for each year filtered by the two day/month combinations:

And here’s the actual output:

Compare the numbers from the report at the bottom with the values calculated from the PivotTable in the screenshot earlier in this post, and you’ll see that we have indeed shown just the combined Sales Amount for Mondays in July and Wednesdays in September, broken down by Year.

Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 7,729 other followers

Public Power BI and SSAS Training Courses

I'm running several SSAS, MDX and Power BI-related training courses through Technitrain in 2017. Check out the Technitrain course catalogue for full details, and to see other upcoming courses from the likes of Alberto Ferrari, Andy Leonard, Allan Hirt and Alex Yates.

Need some help?

As well as being a blogger, I'm an independent consultant specialising in Analysis Services, MDX, DAX, Power BI, Power Query and Power Pivot. I work with customers from all round the world solving design problems, performance tuning queries and delivering training courses, and I am happy to work on short-term engagements. For more details see http://www.crossjoin.co.uk