Archive for May 2013

At last, the Excel 2013 app I’ve really been waiting for! Those data visualization apps from last week have generated a lot of interest, but this is even cooler for a data geek like me.

A few months ago I came across FlatMerge, a startup that allows you to upload data and then make it available as an OData feed; I was going to blog about it but my fellow OData fan Jamie Thomson beat me to it. However at that point it was only a website where you could upload data… today, FlatMerge released its own, free (for the time being) Excel 2013 app which allows you to upload data direct from Excel. So you can take data from an Excel table:

Save it to FlatMerge:

And then import it into Excel 2013, PowerPivot, Data Explorer or any tool that supports OData feeds. Here’s the URL for the table I just uploaded (which, if I’ve read the docs correctly, should be publicly available):

It’s still a version 1.0 and there are a few features it’s missing that I’d like to see (like the ability to update a data source, and to control who has access to that data), but I think it’s very cool. I’ve seen tools that allow you to share data from Excel before but this is the first that uses OData, and this means you have a much greater degree of flexibility about how you consume your data. Arguably you could do the same thing by saving your Excel file to Sharepoint 2013 Excel Services and using the OData feed from an Excel Services table, but that’s a much more expensive and less user-friendly option.

I can imagine a whole bunch of uses for this, for example in a budgeting application where multiple Excel users need to submit their figures, which then need to be consolidated in a single Excel spreadsheet, maybe using Data Explorer.

I don’t usually like to blog about topics that I think other people have blogged about already, but despite the fact that Mosha blogged about this several years ago (in fact more than eight years ago, blimey) this particular problem comes up so often with my customers and on the MSDN Forum that I thought I should write something about it myself. So apologies if you know this already…

Here’s the problem description. If you define a calculated measure in MDX, that calculation will take place after the real measure values have all aggregated. For example, consider a super-simple cube with a Year dimension, two real measures called A and B and a calculated measure called [A * B] that returned the value of A multiplied by B. In a PivotTable you’d see the following result:

Note how the Grand Total for the [A * B] calculated measure is 12*16=192, and not 12+12+12+12=48. This is expected behaviour for calculated measures, and indeed the way you want your calculations to behave most of the time.

However, there are some scenarios where you want to do the calculation first and then aggregate up the result of that calculation; in our previous example that means you’d get 48 for the Grand Total instead. Currency conversion and weighted averages are common examples of calculations where this needs to happen. How can you handle this in MDX?

Let’s look at a slightly more complex example than the one above. In the following cube, based on Adventure Works data, I created a Date dimension that looks like this:

…and a Product dimension that looks like this:

I also created two measures on a fact table called A and B:

Now, let’s say that once again you want to calculate the value of A*B at the Date and Product granularity, and aggregate the result up. Probably the easiest way of handling this would be to do the calculation in the fact table, or in the DSV, and then bringing the result in as a new real measure. However this may not be possible with some types of calculation, or if the granularity that you want to do the calculation is not the same as the granularity of the fact table.

One way of approaching this in MDX would be to create a calculated measure like this:

The big problem with this approach (apart from the fact that it may break when you do a multi-select in certain client tools – but you could work around that) is that it is usually very, very slow indeed. Depending on the calculation, it may be unusably slow. So you need a different approach.

This is where scoped assignments come in. If you make a scoped assignment to a real measure, as opposed to a calculated measure, then the value of the assignment will aggregate up outside of the original scope. So, in this case, since you want the calculation to take place at the Date and Product granularity, if you scope on a real measure at that granularity the result of the calculation will aggregate up automatically.

The first step here is to create a new real (ie not calculated) measure for the calculation. This can be done in the DSV by creating a named calculation on your fact table which returns the value NULL:

You then need to create a new real measure on your measure group from this new named calculation:

In this example, I’ve left the AggregateFunction property of the measure to be the default of Sum, but you could use a different setting if you wanted a different type of aggregation. The next step is to process the cube, and once you’ve done that you’ll see a new measure that only returns the value 0:

Next, you need to create the scoped assignment on the Calculations tab of the Cube Editor. If you remember in my post last week about scoped assignments, I recommended avoiding writing scopes using user hierarchies; using only attribute hierarchies the scope statement becomes:

One very important thing to notice: the sets I’ve used for scoping on the Dates and Products do not include the All Member: for example, [Date].[Date].[Date].MEMBERS. If you use a set that includes the All Member, such as [Date].[Date].MEMBERS, the calculation will not aggregate up correctly.

Here’s the result:

This is going to be much more efficient than the pure MDX calculated measure approach, though just how well the calculation performs will depend on the complexity of the calculation and the size of the area that you are scoping on.

To test the Treemap out, I used Data Explorer to get the overall size on disk of the contents of the folders I use to store my presentation materials; I won’t go into detail about how I did it, but Erik Svenson has a great post on how to do this here. I ended up with a the following treemap:

It’s worth pointing out one cool thing about these apps: they still work when your worksheet is deployed to Sharepoint and viewed in a browser with the Excel Web App, even in Office 365!

I’ve had a lot of requests for more MDX content on my blog, so here’s something I’ve been meaning to write up for a long time: a worked example of how to use scoped assignments to implement two different types year-to-date calculation on two different hierarchies in the same dimension. Knowledge of how to use scoped assignments is the sign of a true MDX master (you can watch a video of a session I gave on the basics of scoped assignments at SQLBits here if you’re unfamiliar with them) but that’s because they can be very difficult to write and there’s surprisingly little information out there on the internet about them. They are incredibly powerful, though, and often they provide the most elegant and best-performing way to solve a problem.

Let’s start by looking at the Date dimension in the Adventure Works DW sample database, and more specifically the attributes, user hierarchies and attribute relationships:

Notice how we have two user hierarchies for Calendar Years (which start on January 1st) and Fiscal Years (which start on July 1st), called Calendar and Fiscal. Now, let’s say that you have a requirement to to show year-to-date values for a measure for both the Calendar and Fiscal hierarchies. It would be very easy to implement this as two separate calculated measures but what if you needed to show both types of year-to-date in the same calculated measure, showing Fiscal year-to-dates when the Fiscal hierarchy was used in a query and Calendar year-to-dates when the Calendar hierarchy was used in a query?

This is possible using scoped assignments. The first thing to point out, though, is that this is only going to be possible if you change the structure of the dimension. Why? Well, take a look at the Date levels of both hierarchies: they are both built using the Date attribute. If you were running a query with your YTD calculation on columns and only the Date attribute hierarchy on rows, would you expect to see Calendar or Fiscal YTD values? Certainly you couldn’t see both in the same cell, and this is the problem: if you expect to see Calendar YTD values at the bottom of your Calendar hierarchy, and Fiscal YTD values at the bottom of your Fiscal hierarchy, you need two separate Date attributes to do this. If you overwrite the values in the same cells twice using a scoped assignment, then you will only see the result of the second assignment.

Therefore, what we need to do is to create two new attributes, Calendar Date and Fiscal Date, to serve as the lowest levels of the Calendar and Fiscal hierarchies instead of the Date attribute. Here’s what the new attribute relationships look like:

From the end-user’s point of view nothing seems to have changed (these new attributes can have their AttributeHierarchyVisible property set to False) but this now means we have two attributes, two different slices of the cube, whose values we can overwrite separately instead of just one.

Now for the MDX. A good rule to follow when writing scoped assignments is to always use attribute hierarchies and never use user hierarchies; this is because there are rules you have to obey about the shape of the subcube of data you are overwriting with your scoped assignment. When defining a scope using only attribute hierarchies, you can only use the following types of set:

Every single member on the attribute hierarchy

Just one member on the attribute hierarchy

Multiple members on the attribute hierarchy not including the All Member

I wrote up a detailed set of rules for defining scopes in MDX Solutions, if you’re interested; if you don’t follow these rules you’ll get the infamous “An arbitrary shape of the sets is not allowed in the current context” error (I know a joke about that, incidentally).

For this calculation, we need to make two scoped assignments on a single calculated measure called [YTD Sales]: one to show the Fiscal YTD calculation for the Fiscal Date, Fiscal Month Name, Fiscal Quarter, Fiscal Semester and Fiscal Year attributes; and one to show the Calendar YTD calculation for the Calendar Date, Calendar Month Name, Calendar Quarter, Calendar Semester and Calendar Year attributes. When scoping on ranges of attributes like this, there’s another easy rule to follow: scope on the set of every member, including the All Member, from the attribute hierarchy at the lowest granularity attribute, and the set of every member, not including the All Member, from the highest granularity attribute. These two sets also need to be in the same, rather than separate, SCOPE statements for reasons I explain here.

Now you can see the YTD Sales calculated measure returns Calendar YTD values for the Calendar hierarchy:

…and it returns Fiscal YTD values for the Fiscal hierarchy:

There’s one last trick I want to show. It can be very difficult to know that your scoped assignment is covering the subcube you want it to cover, so while debugging scoped assignments I find it helps to assign values to the BACK_COLOR cell property so I can see exactly where I’m scoping. Here’s the MDX above with extra assignments to set the cell background colour to orange for the Calendar YTD calculation and blue for the Fiscal YTD calculations:

It can also help you understand what’s going on in the scenarios where the scopes overlap, for example if you put the Calendar and Fiscal hierarchies on rows and columns in the same PivotTable: the Fiscal hierarchy takes precedence, because it’s defined second.

With my Technitrain hat on I’m sitting in on Marco’s Advanced DAX course in London today, and the question of comments in DAX came up – which reminded me that this is something I’ve been meaning to blog about. DAX as a language supports comments, but unfortunately it’s not possible to add comments inside a DAX measure or calculated column expression in either PowerPivot or SSAS Tabular right now (which is where they’re most needed – I hope this changes in the future). That said, there are some other things you can do to add textual explanations and descriptions to your DAX measure code.

Before we get onto the workarounds, a quick word about comments in DAX. These can only be used in DAX queries, and the types of comment supported are the same as in MDX: double-dashes and double-forward-slashes for single line comments, and forward-slash-asterisk to start a multi-line comment and asterisk-forward-slash to close a multi-line comment. Here’s an example:

--single line comment

//another single line comment

/*a multi-line

comment*/

evaluate table1

What can be done with measures though? After all, that’s where the most complex DAX is usually written.

First of all, you can add a description to a measure by right-clicking on it in the measure grid and selecting Description:

Unfortunately this description is not easily accessible to end users anywhere (it would be great if it appeared as a tooltip in a PivotTable, for example) but it can be seen in an Excel worksheet by running a DMV query. DMV queries can be run in Excel 2013 in the same way as DAX queries, using a query table as described here; the DMV query to use is:

select

measure_name as [Measure Name], [description], measure_is_visible

from $system.mdschema_measures

Unfortunately all hidden and implicit measures are returned, and even when the table is filtered so that only measure_is_visible=true there are still a lot of measures that probably shouldn’t be shown.

Similarly, descriptions can be added to any column (calculated or not) in your model, again by right-clicking on it and selecting Description.

This description can be displayed in the worksheet using the following DMV query:

select

hierarchy_name as [Column Name], [description] as [Description]

from $system.mdschema_hierarchies

where cube_name='model'

You can also write text direct to cells in the measure grid too. When I first saw a customer do this I was worried that it might not be supported, but I’ve been told that it is; so long as you don’t use the =: used for defining measures then you should be ok.

This is probably the best way to add comments to your code, if only because it’s the most visible to anyone looking at your PowerPivot/SSAS Tabular model. Of course, for it to be effective you’ll need to have a system for arranging your measures in the measure grid; in “SQL Server Analysis Services 2012: The BISM Tabular Model”, Marco, Alberto and I recommended that you arrange all your measures in the top-left hand corner of the measure grid and I think that’s still a good idea, but the use of text in cells to create headings for groups of measures as well as descriptions can help a lot too.

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

One of the first questions I get asked after showing someone PowerPivot for the first time is “Can I add new data to a PowerPivot table that already has data in it?”. Out of the box, of course, the answer is no: when you process a table in PowerPivot you have to reload all the data from your data source, you can’t just append new data (unless you’re using copy/paste to load data, which isn’t a good idea). However, there are a lot of self-service BI scenarios where the ability to do this would be extremely useful: for example, you might want to scrape stock quotes from a web page every day and then, in an Excel workbook, accumulate that data in a table so you can analyse historical stock prices with PowerPivot. I ran into a scenario very much like this last week and I thought that Data Explorer should be able to help here. It can, but it’s not obvious how to do it – hence this blog post!

Here’s a super-simple example of how to accumulate data in a table then. Let’s start with a csv file that contains the following data:

Product,Sales
Apples,1
Oranges,2

It’s straightforward to import this data into Excel using Data Explorer and the ‘From csv’ data source:

Here’s the code that Data Explorer generates:

let

Source = Csv.Document(File.Contents("C:\InputData.csv")),

FirstRowAsHeader = Table.PromoteHeaders(Source),

ChangedType = Table.TransformColumnTypes(FirstRowAsHeader,

{{"Product", type text}, {"Sales", type number}})

in

ChangedType

Now, let’s imagine that you want to keep the data from this file in Excel and every time you click Refresh in Data Explorer you add the data from the file onto the end of the existing data you’ve already captured. The first thing you’ll probably want to do in this scenario is add a new column to the data that gives the date and time that the data was loaded, and you can do that quite easily in Data Explorer using the DateTimeZone.UtcNow() function as follows:

Table.AddColumn(ChangedType, “Load Date”, each DateTimeZone.UtcNow())

Data Explorer has functionality to append the data from one query onto the end of another query, but the problem you have to solve now is that when you click Refresh you want the new data to be appended onto the end of the data that has already been collected. It’s a recursive scenario not unlike the one I grappled with here. The solution to this problem is to first of all load the data into the PowerPivot (ie what we should be calling the Excel Data Model now) by clicking on the Load To Data Model link in the Data Explorer query pane:

Then, on a new sheet, create an Excel query table that returns all the data from the PowerPivot table that you’ve just loaded data into. Kasper shows how to do this here; there’s no need for any special DAX, you just need to connect to the PowerPivot table in the Existing Connections dialog:

At this point you should have two tables on two sheets that contain the same data. The next step is to modify the original Data Explorer query so that it contains a new step that appends data from the table you’ve just created (ie the table getting the data from PowerPivot) onto the data from the csv file. This can be done with three new steps, first to get the data from the new Excel table:

Now as I said, this is just a super-simple example and in the real world you’d need extra functionality to do things like delete rows you’ve already loaded and so on; but that’s all doable I think. It’s also worth mentioning that I encountered some strange errors and behaviour when implementing this, partly due to Data Explorer still being in preview I guess, so if you want to recreate this query you’ll need to follow my instructions exactly.

Quick summary: DAX measures in SSAS Tabular that use the UseRelationship() function return an error when row security is applied to a table. I’m surprised this hasn’t been documented somewhere – I know Marco came across it some time ago, but I ran into it again recently so I thought I’d mention it.

Consider the following simple SSAS Tabular model, based on Adventure Works DW:

There’s an active relationship between DateKey and OrderDateKey, and an inactive relationship between DateKey and ShipDateKey. The following measure returns the sum of Sales Amount and activates the inactive relationship:

Public MDX and SSAS Training Courses

I'm running several SSAS, MDX and PowerPivot-related training courses through Technitrain in 2015. Check out the Technitrain course catalogue for full details, and to see other upcoming courses from the likes of Allan Hirt, Jamie Thomson, Andy Leonard, Thomas Kejser and others.

Need some help?

As well as being a blogger, I'm an independent consultant specialising in Analysis Services, MDX, PowerPivot and DAX. I work with customers from all round the world solving cube design problems, performance tuning Analysis Services queries and calculations, writing complex MDX and delivering training courses, and I am happy to work on short-term engagements. For more details see http://www.crossjoin.co.uk