Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Month: September 2015

There’s a new M function in the latest release of Power BI Desktop that I spotted: Tables.GetRelationships(). Here’s the help page for it:

Basically, what it does is return all the relationships between all of the tables in a data source such as SQL Server. The documentation is a bit cryptic but I worked out what it does.

When you use a function like Sql.Database(), you get a table that contains a list of all of the tables in a database. For example, if I use the expression:

Sql.Database("localhost", "adventure works dw")

On my local instance of the Adventure Works DW database, this is the table that is returned:

This, it turns out, is the “navigation table” that the Tables.GetRelationships() function needs for its first parameter. The column called “Data” in this table, which contains links to the actual tables in the database, is what Tables.GetRelationships() needs for its second parameter. Put the two functions together in a query like this:

Expand all the columns here and you get a table with one row for every relationship detected between every table in the database:

Useful if you need to report on the structure of a database, I guess. It’s a shame that this isn’t available in Power Query in Excel yet (it isn’t as of September 2015, I assume it’s coming soon) because it would be cool to use this data with NodeXL.

There are loads of great new features in today’s release of Power BI Desktop, but for me the most important by far is the introduction of calculated tables. Miguel Llopis gives a good introduction to what they are in the post announcing the release, but I thought it was worth going into a bit more detail about what they are and why they’re so useful.

What are calculated tables?

Calculated tables are tables in the Data Model whose data source is a DAX expression that returns a table. Here’s a simple example. Imagine that you have already imported the DimDate dimension table from the Adventure Works DW database into your Power BI Data Model. If you go to the Data tab you would see the contents of that table shown, and on the ribbon you can see the new New Table button:

Clicking the button allows you to enter a new table name and a DAX expression that returns the table, such as this one that returns a filtered subset of the rows in the DimDate table:

Calculated tables are created when the data in the model is refreshed (like calculated columns), and after that behave like any other table – so you can create relationships between calculated tables and other tables. You can also create calculated tables whose DAX expressions reference other calculated tables. They do take up memory like other tables too, so over-using them could be a bad thing.

Why are calculated tables useful?

Miguel’s blog post already lists some of the scenarios where calculated tables are useful, and I can already think of lots of practical scenarios where I’m going to be using them myself.

Role playing dimensions are one obvious use: in a lot of models you need to use the same dimension table more than once in different places, with different relationships and maybe with different filters in place. It might be that you have a single Company dimension in your data warehouse that contains all of the companies your organisation does business with; with calculated tables you only need to import that table once, and you can then use calculated tables to create filtered copies of that table to use as Supplier and Customer dimension tables, joining them to your Stock and Sales fact tables, and only showing the relevant companies in each case.

Certainly for debugging complex DAX expressions they’re going to be handy, because they allow you to see what DAX table expressions return. We’ve already DAX Studio for that but now we don’t have the hassle of switching to another application…!

I can also see calculated tables as a way of doing certain types of ETL – which raises the question of whether you should do a certain operation in Get Data (ie what was Power Query) or using a calculated table. I strongly suspect that a lot of operations are going to be much faster with calculated tables because of the power of the underlying engine. It would be interesting to know if there are plans to allow Get Data to make use of calculated tables, for example as a way of buffering tables in memory, with M transformations folded back to DAX on those tables.

The Calendar() and CalendarAuto() functions

If you were wondering what the new DAX Calendar() and CalendarAuto() functions were for, well, you can probably guess now – Date tables. The Calendar() function returns a table of continuous dates within a given range. So, the expression

CalendarDemo =
CALENDAR ( "1/1/2015", "2/2/2015" )

Will return a table with one column containing all the dates from January 1st 2015 to February 2nd 2015:

The CalendarAuto() function looks at all of the Date columns in all of the other tables in the model, and returns a similar table but one where the first date is the beginning of the year that contains the earliest date found in any non-calculated column in any non-calculated table, and where the last date is the end of the year that contains the latest date found in any non-calculated column in any non-calculated table. By default the beginning of the year is January 1st and the end of the year is December 31st, but there’s also an optional parameter to specify a different month to end the year on, if you want to create a fiscal calendar table.

I’m going to be speaking at two SQL Relay events in October. SQL Relay is a series of one-day SQL Server events held in various places all over the UK and is always well worth attending. I’ll be speaking at:

[This blog post is relevant to Power Query in Excel 2010/2013, the Get & Transform section on the Data tab in Excel 2016, and the Get Data screen in Power BI Desktop. I’m going to use the term ‘Power Query’ in this post to refer to all of the previously mentioned functionality]

Sometimes, when you’re working with a table of data in Power Query, you want to be able to get the value from just one cell in that table. In this blog post I’ll show you how you can do this both in the UI and in M code, and talk through all of the more advanced options available in M. Incidentally this is a topic I covered in some detail in the M chapter of my Power Query book, but since that book is now somewhat out-of-date I thought it was worth covering again in a blog post.

Referencing Cell Values In The UI

Imagine your data source is an Excel table that looks like this:

If you import it into Power Query, and you want to get the value in the cell on the second row in ColumnB, then in the Query Editor window you just need to right-click on that cell and select Drill Down:

…and bingo, you have the value 5 returned:

Note that this is the value 5, and not the value 5 in a cell in a table – a Power Query query can return a value of any data type, and in this case it’s going to return a single integer value rather than a value of type table. If you load the output of this query into Excel you’ll still see it formatted as a table, but if you’re using the output of this query as an input for another query (for example, you might want to read a value from Excel and use that value as a filter in a SQL query) it’s much more convenient to have an integer value than a table with one column and one row.

Referencing Cell Values in M

You can see in the screenshot above the M code generated for the Drill Down by the UI, and probably guess how it works. Here’s a cleaned-up version of the query from the previous section for reference:

ChangeDataTypes sets the data types for the three columns in the table to be Whole Number

GetMiddleCell returns the value from the middle cell of the table returned by the ChangeDataTypes step

M allows you to refer to individual values in tables by a system of co-ordinates using the name of the column and the zero-based row number (tables in Power Query are always assumed to be sorted, therefore it making it possible to ask for a value from the nth row). So the expression
ChangeDataTypes{1}[ColumnB]

returns the value from the cell on the second row in the column called ColumnB of the table returned by ChangeDataTypes, which is 5. Similarly, the expression

ChangeDataTypes{0}[ColumnC]
returns the value 3, which is the value in the column ColumnC on the first row.

It’s also worth pointing out that the row and column reference can be in any order, so the expression

ChangeDataTypes{1}[ColumnB]

…returns the same value as

ChangeDataTypes[ColumnB]{1}

As you’ll see in a moment, the order that the row and column reference come in could be important.

Referring To Rows Or Columns That Don’t Exist

What happens if you write an expression that refers to a row and/or column that doesn’t exist? You get an error of course! So using our example query, the expressions

ChangeDataTypes{4}[ColumnB]

and

ChangeDataTypes{1}[ColumnD]

…will both return errors because there isn’t a fifth row in the table, and there isn’t a column called ColumnD.

However, instead of an error you can return a null value by using the ? operator after the reference. For example, the expression

ChangeDataTypes{1}[ColumnD]?

returns the value null instead of an error:

You have to be careful though! The expression

ChangeDataTypes{4}?[ColumnB]

still returns an error, not because there isn’t a fifth row but because the reference to the fifth row returns a null value and there is no column called ColumnB in this null value.

The solution here is to reverse the order of the references, like so:

ChangeDataTypes[ColumnB]{4}?

or even better use ? with both references:

ChangeDataTypes{4}?[ColumnB]?

Unfortunately using ? won’t stop you getting an error if you use a negative value in a row reference.

The Effect Of Primary Keys

Did you know that a table in Power Query can have a primary key (ie a column or columns whose values uniquely identify each row) defined on it? No? I’m not surprised: it’s not at all obvious from the UI. However there are several scenarios where Power Query will define a primary key on a table, including:

When you import data from a table in a relational database like SQL Server, and that table has a primary key on it

When you use the Remove Duplicates button to remove all duplicate values from a column or columns, which behind the scenes uses the Table.Distinct() M function

The last step is the important one to look at. The row in the reference is no longer by the row number but by the value from the primary key column instead:

RemovedDuplicates{[MyKeyColumn=”SecondRow”]}[ColumnB]

You can still use the previous row number-based notation, but when a table has a primary key column defined on it you can also use a value from the primary key column to identify a row.

A Last Warning About Performance

Being able to reference individual values like this is incredibly useful for certain types of query and calculation. However, bear in mind that there are often many different ways of solving the same problem and not all of them will perform as well as each other. One obvious use of the techniques I’ve shown in this post would be to write a previous period growth calculation, where you need to refer to a value in a previous row in a table – but my experience is that writing calculation using row and column references prevents query folding and leads to poor performance, and an alternative approach (maybe like the ones shown here and here involving joins) often performs much better. There aren’t any general rules I can give you though, you just need to make sure you test thoroughly.

A month or so ago, before I went on holiday, I was working on a really cool MDX idea that involved the Axis() function. Unfortunately I’ve forgotten what that idea was but while I was working on it I did find out something interesting about the Axis() function – namely that it doesn’t do exactly what the documentation says it does.

The documentation says that the Axis() function returns the set of tuples on a given axis in an MDX query. Here’s a simple example query on the Adventure Works cube showing it in action:

WITH
MEMBER MEASURES.TEST AS SETTOSTR(AXIS(1))
SELECT {MEASURES.TEST} ON 0,
[Customer].[Gender].MEMBERS ON 1
FROM
[Adventure Works]

Here, I’m using the SetToStr() function to take the set returned by the Axis() function and display it in a calculated measure. As you can see from the screenshot, I’m showing all three members from the Gender hierarchy on the Customer dimension on rows and the set returned by Axis(1) is indeed that set.

BUT, now look at this second query and what it returns:

WITH
MEMBER MEASURES.FIRSTMEMBER AS
MEMBERTOSTR(AXIS(1).ITEM(0).ITEM(0))
MEMBER MEASURES.TEST AS
IIF(
[Customer].[Gender].CURRENTMEMBER.UNIQUENAME =
MEASURES.FIRSTMEMBER, NULL, 1)
SELECT MEASURES.TEST ON 0,
NON EMPTY
[Customer].[Gender].MEMBERS ON 1
FROM
[Adventure Works]

Why is this interesting? The calculated measure FIRSTMEMBER returns the unique name of the first member in the set returned by Axis(1), which should be the first member shown on the rows axis. The calculated measure TEST returns null if the currentmember on the Gender hierarchy has the same unique name as the member returned by FIRSTMEMBER. The calculated measure TEST is on columns in the query, and on rows we get all the members on the Gender hierarchy that return a non null value for TEST. Since only Female and Male are returned, the All Member on Gender must return null for TEST, which means that the All Member is the first member in the set returned by the Axis() function.

So, to summarise, the Axis() function actually returns the set of members on an axis the current query before any NON EMPTY filtering is applied.

Office 2016 is on the verge of being released, and although Power BI is the cool new thing Excel 2016 has added several new BI-related features too. What is also interesting – and less well publicised – is that several of the BI features in Excel 2016 have been rebranded. Specifically:

Power Query (which is now no longer an add-in, but native Excel functionality) is not called Power Query any more, but “Get & Transform”. It has also been squeezed onto the Data tab, next to the older data import functionality:

Power Map is not called Power Map any more, but “3D Maps”

Power View is still Power View, but as John White points out here it is no longer visible on the ribbon by default, hidden from new users, although it’s easy to add the Power View button back onto the ribbon. Power View in Excel 2016 is unchanged from Power View in Excel 2013. Read into this what you will.

Although Power Pivot still has its own tab on the ribbon (and has finally got a space in the middle of its name), there’s also a “Manage Data Model” button on the Data tab in the ribbon that is visible even when the Power Pivot add-in has not been enabled: Clicking this button opens the Power Pivot window. There’s a subtle distinction between Power Pivot the add-in and the Excel Data Model (which is the database engine behind Power Pivot, and which is present in all Windows desktop editions of Excel regardless of whether the add-in is enabled or not) that has existed since Excel 2013 and which is generally unknown or misunderstood. The fact this button is titled “Manage Data Model” rather than “Power Pivot” is telling.

All the add-ins now have the collective name “Data Analysis add-ins” and can be enabled with a single click:

So, clearly Excel has moved away from branding all its BI functionality as Power-something. My guess, informed by various conversations with various people in the know, is that this has happened for a couple of reasons:

The ‘Power’ prefix was intimidating for regular Excel users, who thought it represented something difficult and therefore not for them; it also made it look like this was functionality alien to Excel rather than a natural extension of Excel.

Having separate Power add-ins led to a disjointed experience, rather than giving the impression that all of these tools could and should be used together. It also made comparisons, by analysts and in corporate bake-offs, with other competing tools difficult – were the Power-addins separate tools, or should they be considered a single tool along with Excel?

Previously there was a lot of confusion about whether these add-ins are anything to do with ‘Power BI’ or not. Up to now, depending on who you talked to, they either were or weren’t officially part of Power BI. Now there is a clear distinction between Excel and Power BI, despite the close technical relationships that remain.

The new names certainly address these problems and on balance I think this change was the right thing to do, even if I was quite annoyed when I first found out about them. There are significant downsides too: for example, changing the names means that several years of books, blog posts, articles and conference presentations about Power Query and Power Map now won’t be found by new users when they search the internet for help. Similarly, it won’t be obvious to new users that a lot of content is relevant for both Power BI Desktop and Excel. Now that the Power Query name has been de-emphasised, why should anyone looking at my old blog posts on that subject know that what I’ve written is still relevant for Excel 2016’s “Get & Transform” and Power BI Desktop? What would I call a second edition of my Power Query book, if I wrote one, given that Power Query exists only as the relatively nondescript “Get & Transform” in Excel 2016 and “Get Data” in Power BI Desktop?

Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 7,729 other followers

Public Power BI and SSAS Training Courses

I'm running several SSAS, MDX and Power BI-related training courses through Technitrain in 2017. Check out the Technitrain course catalogue for full details, and to see other upcoming courses from the likes of Alberto Ferrari, Andy Leonard, Allan Hirt and Alex Yates.

Need some help?

As well as being a blogger, I'm an independent consultant specialising in Analysis Services, MDX, DAX, Power BI, Power Query and Power Pivot. I work with customers from all round the world solving design problems, performance tuning queries and delivering training courses, and I am happy to work on short-term engagements. For more details see http://www.crossjoin.co.uk