Becoming one with CALCULATE()

Intro from Rob

Hey, it starts out simple and powerful: CALCULATE is the SUMIF you always wished you’d had. It works in pivots. It’s the “anything IF.” It’s amazing, really, how many doors it opens.

Of course, CALCULATE is designed to be powerful in ways we can’t even IMAGINE in our first day/week/month of using it. You can spend years discovering all the things it can do – and that’s a good thing! But sooner or later you’re going to hit something with CALCULATE that makes you scratch your head – why is it returning THOSE results?

I myself entered this twilight zone with the Precedence Project – a series of posts that I quickly abandoned. It turns out that, practically speaking, you don’t need to achieve deep theoretical understanding of this stuff in order to achieve great results.

Below, however, Scott does a great job of resolving those mysteries. And he does so by “channeling” two old friends who live at the base of the Alps. Take it away, Scott…

Going to “Graduate School”

All right, so I’ve read Rob’s book a few times. (Heck, I am credited as tech editor on it.) I’ve devoured PowerPivotPro University. So now what, I ask Rob?

“Go forth and conquer – data is your ocean,” is his answer. He’s a practical sort of guy. Me, though? I’m never satisfied until I’ve completely torn the machine apart.

So, as I hinted in my last post, I went to graduate school and spent a few intense days engrossed in Marco and Alberto’s book.

I had to read and re-read the chapter on CALCULATE() three complete times, and for some paragraphs, probably closer to a dozen times. But then it happened. I became one with CALCULATE.

So, in the post, I am not really giving new info. Rob does a great job of the material in his book, again from that practical perspective. The Italians obviously do a great job. There are blog posts from Rob and others that discuss it as well. But this stuff is so fundamental to your understanding of DAX, I think it is worth retelling again. I know for me, I had to read different material again and again before I was one with the Matrix. So maybe my retelling will help you too.

The Boolean Case

CALCULATE() is simply the way we execute an expression in a modified filter context. The nerdy looking syntax: CALCULATE(<expression>,<filter1>,<filter2>…)

Let’s jump into an example. You already have this measure against your sales data.

[Avg Sales] := SUM(Sales[ExtendedAmount]) / COUNTROWS(Calendar)

We drop it on a pivot table broken out by “IsWeekend” and “DayOfWeek” and get something like what you see on the right.

We see that Saturday has our highest average sales, and decide we want to start building out a new measure… say [% of Saturday]. As a step in that direction, we pull out CALCULATE() and get to work!

If a filter parameter is against an existing filter context, it replaces the existing filter.

Otherwise, a new filter is added to the filter context.

Since our pivot table is split out by ShortWeekdayName (that is “on rows”), our filter parameter is just going to replace that filter with ShortWeekdayName = “Sat”. Right?

Well, Yes! That is exactly what happened! What we didn’t mention is that IsWeekday is also in the filter context, and we made no effort to replace that filter… so while the expected value of $26,319 shows up in both rows where it’s the weekend… we just get blanks for weekdays.

We simply add a second filter parameter to CALCULATE (you can pass as many as you like), this time replacing the filter context on Calendar. We could have just as well passed ALL(Calendar[IsWknd]) for the same impact on this pivot table, but it would break again once somebody drops Year/Quarter/etc on a row, column or slicer… so, we just strip away the filter on the whole darn table. We are crazy like that.

Note this is a different “style” of filter parameter to calculate. We are not passing a boolean expression, we passed it a table. That means I am typing in the wrong section and better move along…

The Table Case

Recall there are restrictions when using boolean parameters to CALCULATE(). For example, you can not reference more than one column (as the engine would get confused on which filters to remove). These restrictions (which allow for great performance benefits) are removed by passing a table as the parameter. When you pass a table, you are explicitly telling the engine “Hey DAX engine for this filter, here are all the rows I want you to use. No more, no less”.

We already saw ALL() used above, where we said “Hey DAX, clear off any filters from the Calendar table”. Another common usage is passing the results of FILTER(). What I want to do is compare the same logic we used above (in the boolean case)… but do it in the table case via FILTER().

Well, that’s… interesting. We have the same problem with IsWeekend, but now we have a new problem… there isn’t even a value for Sunday!?

Here is the thing. You see that Calendar parameter to FILTER? It is still under the influence of the original filter context. When we are evaluating Sunday the filter context is { IsWeekend=1, ShortWeekdayName=”Sun” }. From that set of rows, which of them will evaluate Calendar[ShortWeekdayName] = “Sat” to true? None of them! So, you get a blank for that Sunday row.

Will that help us? It certainly “reads nice”, but no. The result would look identical.

We have created a new filter context (with ALL) that strips filters off the Calendar table, but we still have the other filter where ShortWeekdayName must be Saturday. Both filters are applied. That is how CALCULATE works… every filter parameter is added to the filter context and basically AND’ed together. They don’t interact in any way.

If you do something silly like pass two filters, one where WeekdayName is Sat, and one is WeekdayName is Sun… you simply get no rows, because no row is both Sat and Sun.

The solution here is to move that ALL() to surround the first parameter to FILTER().

CALCULATE() is going to execute our expression [Avg Sales] for each row. On each execution the filter context will be different (each of the combinations of IsWeekend and ShortWeekdayName).

The filter context from 1, flows into each of the filter parameters passed to CALCULATE().

Each of the filter parameters to CALCULATE() then modifies the filter context in the same way as the boolean case. If the current filter context already has a filter on the column, it replaces it, otherwise it adds a new filter.

In our case, FILTER() accepts the filtered (say Weekend=Sun, Weekend=1) then nukes it by calling ALL(), then applies the new filter [ShortWeekdayName] = “Sat”… and as a result returns this new table back to CALCULATE(). “Hey DAX, for this filter, here are the rows I want you to use. No more. No less”.

Note that instead of FILTER(ALL(Calendar)… we could have passed FILTER(ALL(Calendar[ShortWeekdayName]) … and the results would be the same as the 2nd pivot table in this post. We would not have cleared the filter on IsWeekend, so Mon-Fri would be blank rows.

Conclusion

For me, this all clicked when I formed the mental model that for each filter parameter to CALCULATE(), a filter context flows in, and then flows out. In my head there are little arrows flowing into the parameter, then flowing back out. That mental image aids my understanding that my parameters (frequently a call to FILTER()) are always impacted by the existing filter context, but then my parameter can add/clear/change however it wishes before returning the filter context to CALCULATE().

If you keep that in mind, along with the understanding that all filter params to CALCULATE() will be AND’ed togther… you too will be one with CALCULATE()!

In general, every time you write a filter argument in CALCULATE such as:

table[column] =

you are in reality writing:

FILTER ( ALL ( table[column] ), table[column] = )

The performance depends on the cardinality (number of distinct values) of table[column]. Usually a column has a lower number of distinct values than the number of rows of the entire table, and for this reason it’s faster.

As far as I know, when you use All() as a top-level function argument to CALCULATE ALL() doesn’t really return a table. It merely removes all filters from that table that existed before the first filter argument of Calculate is processed.

A top-level function argument is an argument that isn’t nested inside another function like FILTER().
So in essence CALCULATE( … , ALL(Calender), …) doesn’t use the entire Calendar table to filter the final calculation dataset, it just make sure that all of the filters set on the calendar table is removed. (a filter might get added back in again by another filter argument of course as is the case here)

For this reason CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”, ALL(Calendar)) ought to be the fastest alternative.

That is because:
1.
Calendar[ShortWeekdayName] = “Sat” is transformed into FILTER(ALL(Calendar[ShortWeekdayName]) , Calendar[ShortWeekdayName] = “Sat”) and the ALL(..) here can get the distinct values of Calendar[ShortWeekdayName] from a column index instead of traversing the entire Calendar table. Here the ALL(..) is used with a table field argument instead of the entire table. It’s not a top-level argument since it’s inside the FILTER() function however, but can make use of a column index. The result of the FILTER() will be a one-row-one-column filter table with “Sat” as its value.
2.
The All(calendar) argument is a top-level argument to the CALCULATE in question and it doesn’t traverse the Calendar table nor is the Calendar table inserted as a filter table as explained above.
3.
The alternative CALCULATE expression: CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”)) would have to traverse the entire calendar table because ALL() is now not a top-level argument to CALCULATE but is nested inside the FILTER() function AND it uses the entire calendar table as the argument. It would have to examine every row to find those with Calendar[ShortWeekdayName] = “Sat”.

—-
Admittedly I don’t know if the rule from Jeffrey Wang’s blog post still applies nor whether the implementation of DAX make efficiently use of column indices. However should that be the case cardinality shouldn’t really matter in this case. (It would be nice to have this confirmed by the people having the right tools to do so though)

At point (3) you say that this expression has to iterate the entire Calendar table:
CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”))
In reality, internal optimizations might optimize such iteration, so even if the FILTER returns the rows from Calendar, the time required to obtain such a result does not require the test on ShortWeekdayName to be performend on each row of Calendar.
But this is not a general rule and a more complex logical condition will require complete iteration.

Kind of 2 different issues. The arrows in/out is to help me remember that for each filter param, the current/outer filter context is applied before my param does it’s magic… then my updated filter context flows out (and into the big AND). I hope to work on a visualization/graphic for this today to help…

Note: Just in case you try to copy formulas from the blog (to save some time), if there are double-quotes in the formula, you’ll need to (I had to) replace the copied double-quotation characters with ones you type in from your keyboard.

Could be a country-code setting, but it always seems to catch me off-guard.

When you apply a filter on a table (or on a column of a table) having one-to-many relationships with other tables, such a filter propagates to the other tables (those on the “many” side). Thus, if you apply a filter to Customers table, you are also filtering the Sales table considering only rows belonging to the filtered customers.

@Marco
My point is if you say Filter(Customer, [Some Column or Measure in Customer Table] [Some operator] [Some Value]) and this results in less number of rows left standing in the customers Table and therefore filters the Sales Table – This is Fine – The filters flow down hill

But in my example above – [mOrders] is a measure defined on the Sales Table
How come it is being used to Filter the Customers Table – which results in less number of Customer ID’s and this then flows back to the Sales Table !!!
So is the filter context not travelling up (from Sales to Customer) and then Travelling down (Customer to Sales) !!! – Like a Circular Reference !!!

you are saying “iterate the customers table and, for each Customer, check that the result of mOrders measure is greater than or equal to 2”
If you expand the measure, you have:

FILTER(Customers, CALCULATE ( SUM ( Sales[ExtendedAmount] ) ) >=2)

which means “iterate the customers table and, for each Customer, check that the sum of the ExtendedAmount value in Sales table for the rows of the customer is greater than or equal to 2”

The presence of CALCULATE (which is automatically involved when you have a reference to a measure, such as [mOrders] ) invoke what we call a “context transition”. The current row in Customer becomes a filter context, which propagates to the Sales table, filtering only the rows in Sales that correspond to the “current” Customer. This operation is repeated for all the rows of the Customers table iterated by the FILTER statement.

I assume you mean something like:
CALCULATE([Avg Sales], Sales[ExtendedAmount]) > [MedianSales])
And no, from http://technet.microsoft.com/en-us/library/ee634825.aspx
“The following restrictions apply to Boolean expressions that are used as arguments:
The expression cannot reference a measure.”