Events-In-Progress for Time Periods in DAX

Calculating the Events-In-Progress is a very common requirement and many of my fellow bloggers like Chris Webb, Alberto Ferrari and Jason Thomas already blogged about it and came up with some really nice solutions. Alberto also wrote a white-paper summing up all their findings which is a must-read for every DAX and Tabular/PowerPivot developer. However, I recently had a slightly different requirement where I needed to calculate the Events-In-Progress for Time Periods – e.g. the Open Orders in a given month – and not only for a single day. The calculations shown in the white-paper only work for a single day so I had to come up with my own calculation to deal with this particular problem.

Before we can start we need to identify which orders we actually want to count if a Time Period is selected. Basically we have to differentiate between 6 types of Orders for our calculation and which of them we want to filter or not:

Order

Definition

Order1 (O1)

Starts before the Time Period and ends after it

Order2 (O2)

Starts before the Time Period and ends in it

Order3 (O3)

Starts in the Time Period and ends after it

Order4 (O4)

Starts and ends in the Time Period

Order5 (O5)

Starts and ends after the Time Period

Order6 (O6)

Starts and ends before the Time Period

For my customer an order was considered as “open” if it was open within the selected Time Period, so in our case we need to count only Orders O1, O2, O3 and O4. The first calculation you would usually come up with may look like this:

[MyOpenOrders_FILTER] :=

CALCULATE (

DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),

FILTER (

'Internet Sales',

'Internet Sales'[Order Date]

<= CALCULATE ( MAX ( 'Date'[Date] ) )

),

FILTER (

'Internet Sales',

'Internet Sales'[Ship Date]

>= CALCULATE ( MIN ( 'Date'[Date] ) )

)

)

We apply custom filters here to get all orders that were ordered on or before the last day and were also shipped on or after the first day of the selected Time Period. This is pretty straight forward and works just fine from a business point of view. However, performance could be much better as you probably already guessed if you read Alberto’s white-paper.

So I integrate his logic into my calculation and came up with this formula (Note that I could not use the final Yoda-Solution as I am using a DISTINCTCOUNT here):

[MyOpenOrders_TimePeriod] :=

CALCULATE (

DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),

GENERATE (

VALUES ( 'Date'[Date] ),

FILTER (

'Internet Sales',

CONTAINS (

DATESBETWEEN (

'Date'[Date],

'Internet Sales'[Order Date],

'Internet Sales'[Ship Date]

),

[Date], 'Date'[Date]

)

)

)

)

To better understand the calculation you may want to rephrase the original requirement to this: “An open order is an order that was open on at least one day in the selected Time Period”.

I am not going to explain the calculations in detail again as the approach was already very well explained by Alberto and the concepts are the very same.

An alternative calculation would also be this one which of course produces the same results but performs “different”:

[MyOpenOrders_TimePeriod2] :=

CALCULATE (

DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),

FILTER (

GENERATE (

SUMMARIZE (

'Internet Sales',

'Internet Sales'[Order Date],

'Internet Sales'[Ship Date]

),

DATESBETWEEN (

'Date'[Date],

'Internet Sales'[Order Date],

'Internet Sales'[Ship Date]

)

),

CONTAINS ( VALUES ( 'Date'[Date] ), [Date], 'Date'[Date] )

)

)

I said it performs “different” as for all DAX calculations, performance also depends on your model, the data and the distribution and granularity of the data. So you should test which calculation performs best in your scenario. I did a simple comparison in terms of query performance for AdventureWorks and also my customer’s model and results are slightly different:

Calculation (Results in ms)

AdventureWorks

Customer’s Model

[MyOpenOrders_FILTER]

58.0

1,094.0

[MyOpenOrders_TimePeriod]

40.0

390.8

[MyOpenOrders_TimePeriod2]

35.5

448.3

As you can see, the original FILTER-calculation performs worst on both models. The last calculation performs better on the small AdventureWorks-Model whereas on my customer’s model (16 Mio rows) the calculation in the middle performs best. So it’s up to you (and your model) which calculation you should prefer.

The neat thing is that all three calculations can be used with any existing hierarchy or column in your Date-table and of course also on the Date-Level as the original calculation.

I found that in the real world (hard to see in Adventure Works) you can achieve a good optimization by applying a filter on a low-cardinality column in the fact table. For example, if you are filtering a range of dates and the maximum distance between Order Date and Ship Date is 60 days, you might exclude all the “other” years. Depending on data distribution, this type of optimization can be very effective (you should remove at least 80-90% of rows in order to see the improvement).

I am aware that it is very beneficial in terms of performance if the distance between the two used dates (Order Date and Ship Date) is low but what exactly do you mean by “excluding all the ‘other’ years”?
I guess what you wanted to point out is that if e.g. August 2014 is selected and the maximum distance is 60 days the calculation only needs to consider June 2014 until October 2014 which would improve performance significantly

this is an awesome piece of work, but it seems I have some trouble to use.
In my scenario I use an date table that is not related to the facttable, using all of your 3 solutions provide the same results (gladly).

This is something like an aggregated distinctcount over time …

Unfortunately this does not work with PowerView due to the lack of the relationship. As soon as I create the relationship I no longer have my (aggregated distinct count over time). Do you have an idea how I can solve my problem?

Hi Tom,
what exactly do you mean by “it does not work with PowerView”?
I guess you are referring to dynamic filtering? e.g. clicking on a bar-chart?

As soon as you have an active relationship between the Fact-table and the Date-table you would need to adopt the calculations. Otherwise you would have two relationships, the active one and the one that is created on the fly using FILTER/CONTAINS

in my first solution there was no relationship between the date table and the fact table this worked smoothly using the generate/contains approach.
Until PowerView reminded me that it needs a relationship, this was the reason why I wrote “it does not work in PowerView” 🙂

Now I have two date dimensions one is related (the normal date table) and one that is not related (even not inactive), I call it date_calc.

I adjusted the Generate/Contains approach a little and voila. I’m glad that there are some days off ahead, so that there is some time to optimize my solution.

Now I’m able to track the development (evolution) of things within a certain timeframe using a status, where the things reach a status (hopefully) or stay at their first status (the do not evolve).

If your request is simply to find orders which were ordered but not delivered on the very same day, you could imply use this formula:
SoldButNotShippedSameDay:=CALCULATE(
DISTINCTCOUNT(‘Facts'[OrderID]),
FILTER(‘Facts’,
‘Facts'[OrderDate] <> ‘Facts'[ShipDate])
)

I have a similar problem to Wesley. The problem I am encountering is that I am receiving a time period from the report and would like to evaluate if an order was shipped but not received on the first day of that date range.

do you mean that if you select July, you want to find orders that have been shipped on July 1st but not received on July 1st?
or shipped on July 1st but not received in the whole time-period of the report?

I have utilized the dax code behind this idea in an effort to breakdown open case age by month. Had to make a slight modification in terms of Min/Max and the inequality signs to pull the total open cases at the end of the month:

it seems that the operator (“=”) got lost in your comment – assuming that it was an equal sign as in your other comment. then the calculation would simply return all distinct incidents which ARRIVED on the last day of the selected time period.
you would need to filter on both, Arrival-Date and Close-Date.
You should first check out the articles that i linked to at the beginning of the blog and make your calculation work for a single day – then you can adopt it accordingly as described here

I have a similar situtation where user wants to select a date and the report has to find active ppl that point of time. I have only one table which has the start date and end date. And I have a time dimension. I don’t know which column should I join the date dimension to (if i should). As until I join ssas tabular won’t find a unique column.

the links at the very beginning of the post should cover your problem, this post is just an advanced version which also allows you to select a time range (e.g. months or weeks)
In general, you will not have an (active) relationship between your tables as relationships always require you to link two columns directly but do not allow “between”-joins. Thats why we need to use FILTER() and other functions to create this join on the fly.

but there also seems to be an issue with your Date-Table as it should contain unique dates and you should be able to link it to your fact-table – but again, for all Events-In-Progress-Calculations you must not have any active relationships between your tables

I am new to this blog, my apologies, if I am too direct.
I have a problem for which I am unable to find a solution. I was wondering if you could give some solution, would be really appreciative.
I am working in excel powerpivot, have slight knowledge of DAX.
I have a table in which I have userIDs, and their level play details. The details include level no, start time and end time of each level. I want to calculate the session count and approx duration for the for each session.
The sessions will be ofcourse for each individual user ID.
A Session will/can include multiple levels, I am taking all levels that start within 5 minutes of completion of previous level as one sesssion.

you can probably use the very same formula as I used in my example. You dont even need to take care of your +/- 5 mintues in between levels as you can do a distinct count on UserID and it does not matter whether he played 7 levels or just 1 level.
The average session duration will probably be more tricky. You would need a session-id or something, calculating this on-the-fly might not be feasible in DAX. The only thing I could think of is to create an artificial parent-child hierarchy between the different levels of the user and use the first part as the session-identifier

Thanks for this write-up. This is a very common business problem, could be orders or subscriptions like I am working on now. All customers has subscriptions with FromDate and ToDate on them, one row for each subscription, with an ID for the subscription which in addition to the customerid is the row identifier :

So, counting customers/subscriptions has already been solved thanks to Alberto’s and your article here. However, I would also like to use this table to calculate Churn/Retention etc. for our customers. The new and returning customers pattern may be applicable but that is based on sales transactions and in this case that is not good enough, since customers are billed monthly/quarterly etc. So, to get Churn I need to know which customers did I have last period (month if that is selected, year if that is selected) that I don’t have any longer.

Hi Stian,
thats actually a very interesting question!
So you want to know customers which existed in the past but do not exist anymore in the current period. As we cannot count something that does not exist anymore in the current period, we need to do the calculation the other way round – find active customers in the previous period and check if they still have a valid subscription in the current one.
Unfortunately I do not have too much time at the moment so I can just give you a littel snippet from which you can start working:

There are probably some typos in there but you should get an idea:
Find all customers with their SubscriptionIDs for the previous period
Check if they have any newer SubscriptionsIDs (which I assume is an ascending number?)
if they dont have any newer SubscriptionIDs, they churned
you probably need to add an extra check on the subscription validity and if it includes TODAY to avoid that all todays customers are counted as churn 😉

If you are using SSAS 2016 or PowerBI, you might also take a look at the new set functions like EXCEPT in combination with DAX Variables.
You could calculate a list of customers of the previous period and use EXCEPT to remove the cutsomers of the current period so only the lost customers are left. then you can do a simple COUNTROWS and would also get your results

would also be interesting in terms of performance

another approach would be to flag the last SubscriptionID of each customer in a calculated column as = partitioned by customer
having this flag you can simply extend your existing calculation with [Flag] = TRUE() to get only churned customers

I have a very similar case with retail networ where each location of stores across the country has a unique ID. Location are taken over from one company to another (eg. retail chain acquires small shop run by a private owner). Start date and end date of store operations are available. I need to calculate how many time location was taken over during in a selected time period. It would be grate to apply the solution of churning subscribers to my case as well.

I hope Gerhard will help for Stian to go forward with the churn pattern!

I has been a while since you originally posted this. However, I am relatively new with Power BI and I found it extremely helpful. I was able to use your technique to count a client census for our non-profit agency.

As I mentioned above, I was able to use your technique to compute a daily census of clients we have in our child care agency.

However, I am now trying to compute an average daily census by month using thud results. I have tried everything approach I can think of (which is a admittedly limited due to my inexperience with PowerBI) but have not been able to do this. The daily census is correct (using your approach) but I am unable to sum those amounts correctly.

For example, in November 2016 the sum of the daily census is 517. If I manually sum the computed daily census it is correct. I would then divide that by the number of days (30) to get and average daily census of 17.2.

However, I am unable to correctly sum up the daily census amounts for a month. Invariably the amounts are wrong. One measure I tried was this: (Daily Census” is the table and measure I am attempting to sum up)

I obviously do not understand some important concepts of Power BI. I would never have thought to use VALUES and datetable as the first parameter of the AVERAGEX function (I tried many, many variations of SUMX / #periods, AVERAGEX, etc). But now that you point it out it is obvious that the VALUES clause controls the range of dates to average.

I need to keep studying.

Again – I really appreciate you taking the time to reply and educate me!

I have attempted to use your measure; however, it does not seem work… Using Adventure Works 2016 DW, I get respectively 5, 4, and 5 for the first three days of Order Date data (2010-12-29, 2010-12-30, 2010-12-31). Instead, it should be 5, 9, and 14.
In other words, the code from the post is act no different than the distinct count. What am I doing wrong???

Hi Kevin, I would assume that you have an active relationship between ‘Date'[Date] and ‘Internet Sales'[Order Date] which filters the final result back down to the Orders that where placed on the selected day. Can you try to add either ALL(‘Date’) or CROSSFILTER(‘Date'[Date], ‘Internet Sales'[Order Date], None) to the inner ‘Internet Sales’ like this:
CALCULATETABLE(‘Internet Sales’, ALL(‘Date’))