SQL Server, T-SQL, BI and more…

Menu

Distinct Count in Analysis Services

Business users are often interested in finding distinct counts in addition to other aggregations. They may want to know the number of distinct customers calling customer support, the number of distinct employees selling business, or the number of distinct products that are being sold. Note the difference between distinct count and the other aggregations usually modeled in OLAP cubes; think sum of sales, count of orders.

Requirement:

I’ll use an online forum website as an example. Assume *SiteX* has several users, and these users are allowed to post on forums daily. Given this, let’s suppose, we want to find the count of distinct users that are posting in forums on any given day.

Star Schema:

I have created a star schema with two dimension tables, DimUser to store all the users, and DimDate to store all the dates. Fact table FactForum records user activity on forums.

DimData and DimUser has five rows each. The following image shows the Fact data – all the posts were posted by three distinct users.

Cube:

Next, I’ve created a cube with dimensions and measure group from the star schema.

The following is a snapshot from the cube browser, showing the total posts sliced by user and date.

It’s a walk in the park thus far. ain’t it?

Distinct Count:

Next, I created a new measure from the measures pane, configured usage to count of rows, and source table to DimUser. This creates a measure group from a dimension, in other words, dimension DimUsers now acts as a dimension as well as a measure.

The above dimension usage shows that DimUsers measure group is related to DimUsers dimension. The relationship is a fact relationship.

Lastly, I added a many to many relationship between DimUsers and DimDate. Many to many requires an intermediary measure group which in this case is FactForums.

The following image from the cube browser shows total posts and number of distinct users posting over calendar date (dimension).

Conclusion:

This is one way of modeling distinct count in analysis services. I find this useful for several reasons, one of which is performance.