SQL Server Statistics Density – What is the purpose?

There is a table in AdventureWorks2008R2 called SalesOrderHeader which has a column called TotalDue. Let us create statistics dennsity for this column

Transact-SQL

1

2

3

4

5

USE[AdventureWorks2008R2]

GO

CREATESTATISTICS[st_TotalDue]ON[Sales].[SalesOrderHeader]([TotalDue])

GO

Now, let us display the statistics that we have just created:

Transact-SQL

1

2

DBCCShow_Statistics('Sales.SalesOrderHeader',st_TotalDue)

GO

We all know the purpose of statistics, right? It helps the SQL Query Optimizer in cardinality estimation. In other words, it helps the optimizer to produce an efficient query execution plan.

What exactly is the purpose of DENSITY factor here?

Density can be calculated as 1/ distinct values and can be cross verified using the below query.

Transact-SQL

1

2

3

4

Select[DistinctValue]=Count(DistinctTotalDue),

[Density]=1/CAST(Count(DistinctTotalDue)ASFloat)

fromSales.SalesOrderHeader

GO

But what does this mean and where does it help really?
Density, just like other items in statistics describes the data. It is a calculation of uniqueness of data in that column. It answers the question: “How often duplicate values occur in TotalDue column?”

High density would mean less unique data. And low density would mean greater uniqueness. Density can only be between 0 and 1. So what if density is 1? It means 100% duplicate records.

Density information is vital to improve the Query Optimizer’s estimates for GROUP BY operations, and that’s just one of the uses.

Let us consider the same example as above, the number of distinct values in TotalDue column of Sales.SalesOrderHeader table is 4754. Density can be computed, as mentioned earlier, as 1 / “number of distinct values,” which will be 1 / 4754, which is 0.000210349179638199 as shown in the above image.

The Query Optimizer can benefit from already available density information to estimate the cardinality of GROUP BY queries (estimated number of distinct values). Since this density information is already available, now all the optimizer needs to do is to calculate the estimated number of distinct values by calculating the reciprocal of the density value. As shown earlier in the verification code, to estimate the cardinality of the following query using GROUP BY TotalDue, we can calculate the reciprocal of the TotalDue density which will be 1 / 0.000210349179638199, that gives us 4754, and that is the estimated number of rows as shown below:

[Turn on Actual Execution Plan]

Transact-SQL

1

2

SELECTTotalDuefromSales.SalesOrderHeader

GROUPBYTotalDue

In my next post I shall show how density information can be used to estimate the cardinality of a query using local variables.