This is the official team web log for SQL Server engineering (TIGER) team – the customer success team that focuses on learning customer workloads running on SQL Server, providing guidance to use SQL Server to solve their business challenges and integrating that feedback to enhance the product.

Columnstore Index – How to Estimate Compression Savings

SQL product team has made significant improvements in columnstore index functionality, supportability and performance during SQL Server 2016 based on the feedback from customers. Please refer to List of Blogs for all blogs published by SQL Tiger Team on columnstore index.

Issue
When SQL Server team released ROW and PAGE compression in SQL Server 2008, customers could invoke sp_estimate_data_compression_savings stored procedure to estimate the storage savings for ROW and PAGE compression. Note, the compression savings was just an estimate based on sampling a subset of rows from the source table and loading them into a temporary table and then measuring the size of this temporary table before/after the compression. For most cases, the compression savings estimate was good except when the data in the source table was skewed. Most customers found it useful as it was a convenient way to see the storage benefits. However, as some of you have found out, this stored procedure has not been extended to estimate storage savings from columnstore index. This is something we could consider for the future.

Workaround
For now, to estimate compression savings for columnstore index, we recommend the following steps

Create a staging table with identical schema

Load 2 million rows into the staging table. Note, I have chosen 2 million arbitrarily but it needs to be at least 1 million.

Use sp_spaceused to find the size of the table

Now create columnstore index on the table

Measure the storage using sp_spaceused.

Compare the numbers in (3) and (5)

The storage savings for the staging table as computed above will be a good estimate to work with. Here is a simple example to show case this

-- find the spaceused in the compressed state and then compare
sp_spaceused 'ccitest_temp'

-- Now you can drop the staging table
drop table ccitest_temp

Nonclustered Columnstore Index (NCCI)
Creating an NCCI does not save storage; in fact, it takes additional storage. If you are interested in finding the size of NCCI, you can follow the same step as above but instead CCI, create an NCCI and the see the increased size.

yes, it is only for CCI. NCCI will not compress the data as it is an additional index. If you are interested in finding the size of NCCI, you can follow the same step as above but instead CCI, create an NCCI and the see the increased size.