Partitioned Clustered Columnstores - Mind your deltastores!

March 10, 2018

We use an hourly partitioning scheme coupled with a clustered columnstore to support ingestion and short-term retention of various telemetry data. A fairly common pattern is for us to deploy an hourly partition schema based on the InsertDate of data (a column with a default of getutcdate()).

One thing we’ve had to keep an eye on when using this kind of pattern is the number of open deltastores that can get left behind as inserts ‘move on’ to a new partition.

Hourly partitions are used as they are also the unit of truncation as data ages out in our envionrment. Due to the pain of the ‘dangling deltastores’ (as well as high partition counts impacting DMV usage) we’re looking at making daily the smallest partition size we support.

When partitioning a clustered columnstore table you will have one or more deltastores per partition. The largest I’ve seen so far is 20 open in a single rowgroup (with row counts ranging from 1M to 10k), though 4 is about average for this particular environment.

These open deltastores have a number of drawbacks:

Increased size on disk for uncompressed rows

Lack of rowgroup elimination for the rows in the deltastores

Reduced query performance on queries which work in the rowstores

The rest of this post will cover these in a bit more detail, as well as how to see if you’re impacted and what to do about.

Increased size on disk

The first example is best shown by a table which had escaped any kind of index maintenance:

The deltastores might be less than 20% of the rows, but they’re more than 80% of the space used for the table! In the example above compressing the open rowgroups reduced the table from 500GB to 150GB.

Lack of rowgroup elimination

Rowgroup elimination is the ability of a query against the columnstore to skip entire rowgroups based on the metadata about min/max values stored for each column segment in the rowgroup (see sys.column_store_segments). When querying our telemetry table for a single record in the last day (already leveraging partition elimination to limit our query), we’ll also supply a predicate for a ServerId we know doesn’t exist. This is beyond the max value that should exist in any rowgroup.

Partition elimination won’t work if your data types are more precise than the column you’re partitiong on. In our case using getutcdate() would have been converted to datetime2(7), more precise than our InsertDateTime column which is datetime2(3), resulting in a full table scan [ouch]. Segment elimination has no such qualms about data type precision!

The statistics of this query show that every segment was skipped (they were successfully eliminated), yet we spent 10 seconds doing an awful lot of reads. These reads were against the deltastores, as SQL Server had to check every single row to ensure none of them were for ServerId 99999999.

Reduced query performance

In addition to a lack of rowgroup elimination, there are additional performance issues when operating against deltastores. Although deltastore scans do seem to benefit from batch mode, they don’t benefit from aggregate pushdown.

How to check if you’re impacted

The dmv sys.column_store_row_groups tells you the status of each of your row groups - you’re looking for any open row groups that are no longer being inserted into, which is extremely common if you partition by an ever-increasing datetime value, moreso as you have smaller partitions.

Fixing the problem

Although the command is easy, the impact to your system if you have a lot of open rowgroups might not be! You might instead opt to only compress a subset of partitions (much more typical if you have active partitions accepting inserts):