Problem

An MSSQLTips webinar was presented about the columnstore index features in the
SQL Server 2016 database engine. This tip answers some of the questions asked during
the webinar. You can view the webinar on demand
here.

Solution

Question 1 – Rebuilding a Columnstore Index

With SQL Server 2016 rebuilding columnstore indexes is not mandatory anymore to get rid of
columnstore index "fragmentation", but is it always efficient regarding the
fragmentation level of the index?

Columnstore indexes don’t have fragmentation in the same sense that traditional
(aka row-store) indexes have. The “fragmentation” that you have in columnstore
indexes is mainly determined by the number of deleted rows in a rowgroup.

Maintenance on a columnstore will focus on two parts:

The deleted bitmap. This is a bitmap which indicates which rows have been
(logically) deleted from the rowgroups. This is often referred to as “columnstore
index fragmentation”. You can also have segments where all data has been
deleted, which can also count as some sort of fragmentation.

The open delta stores. These are row-storage heaps where new data is inserted
into before it is moved to a compressed row group.

You have two maintenance commands for indexes:

Reorganize. This will invoke the Tuple Mover,
which will turn closed delta stores into compressed rowgroups. If you specify
the hint COMPRESS_ALL_ROW_GROUPS_ON, all delta stores (closed and open) are
compressed. Since SQL Server 2016, reorganize is a bit more aggressive:

When 10% or more of the rows in a rowgroup are deleted, they are
physically removed and the rowgroup is recompressed.

Multiple rowgroups can be combined into one single rowgroup to reach
the maximum of 1,024,576 rows.

A combination of the previous two is possible where logically deleted
rows are removed and rowgroups are merged together.

Rebuild. The entire columnstore index will be rebuild.
Two copies of the columnstore index will exist at the same time, where the old
one is dropped once the rebuild is finished. Data can be reshuffled between
rowgroups (which means alignment for segment elimination can be disturbed).
All deleted rows are removed and there are no open delta stores.

To answer the question: reorganize will only help with fragmentation (deleted
rows) in SQL Server 2016 if there are more than 10% rows deleted in the rowgroup.
If not, only rebuild can get rid of deleted rows.

Question 2 – How much data should be in a table before a columnstore index makes
sense?

A columnstore index really pays off when you have large scans of the data. This
is because of the high compression ratio and segment elimination. In order to be
a bit useful, a columnstore index should at least have a couple of rowgroups, which
means a couple millions of rows.

However, it’s not only the number of rows that count, but also the size
on disk. If your table is only 50MB, you probably won’t notice much difference
in IO savings (but your queries can run in batch mode though). Once your table goes
over 1GB, you can see some real improvements because compression might reduce the
size to about 100MB (theoretically). You can also have great compression savings
by using page compression though. So it’s the combination of a large number
of rows and large size on disk – combined with typical data warehouse queries
- that becomes the sweet spot for columnstore indexes.

Question 3 – Rebuilding the Index

Rather than rebuilding a clustered columnstore index, would it be better to drop
it, create a clustered rowstore index on the column you're going to query,
then build the columnstore index?

The only advantage you will get from this method is that the columnstore index
segments will be aligned for the column specified in the row-store clustered index.
This improves segment elimination. If you just rebuild your columnstore index, your
segments might become misaligned.

Question 4 – Which columns should you include in the columnstore index?

If you create a clustered columnstore index, you have no choice: all columns
are included. If you create a non-clustered columnstore index you have the option
to specify columns. As a general rule I would include all columns, unless:

You are certain the column will never be needed by an analytical query for
which the index is created

The column is not supported by a columnstore index. You can find a list
of limitations in the
documentation.

If the column has too many distinct values and cannot be easily compressed.
A good example are comments fields. They are highly unique and typically contain
lots of text. This will blow up the size of the dictionary and results in rowgroups
with less rows than the maximum of 1,024,576 rows.

Question 5 - Will adding the column store index on a table impact the performance
of database?

Yes. The whole point of adding columnstore index is to improve read performance,
especially for data warehouse style of queries. However, like all indexes, they
will also impact queries that modify data (inserts, deletes and updates). Especially
updates are slower, because they are split out into deletes (which cause fragmentation
as discussed in the first question) and inserts (which might end up in open uncompressed
delta stores).

Question 6 – Enterprise Edition only?

Are columnstore indexes only supported in Enterprise Edition? Will columnstore indexes also work
in Developer Edition or any other edition?

In SQL Server 2012 and 2014 columnstore indexes are a feature of Enterprise Edition.
However, as for all releases of SQL Server, some editions have all the features
of Enterprise Edition:

Developer Edition

Evaluation Edition

This means you can use Enterprise, Developer and Evaluation edition to test out
columnstore indexes.

Since SQL Server 2016 service pack 1, columnstore indexes are also available
in Standard Edition (the Business Intelligence Edition has been removed in SQL Server
2016), as well as in Web and Express edition. So make sure you upgrade your SQL
Server 2016 instance to the latest service pack!

Question 7 – Compression

Using column store indexes the data is compressed right? What is the
percentage of savings that we get in storage? What percentage is the data
compressed?

Compression is one of the best features of columnstore indexes. You might expect
compression up to 10x less the original size. However, a lot depends on the number
of rows and the columns. For the columns, two factors play a very big role in the
compression ratio:

The type of data. Integers will probably compress better than decimals
because of the higher precision.

The cardinality (or uniqueness) of the data. A column containing country
names (lots of duplicates) will compress better than a column containing employee
names, because the latter has more distinct values.

Question 8 – Columnstore Indexes and OLTP

Can a columnstore index be used in an OLTP database?

Absolutely. This is referred to as “real-time operational analytics”
by Microsoft. You can even add columnstore indexes on an in-memory OLTP table (which
makes the columnstore index really “in memory”). However, there are
some guidelines to consider:

Preferably you would use a non-clustered columnstore index (since SQL Server
2016) instead of clustered columnstore index. Since OLTP systems are optimized
for write access (insert, update, delete), you might want to avoid this in a
clustered index, since these types of operations are slower for a clustered
columnstore index.

Only use columnstore indexes if you have analytical queries that scan large
amounts of data in your OLTP system. Traditional indexes are far better in queries
that seek data or those who scan a small amount of data.

As usual, test well before you implement.

Question 9 – Candidate Tables

How to identify the best candidates tables for a columnstore index?

As mentioned a couple of times in other answers, columnstore indexes work best
for large tables. In a data warehouse scenario, these are most likely your biggest
fact tables and possible very large dimensions.

About the author

Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter
I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.