Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I know a good bit about regular old row-store indexes, like the b-tree structure, differences in storage between the leaf level and b-tree pages, affects of included fields, optimizing to use them, order of keys, etc.

I'm having difficulty getting any good info on the internals of a columnstore index.

How is it structured?

Is there a b-tree? Some other structure in place?

How is data organized?

What sorts of specific operators are best suited to use it?

Any other anti-patterns to be avoided when using them?

A lot of what I can find out about them are basically the exact opposite of a "normal" index, i.e. no ordering of keys, no included fields, nonclustered ONLY.

There's quite a lot of fanout about technical implementations of column store databases on the wikipedia page. I imagine the index is just a column store data structure for a single column along with its keys. Maybe it uses a bitmap index, maybe a BTree.
–
ConcernedOfTunbridgeWellsJan 9 '12 at 21:54

It's actually for multiple columns. Also I'm assuming as with other SS implementations it'll be a little different than other products
–
JNK♦Jan 9 '12 at 21:55

@ConcernedOfTunbridgeWells - Do columnstore indexes use bitmap indexes?No. Columnstore indexes use a proprietary data representation based on Vertipaq. It’s not the same as a bitmap index and doesn’t use one. But it has some similar benefits to bitmap indexes, such as reducing the time it takes to filter on a column with a small number of distinct values.
–
Martin SmithJan 10 '12 at 10:19

1 Answer
1

Columnstore data is physically stored in one or moresegments (regular LOB allocation units) per column, and may also be partitioned in the usual way. Each segment contains roughly one million rows of highly-compressed values or value references (several compression techniques are available). A value reference links to an entry in one of up to two hash dictionaries.

Dictionaries are pinned in memory during query execution, with data value IDs from the segment being looked in the dictionary whenever execution requires the actual data value (this lookup is deferred for as long as possible for performance reasons).

Segments also have a header record containing metadata such as the minimum and maximum values stored in the segment. Information from the header can often be used to eliminate complete partitions from processing at execution time. Header record information is stored in the usual LOB data root structure, so eliminating a segment means the Storage Engine can skip reading the LOB data pages from physical storage entirely. Maximizing the potential for elimination can require careful design, including a dependency on the clustered index order at the time the Columnstore index is built.

Specific Plan Operators

SQL Server 2012 introduces a new execution mode called Batch Mode. In this mode, packets of roughly 1000 rows are passed between operators, significantly improving processor utilization efficiency. Within each packet, columnar data is represented as a vector. Not all plan operators support batch mode operation, but examples of those that do include Columnstore Index Scan, Hash Inner Join, Batch Hash Table Build, Bitmap Filter, Hash Aggregate (not scalar aggregates), Filter, and Compute Scalar (for projection and expression evaluation). Query execution plans have been enhanced to show estimated and actual execution mode.

Anti-Patterns

There are a large number of restrictions in the first release, including constraints on allowable data types. Most common types are supported; unsupported data types include DECIMAL with a precision greater than 18 digits, (N)VARCHAR(MAX), UNIQUEIDENTIFIER, CLR types, and (VAR)BINARY.

Use of string types, OUTER JOIN, IN, EXISTS, NOT IN, OR, UNION ALL may result in significantly reduced performance (Row Mode execution), unless workarounds are employed that typically involve unusual syntax rewrites as shown in the linked articles in this section.