Clustered Columnstore Indexes – part 32 (“Size Does Matter, but how ?”)

The Columnstore Row Group Size Question:

What is better for SQL Server performance: to have a big Row Groups (1.045.678 rows) or to have them as small as possible (100.000 rows) or maybe even less ?

Any person I have asked this question said that naturally the ideal size for any given Row Group is the maximum one, because once we are reading a lot of rows, we should be getting the optimum performance out of the better compressed Row Groups. This seems to be about right. In theory. :)

The Answer

This blogpost is about discovering if the most obvious answer on this question is actually the correct one.

First of all, please do not forget that the unit of any reading operation when using Clustered Columnstore Indexes is a Segment (as long as you are processing data from the Segment, in the case of Deleted Bitmap or Delta-Stores the story is pretty much different). It does not matter how many rows you are going to read from the Segment – just 1 row or the 1 Million rows, you are going to read each and every page & extent that are storing your Columnstore Index, which is pretty much expensive.

If you are using a predicate on the non-presorted column and you are needing the not that much data out of the most segments, then you just might be doing quite a big number of unnecessary reads. Of course in the real life you will be doing Segment elimination not only on 1, but on each and every column predicate, but in the case whenever you are working with a Table that contains billions of rows, you might be doing some very serious unnecessary reads.

Oh, I wish Microsoft would give us a chance to control the Row Group Size… Oh yeah, and I have already asked for it since the Nonclustered Columnstore Indexes announcement, a couple of years ago…

select *
into dbo.FactOnlineSales_SmallGroups
from dbo.FactOnlineSales;

And now, before executing the Clustered Columnstore Indexes creation, lets lower the amount of memory available for the process, thus forcing it to create Row Groups with smaller amounts of rows. In my case I put the maximum server memory on just 222 MB:NOTE: PLEASE DO NOT EXECUTE THIS ON YOUR SERVER UNLESS YOU KNOW WHAT YOU ARE DOING & RISKING:

The difference in sizes in our case is not that dramatic – we have 172 MB spent on our original table with the maximum number of rows at the Row Groups (1045678) while the smaller sized Row Groups occupy 192 MB, which means just a 20 MB difference.

Don’t get me wrong on this point, I have seen much bigger differences on the tables with a couple of thousands of Row Groups – and it can get very significant. :)

Now we can check the count of Row Groups for each of our test tables:

SELECT object_name(i.object_id) as TableName, count(*) as RowGroupsCount
FROM sys.indexes AS i
INNEr JOIN sys.column_store_row_groups AS rg with(nolock)
ON i.object_id = rg.object_id
AND i.index_id = rg.index_id
WHERE object_name(i.object_id) in ( 'FactOnlineSales','FactOnlineSales_SmallGroups')
group by object_name(i.object_id)
ORDER BY object_name(i.object_id);

The difference in the number of Row Groups is quite significant – we have 100 Row Groups for one of the tables and just 13 for the other, which is ~8 times less.

The actual maximum number of rows per Row Group for the FactOnlineSales table contains 1045678 rows and for the FactOnlineSales_SmallGroups table is just 131073 rows big.

I will use the amazing SQL Sentry Plan Explorer tool (you can download free version), which unfortunately does not understand Columnstore Indexes, but hopefully will be fixing this 2 year gap very soon.
Here is the actual execution plan for both of those queries:
they are equal and it makes sense. Now, we can check out on the some of the execution statistics (Important notice: as far as I understand the Query Plan Explorer simply ignores the read-ahead reads, I have controlled those numbers manually in order not to be fooled):
Now this is interesting, because according to this information, the first execution plan is slower than the second one – the CPU time is bigger, the duration time is bigger as well, the number of reads is actually much much bigger for the table with smaller number of Row Groups.
Even if we remove statements cleaning out pre-cached information, the final result according to the Execution plan statistics will remain the same.If we look at the Statistics IO, we shall find the the number of the reads (logical as well as the combined – 25460 reads vs 9730 reads) is quite superior in the case of the first table, and while the final execution results might vary significantly, right at the very first attempt we have managed to find the case where having more segments is actually an advantage.Important Note: I guess that there is some sort of mistake when reporting the reads because there is no such difference in the compared table structures.

When checking on the execution times, we can see a very slight and insignificant different between execution times – 279 ms for the 100 Rowgroups Table vs 281 ms for the 13 Rowgroups table.
I have executed those queries multiple times in order to make sure that the data is not a simple outlier, and on my test VM the results are very consistent.

Let us try to understand what is going on actually and for that I will use the query for analysing the size of the Columnstore Indexes:

Oha ! So actually, underneath it all we have a situation that even though our 100 segments should be hosting massive amounts of data compared to just 13 segments, in the reality they occupy a little bit less space (1.3 MB) as you can see on the picture above.

And here is what we have got: This is where we can see the difference between 2 tables, the dictionaries sizes for the FactOnlineSales_SmallGroups which has 100 Segments is definitely much bigger than the dictionaries sizes for the FactOnlineSales containing just 13 Segments.

The missing part should belong to such internal structures as Deleted Bitmaps, and as far as I know there is no way to consult its size or content directly. Unless of course someone will find a cool way with some internal function, such as DBCC CSIndex.

But what if we check out the sizes of our Dictionaries, corresponding to the columns that we have used in our query:

The dictionaries for the columns used in our test query are extremely small and they do not represent the actual weight of the dictionaries. We can look at them and notice that on some columns there are quite a number of local dictionaries, corresponding almost to each of the existing Segments.

If we re-run the previous query, this time without indications of the specific columns we can easily find the real weight behind the dictionaries sizes:

SalesOrderNumber is the column that uses the most space for the dictionaries and had we used it in our query, it would definitely increase the amount of memory required as well as the amounts of memory required to process the query.

As you can see, even if we would scan the whole table without any predicates, so many things would depend on the actual columns used.

So it all depends, isn’t it ? :)
Having smaller Row Groups can have its own advantages, you will need to test it in order to get a better understanding.

Disclaimer: I believe that Microsoft has very carefully selected the maximum size of the Row Groups, doing a lot of research before selecting it, but as an IT person, I wish could have an opportunity to control it in order to achieve better performance.