Notice, that this technic is valid for Nonclustered Columnstore Indexes as well as the Clustered Columnstore Indexes:
Let us consider the following scenario: we have an instance with SQL Server 2012+ where we have a partitioned table that is being hammered by a number of different queries.
To improve the performance we have identified the most common predicate in our SELECT queries and so decided to implement Segment Clustering to improve our performance.

This has helped us a lot but we need a further improvement, because to be honest, there are almost no such situations when we have just 1 query that needs to be tuned. :)

In this example we have found 2 queries that need to be optimised and surprise – even though they have 1 common predicate (DateKey), they include a second predicate that is different for each of the queries:

select sum(SalesAmount)
from dbo.FactOnlineSales
where (DateKey>'2008-01-01' and DateKey '2009-01-01' and DateKey 1000;

But let’s not hurry to much and start with our Segment Clustering implementation:
As so often, I will play with a Contoso Retail DW, my favourite free database.
Right after restoring the database, I will drop the foreign & primary keys on the FactOnlineSales table, on which we shall run our experiments:

I have already studied the data inside our table and decided to create the following partitions & filegroups: 1 for Archived Old Data, and 1 partition per each of the Years with data – 2007, 2008, 2009 & 2010:

It looks quite fine and we have done an optimisation to our system at this point.

It is perfectly aligned for the Column Number 2 (DateKey) on which we have partitioned our Clustered Columnstore Index, but what about the 2 other columns (StoreKey (Column 3) & SalesAmount(Column 11)):

This means we have 1209 logical reads & 1325 logical reads respectively. Is this the best that we can do?

Partitioning information:

Let’s see how the data is spread around the different partitions:

-- Partitioning Information
SELECT $PARTITION.pfOnlineSalesDate(DateKey) AS Partition, COUNT(*) AS [Rows Count]
, Min(Year(DateKey)) as [Min], Max(Year(DateKey)) as [Max]
FROM dbo.FactOnlineSales
GROUP BY $PARTITION.pfOnlineSalesDate(DateKey)
ORDER BY Partition ;

It’s clear now that we actually will be concentrating our job on the partitions 3 & 4 respectively, since they include the data that interest us for those 2 queries optimization.

Query 1 (Partition 3):

We should concentrate on the first query, looking at the data between in the year 2008 and thus let’s try to optimize it first (in practice the order of optimization does not matter as long as different partitions are being involved):
Actually, what happens if at this point we reload our partition?
Clustered Columnstore Indexes should allow us to do that without any problem:

-- Load Data from the year 2008
SELECT *
into dbo.FactOnlineSales2008
FROM dbo.FactOnlineSales
WHERE $PARTITION.pfOnlineSalesDate(DateKey) = 3;

And what if we build a traditional row store clustered index on that table before building a Clustered Columnstore Index and switching it in:

Notice that we are building Clustered Columnstore right after the rowstore clustered index that have sorted that data on the disk, and that MAXDOP = 1 is extremely important as long Microsoft has not fixed bug with the Segment Clustering.

Now we are ready to switch the data in, but before that we shall remove all the copied data from the year 2008:

You can clearly see that starting from the 9th Segment the alignment is perfect – we have got a different Segment Clustering for the 4th partition here. Notice that since we did not touched the data at the other partitions, that data is still perfectly aligned on the StoreKey column for the 1st query.

That’s enough of speaking, let’s run our first query again:

set statistics io on
select sum(SalesAmount)
from dbo.FactOnlineSales
where (DateKey>'2009-01-01' and DateKey 1000;

We are down from 1325 logical reads to 344 logical reads.
In the case of the 2nd query we have received a 4 times improvement!!!

Final Thoughts

I just have shown you how to do multi-dimensional clustering (ordering) of a table with Columnstore Index, in which you can practically control the degree of the first clustering dimension, in the sense of controlling the size of the partition - not always and of course until the maximum supported 15.000 partitions.

The technic implies a need to switch out a partition, order it by applying Row store compression with an index and then creation of a Columnstore Index.

Your results will naturally vary, depending on the data distribution and you might not get any improvements if your predicate inside the partition has no variation whatsoever.

Multidimensional clustering is all nice – but it is so hard to control. For example rebuilding the table will likely “smear” adjacent segments into each other.

What do you think about the following? Create a composite partitioning key (e.g. “DateKey=2014-12-XX;StoreKey=8XX”, or make this an integer). That way we can reliably shove data into partitions, each of which will have uniform segments. This works for arbitrary dimensionality.

thank you for sharing your thoughts :)
I agree that it is hard to control, but there will be enough environments where composite partitioning key will not be an allowed option.

Also, notice that I have sorted different partitions based on the different columns – which allows me to adapt a partition to the most common predicates existing on them, and this can’t be done alone by a composite partitioning key.

As the time goes by so can the queries extracting the data – and in order to get the best performance you do not have to change your data – just adapt maintenance processes.