Batch Sort

The possibility of having your data sorted with the help of the Batch Execution Mode was introduced in SQL Server and some of the key improvements for the SQL Server 2016, such as Window Functions depend on the velocity of execution of the Sort iterator
The vast majority of the time, it works fine, but some rare times it can bring you into some serious trouble.
For that purpose, the documented (I call it documented, since it is mentioned in the KB 3172787) Trace Flag was introduced to allow to disable the Batch Execution Mode for the iterator.

This trace flag works as a Configuration Parameter, as a Global Trace Flag, Session Trace Flag or even QueryTraceOn option).

I will use the free ContosoRetailDW database (it is so easy to play) and run the standard script for restoring a copy of it from C:\Install\, upgrading it’s compatibility level to 130 (batch mode improvements require it) and then dropping the primary clustered key from the FactOnlineSales table and create a Clustered Columnstore Index on it:

The 3rd from the left, the Sort Iterator is there, running in the Batch Execution Mode and occupying estimated 34% of the resources. It took just 171ms on my virtual machine.

Now, let’s us enable the Trace Flag 9347 and see if it shall bring the expected impact to the Batch Sort iterator (notice that if you are running on the RTM version of SQL Server, you will face a bug that was fixed in the Cumulative Update 1, which is already available):

Occupying estimated 84%! of the resources, the very same query runs over 15 times slower – 2731 ms, all because of the Sort Iterator being executed in the Row Execution Mode. What makes the performance suffer more is that the Sort Iterator does not get enough memory through the memory grant and that is why it is spilling on the disk.

Top N Sort

The Top N Sort iterator does sort like the above mentioned Sort iterator, with the difference that it passes only top N rows after sorting. There are different optimisations to the used algorithm that make this iterator functioning differently from the Sort iterator.
Given it is a different functionality and a different iterator, SQL Server 2016 has a separate Trace Flag that will allow you to disable the Batch Mode on it – the Trace Flag 9349. It is important to have it separate from the complete Sort Iterator, since a number of times one can catch both operators and having a possibility to shut down just a Top N Sort can be advantageous.

Let’s run a SELECT TOP X query, selecting 100.000 rows from the results of our previous query:

As you can see on the image above, the execution plan presents TOP N Sort Iterator functioning in the Batch Execution Mode, taking 101ms in total execution time on my VM, while occupying estimated 53% of the execution plan resources.

Let’s execute the same query this time with the help of the Trace Flag 9349, thus preventing the TOP N Sort Iterator from running in the Batch Execution Mode:

This time the query took good 1820ms, over 18 times slower then the query with Sort Iterator using the Batch Execution Mode. Even the estimated resource consumption jumped up to 92%, but from the other side the amount of memory that was granted for the second query is around 38MB vs 72MB that was granted for the query with the Batch Execution Mode. When running an OLTP that is using a huge number of parallel connections, the amount of used memory can be the most important resource and keeping it down to an accepted level can become the paramount of keeping the system alive.

Sort in a Complex Parallel Query

Sometimes there are complex queries that suffers from the unnecessary Batch Execution Mode for Sort Iterators and since Cumulative Update 1 for SQL Server 2016, there is a new trace flag – 9358 for disabling it.

I could not easily produce with a sample Database such an example query, but I hope to get back to this blog post one day and provide a practical example.
An important additional part to consider is that according to the official documentation, the good old Trace Flag 4199 will produce the same effect, meaning that if it is active – you will see some of your Sort Iterators running in the Row Execution Mode.

Dynamic memory grant for batch mode operators

Starting with SQL Server 2016, if you have enough RAM and suffering from the TempDB Spills that do have a significant impact on your workload, then you can enable the Trace Flag 9389 that will enable Batch Mode Iterators to request additional memory for the work and thus avoiding producing additional unnecessary I/O.

I am glad that Microsoft has created this functionality and especially that at the current release, it is hidden behind this track, and so Microsoft can learn from the applications before enabling it by default, hopefully in the next major release of SQL Server.

Merge & recompress during columnstore index reorganisation

Some of the most incredibly useful feature in SQL Server 2016 was removal, recompression and merge of the fragmented Row Groups (containing deleted data in the Deleted Bitmap) that I have already described in details in Columnstore Indexes – part 74 (“Row Group Merging & Cleanup, SQL Server 2016 edition”).
In SQL Server 2016 we also have a special knob for disabling this functionality and it is a Trace Flag 10204. In other words, your SQL Server 2016 database with compatibility level 130 will become a kind of SQL Server 2014, where Row Groups maintenance was non-functioning with ALTER INDEX … REORGANIZE command.

Disk-Based Clustered Columnstore

Nothing beats a practical example that is also presentable in a blog post and so here is the code to create an exact copy of a FactOnlineSales with a Clustered Columnstore Index, which will serve us for testing the ALTER INDEX … REORGANIZE command:

Everything is fine as you can see. We have a copy of our data with the Clustered Columnstore Index.

Let’s enable the Trace Flag 10204 (it does not make a lot of sense to do it on the session level of course, since the automated Tuple Mover runs in the background, unless disabled with the other Trace Flag 634):

dbcc traceon (10204, -1);

We can now easily advance with our delete process, and so the next statement shall delete the first 2 million rows ordered by the column OnlineSalesKey:

;with delCTE as (
select top (2000000) OnlineSalesKey
from dbo.FactOnlineSales_CCI del
order by OnlineSalesKey
)
delete from delCTE;

The statement does not even take 1 millisecond on my VM .. Because it does nothing. The activated Trace Flag goes deeper than the corresponding Extended Event columnstore_no_rowgroup_qualified_for_merge – which registers nothing at this point. Troubleshooting the issue with another potentially helpful Extended Event columnstore_rowgroup_merge_failed brings nothing as well. The trace flag simply shuts it down without a trace (pun intended).

Running the CISL functions will deliver the same result as we have had above, the ALTER INDEX … REORGANIZE falls back into SQL Server 2014 mode, where it takes care only of the open Delta-Store by default.

I guess that this functionality will drive people completely mad, if they know nothing about this trace flag, but hey, if you are using CISL from the version 1.3.1 and later, your cstore_GetSQLInfo.sql will provide you with the correct output (the current file at GitHub gives you this functionality already):

exec dbo.cstore_GetSQLInfo;

The Nonclustered Columnstore

We shall make now the same test we did for the Clustered Columnstore, but this time we shall do it for the Nonclustered Columnstore Index, which is also disk-based, as the previous one:

Surprise! On the image above, we have a Tail Row Group (row_group_id = -1), which is a kind of a Delta-Store, containing the surviving 200.000 rows, and all completely deleted Row Groups are simply removed from the internal structures.

Why did this happen ?
1. There is no REBUILD functionality for the Memory-Optimized Columnstore Tables yet, and so there must be a way to keep the table alive.
2. The Row Migration process for the Memory-Optimized Columnstore Tables moves rows from Tail Row Group into the compressed Row Groups and back, it does function in a different way than a disk-based columnstore and with no REBUILD possibility, this is the only way of cleaning up the rows from the Memory-Optimized Columnstore table (besides dropping and recreating the Columnstore Index, of course).

I am happy to see that the Trace Flag 10204 does not function for the Memory-Optimized Columnstore tables, because technically for the current version of SQL Server it would make no sense at all.

Final Thoughts

These Trace Flags presented in this article are some really great knobs, that Microsoft has inserted into the SQL Server 2016 (some of them are available only from SQL Server 2016 CU1), allowing you to expand the potential performance of your Columnstore Batch Execution Mode queries (Dynamic memory grant with TF 9389), and preventing the Sort operations from running in the Batch Execution Mode (could be important for overloaded OLTP systems).
The possibility of disabling Row Group Merge & Self-Compress should target the extreme OLTP System (Trace Flag 10204) is also nice, but I really hope that no unexperienced engineer will start applying this trace flag everywhere by default – though I will be updating the CISL to recognise and advise the user of this configuration.

Niko Neugebauer is Microsoft Data Platform professional. A SQL Server MVP, he is very passionate about Data Platform (with a focus on Columnstore) and Community.
He is a founder and a leader of the Portuguese SQL Server User Group, SQLPort and a co-founder of 2 more Data Platform User Groups in his home country.
His passion for community led him to organise the first SQLSaturday outside of North America – #78 Portuga. Niko presented at a number of different conferences, including PASS Summit, SQLRallys, SQLBits, and SQLSaturdays around the world.