Database design and development with Microsoft Sql Server

Monthly Archives: May 2014

Last time we have looked at the internal structure of delta store and delete bitmap in the clustered columnstore indexes. Today, I would like us to discuss a few practical aspects affecting performance of ETL processes and queries against tables with clustered columnstore indexes.

There are two different ways how you can import data into a table with clustered columnstore index. The first approach is bulk insert, which can be done with bcp utility, BULK INSERT command and other applications that utilize the bulk insert API. The second type, called trickle inserts, are regular INSERT operations that do not use the bulk insert API.

Bulk insert operations provide the number of rows in the batch as part of the API call. SQL Server inserts data into newly created row groups if that size exceeds a threshold of a little bit over 100,000 rows. Depending on the size of the batch, one or more row groups can be created and some rows may be stored in delta store.

Figure 1 below illustrates how data from the different batches are distributed between row groups and delta stores based on batch size.

01. Batch size and data distribution during bulk insert

Let’s do some tests now and see how performance is affected based on the batch size and, therefore, number of row groups in the table. In those tests, I created a set of the tables with the structure similar to what is shown below.

As the first step, I created CSV file with about 62M rows generated based on dbo.FactResellerSales table from the AdventureWorksDW2012 database and measured performance of the bulk import with bcp utility using 1,000,000-row batches and 102,500-row batches respectively in the 4-CPU virtual machine with 8GB of RAM allocated.

You can see row group statistics after the imports in Figure 2 below. The first import generated 62 1,000,000-row row groups while the second imported ended up with 604 102,500-row row groups.

02. Row groups after insert

Performance of import operation was affected by the batch size. Bcp utility were able to process about 103,500 rows per second with 1,000,000-row batches. In case of 102,500-row batches, the throughput was about 94,300 rows per second, which is about 9% slower.

It is also worth noting that in case of the smaller batches, SQL Server imports data into the delta stores converting them to fully-populated row groups later. While, on the one hand, it would generate efficient row groups, it significantly degraded performance of insert process. For example, in case of 99,999-row batches, the throughput in my environment was only 37,500 rows per second.

As the next test, I checked how partially populated row groups affected performance of the queries using the query shown below. That query performs a MAX() aggregation on 20 columns from a table. The result of the query is meaningless; however, it forces SQL Server to read data from 20 different column segments in each row group in the table.

Figure 3 illustrates execution statistics of the query against tables with fully and partially populated row groups (shown in Figure 2). As you can see, the query against a table with partially populated row groups took a considerably longer time to execute.

In the next step, let’s check how large delta store affects performance of the queries. For that test, I inserted one million rows to the table using small batches and run the test query. After that, I rebuilt the columnstore index, comparing the execution time of the test query before and after the index rebuild.

The index rebuild process moved all data from the delta store to row groups. You can see the status of row groups and the delta store before (on the left side) and after (on the right side) the index rebuild in Figure 4.

04. Row groups and delta store after insertion of 1,000,000 rows

Figure 5 illustrates the execution times of the test query in both scenarios, and it shows the overhead introduced by the large delta store scan during query execution.

05. Execution time and delta store size

Finally, let’s see how delete bitmaps affect query performance. For that test, I deleted almost 30,000,000 rows from a table (the one where I just rebuilt the index). You can see row groups’ information in Figure 6.

06.Row groups after deletion of 30,000,000 rows

The test query needs to validate that rows have not been deleted during query execution. Similar to the previous test, this adds considerable overhead. Figure 7 shows the execution time of the test query, comparing it to the execution time of the query before the data deletion.

07. Execution time and delete bitmap

The bottom line – partially populated row groups, and large delta stores and delete bitmaps, they all negatively affect performance of the systems that use clustered columnstore indexes. You can address all of these performance issues by rebuilding the columnstore index, which you can trigger with the ALTER INDEX REBUILD command. The index rebuild forces SQL Server to remove deleted rows physically from the index and to merge the delta stores’ and row groups’ data. All column segments are recreated with row groups fully populated.

Similar to index creation, the index rebuild process is very resource intensive. Moreover, it prevents any data modifications in the table by holding shared (S) table lock. However, other sessions can still read data from a table while the rebuild is running.

One of the methods you can use to mitigate the overhead of index rebuild is table/index partitioning. You can rebuild indexes on a partition-basis and only perform it for partitions that have volatile data. Old facts table data in most Data Warehouse solutions is relatively static, and ETL processes usually load new data only. Partitioning by date in this scenario localizes modifications within the scope of one or very few partitions. This can help you dramatically reduce the overhead of an index rebuild.

A columnstore indexes maintenance strategy should depend on the volatility of the data and the ETL processes implemented in the system. You should rebuild indexes when a table has a considerable amount of deleted rows and/or a large number of partially populated row groups.

To summarize:

You should design ETL processes in the way that data is bulk imported in the batches as close to 1,048,576 rows as possible. This will guarantee that every batch will become separate and fully populated row-group. Do not exceed this size and avoid spilling batches across multiple row groups

Even though clustered columnstore indexes are updateable, you should minimize such updates. Large delta stores and/or delete bitmaps negatively affect query performance. You should monitor their sizes and design index maintenance strategy in the way that keep them as small as possible

Columnstore index rebuild is very resource-intensive. Table partitioning would help you to mitigate performance impact by allowing index rebuild in the scope of the one or very few partitions. You should design partitioning strategy in the way, that limits data modification and/or import into small subset of partitions rebuilding them afterwards

I am OLTP guy. I cannot grasp concept of the columnstore indexes – indexes that do not care about an order of columns in the definition. It was a reason why Data Warehouses and I lived happily ever after. Just apart from each other.

It was not hard, after all. Even though you can find some use cases for nonclustered columnstore indexes in OLTP environment, inability to modify data after index was created makes those use cases quite rare. Technically, you can use data partitioning and partitioned views and benefit from them in some scenarios; however, such implementation requires large amount of work.

The situation changed after SQL Server 2014 release. Now you can define clustered columnstore indexes, which are updateable. It does not make them suited for OLTP environment – you should remember that they are optimized for large SCAN workloads. Moreover, as the opposite of nonclustered columnstore indexes, they are the only indexes you can define on the table. It is impossible to define B-Tree indexes on the same table and support both environments.

Anyway, I decided to spend some time and explore how clustered columnstore indexes handle data modifications. It was very easy to find some references online; however, neither of the references talks about internal implementation of the indexes. Today, we will try to close this gap.

As the first step, let’s talk about high level structure of clustered columnstore indexes. They use the same storage format as nonclustered columnstore indexes storing columnstore data in row groups. Each row group stores data for up to 1,048,576 rows in column-based format. Data from each column stored separately in highly compressed fashion.

Clustered columnstore indexes have two additional elements to support data modifications. The first is delete bitmap that indicates what rows were deleted from a table. The second structure is delta store that includes newly inserted rows. Both, delta store and delete bitmap use B-Tree format to store data.

SQL Server works with delete bitmap and delta stores transparently to users, which makes terminology confusing. You can often see delta stores being referenced as another row group in the documentation and technical articles. Moreover, delete bitmap is often considered as a part of delta store and/or row groups. I will use the following terminology today to avoid confusion. A term row group references data stored in column-based storage format. I will explicitly reference delta stores and delete bitmap as two separate set of internal objects whenever needed.

You can see example of the structure of clustered columnstore index in a table that has two partitions in Figure 1 below. Each partition can have a single delete bitmap and multiple delta stores. It is worth mentioning that delete bitmap and delta stores are created on-demand, for example, delete bitmap would not be created unless some of the rows in the row groups were deleted.

01. Clustered Columnstore Index Structure

Every time when you delete a row that is stored in a row group (not in a delta store), SQL Server adds information about deleted row to delete bitmap. Nothing happens to original row. It is still stored in a row group; however, SQL Server checks delete bitmap during query execution excluding deleted rows from the processing. When you insert data into columnstore index, it goes into a delta store. Updating a row that is stored in a row group do not change row data either. Such update triggers deletion of a row, which is, in fact, insertion to delete bitmap, and insertion of a new version of a row to a delta store. However, any data modifications of the rows in delta store are done the same way as in the regular B-Tree indexes by updating and deleting actual rows there. You will see one of such examples later.

Each delta store can be either in open or closed state. Open delta stores accept new rows and allow modifications and deletions of the data. SQL Server closes a delta store when it reaches 1,048,576 rows, which is the maximum number of rows that can be stored in a row group. Another SQL Server process, called tuple mover, runs every five minutes and converts closed delta stores to row groups that store data in column-based storage format.

You can examine the state of row groups and delta store with sys.column_store_row_groups view. Figure 2 illustrates an output of this view, which returns combined information about all columnstore index objects. Rows in OPEN or CLOSED state correspond to delta stores. Rows in COMPRESSED state correspond to row groups with data in column-based storage format. Finally, deleted_rows column provide statistics about deleted rows stored in delete bitmap.

02. Row Groups and Delta Stores

As you see, the second row in a view output shows closed delta store that have yet to be picked up by tuple mover process. The situation would change after tuple mover process converted closed delta store to a row group. Figure 3 illustrates the output from a view after it happened. It is worth mentioning that row_group_id of converted row group changed. Tuple mover created new row group dropping closed delta store afterwards.

03. Row Groups and Delta Store After Tuple Mover Process

Let’s look at the structure of delta store and delete bitmap rows. Listing below creates a table and populates it with the data creating clustered columnstore index afterwards. I am using MAXDOP=1 option to reduce the number of partially populated row groups.

As you see in Figure 4, columnstore index has two row groups and does not have delta store nor delete bitmap. You can see Col1 values that are stored in both row groups in min_data_id and max_data_id columns for the rows that have column_id=1.

04. Row Groups after Clustered Columnstore Index Creation

As the next step, let’s perform some data modifications in the table. First statement inserts two new rows into the table. Second statement deletes three rows, including one row we just inserted. Finally, we will update another, newly inserted, row.

Figure 6 shows partial content of a data page, which is a delta store page. As you can see, SQL Server stores data in regular row-based storage. There is one internal column CSILOCATOR in addition to two table columns. CSILOCATOR is used as internal unique identifier of the row in delta store. Finally, it is worth mentioning that a row with Col1=2000000, which we have inserted and deleted after clustered columnstore index was created, is not present in delta store. SQL Server deletes (and updates) rows in B-Tree delta store the same way as with regular B-Tree tables.

06. Delta Store Data Page

You can use the same approach to examine content of a deleted bitmap data page. In my case, the page id is 308. Figure 7 shows the partial output of DBCC PAGE command. As you see, delete bitmap includes two columns, which are uniquely identifying a row. The first column is a row group id and the second column is offset of the row in the segment. Do not be confused by the fact that column names match table columns. DBCC PAGE uses table metadata to prepare an output.

07. Delete Bitmap Data Page

As you see, both delta store and deleted bitmap pages were using row compression in our example, which means delta store and delete bitmap either row- or page-compressed. As you know, in case of page compression, SQL Server performs page compression only when page is full and retain it only if it provides significant space savings. Otherwise, data is kept in row-compressed format even when index is defined with page compression.

Let’s run a test that inserts large batch of rows that can benefit from page compression using code shown in Listing below.

Figure 8 illustrates content of the data page from delta store after insert. The presence of compression info record indicates that delta store is using page compression

08. Delta Store Data Page (with Page Compression)

Let’s examine what happens with delete bitmap and delete all rows from compressed row groups with code shown in Listing below.

delete from dbo.CCI

As you can see in Figure 9 below, page is still uses row compression even though now it is fully populated. Obviously, we cannot guarantee that delete bitmap is not defined with page compression – after all it is not documented – however, it could be logical to use row compression in this case when we have two small integer values. Row compression would perform perfectly here.

09. Delete Bitmap Data Page (Full with Row Compression)

Hope, that information can shed some light on clustered columnstore index internal structure.