outstanding query performance because it is memory optimized (xVelocity)

What makes the CCI especially useful is that the table remains writable, so you can perform any kind of loading operations (inserts, partition switching) on the table without any need to recreate the column store index afterwards. In fact, the column store index IS the table (therefore the name clustered column store index).

However, when data is written to the CCI it is not always being compressed immediately. In short, newly inserted data is split up into row groups. The status of a row group may be

OPEN

This means that the rows are actually stored in a row store (a ‘normal’ table, so called delta store). PDW still waits for more rows to arrive before the rows are compressed into the CCI storage.

CLOSED

This means that this current row store buffer (delta store) is full (at approximately 1 million rows of data). This buffer will be compressed asynchronously by a background process (‘tuple mover process’).

COMPRESSED

This means that the row group is really compressed in the CCI (final stage).

You can find more information on this including some illustrations in the PDW help file (look for ‘clustered columnstore index’).

In order to explain the nature of CCI write operations on the PDW, let’s go through some practical tests inserting, deleting and updating rows under different scenarios. You can find a similar example in the PDW help file but I wanted to cover more aspects here.

After each operation we want to see how the delta store looks like. I’m using the query from the metadata queries section of the chm file for this purpose:

For our tests, let’s first created a distributed table. Although not recommended, I choose the same key here for partitioning and distribution.

createtable CCITest (

id intNOTNULL,

val nvarchar(20)NOTNULL

)

with(

distribution =hash(id),clusteredcolumnstoreindex,

partition(id rangerightforvalues(100))

)

Checking the row groups using the meta data query from above shows no results (no row groups are created in advance). This is also what we see after a truncate table operation:

2

We now insert one row of data into one partition of the table:

insertinto CCITest values(1,'Test')

As expected, this row goes to the delta store causing an open row group. Please note, that this row is not yet compressed but waiting for more rows to arrive (in order to efficiently use CCI compression).

3

Let’s insert another row of data into another partition of the table:

insertinto CCITest values(100,'Test')

Since row groups are created per distribution and partition, we have created a second open row group. Both row groups are waiting for more rows of data before they are going to be compressed.

4

Now we delete the recently inserted row of data:

deletefrom CCITest where id=100;

Since the row is not yet compressed, it is simply removed from the delta store:

5

Ok, now let’s insert 100,000 rows of data into the first partition of this table. I’m using a table “MyLargeTable” here which can be any table containing at least 100,000 rows:

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

Together with the row from above, this gives 100,001 rows in the first row group. Please note, that the row group still isn’t compressed (status ‘OPEN’).

6

Let’s empty the table (using a truncate) and then insert 200,000 rows of data into the first partition:

truncatetable CCITest

insertinto CCITest selecttop2000001,'Test'from MyLargeTable

What you see here, is that a 200,000 rows insert causes the CCI to automatically (and synchronously) compress the data. Status ‘COMPRESSED’ means that the data is now stored natively in column store optimized format. So if more than 100,000 rows (or more exactly more than 102,400 rows) are written to a row group in a batch, the rows are automatically and synchronously compressed.

7

We empty the table again (truncate) and now insert 10 times 100,000 rows of data into the table (I tag one row of the first insert for later use):

truncatetable CCITest

insertinto CCITest selecttop999991,'Test'from MyLargeTable

insertinto CCITest selecttop11,'Test del'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

Although we inserted many more rows than before, now the rows are not compressed but we now have one open row group with 1,000,000 rows of data:

The difference to the example from above where we inserted 200,000 rows causing an immediate compression is that we now have smaller batches. Remember that our table now contains 1 millions rows, all being in uncompressed state (“slow” access). Please note that since we’ve always used the same id (hash column) all these rows are stored in the same distribution.

8

In order to find out, if this goes on and on for ever, we insert another 100,000 rows of data into the table

insertinto CCITest selecttop1000001,'Test'from MyLargeTable

What we now observe is that the open row group was closed after reaching 1,048,576 rows. The remaining rows are inserted into a freshly created open row group:

Status ‘CLOSED’ means uncompressed row storage. Compression is now being performed asynchronously using a background task (‘tuple-mover process’) which runs every 5 minutes. So after a short moment, our table looks like this:

As you can see, the tuple mover has now compressed the ‘CLOSED’ rowgroup. If you do not want to wait, you could also run this statement:

alterindexallon CCITest reorganize partition=1

In part 2 of this post, I’m continuing the examples showing update operations and also showing how row groups are created per distribution (and partition).