Geek City: Nonclustered Index Keys

I recently received a question about the storage of nonclustered index keys. I am assuming you are aware of the fact that if your table has a clustered index, SQL Server uses the clustered index key (all of its columns, if it is a composite index) as a 'bookmark' in your nonclustered indexes, to allow your nonclustered indexes to uniquely identify the row that the index is pointing to.

In particular, this question was about a couple of statements in Inside SQL Server 2005: Query Tuning and Optimization, Chapter 3. Now, I didn't write Chapter 3, but I thought I knew all about how index keys were stored. But (you may want to be sitting down for this) I WAS WRONG.

Here are the statements:

“…when you create a nonunique nonclustered index on a table with a clustered index, we append the clustered index keys to the nonclustered index keys if they are not explicitly part of the nonclustered index keys”

And then a bit later on, there is another quote discussing this same effect:

“Note that the key columns for each of the nonclustered indexes on T_clu include the clustered index key column a with the exception of T_clu_f, which is a unique index.”

Although I had read this chapter when we were putting the book together, I must have read this last sentence too fast. The first quoted sentence is basically what I said in my lead-in paragraph. But the second sentence now seemed wrong. I thought it was saying that if your nonclustered index is UNIQUE, the clustered key columns are not stored in the index at all. I knew that wasn't true, so I contacted the author of Chapter 3. It turns out he was using a very specific definition of key.

KEY COLUMN: a column that is stored in the leaf pages and the interior (node, or non-leaf) pages of the B-tree and that you can use in an index seek.

He also uses another term in some of the related sections of the chapter:

COVERED COLUMN: a column that can be returned by the index without performing a bookmark lookup; it may or may not be a key column but it is stored in the leaf pages of the B-tree.

The definition of an index key includes the the property that it is stored at all levels of an index, not just the leaf level. Covered columns are in the leaf, so we have the potential of a covering index, and the possibility of retrieving the data we need without accessing the table itself. But not all covered columns are key columns.

This was news to me. I assumed that the fact that the clustered index key was in the nonclustered index meant that it was part of the key. And all my previous testing seemed to bear that out; I must never have tested at this level of detail with a unique nonclustered index. What this means is that I have been wrong in the following claim: "Explicitly declaring your clustered key columns as part of your nonclustered indexes does not make any difference in the storage of the index." That statement is true for nonunique indexes, but not true for unique.

What does this really mean?

Consider a TabA with a clustered index on col1. These two index definitions are identical:

CREATE INDEX nc_index ON TabA(col2)

CREATE INDEX nc_index ON TabA(col2, col1)

For a nonunique, nonclustered index (which the above index is), the clustered key col1 is automatically included at the leaf, and at all other levels, just as if we explicitly declared col1 to be part of the nonclustered index key.

However, these two index definitions are not exactly the same:

CREATE UNIQUE INDEX nc_index ON TabA(col2)

CREATE UNIQUE INDEX nc_index ON TabA(col2, col1)

The first index definition will create a nonclustered index on col2, and add col1 only at the leaf. The second index definition will also create an nonclustered index on col2, but it will add col1 as a true key column and it will be stored at all index levels.

Here's a specific example

Let's look at a specific example. I will be demonstrating the contents of the index rows using DBCC PAGE, and determine the index pages I am interested in by using DBCC IND. For more information about these two undocumented commands, one of the best places to look is at a couple of blog posts by Paul Randal:

DBCC IND returns one row for every page of a table or index. The rows contain values indicating what type of page is listed (2 is an index page), what index the page belongs to, what level of an index the page is at (0 is the leaf), and then the file ID and page number. These last two values are then used in DBCC PAGE to get the contents of the page. The output also includes the file ID and page number for the next and previous pages in logical order. A page with no previous page is the first page in order.

When running DBCC IND on a large table, you can get lots of rows returned, and it can be tricky to find the exact rows you're interested in. For this reason, I usually take the output of DBCC IND and save it to a table, and then I can query the table. I have included the script to build a table called sp_index_info in the master database. Because of the sp_ prefix, this table can be accessed from any database.

I'm going to use a table that I copy from the AdventureWorks database into a test database. You can use any database you like to run this code on your own server.

USE testdbGOIF EXISTS (SELECT name FROM sys.tables WHERE name = 'Sales') DROP TABLE Sales;GOSELECT * INTO Sales FROM AdventureWorks.Sales.SalesOrderDetail;GO

Now build a clustered index on SalesOrderID and three similar nonclustered indexes on SalesOrderDetailID, which is unique.

CREATE CLUSTERED INDEX Sales_ID_Index ON Sales(SalesOrderID);GO-- The first index is not declared as a unique indexCREATE INDEX Sales_DetailID_Index1 ON Sales(SalesOrderDetailID);GO-- The second index is declared as a unique indexCREATE UNIQUE INDEX Sales_DetailID_Index2 ON Sales(SalesOrderDetailID);GO -- The third index is declared as a unique index and explicitly includes the clustered index keyCREATE UNIQUE INDEX Sales_DetailID_Index3 ON Sales(SalesOrderDetailID, SalesOrderID);GO

To look at the actual rows, we need to use DBCC PAGE. An option value of 3 is very useful for index pages, as it gives us tabular output showing each individual index row. I'll take each of the file and page numbers and use them in DBCC PAGE, and of course you'll have to substitute whatever page numbers you get.

The results shown are for the leaf level. Note that because my clustered index is not unique, it includes a uniqueifier for every row. The uniqueifier is never visible using SELECT statements, but is a hidden column stored within the row to make sure each row in the clustered index is unique. This uniqueifier column is always considered part of the clustered key internally, so every nonunique clustered index is really a composite index.

There is the exact same data in the leaf for all three nonclustered indexes (except for page numbers of course; all three indexes have their own pages). The same index keys are in each row of the first leaf level index page. However, you might note a couple of differences int the column headers. Just as the nonunique clustered index includes a uniqueifier, SQL Server has to have some way to make sure each nonclustered index row is unique. The column headers actually index which columns of my nonclustered indexes make up the key. For the nonunique index, SQL Server has to consider the entire clustered key as part of the nonclustered key. It knows the clustered key is unique, so adding it to the nonclustered guarantees uniqueness. You can see that all 3 columns are labeled as (key): SalesOrderDetailID, SalesOrderID (the clustered key) and the clustered index uniqueifier.

For the second index, which was declared as unique, but did not explicitly include the clustered key, the only key the nonclustered index needs is the key the index was built on. Because that single key was declared as unique, no further information is needed to be part of the key. However, the clustered key stored in the leaf level rows, because SQL Server needs that to be able to find the row in the table data.

The third index explicitly declared the clustered key to be part of the nonclustered index key. So the column labels indicate that both the column SalesOrderDetailID and SalesOrderID are key columns. Since the index was declared to be unique, only the explicitly declared key columns are part of the key. The uniqueifier is there, because as part of the clustered key is must be in the leaf level pages, but it is not marked as part of the nonclustered key.

So how are these indexes different?

The indexes differ in the upper levels. The table sp_index_info still contains the output of DBCC IND, so we can find an upper level page for each index.

At the level above the leaf the IndexLevel is 1. I don't want to see the rows for the upper level pages of the clustered index, so I filter them out.

Now you should see something different. The first index, nonunique, still has all three columns in the upper level page: the nonclustered key SalesOrderDetailID, and the two columns of the clustered key: SalesOrderID and the uniqueifier.

The second index is the unique nonclustered on a single column, and only has SalesOrderDetailID in the upper level page.

The third index is unique and composite, so both of the declared keys are in the upper level.

What other possibilities are there?

One additional question you might have is what happens if the clustered index is also unique. How do the nonclustered index rows look different? I think with the tools and examples presented here, you should be able to figure that out for yourself!

This kind of detail is so wonderful :) This is why in all of my design books, when I get past the "conceptual" stuff on indexes (and any other internal topics) there is always a note that says something like "If you really want to get deep into how things work, there is a person named Kalen whose book you need to get!" Thanks!

When ever I teach about SQL Server performance tuning I try can get across the message that there is no such think as a table. Does that sound odd, well it isn&#39;t, trust me. Rather than tables you need to consider structures. You have 1. Heaps 2. Indexes