SQL University: Index Usage

Greetings. Welcome once more to the Miskatonic University branch of SQL University. Does anyone know where to buy some camping equipment, cheap? I’ve been tagged to go an expedition to Antarctica and I need some cold weather gear a bit more substantial than my LL Bean boots. Evidently the last expedition found some caves in some mountains down there. Sounds like the perfect place to get away from all the crazy stuff that goes on here at Miskatonic. I mean, what could happen?

Anyway, our lastseveral talkshave all been about indexes and indexing. One of the things that we haven’t talked about is how to tell if, how or when your indexes are being used. Starting with SQL Server 2005, and continuing to 2008 and R2, there has been a wonderfully easy way to do just this through Dynamic Management Objects (DMOs). Specifically three DMOs: sys.dm_db_index_usage_stats, sys.dm_db_index_operational_stats(), and sys.dm_db_index_physical_stats().

sys.dm_db_index_usage_stats

The first, sys.dm_db_index_usage_stats is a fascinating Dynamic Management View (DMV). It returns information that shows how many times various operations have occurred on an index, and when the last time it was done. The information is stored since the last time the server was started, or when a database is attached or restored. If a database is detached or closed, the data for that database is removed from the DMV. It breaks the data down so that it shows user information separately from system information. This means you can see the number of times a user has updated the index as opposed to when the system did an update as part of maintenance. You can use this to get an idea of which indexes are used within your system and which ones are not.

Since it’s a DMV, it’s incredibly easy to use. Here’s a query that shows the number of times that user access have occurred on the SalesOrderDetail clustered index:

You can combine this with other queries and other DMOs to put together interesting information. But the key is to remember that you can only rely on this information to a certain degree. You can’t simply assume that the information here will tell you precisely which indexes have been used and which have not. You will have to assume that it might not cover all possible uses of indexes in your system because of a reboot on your server. If your server and the database that you’re interested in have been continuously online for over a year and you don’t have 18 month queries or something else, maybe. Otherwise, you’re taking a chance to simply use this to help clean up your system. However, it can help you clean up your system. Just do so with the knowledge of what this represents.

sys.dm_db_index_operational_stats

The next object is sys.dm_db_index_operational_stats(). This is a Dynamic Mangement Function (DMF) which means you must pass it parameters. With this DMF you can see current locking for every table or index. The call looks something like this:

The way I have it configured, I’m looking at all indexes for the table Sales.SalesOrderDetail. If I substituted NULL for the OBJECT_ID, I could see all access to tables within the database. Again, if I substituted NULL for the DB_ID I could see all active access across the system. Conversely, if I wanted to drill down, I would also add the IndexID value so that I would only see the activity of a given index and the same thing with the PartitionID.

But to really see it at work, let’s get a transaction going. I can do it very simply. If I open a second connection to the database and run this query:

And then go back and run the original query. I have three rows for the three indexes on the table:

This returns all sorts of information about the locks and waits that currently occurring. In this instance you can see that a single page lock was taken out and that there were a pair of page latch waits, all on the first index, index_id = 1, the cluster. This would be expected for a query that is performing a seek on the clustered index.

sys.dm_db_index_physical_stats

The last object is sys.dm_db_index_physical_stats(). This gives you information about your indexes, how deep they are, how many pages, how fragmented those pages are, etc. It’s extremely useful for determining when to defragment your indexes, but it also helps you understand what is being accessed when, why the optimizer chose to access a given index and more. The key to this particular DMF is that it can be very expensive to run. It has three modes; LIMITED, SAMPLED, and DETAILED. You’re going to get varying amounts of data back, but the DETAILED mode will scan the whole index and this can lead to blocking issues on a production system. Be cautious. Calling this DMF is not that different from the others:

Running the query in this manner can be used to show general information about the index and it’s level of fragmentation. If you’re interested in understanding more about the index, such as the average amount of space left on a page, you can move to SAMPLED or DETAILED. SAMPLED and DETAILED return the same data, but the SAMPLED information is taken from a 1% sample of the index. Although, if the index is less than 10,000 pages in size, DETAILED is used instead of SAMPLED, so that’s something to be aware of.

Running the query above would return a data set similar to this one:

This is all the data returned from a LIMITED scan, which only hits the tree structure above the leaf or the PFS & IAM pages of a heap. You can see the index types, if they store data outside of the row, the depth, average fragmentation in percentages, fragment counts, fragment size in pages and page count.

Conclusion

This is just an overview of what’s possible with these various DMOs. You can combine them with other DMOs to arrive at use information about your systems and the performance there.

Thanks for stopping by Miskatonic University. Please stop by the gift shop on your way out. Just remember to keep those meteor shards they sell there out of your water supply.