Improving my SQL skills through your questions!

First and foremost, happy spring! I truly hope we’re on the path to summer (although who would know it here – we’re in Florida for SQL Connections and the weather is a bit chilly and it’s been raining off/on today – I hope this is short term (no, I don’t want to look at the forecast as I don’t want to jinx it :)). But, wherever you are – I hope you’re on your way to nicer weather and minimal cold weather (ok, I guess I only have warm wishes for the northern hem… for you southies – I hope your fall is lovely!!).

But, for everyone – now’s a good time to start thinking about cleaning out the [non-literal index] closet… and getting rid of some of those dusty indexes?

Why/when should you get rid of some of your indexes?

It’s possible that some of them aren’t being used at all…

Especially when they’re not being used but even when they’re “redundant” (or minimally useful) they’re still costing you in many ways:

Wasting space on disk

Wasting space in memory (well, if they are being used then they’re cluttering up your cache)

You might be able to reduce your overall indexes with index consolidation…

So, for this post, I’m going to target #1: are there any indexes that just aren’t being used at all…

First, how do you know if your indexes are being used?

In SQL Server 2005 and higher, there’s a DMV (dynamic management view) called sys.dm_db_index_usage_stats and it’s there to track index usage patterns. However, it’s not persisted since the beginning of time and as a result, if you look at this and believe that it’s telling you ALL of the indexes that have been used in your database – then you might be mistaken. The index usage stats DMV is cleared when SQL Server is restarted as well as when you detach/attach the database or when you backup/restore the database.

Therefore, you don’t want to just run the following query and drop all of the indexes that aren’t being used. A better way to “trust” this information is to periodically persist the data from the DMV in your own table and then query it after you’ve completed a business cycle’s worth of activity – logging all of the usage stats. Then you can trust this much more. Again, here are a couple of negatives:

It’s not persisted

It only keeps the database_id and the index_id (which could change over time). You’re right in thinking it probably shouldn’t change but, a nonclustered index’s ID is not permanent so, it’s better to track the index name in addition to the index_id. And, when you run your queries to determine what to delete, you can easily verify indexes against the current indexes because your comparison is within the same db (more on this below).

If you want to persist this, then you have two ways to do this:

Store the index usage patterns in a table within the specific database you’re tracking:

Pro: it goes with the database when you back it up, etc. and, it’s easier to reverse engineer which actual indexes you’re referring to (grabbing the names and not just the IDs).

Con: it’s a bit more complex of a query to run and you’ll need to run it for all of your databases (ok, it’s really not all that bad – but, using something like sp_msForEachDB will really help)

Store the index usage patterns from all databases in a table within master or your own “performance database”:

Pro: you only need one job to handle all the index usage info AND object_name *does* support TWO parameters (object_id and database_id) so, as long as you trap the name at the time of insert then you’ll be good.

Con: it doesn’t go with the database (e.g. backup/restore – and if you’re restoring to a test system and you want to see what the usage patterns were then you’ll need to get this information as well…)

Here’s a simple query that you can run that shows all the indexes used right now – and adds the databasename/objectname into the results – in a persisted table you’ll also want to add the runtime:

SELECT getdate() AS RunTime
, DB_NAME(i.database_id) as DatabaseName
, OBJECT_NAME(i.object_id, i.database_id) as ObjectName
, *
FROM sys.dm_db_index_usage_stats AS i
WHERE object_id > 100

And, if you want to get a few more insghts into how to persist this on Paul’s blog here.

OK, so, you have a few options to think about and I have a few more parts to post! kt

8 Responses to Spring cleaning your indexes – Part I

Hi Paul I’m a little bit suprised that you did not mention another DMV named sys.dm_db_index_operational_stats. In my opiniion that DMV even more important as tells us how often the indexes are actually used during the execution of plans and, thus, which indexes are directly contributing to server performance. What do you think?

To address the first comment: you’re right but, I see it as though there are really two issues here – there’s the simple point of whether or not an index is *ever* used… and, easily, the DMV tracking usage stats is a quick and easy way to see if the index is ever touched. It doesn’t require any parameters, it works across all databases AND it’s actually even more transient than the usage stats.

Here’s a piece from the BOL that quickly sums it up: The data returned by sys.dm_db_index_operational_stats exists only as long as the metadata cache object that represents the heap or index is available. This data is neither persistent nor transactionally consistent. This means you cannot use these counters to determine whether an index has been used or not, or when the index was last used. For information about this, see sys.dm_db_index_usage_stats.

Now, having said that – **IF** an index **IS** being used then the next question to ask is *HOW* is it being used. This is where the operational stats can be even more interesting. It might turn out that one of your "useful" indexes is really only marginally useful and more costly than useful. So, it might be a candidate to consider for consolidation and/or removal as well… that’s part of one of the upcoming parts.

So… you’re definitely right that usage stats isn’t alone in providing some very good information about indexes but, they all have their pros/cons.

Oh, and btw – this is a Kimberly post… but, we get ourselves confused quite often too :) :) :)

Unfortunately there’s no way to clear these. In fact, even if the objects are removed from cache (using DROPCLEANBUFFERS) it still doesn’t remove the information from the DMVs. You’re basically left with a stop/restart for many of these (especially if your test runs across databases). But, if it’s only one database (or a small number of databases) AND, this is a testing environment – you can take the database offline to clear all of THAT database’s information from ALL DMVs (even those that wouldn’t have been cleared through a stop/restart – like index usage stats) and then just bring it back online.

In fact, that’s probably the best way because then you don’t need to restart SQL *and* you only clear that db. But, I have to admit, I don’t know if there are any really negative side effects here (well, I do know one – IF there are any files that are damaged or suspect, then you won’t be able to bring a database back online without FIRST taking all of the individual files offline). So, there could be side effects here if you have other problems. But, if the database is completely intact, then switching it offline/online might be a really easy (and quick) way of doing this.

The concern I have with this DMV is that there are situations where the presence of an index will change how a query plan is generated (for the better) but the index won’t show as being used in the DMV.