Finding Duplicate Indexes in Large SQL Server Databases

One important component of tuning a large, heavily used database, is to ensure that the tables are indexed optimally: enough indexing, but not too much indexing for the application you are running. There are rules of thumb about index tuning, but the entire issue is so complex that there’s no “silver bullet” solution that will work for every case.

Duplicate indexes in a simple case like this might seem obvious or even silly, but as Mr. McGehee points out in the article at the link above, they often creep into a bigger system as a result of various people tuning it at different times, or because of upgrades applied to third-party or vendor-created databases, or as a result of accidentally, manually creating indexes that duplicate those that are auto-generated by SQL Server.

In a small, simple database, one could simply poke around to locate these redundant indexes by using:

sp_helpindex 'myTable'

which, from our example, might return:

index_name

index_description

index_keys

ind_name

nonclustered located on PRIMARY

LastName

ind_name_1

nonclustered located on PRIMARY

LastName

ind_name_2

nonclustered located on PRIMARY

LastName, FirstName

PK_myTable

clustered, unique, primary key located on PRIMARY

IDNum

The index_keys column can be used to identify indexes on the same column(s), which would be candidates for removal. If a column, or sequence of columns, appears in the same order in the index_keys list, such as LastName, and LastName, FirstName, then the indexes are probably redundant. An index on LastName, FirstName would fulfill the same function as an index on LastName alone, so it’s not necessary to maintain both. Removing the duplicates should speed inserts without compromising the performance of selects.

This problem is considerably more difficult when tuning a large database, which might have been worked on over a period of years by various people, and could have hundreds of tables and indexes. Picking through such a large collection of tables one at a time would be quite time consuming. But there is a way to get SQL Server to help out with that task. The following statement will show all the indexes defined in a database, by pulling information from the SYSINDEXES table:

This query uses the INDEX_COL() function to determine what columns are being indexed, and to separate that list into discrete columns. If you run the query, you will probably find that there are several types of entries. “Normal” indexes will appear in the list, with the columns they index listed in fields col1 – colx. In addition, you’ll find that any statistics maintained in the database are also listed, generally with names starting “_WA_Sys_”. Those statistics entries are also flagged by the function INDEXPROPERTY(), which will fill the column “IsStats” as 1. Statistics are important, but fall outside the scope of our task here; for this exercise we’d like to filter them out. (See the Extra Credit section at the end of this article.)