Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I've been running an auto-index tool on our MS SQL database (I modified a script originating from Microsoft that looks at the index statistics tables - Automated Auto Indexing). From the stats, I now have a list of recommendations for indexes that need creating.

Edit:The Indexes described above take information from the DMV's that tell you what the database engine would use for indexes if they were available and the scripts take the Top x recommendations (by seeks, user impact etc.) and put these in a table.

(Edit above partially taken from Larry Coleman's answer below in order to clarify what the scripts are doing)

As I am new to database admin, and having had a quick search around the net, I am reluctant to take the plunge and blindly add the recommended indexes. However, not being experienced in the field, I am looking for some advice on how to determine whether the recommendations are necessary or not.

Do I need to run the SQL Profiler, or is it better to examine the code that queries the tables? And do you have any other advice?

6 Answers
6

I use Jason Strate's index analysis scripts. They tell you how much your existing indexes are used as well as how much missing indexes would have been used. I typically don't add indexes unless they make up more than 5 or 10% of the queries on a table.

Most importantly, though, it's about making sure the application responds fast enough for the users.

There are a few concepts and terms that are important to understand when dealing with indexes. Seeks, scans, and lookups are some of the ways that indexes will be utilized through select statements. Selectivity of key columns is integral to determining how effective an index can be.

A seek happens when the SQL Server Query Optimizer determines that the best way to find the data you have requested is by scanning a range within an index. Seeks typically happen when a query is "covered" by an index, which means the seek predicates are in the index key and the displayed columns are either in the key or included. A scan happens when the SQL Server Query Optimizer determines that the best way to find the data is to scan the entire index and then filter the results. A lookup typically occurs when an index does not include all requested columns, either in the index key or in the included columns. The query optimizer will then use either the clustered key (against a clustered index) or the RID (against a heap) to "lookup" the other requested columns.

Typically, seek operations are more efficient than scans, due to physically querying a smaller data set. There are situations where this is not the case, such as a very small initial data set, but that goes beyond the scope of your question.

Now, you asked how to determine how effective an index is, and there are a few things to keep in mind. A clustered index's key columns are called a clustering key. This is how records are made unique in the context of a clustered index. All nonclustered indexes will include the clustered key by default, in order to perform lookups when necessary. All indexes will be inserted to, updated to, or deleted from for every respective DML statement. That having been said, it is best to balance performance gains in select statements against performance hits in insert, delete, and update statements.

In order to determine how effective an index is, you must determine the selectivity of your index keys. Selectivity can be defined as a percentage of distinct records to total records. If I have a [person] table with 100 total records and the [first_name] column contains 90 distinct values, we can say that the [first_name] column is 90% selective. The higher the selectivity, the more efficient the index key. Keeping selectivity in mind, it is best to put your most selective columns first in your index key. Using my previous [person] example, what if we had a [last_name] column that was 95% selective? We would want to create an index with [last_name], [first_name] as the index key.

I know this was a bit long-winded answer, but there really are a lot of things that go into determining how effective an index will be, and a lot things you must weigh any performance gains against.

Thankyou for this explanation. This information combined with my Automation script and the Strate scripts (which I'll only use as a guide, not to actually create the Indexes), should help me in deciding if an index is viable.
–
misterjayteeJan 11 '11 at 22:04

I just want to stress on what has been said above: Indexes slow down your inserts/delete and updates. If you have to say insert a large amount of data in bulk, you are better off without the index (you can create it after, it's faster).
–
ndefontenayApr 21 '11 at 10:28

Normally one goes by having a specific workload (queries) and carefully testing the impact of each new index on the workload. This iterative process should always include careful analysis of the execution plans, which would reveal what indexes are used. The topic of analyzing a query is a lengthy one, and starting with the dedicated MSDN chapter Analyzing a Query is a good bet.

Sometimes when the workload is too complex or the knowledge of the database design is sketchy one uses the Database Engine Tuning Advisor, which does some automatic analysis of your workload and proposes some indices. The proposals should, of course, be carefully analyzed and the impact should be measured immediately.

So if you follow my idea, adding an index and measuring the impact is really just a case of A/B testing: you run your workload w/o the index as a base line, then you run it with the index, measure and compare with the base line and then decide, based on observed and measured metrics, if the impact is beneficial. The workload is best a good quality test suite, but it can also be a replay of a captured workload, see How to: Replay a Trace File.

A more synthetic answer is to look at the sys.dm_db_index_usage_stats view and see how indices are being utilized, but that is usually an approach for doing on-site analysis on an unknown workload (ie. a consultant called to help would probably start with this).

This does some good analysis of which indexes exist, how often they are used and how often the query engine is looking for an index that doesn't exist.

It's guidance is generally good. Sometimes it gets a bit over-suggestive of ideas. I have generally done the following so far:

Removed indexes that have NEVER been read (or maybe less than 50 times a month).

Added the most obvious indexes on foreign keys and fields I know we use a lot.

I haven't added all the recommended indexes, and have gone back a week later to find that they are no longer recommended since the query engine is using some of the other new indexes instead!

Generally you should avoid indexes on:

Very small tables (less than 50 to 200 records): often the query engine is faster if it scans the table rather than load up the index, read, process it etc.

Avoid indexes on columns with Low Cardinality (http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)) on the first mentioned column. E.g. Indexing a gender field (M/F) is of very little use, it's just as practical to scan the table and find the ~50% that match. If it's listed after something more specific in the index (e.g. [date of birth, gender]) that's better - you might want all Males born in a given time span.

Clustered Indexes are good - normally these are based on your primary key. They help the database engine put the data on the disk in good order. Very essential to understand this for the largest tables as a good clustered index often reduces the space the table occupies.

Review Regularly

Queries change, data volumes change, new features are added, old ones removed. You should be looking at them once a month (or more often if you have high volumes) and looking for where you can help the database out!

How many

In a recent video Brent recommends (typically) no more 5 indexes on a table with lots of writing (e.g. orders table), and no more than 10 if it is read a lot more than written (i.e. logging table for analytics) http://www.youtube.com/watch?v=gOsflkQkHjg

Overall

It depends!

Your mileage varies according to database. Cover the obvious (employee surname, order date etc) on your (now/future) larger tables. Monitor, review, and adjust as necessary. It should be part of your routine checklist when managing your database(s) :)

Starting with SQL 2005, SQL Server has DMV's that tell you what the database engine would use for indexes if they were available. The views can tell you which columns should be key columns, which columns should be included, and most importantly, how many times the index would have been used.

A good approach would be to sort the missing indexes query by number of seeks, and consider adding the top indexes first.

It depends on how that table is used. e.g. lets say I've a table which is read a lot of times but updates and inserts are rare. Plus I always query the table on some foreign key column. It will make sense to create (non clustered) index over that foreign key to speed up read queries. But the downside is, your insert,update will become slow.

There are few statistics queries which tell how much time queries are taking. Start with slowest ones. If the query predicate has no index, creating one will help.