SQL University: Advanced Indexing – Indexing Strategies

Right, I know it’s Friday and everyone’s tired and looking forward to the weekend, but I do need to finish off this indexing section and I’ll try to keep this short and interesting and hopefully keep everyone awake.

There’s no shortage of information available on how to create indexes. Hell, I’ve written a copious amount myself. Most of these many articles however are written from the point of indexing single queries. What you chose for a where clause, what has to go into the include to create the perfect index for this query. Now that’s all well and good, but I’ve never met a system that had only one query per table (maybe there is such a system out there, but I’ve never found it)

So what I’m going to try to do today is address the topic of a strategy for indexing. How to approach indexing, not for a single query, but for the system as a whole. I won’t be able to cover this in-depth, this is material worthy of an entire book chapter, if not an entire book, but I can at least touch on the essential portions.

Now, there’s two main positions that we could be in when considering indexing strategies for an entire system
1) A brand new system that’s still in development
2) An existing system that’s being used actively.

One at a time…

Indexing strategies for a brand new system

Start by choosing a good clustered index. What makes a good clustered index? Well, it depends 🙂

The clustered index is the base, it will affect each and every nonclustered index, and it’s not trivial to change once the system is in use, so chose carefully. I’m not saying another word on the subject of a clustered index, not today.

Once that’s done…

Design any unique constraints, unique indexes or primary key constraints that are required by the database design. If the primary key should go on the column(s) that were chosen as the clustering key, great, the primary key gets created clustered. If not, then the clustered index goes on the column(s) chosen for the clustered index and the primary key gets created as nonclustered.

Index the foreign keys. These may end up not being the final indexes on these columns, but it’s an excellent place to start and it’s something that’s left out far, far too often.

That’s the absolute bare minimum that must be done. That can all be done with just the database design. The rest is going to require some knowledge of the queries that will be running against the server.

Speak to the developers, see what queries they’re going to be sending to the database, speak to the business analysts (or users) and see what think will be the commonly used aspects. Bear in mind that both the developers and business analysts (or users) may well be wrong. Not intentionally wrong, but wrong because they’re looking at things from a different perspective.

To give an example of that, a system I worked on some time back had custom security built-in to the DB (tables storing access rights to various sets of data). The users swore that the most accessed portion of the system was the address book. The developers claimed that the account balances procedure would have the heaviest impact. A trace showed that the custom security ran far more frequently than anything else.

Hence, if you can trace a workload (from automated testing, user testing or acceptance testing is the best) you should. Combine that with what the developers and business analysts say and that should be reasonably accurate.

Find the most critical queries, the ones that are going to run often. These may be ‘housekeeping’ queries like the custom security, or maybe they’ll be queries run when the user opens the app. It’s going to differ for everyone.

Create a minimal set of indexes to support the most critical, most frequent queries. Do not, at this point, try to index everything. It’s going to be a waste of time without accurate stats of how the users really use the system. Create just a minimum of indexes to start with. You want enough so that the system runs acceptably but not so many that you’ll be cleaning unused indexes off for months after the system goes live.

I’m not going to go into detail here on how to create indexes, see all the links that I gave in the first part of this series. There are, however, a few things to keep in mind

Once done, come back to the indexes after the system is in use and re-evaluate.

Indexing strategies for an established system

This one depends heavily on how badly the system is indexed.

If tables are missing clustered indexes or primary keys that should be the first priority. This is harder than for a new system as, in the absence of enforced constraints, duplicate values could have crept into the supposedly unique columns.

If primary keys can’t easily be added (due to errant data), the clustered indexes should still be considered. All the same considerations as for a new system apply here. It’s worth noting that adding clustered indexes to huge tables is not a quick exercise.

There are two factors to fixing the indexing for an existing system

creating or widening indexes to support the current workload

removing indexes that are not used

Creating or widening indexes

This should be done based on the workload. Either SQLTrace or Extended Events can be used to capture the queries running against the server. This can be examined manually or it can be submitted to the Database Tuning Adviser (DTA).

If using DTA, that must not be the entire story. DTA recommendations have to be carefully examined and tested and implemented only if they make sense (and improve performance). Be wary of accidentally creating redundant indexes this way (How can you tell if an index is REALLY a duplicate?)

If manually tuning, the same cautions as mentioned in the ‘Indexing a new system’ apply.

Also, be aware that you can’t create perfect indexes for all queries (except maybe in a data warehouse, but likely not even then). Index for the important (frequent or high priority) queries, create indexes that can support multiple queries. Tune the system, not the individual query. See Indexes: just because you can, doesn’t mean you should! (Kimberly Tripp).

If the less important, less frequently run queries aren’t quite as optimal as they could be (but are still in the acceptable range), that’s fine. Be very careful of over-indexing.

Removing indexes

To be honest, this is fraught with peril. Telling that an index is unused is not as easy as it may seem.

Start with the sys.dm_db_index_usage_stats DMV. Indexes that have no seeks, no scans and just updates, or ones that don’t appear in there at all may appear to be unused. Whether they are really unused however is another question.

The sys.dm_db_index_usage_stats DMV is cleared by a restart of SQL or any time the database is closed (beware auto_close). So if the SQL instance has only been running for three days, then the best that can be said about indexes that appear in sys.dm_db_index_usage_stats with no seeks and no scans is that they haven’t been used in three days.

Before deciding to drop any indexes monitor that DMV over a period of time. How long? Depends on your application. If it’s an app that has a steady and consistent usage, maybe not long. If it’s an app that has radically different usage patterns at different times of the month/year, then long enough that you capture them all.

Also, make sure you keep documentation and preferably scripts of any indexes dropped, so that you can easily recreate them should it be necessary.

In conclusion

Comprehensive indexing strategies are not exactly easy things to write short posts on, but I hope this has given some idea on how to fit all the pieces together. For another view on this topic, be sure to watch Kimberly’s video on indexing strategies: http://technet.microsoft.com/en-us/sqlserver/gg545006.aspx

Oh, I almost forgot the answer to Wednesday’s homework (for anyone that’s still awake). There are likely many answers, but this one satisfies the requirements:

CREATE NONCLUSTERED INDEX idx_HomeworkAnswer
ON dbo.CallLog (Severity, LastUpdateDate, CallStatus)
INCLUDE (AssignedTo, LogDate)
WHERE LastUpdateDate IS NOT NULL AND Severity IN (1,2)

Stats aren’t a huge problem if the DB has auto_create on. If the index is dropped and hence the stats go, SQL will recreate them if necessary.

On the uniqueness, guess I should have called that out specifically. An index that’s being used for a unique constraint should never be considered unused, as it’s enforcing uniqueness. Only case there is if there are 2 indexes enforcing uniqueness on exactly the same column, then one can go, otherwise a unique index is part of the table’s design, not just an index.

If you have sufficiently normalized your tables there shouldn’t be too much danger of over-indexing because clustered index and foreign key should be the only way you’ll look into the table. Is there a scenario where this is not the case? I’d like to know your thoughts on indexing and highly normalized data.