Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I somewhat know the answer to this question already, but I always feel as though there is more I need to pick up on the topic.

My basic understanding is that generally speaking, a single index that just includes all the fields you might be querying/sorting on at any given time isn't likely to be useful, yet I have seen this type of thing. As in, someone thought, "Well, if we just put all this stuff in an index, the database can use it to find what it needs", without having ever seen an execution plan for some of the actual queries being run.

I might see a single index including the name, customerId and dateCreated fields.

But my understanding is that such an index would not be used in a query like, for example:

SELECT [id], [name], [customerId], [dateCreated]
FROM Representatives WHERE customerId=1
ORDER BY dateCreated

For such a query, it seems to me that a better idea would be an index including the customerId and dateCreated fields, with the customerId field being 'first'. This would create an index that would have the data organized in such a way that this query could quickly find what it needs - in the order that it needs.

Another thing I see, perhaps as frequently as the first, is individual indexes on each field; so, one each on name, customerId and dateCreated fields.

Unlike the first example, this type of arrangement seems to me sometimes to at least be partially useful; the query's execution plan may show that at least it's using the index on the customerId to select the records, but it's not using the index with the dateCreated field to sort them.

I know this is a broad question, because the specific answer to any particular query on any particular set of tables is usually to see what the execution plan says it's going to do, and otherwise take the specifics of the table(s) and queries into account. Also, I know that it depends on how often a query might be run as opposed to the overhead of maintaining a particular index for it.

But I suppose what I'm asking is as a general 'starting point' for indexes, does the idea of having specific indexes for specific, frequently-pulled queries and the fields in the WHERE or ORDER BY clauses make sense?

If it found both [customerId] and [customerId], [dateCreated], [name] its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated] were defined it should prefer that over the other two unless you give a specific index hint to the contrary.

It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.

Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated] as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId] if present. While using just [customerId] would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).

+1 ; great info, especially the reminder (which I tend to forget!) that the planner can use a compound index at times when it only needs the first field(s) from it for a query.
–
Andrew BarberJan 5 '11 at 13:03

To answer your original question, yes, indexes have to be designed around the queries, not just the table. Order of fields in the index is vitally important. Designing a single index to be optimal for multiple queries is harder, and you will have to make trade-offs.

Regarding your second point, yes, a bunch of indexes on single individual fields is annoyingly common. I see it all the time in my environment, and its usually a red flag to me that the development team hasn't worked with a DBA to design proper indexes.

My strategy for designing indexes, is to index:

Fields used in WHERE (in order of selectivity)

Fields used in ORDER BY

Include other fields (if necessary) to make a covering index

So for your example:

SELECT [id], [name], [customerId], [dateCreated]
FROM Representatives WHERE customerId=1
ORDER BY dateCreated

I would probably design an index on (CustomerID, dateCreated) INCLUDE (id, name). This covering index means the query doesn't ever have to hit the original table, vastly improving performance.

This example is almost too simple, though. A naive index on just (CustomerID) would perform nearly as well (assuming that each customer only has a single rep, so only a single bookmark lookup to the table will be required). It also might even be beneficial to actually do a clustered index on (CustomerID, ID), depending on what other queries run against the table.