Introduction to Database Indexes

Put simply, database indexes help speed up retrieval of data. The other great benefit of indexes is that your server doesn’t have to work as hard to get the data. They are much the same as book indexes, providing the database with quick jump points on where to find the full reference (or to find the database row).

There are both advantages and disadvantages to using indexes,however.

One disadvantage is they can take up quite a bit of space – check a textbook or reference guide and you’ll see it takes quite a few pages to include those page references.

Another disadvantage is using too many indexes can actually slow your database down. Thinking of a book again, imagine if every “the”, “and” or “at” was included in the index. That would stop the index being useful – the index becomes as big as the text! On top of that, each time a page or database row is updated or removed, the reference or index also has to be updated.

So indexes speed up finding data, but slow down inserting, updating or deleting data.

Some fields are automatically indexed. A primary key or a field marked as ‘unique’ – for example an email address, a userid or a social security number – are automatically indexed so the database can quickly check to make sure that you’re not going to introduce bad data.

So when should a database field be indexed?

The general rule is anything that is used to limit the number of results you’re trying to find.

It’s hard to generalise so we’ll look at some specific but common examples.

Note – the database tables shown below are used as an example only and will not necessarily be the best setup for your particular needs.

In a database table that looks like this:

Note: The SQL code shown below works with both MySQL and PostgreSQL databases.

This allows the database to very quickly match the records from the ‘newsitem’ table to the ‘authors’ table. In database terminology this is called a table join – you should index any fields involved in a table join like this.

Since the ‘authorid’ in the authors table is a primary key, it is already indexed. The same goes for the ‘newsid’ in the news table, so we don’t need to look at those cases.

On a side note, table aliases make things a lot easier to see what’s happening. Using ‘newsitem n’ and ‘authors a’ means we don’t have to write:

However, doing this limits some ways the index can be used. A query against the table that uses both ‘newsid’ and ‘categoryid’ will be able to use this index. A query against the table that only gets the ‘newsid’ will be able to use the index.

A query against that table that only gets the ‘categoryid’ will not be able to use the index.

For a table like this:

CREATE TABLE example ( a int, b int, c int );

With this index:

CREATE INDEX example_index ON example(a,b,c);

It will be used when you check against ‘a’.

It will be used when you check against ‘a’ and ‘b’.

It will be used when you check against ‘a’, ‘b’ and ‘c’.

It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’.

It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it won’t be used to check the ‘c’ column as well.

A query against ‘a’ OR ‘b’ like this:

SELECT a,b,c FROM example where a=1 OR b=2;

Will only be able to use the index to check the ‘a’ column as well – it won’t be able to use it to check the ‘b’ column.

Multi-column indexes have quite specific uses, so check their use carefully.

Now that we’ve seen when we should use indexes, let’s look at when we shouldn’t use them. They can actually slow down your database (some databases may actually choose to ignore the index if there’s no reason to use it).

… looks pretty standard. The ‘active’ field tells us whether the news item is active and ready to be viewed on the site.

So… should we should create an index on this field for a query like this?

SELECT newsid, newstitle FROM news WHERE active=’1′;

No, we shouldn’t.

If most of your content is live, this index will take up extra space and slow the query down because almost all of the fields match this criteria. Imagine 500 news items in the database with 495 being active. It’s quicker to eliminate the ones that aren’t active than it is to list all of the active ones (if you do have an index on the ‘active’ field, some databases will choose to ignore it anyway because it will slow the query down).

The featured field tells us whether the news item should feature on the front page. S
hould we index this field? Yes. Most of our content is not featured, so an index on the ‘featured’ column will be quite useful.

Other examples of when to index a field include if you’re going to order by it in a query. To get the most recent news items, we do a query like this:

SELECT newtitle, newscontent FROM news ORDER BY newsdate DESC;

Creating an index on ‘newsdate’ will allow the database to quickly sort the results so it can fetch the items in the right order. Indexing can be a bit tricky to get right, however there are tools available for each database to help you work out if it’s working as it should.Well there you have it — my introduction to database indexes. Hopefully you’ve learned something from this article and can apply what you’ve learned to your own databases.

22 Responses to “Introduction to Database Indexes”

I think you need to be a bit more "the reader knows absolutly nothing" when describing the table joins. You lost me for a bit there. Perhaps a better step by step hand holding example would be better.

[ Editors note: Sure thing. I'll see what I can come up with for next month! If you're desperate for information and can't wait - drop me a line - chris at interspire dot com and I'll explain it further ]

This was a great explanation of indexes for me – I am self-taught when it comes to databases so the language in this tutorial was very easy for me to understand. Also, you used great examples to help explain your information. THANKS!