Comment

The sales table has a million rows, 50,000 of which match category_id 1 and 5,000 of which match salesperson_id 2. But there are only 100 rows that match both. You have two indexes, one on category_id and one on salesperson_id.

When MySQL goes to evaluate this query, it will only ever use one index per relation (table) to retrieve the data. If it chooses to use the category_id index, it will read 50,000 rows off disk and filter them by salesperson_id in memory, and return to you 100 rows. The same goes for the salesperson_id index, except that it will read 5,000 rows and filter by category_id in memory.

In general, the name of the game is to reduce disk I/O, so you go with the index that results in the fewest rows read from disk.

Now let's say you drop those two indexes and instead add a single index (category_id, salesperson_id). Then you'll only be reading 100 rows when you evaluate this query.

While it sounds much better, if you were to change the query to

SELECT product_id FROM sales WHERE salesperson_id = 2;

That index we just added can't be used, and we'll get a table scan (we read 1 million rows), because we must provide the prefix (category_id, in this case), in order for the compound index to be useful. The article gmouse suggested explores this in detail.

If you look at the results of that query, it should be easy for you to visually find all the rows where page_id = 1, right? They're all clumped together nicely.

But if you're just looking for user_id = 1 without a page_id, well, all of the sudden its not very easy to do visually, because they seem to be all over the place, because they are sorted first by page_id, which you aren't asking about, and then again by zone_id.

So if you asked for page_id = 1 and zone_id = 2, visually, you'd be able to find the rows easily, because you'd first find where the rows for page_id = 1 start in your results, and then start looking for zone_id = 2. The same goes if you add in user_id.

Comment

Scenario #1 doesn't use the primary key* and scenario #2 will use the page_id part of the key.

*Not to confuse you more, but technically this query in scenario #1 will be evaluated only using the primary key, since the key contains all the columns required to produce the result set without any table lookups. If you selected out a column not in the index, then a table scan would result instead. You can actually see this in the output of EXPLAIN when "using index" shows up.