Greater, Less and BETWEEN

The biggest performance risk of an INDEX RANGE SCAN is the leaf node traversal. It is therefore the golden rule of indexing to keep the scanned index range as small as possible. You can check that by asking yourself where an index scan starts and where it ends.

The question is easy to answer if the SQL statement mentions the start and stop conditions explicitly:

Of course an ideal index has to cover both columns, but the question is in which order?

The following figures show the effect of the column order on the scanned index range. For this illustration we search all employees of subsidiary 27 who were born between January 1st and January 9th 1971.

Figure 2.2 visualizes a detail of the index on DATE_OF_BIRTH and SUBSIDIARY_ID—in that order. Where will the database start to follow the leaf node chain, or to put it another way: where will the tree traversal end?

Figure 2.2. Range Scan in DATE_OF_BIRTH, SUBSIDIARY_ID Index

The index is ordered by birth dates first. Only if two employees were born on the same day is the SUBSIDIARY_ID used to sort these records. The query, however, covers a date range. The ordering of SUBSIDIARY_ID is therefore useless during tree traversal. That becomes obvious if you realize that there is no entry for subsidiary 27 in the branch nodes—although there is one in the leaf nodes. The filter on DATE_OF_BIRTH is therefore the only condition that limits the scanned index range. It starts at the first entry matching the date range and ends at the last one—all five leaf nodes shown in Figure 2.2.

The picture looks entirely different when reversing the column order. Figure 2.3 illustrates the scan if the index starts with the SUBSIDIARY_ID column.

Figure 2.3. Range Scan in SUBSIDIARY_ID, DATE_OF_BIRTH Index

The difference is that the equals operator limits the first index column to a single value. Within the range for this value (SUBSIDIARY_ID 27) the index is sorted according to the second column—the date of birth—so there is no need to visit the first leaf node because the branch node already indicates that there is no employee for subsidiary 27 born after June 25th 1969 in the first leaf node.

The tree traversal directly leads to the second leaf node. In this case, all where clause conditions limit the scanned index range so that the scan terminates at the very same leaf node.

Tip

Rule of thumb: index for equality first—then for ranges.

The actual performance difference depends on the data and search criteria. The difference can be negligible if the filter on DATE_OF_BIRTH is very selective on its own. The bigger the date range becomes, the bigger the performance difference will be.

With this example, we can also falsify the myth that the most selective column should be at the leftmost index position. If we look at the figures and consider the selectivity of the first column only, we see that both conditions match 13 records. This is the case regardless whether we filter by DATE_OF_BIRTH only or by SUBSIDIARY_ID only. The selectivity is of no use here, but one column order is still better than the other.

To optimize performance, it is very important to know the scanned index range. With most databases you can even see this in the execution plan—you just have to know what to look for. The following execution plan from the Oracle database unambiguously indicates that the EMP_TEST index starts with the DATE_OF_BIRTH column.

The PostgreSQL database does not indicate index access and filter predicates in the execution plan. However, the Index Cond section lists the columns in order of the index definition. In that case, we see the two DATE_OF_BIRTH predicates first, than the SUBSIDIARY_ID. Knowing that any predicates following a range condition cannot be an access predicate the SUBSIDIARY_ID must be a filter predicate. See Section 2.3 for more details.

The predicate information for the INDEX RANGE SCAN gives the crucial hint. It identifies the conditions of the where clause either as access or as filter predicates. This is how the database tells us how it uses each condition.

Note

The execution plan was simplified for clarity. The appendix explains the details of the “Predicate Information” section in an Oracle execution plan.

The conditions on the DATE_OF_BIRTH column are the only ones listed as access predicates; they limit the scanned index range. The DATE_OF_BIRTH is therefore the first column in the EMP_TEST index. The SUBSIDIARY_ID column is used only as a filter.

Important

The access predicates are the start and stop conditions for an index lookup. They define the scanned index range.

Index filter predicates are applied during the leaf node traversal only. They do not narrow the scanned index range.

The PostgreSQL database does not indicate index access and filter predicates in the execution plan. However, the Index Cond section lists the columns in order of the index definition. In that case, we see the SUBSIDIARY_ID predicate first, than the two on DATE_OF_BIRTH. As there is no further column filtered after the range condition on DATE_OF_BIRTH we know that all predicates can be used as access predicate. See Section 2.3 for more details.