8.2.1.3 Range Optimization

The range access method
uses a single index to retrieve a subset of table rows that
are contained within one or several index value intervals. It
can be used for a single-part or multiple-part index. The
following sections give a detailed description of how
intervals are extracted from the WHERE
clause.

8.2.1.3.1 The Range Access Method for Single-Part Indexes

For a single-part index, index value intervals can be
conveniently represented by corresponding conditions in the
WHERE clause, so we speak of
range conditions rather than
“intervals.”

The definition of a range condition for a single-part index
is as follows:

For both BTREE and
HASH indexes, comparison of a key
part with a constant value is a range condition when
using the
=,
<=>,
IN(),
IS NULL, or
IS NOT NULL operators.

Additionally, for BTREE indexes,
comparison of a key part with a constant value is a
range condition when using the
>,
<,
>=,
<=,
BETWEEN,
!=,
or
<>
operators, or LIKE
comparisons if the argument to
LIKE is a constant string
that does not start with a wildcard character.

For all types of indexes, multiple range conditions
combined with OR or
AND form a range condition.

“Constant value” in the preceding descriptions
means one of the following:

Any expression composed entirely from subexpressions of
the preceding types

Here are some examples of queries with range conditions in
the WHERE clause:

SELECT * FROM t1
WHERE key_col > 1
AND key_col < 10;
SELECT * FROM t1
WHERE key_col = 1
OR key_col IN (15,18,20);
SELECT * FROM t1
WHERE key_col LIKE 'ab%'
OR key_col BETWEEN 'bar' AND 'foo';

Some nonconstant values may be converted to constants during
the constant propagation phase.

MySQL tries to extract range conditions from the
WHERE clause for each of the possible
indexes. During the extraction process, conditions that
cannot be used for constructing the range condition are
dropped, conditions that produce overlapping ranges are
combined, and conditions that produce empty ranges are
removed.

Consider the following statement, where
key1 is an indexed column and
nonkey is not indexed:

SELECT * FROM t1 WHERE
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z');

Remove nonkey = 4 and key1
LIKE '%b' because they cannot be used for a
range scan. The correct way to remove them is to replace
them with TRUE, so that we do not
miss any matching rows when doing the range scan. Having
replaced them with TRUE, we get:

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
(key1 < 'bar' AND TRUE) OR
(key1 < 'uux' AND key1 > 'z')

Collapse conditions that are always true or false:

(key1 LIKE 'abcde%' OR TRUE) is
always true

(key1 < 'uux' AND key1 >
'z') is always false

Replacing these conditions with constants, we get:

(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

Removing unnecessary TRUE and
FALSE constants, we obtain:

(key1 < 'abc') OR (key1 < 'bar')

Combining overlapping intervals into one yields the
final condition to be used for the range scan:

(key1 < 'bar')

In general (and as demonstrated by the preceding example),
the condition used for a range scan is less restrictive than
the WHERE clause. MySQL performs an
additional check to filter out rows that satisfy the range
condition but not the full WHERE clause.

The range condition extraction algorithm can handle nested
AND/OR
constructs of arbitrary depth, and its output does not
depend on the order in which conditions appear in
WHERE clause.

Currently, MySQL does not support merging multiple ranges
for the range access
method for spatial indexes. To work around this limitation,
you can use a UNION with
identical SELECT statements,
except that you put each spatial predicate in a different
SELECT.

8.2.1.3.2 The Range Access Method for Multiple-Part Indexes

Range conditions on a multiple-part index are an extension
of range conditions for a single-part index. A range
condition on a multiple-part index restricts index rows to
lie within one or several key tuple intervals. Key tuple
intervals are defined over a set of key tuples, using
ordering from the index.

For example, consider a multiple-part index defined as
key1(key_part1,
key_part2,
key_part3), and the
following set of key tuples listed in key order:

The interval covers the 4th, 5th, and 6th tuples in the
preceding data set and can be used by the range access
method.

By contrast, the condition
key_part3 =
'abc' does not define a single interval and cannot
be used by the range access method.

The following descriptions indicate how range conditions
work for multiple-part indexes in greater detail.

For HASH indexes, each interval
containing identical values can be used. This means that
the interval can be produced only for conditions in the
following form:

key_part1cmpconst1
AND key_part2cmpconst2
AND ...
AND key_partNcmpconstN;

Here, const1,
const2, … are
constants, cmp is one of the
=,
<=>,
or IS NULL comparison
operators, and the conditions cover all index parts.
(That is, there are N
conditions, one for each part of an
N-part index.) For example,
the following is a range condition for a three-part
HASH index:

For a BTREE index, an interval might
be usable for conditions combined with
AND, where each condition
compares a key part with a constant value using
=,
<=>,
IS NULL,
>,
<,
>=,
<=,
!=,
<>,
BETWEEN, or
LIKE
'pattern' (where
'pattern'
does not start with a wildcard). An interval can be used
as long as it is possible to determine a single key
tuple containing all rows that match the condition (or
two intervals if
<>
or !=
is used).

The optimizer attempts to use additional key parts to
determine the interval as long as the comparison
operator is
=,
<=>,
or IS NULL. If the operator
is
>,
<,
>=,
<=,
!=,
<>,
BETWEEN, or
LIKE, the
optimizer uses it but considers no more key parts. For
the following expression, the optimizer uses
= from
the first comparison. It also uses
>=
from the second comparison but considers no further key
parts and does not use the third comparison for interval
construction:

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

The single interval is:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

It is possible that the created interval contains more
rows than the initial condition. For example, the
preceding interval includes the value ('foo',
11, 0), which does not satisfy the original
condition.

If conditions that cover sets of rows contained within
intervals are combined with
OR, they form a condition
that covers a set of rows contained within the union of
their intervals. If the conditions are combined with
AND, they form a condition
that covers a set of rows contained within the
intersection of their intervals. For example, for this
condition on a two-part index:

In this example, the interval on the first line uses one
key part for the left bound and two key parts for the
right bound. The interval on the second line uses only
one key part. The key_len column in
the EXPLAIN output
indicates the maximum length of the key prefix used.

In some cases, key_len may indicate
that a key part was used, but that might be not what you
would expect. Suppose that
key_part1 and
key_part2 can be
NULL. Then the
key_len column displays two key part
lengths for the following condition: