8.2.1.16 GROUP BY Optimization

The most general way to satisfy a GROUP BY
clause is to scan the whole table and create a new temporary
table where all rows from each group are consecutive, and then
use this temporary table to discover groups and apply
aggregate functions (if any). In some cases, MySQL is able to
do much better than that and to avoid creation of temporary
tables by using index access.

The most important preconditions for using indexes for
GROUP BY are that all GROUP
BY columns reference attributes from the same index,
and that the index stores its keys in order (for example, this
is a BTREE index and not a
HASH index). Whether use of temporary
tables can be replaced by index access also depends on which
parts of an index are used in a query, the conditions
specified for these parts, and the selected aggregate
functions.

There are two ways to execute a GROUP BY
query through index access, as detailed in the following
sections. In the first method, the grouping operation is
applied together with all range predicates (if any). The
second method first performs a range scan, and then groups the
resulting tuples.

8.2.1.16.1 Loose Index Scan

The most efficient way to process GROUP
BY is when an index is used to directly retrieve
the grouping columns. With this access method, MySQL uses
the property of some index types that the keys are ordered
(for example, BTREE). This property
enables use of lookup groups in an index without having to
consider all keys in the index that satisfy all
WHERE conditions. This access method
considers only a fraction of the keys in an index, so it is
called a loose index scan. When there
is no WHERE clause, a loose index scan
reads as many keys as the number of groups, which may be a
much smaller number than that of all keys. If the
WHERE clause contains range predicates
(see the discussion of the
range join type in
Section 8.8.1, “Optimizing Queries with EXPLAIN”), a loose index scan looks
up the first key of each group that satisfies the range
conditions, and again reads the least possible number of
keys. This is possible under the following conditions:

The query is over a single table.

The GROUP BY names only columns that
form a leftmost prefix of the index and no other
columns. (If, instead of GROUP BY,
the query has a DISTINCT clause, all
distinct attributes refer to columns that form a
leftmost prefix of the index.) For example, if a table
t1 has an index on
(c1,c2,c3), loose index scan is
applicable if the query has GROUP BY c1,
c2,. It is not applicable if the query has
GROUP BY c2, c3 (the columns are not
a leftmost prefix) or GROUP BY c1, c2,
c4 (c4 is not in the
index).

The only aggregate functions used in the select list (if
any) are MIN() and
MAX(), and all of them
refer to the same column. The column must be in the
index and must follow the columns in the GROUP
BY.

Any other parts of the index than those from the
GROUP BY referenced in the query must
be constants (that is, they must be referenced in
equalities with constants), except for the argument of
MIN() or
MAX() functions.

For columns in the index, full column values must be
indexed, not just a prefix. For example, with
c1 VARCHAR(20), INDEX (c1(10)), the
index cannot be used for loose index scan.

If loose index scan is applicable to a query, the
EXPLAIN output shows
Using index for group-by in the
Extra column.

Assume that there is an index
idx(c1,c2,c3) on table
t1(c1,c2,c3,c4). The loose index scan
access method can be used for the following queries:

SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2;

The following queries cannot be executed with this quick
select method, for the reasons given:

8.2.1.16.2 Tight Index Scan

A tight index scan may be either a full index scan or a
range index scan, depending on the query conditions.

When the conditions for a loose index scan are not met, it
still may be possible to avoid creation of temporary tables
for GROUP BY queries. If there are range
conditions in the WHERE clause, this
method reads only the keys that satisfy these conditions.
Otherwise, it performs an index scan. Because this method
reads all keys in each range defined by the
WHERE clause, or scans the whole index if
there are no range conditions, we term it a tight
index scan. With a tight index scan, the
grouping operation is performed only after all keys that
satisfy the range conditions have been found.

For this method to work, it is sufficient that there is a
constant equality condition for all columns in a query
referring to parts of the key coming before or in between
parts of the GROUP BY key. The constants
from the equality conditions fill in any “gaps”
in the search keys so that it is possible to form complete
prefixes of the index. These index prefixes then can be used
for index lookups. If we require sorting of the
GROUP BY result, and it is possible to
form search keys that are prefixes of the index, MySQL also
avoids extra sorting operations because searching with
prefixes in an ordered index already retrieves all the keys
in order.

Assume that there is an index
idx(c1,c2,c3) on table
t1(c1,c2,c3,c4). The following queries do
not work with the loose index scan access method described
earlier, but still work with the tight index scan access
method.

There is a gap in the GROUP BY, but
it is covered by the condition c2 =
'a':

SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;

The GROUP BY does not begin with the
first part of the key, but there is a condition that
provides a constant for that part: