15.6.11.1 Configuring Persistent Optimizer Statistics Parameters

The persistent optimizer statistics feature improves
plan stability by
storing statistics to disk and making them persistent across
server restarts so that the
optimizer is more likely
to make consistent choices each time for a given query.

Formerly, optimizer statistics were cleared on each server
restart and after some other operations, and recomputed on the
next table access. Consequently, different estimates could be
produced when recalculating statistics, leading to different
choices in query execution plans and thus variations in query
performance.

The innodb_stats_auto_recalc
configuration option, which is enabled by default, determines
whether statistics are calculated automatically whenever a
table undergoes substantial changes (to more than 10% of the
rows). You can also configure automatic statistics
recalculation for individual tables using a
STATS_AUTO_RECALC clause in a
CREATE TABLE or
ALTER TABLE statement.
innodb_stats_auto_recalc is
enabled by default.

Because of the asynchronous nature of automatic statistics
recalculation (which occurs in the background), statistics may
not be recalculated instantly after running a DML operation
that affects more than 10% of a table, even when
innodb_stats_auto_recalc is
enabled. In some cases, statistics recalculation may be
delayed by a few seconds. If up-to-date statistics are
required immediately after changing significant portions of a
table, run ANALYZE
TABLE to initiate a synchronous (foreground)
recalculation of statistics.

If innodb_stats_auto_recalc
is disabled, ensure the accuracy of optimizer statistics by
issuing the ANALYZE TABLE
statement for each applicable table after making substantial
changes to indexed columns. You might run this statement in
your setup scripts after representative data has been loaded
into the table, and run it periodically after DML operations
significantly change the contents of indexed columns, or on a
schedule at times of low activity. When a new index is added
to an existing table, index statistics are calculated and
added to the innodb_index_stats table
regardless of the value of
innodb_stats_auto_recalc.

Caution

To ensure statistics are gathered when a new index is
created, either enable the
innodb_stats_auto_recalc
option, or run ANALYZE TABLE
after creating each new index when the persistent statistics
mode is enabled.

STATS_PERSISTENT specifies whether to
enable
persistent
statistics for an InnoDB table.
The value DEFAULT causes the persistent
statistics setting for the table to be determined by the
innodb_stats_persistent
configuration option. The value 1
enables persistent statistics for the table, while the
value 0 turns off this feature. After
enabling persistent statistics through a CREATE
TABLE or ALTER TABLE
statement, issue an ANALYZE
TABLE statement to calculate the statistics,
after loading representative data into the table.

STATS_AUTO_RECALC specifies whether to
automatically recalculate
persistent
statistics for an InnoDB table.
The value DEFAULT causes the persistent
statistics setting for the table to be determined by the
innodb_stats_auto_recalc
configuration option. The value 1
causes statistics to be recalculated when 10% of the data
in the table has changed. The value 0
prevents automatic recalculation for this table; with this
setting, issue an ANALYZE
TABLE statement to recalculate the statistics
after making substantial changes to the table.

STATS_SAMPLE_PAGES specifies the number
of index pages to sample when estimating cardinality and
other statistics for an indexed column, such as those
calculated by ANALYZE
TABLE.

All three clauses are specified in the following
CREATE TABLE example:

15.6.11.1.3 Configuring the Number of Sampled Pages for InnoDB Optimizer Statistics

The MySQL query optimizer uses estimated
statistics about key
distributions to choose the indexes for an execution plan,
based on the relative
selectivity of the
index. Operations such as ANALYZE
TABLE cause InnoDB to sample
random pages from each index on a table to estimate the
cardinality of the
index. (This technique is known as
random dives.)

To give you control over the quality of the statistics
estimate (and thus better information for the query
optimizer), you can change the number of sampled pages using
the parameter
innodb_stats_persistent_sample_pages,
which can be set at runtime.

Statistics are not accurate enough and the
optimizer chooses suboptimal plans, as shown by
EXPLAIN output. The
accuracy of statistics can be checked by comparing the
actual cardinality of an index (as returned by running
SELECT
DISTINCT on the index columns) with the
estimates provided in the
mysql.innodb_index_stats persistent
statistics table.

If a balance cannot be achieved between accurate
statistics and ANALYZE
TABLE execution time, consider decreasing the
number of indexed columns in the table or limiting the
number of partitions to reduce
ANALYZE TABLE complexity.
The number of columns in the table's primary key is also
important to consider, as primary key columns are appended
to each non-unique index.

By default, InnoDB reads uncommitted data
when calculating statistics. In the case of an uncommitted
transaction that deletes rows from a table,
InnoDB excludes records that are
delete-marked when calculating row estimates and index
statistics, which can lead to non-optimal execution plans for
other transactions that are operating on the table
concurrently using a transaction isolation level other than
READ UNCOMMITTED. To avoid
this scenario,
innodb_stats_include_delete_marked
can be enabled to ensure that InnoDB
includes delete-marked records when calculating persistent
optimizer statistics.

15.6.11.1.5 InnoDB Persistent Statistics Tables

The persistent statistics feature relies on the internally
managed tables in the mysql database, named
innodb_table_stats and
innodb_index_stats. These tables are set up
automatically in all install, upgrade, and build-from-source
procedures.

Table 15.3 Columns of innodb_table_stats

Column name

Description

database_name

Database name

table_name

Table name, partition name, or subpartition name

last_update

A timestamp indicating the last time that InnoDB
updated this row

n_rows

The number of rows in the table

clustered_index_size

The size of the primary index, in pages

sum_of_other_index_sizes

The total size of other (non-primary) indexes, in pages

Table 15.4 Columns of innodb_index_stats

Column name

Description

database_name

Database name

table_name

Table name, partition name, or subpartition name

index_name

Index name

last_update

A timestamp indicating the last time that InnoDB
updated this row

stat_name

The name of the statistic, whose value is reported in the
stat_value column

stat_value

The value of the statistic that is named in stat_name
column

sample_size

The number of pages sampled for the estimate provided in the
stat_value column

stat_description

Description of the statistic that is named in the
stat_name column

Both the innodb_table_stats and
innodb_index_stats tables include a
last_update column showing when
InnoDB last updated index statistics, as
shown in the following example:

The innodb_table_stats and
innodb_index_stats tables are ordinary
tables and can be updated manually. The ability to update
statistics manually makes it possible to force a specific
query optimization plan or test alternative plans without
modifying the database. If you manually update statistics,
issue the FLUSH TABLE
tbl_name command to make
MySQL reload the updated statistics.

15.6.11.1.6 InnoDB Persistent Statistics Tables Example

The innodb_table_stats table contains one
row per table. The data collected is demonstrated in the
following example.

To immediately update statistics, run
ANALYZE TABLE (if
innodb_stats_auto_recalc is
enabled, statistics are updated automatically within a few
seconds assuming that the 10% threshold for changed table rows
is reached):

Table statistics for table t1 show the last
time InnoDB updated the table statistics
(2014-03-14 14:36:34), the number of rows
in the table (5), the clustered index size
(1 page), and the combined size of the
other indexes (2 pages).

The innodb_index_stats table contains
multiple rows for each index. Each row in the
innodb_index_stats table provides data
related to a particular index statistic which is named in the
stat_name column and described in the
stat_description column. For example:

size: Where
stat_name=size, the
stat_value column displays the total
number of pages in the index.

n_leaf_pages: Where
stat_name=n_leaf_pages,
the stat_value column displays the
number of leaf pages in the index.

n_diff_pfxNN:
Where
stat_name=n_diff_pfx01,
the stat_value column displays the
number of distinct values in the first column of the
index. Where
stat_name=n_diff_pfx02,
the stat_value column displays the
number of distinct values in the first two columns of the
index, and so on. Additionally, where
stat_name=n_diff_pfxNN,
the stat_description column shows a
comma separated list of the index columns that are
counted.

To further illustrate the
n_diff_pfxNN
statistic, which provides cardinality data, consider the
t1 table example. As shown below, the
t1 table is created with a primary index
(columns a, b), a
secondary index (columns c,
d), and a unique index (columns
e, f):

For the PRIMARY index, there are two
n_diff% rows. The number of rows is equal
to the number of columns in the index.

Note

For non-unique indexes, InnoDB appends
the columns of the primary key.

Where
index_name=PRIMARY
and
stat_name=n_diff_pfx01,
the stat_value is 1,
which indicates that there is a single distinct value in
the first column of the index (column
a). The number of distinct values in
column a is confirmed by viewing the
data in column a in table
t1, in which there is a single distinct
value (1). The counted column
(a) is shown in the
stat_description column of the result
set.

Where
index_name=PRIMARY
and
stat_name=n_diff_pfx02,
the stat_value is 5,
which indicates that there are five distinct values in the
two columns of the index (a,b). The
number of distinct values in columns a
and b is confirmed by viewing the data
in columns a and b
in table t1, in which there are five
distinct values: (1,1),
(1,2), (1,3),
(1,4) and (1,5). The
counted columns (a,b) are shown in the
stat_description column of the result
set.

For the secondary index (i1), there are
four n_diff% rows. Only two columns are
defined for the secondary index (c,d) but
there are four n_diff% rows for the
secondary index because InnoDB suffixes all
non-unique indexes with the primary key. As a result, there
are four n_diff% rows instead of two to
account for the both the secondary index columns
(c,d) and the primary key columns
(a,b).

Where index_name=i1
and
stat_name=n_diff_pfx01,
the stat_value is 1,
which indicates that there is a single distinct value in
the first column of the index (column
c). The number of distinct values in
column c is confirmed by viewing the
data in column c in table
t1, in which there is a single distinct
value: (10). The counted column
(c) is shown in the
stat_description column of the result
set.

Where index_name=i1
and
stat_name=n_diff_pfx02,
the stat_value is 2,
which indicates that there are two distinct values in the
first two columns of the index (c,d).
The number of distinct values in columns
c an d is confirmed
by viewing the data in columns c and
d in table t1, in
which there are two distinct values:
(10,11) and (10,12).
The counted columns (c,d) are shown in
the stat_description column of the
result set.

Where index_name=i1
and
stat_name=n_diff_pfx03,
the stat_value is 2,
which indicates that there are two distinct values in the
first three columns of the index
(c,d,a). The number of distinct values
in columns c, d, and
a is confirmed by viewing the data in
column c, d, and
a in table t1, in
which there are two distinct values:
(10,11,1) and
(10,12,1). The counted columns
(c,d,a) are shown in the
stat_description column of the result
set.

Where index_name=i1
and
stat_name=n_diff_pfx04,
the stat_value is 5,
which indicates that there are five distinct values in the
four columns of the index (c,d,a,b).
The number of distinct values in columns
c, d,
a and b is confirmed
by viewing the data in columns c,
d, a, and
b in table t1, in
which there are five distinct values:
(10,11,1,1),
(10,11,1,2),
(10,11,1,3),
(10,12,1,4) and
(10,12,1,5). The counted columns
(c,d,a,b) are shown in the
stat_description column of the result
set.

For the unique index (i2uniq), there are
two n_diff% rows.

Where
index_name=i2uniq
and
stat_name=n_diff_pfx01,
the stat_value is 2,
which indicates that there are two distinct values in the
first column of the index (column e).
The number of distinct values in column
e is confirmed by viewing the data in
column e in table
t1, in which there are two distinct
values: (100) and
(200). The counted column
(e) is shown in the
stat_description column of the result
set.

Where
index_name=i2uniq
and
stat_name=n_diff_pfx02,
the stat_value is 5,
which indicates that there are five distinct values in the
two columns of the index (e,f). The
number of distinct values in columns e
and f is confirmed by viewing the data
in columns e and f
in table t1, in which there are five
distinct values: (100,101),
(200,102),
(100,103), (200,104)
and (100,105). The counted columns
(e,f) are shown in the
stat_description column of the result
set.