8.2.3 Optimizing INFORMATION_SCHEMA Queries

Applications that monitor databases may make frequent use of
INFORMATION_SCHEMA tables. Certain types of
queries for INFORMATION_SCHEMA tables can be
optimized to execute more quickly. The goal is to minimize file
operations (for example, scanning a directory or opening a table
file) to collect the information that makes up these dynamic
tables.

1) Try to use constant lookup values for
database and table names in the WHERE
clause

You can take advantage of this principle as follows:

To look up databases or tables, use expressions that
evaluate to a constant, such as literal values, functions
that return a constant, or scalar subqueries.

Avoid queries that use a nonconstant database name lookup
value (or no lookup value) because they require a scan of
the data directory to find matching database directory
names.

Within a database, avoid queries that use a nonconstant
table name lookup value (or no lookup value) because they
require a scan of the database directory to find matching
table files.

This principle applies to the
INFORMATION_SCHEMA tables shown in the
following table, which shows the columns for which a constant
lookup value enables the server to avoid a directory scan. For
example, if you are selecting from
TABLES, using a constant lookup
value for TABLE_SCHEMA in the
WHERE clause enables a data directory scan to
be avoided.

Use of the literal database name test enables
the server to check only the test database
directory, regardless of how many databases there might be. By
contrast, the following query is less efficient because it
requires a scan of the data directory to determine which
database names match the pattern 'test%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA LIKE 'test%';

For a query that is limited to a specific constant table name,
checks need be made only for the named table within the
corresponding database directory. Example:

Use of the literal table name t1 enables the
server to check only the files for the t1
table, regardless of how many tables there might be in the
test database. By contrast, the following
query requires a scan of the test database
directory to determine which table names match the pattern
't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't%';

The following query requires a scan of the database directory to
determine matching database names for the pattern
'test%', and for each matching database, it
requires a scan of the database directory to determine matching
table names for the pattern 't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test%' AND TABLE_NAME LIKE 't%';

2) Write queries that minimize the number
of table files that must be opened

For queries that refer to certain
INFORMATION_SCHEMA table columns, several
optimizations are available that minimize the number of table
files that must be opened. Example:

In this case, after the server has scanned the database
directory to determine the names of the tables in the database,
those names become available with no further file system
lookups. Thus, TABLE_NAME requires no files
to be opened. The ENGINE (storage engine)
value can be determined by opening the table's
.frm file, without touching other table
files such as the .MYD or
.MYI file.

Some values, such as INDEX_LENGTH for
MyISAM tables, require opening the
.MYD or .MYI file as
well.

The file-opening optimization types are denoted thus:

SKIP_OPEN_TABLE: Table files do not need
to be opened. The information has already become available
within the query by scanning the database directory.

OPEN_FRM_ONLY: Only the table's
.frm file need be opened.

OPEN_TRIGGER_ONLY: Only the table's
.TRG file need be opened.

OPEN_FULL_TABLE: The unoptimized
information lookup. The .frm,
.MYD, and .MYI
files must be opened.

The following list indicates how the preceding optimization
types apply to INFORMATION_SCHEMA table
columns. For tables and columns not named, none of the
optimizations apply.

3) Use
EXPLAIN to determine whether the
server can use INFORMATION_SCHEMA
optimizations for a query

This applies particularly for
INFORMATION_SCHEMA queries that search for
information from more than one database, which might take a long
time and impact performance. The Extra value
in EXPLAIN output indicates
which, if any, of the optimizations described earlier the server
can use to evaluate INFORMATION_SCHEMA
queries. The following examples demonstrate the kinds of
information you can expect to see in the
Extra value.

No lookup values are provided (there is no
WHERE clause), so the server must scan the
data directory and each database directory. For each table thus
identified, the table name and row format are selected.
TABLE_NAME requires no further table files to
be opened (the SKIP_OPEN_TABLE optimization
applies). ROW_FORMAT requires all table files
to be opened (OPEN_FULL_TABLE applies).
EXPLAIN reports
OPEN_FULL_TABLE because it is more expensive
than SKIP_OPEN_TABLE.

No table name lookup value is provided, so the server must scan
the test database directory. For the
TABLE_NAME and TABLE_TYPE
columns, the SKIP_OPEN_TABLE and
OPEN_FRM_ONLY optimizations apply,
respectively. EXPLAIN reports
OPEN_FRM_ONLY because it is more expensive.

For the first EXPLAIN output row:
Constant database and table lookup values enable the server to
avoid directory scans for TABLES values.
References to TABLES.TABLE_NAME require no
further table files.

For the second EXPLAIN output
row: All COLUMNS table values are
OPEN_FRM_ONLY lookups, so
COLUMNS.TABLE_NAME requires the
.frm file to be opened.

The example for joining TABLES with COLUMNS shows good EXPLAIN plan for TABLES, but poor EXPLAIN plan for columns, since the table name and schema for COLUMNS cannot be deduced ahead.But this is just due to poor optimizer's work. Here's how to get the same results, but with far better execution plan; we push the constants down to COLUMNS: