1.2.2 UTF-8 for Metadata

Metadata is “the
data about the data.” Anything that
describes the database—as opposed to
being the contents of the database—is
metadata. Thus column names, database names, user names, version
names, and most of the string results from
SHOW are metadata. This is also
true of the contents of tables in
INFORMATION_SCHEMA because those tables by
definition contain information about database objects.

Representation of metadata must satisfy these requirements:

All metadata must be in the same character set. Otherwise,
neither the SHOW statements
nor SELECT statements for
tables in INFORMATION_SCHEMA would work
properly because different rows in the same column of the
results of these operations would be in different character
sets.

Metadata must include all characters in all languages.
Otherwise, users would not be able to name columns and
tables using their own languages.

To satisfy both requirements, MySQL stores metadata in a Unicode
character set, namely UTF-8. This does not cause any disruption
if you never use accented or non-Latin characters. But if you
do, you should be aware that metadata is in UTF-8.

Storage of metadata using Unicode does not
mean that the server returns headers of columns and the results
of DESCRIBE functions in the
character_set_system character
set by default. When you use SELECT column1 FROM
t, the name column1 itself is
returned from the server to the client in the character set
determined by the value of the
character_set_results system
variable, which has a default value of utf8.
If you want the server to pass metadata results back in a
different character set, use the SET
NAMES statement to force the server to perform
character set conversion. SET
NAMES sets the
character_set_results and other
related system variables. (See
Section 1.4, “Connection Character Sets and Collations”.) Alternatively, a client
program can perform the conversion after receiving the result
from the server. It is more efficient for the client to perform
the conversion, but this option is not always available for all
clients.

Error messages returned from the server to the client are
converted to the client character set automatically, as with
metadata.

If you are using (for example) the
USER() function for comparison or
assignment within a single statement, don't worry. MySQL
performs some automatic conversion for you.

SELECT * FROM t1 WHERE USER() = latin1_column;

This works because the contents of
latin1_column are automatically converted to
UTF-8 before the comparison.

INSERT INTO t1 (latin1_column) SELECT USER();

This works because the contents of
USER() are automatically
converted to latin1 before the assignment.

Although automatic conversion is not in the SQL standard, the
standard does say that every character set is (in terms of
supported characters) a “subset” of Unicode.
Because it is a well-known principle that “what applies to
a superset can apply to a subset,” we believe that a
collation for Unicode can apply for comparisons with non-Unicode
strings. For more information about coercion of strings, see
Section 1.8.4, “Collation Coercibility in Expressions”.