10.1.9.1 The utf8 Character Set (3-Byte UTF-8 Unicode Encoding)

UTF-8 (Unicode Transformation Format with 8-bit units) is an
alternative way to store Unicode data. It is implemented
according to RFC 3629, which describes encoding sequences that
take from one to four bytes. (An older standard for UTF-8
encoding, RFC 2279, describes UTF-8 sequences that take from
one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this
reason, sequences with five and six bytes are no longer used.)

The idea of UTF-8 is that various Unicode characters are
encoded using byte sequences of different lengths:

Korean, Chinese, and Japanese ideographs use 3-byte or
4-byte sequences.

The utf8 character set in MySQL has these
characteristics:

No support for supplementary characters (BMP characters
only).

A maximum of three bytes per multibyte character.

Exactly the same set of characters is available in
utf8 and ucs2. That is,
they have the same repertoire.

Tip

To save space with UTF-8, use
VARCHAR instead of
CHAR. Otherwise, MySQL must
reserve three bytes for each character in a CHAR
CHARACTER SET utf8 column because that is the
maximum possible character length. For example, MySQL must
reserve 30 bytes for a CHAR(10) CHARACTER SET
utf8 column.