12.4.2 The BINARY and VARBINARY Types

The BINARY and VARBINARY
types are similar to CHAR and
VARCHAR, except that they contain
binary strings rather than nonbinary strings. That is, they
contain byte strings rather than character strings. This means
they have the binary character set and
collation, and comparison and sorting are based on the numeric
values of the bytes in the values.

The permissible maximum length is the same for
BINARY and VARBINARY as it
is for CHAR and
VARCHAR, except that the length
for BINARY and VARBINARY
is a length in bytes rather than in characters.

The BINARY and VARBINARY
data types are distinct from the CHAR BINARY
and VARCHAR BINARY data types. For the latter
types, the BINARY attribute does not cause
the column to be treated as a binary string column. Instead, it
causes the binary (_bin) collation for the
column character set to be used, and the column itself contains
nonbinary character strings rather than binary byte strings. For
example, CHAR(5) BINARY is treated as
CHAR(5) CHARACTER SET latin1 COLLATE
latin1_bin, assuming that the default character set is
latin1. This differs from
BINARY(5), which stores 5-bytes binary
strings that have the binary character set
and collation. For information about differences between binary
strings and binary collations for nonbinary strings, see
Section 11.1.8.5, “The binary Collation Compared to _bin Collations”.

If strict SQL mode is not enabled and you assign a value to a
BINARY or VARBINARY column
that exceeds the column's maximum length, the value is truncated
to fit and a warning is generated. For cases of truncation, you
can cause an error to occur (rather than a warning) and suppress
insertion of the value by using strict SQL mode. See
Section 6.1.8, “Server SQL Modes”.

When BINARY values are stored, they are
right-padded with the pad value to the specified length. The pad
value is 0x00 (the zero byte). Values are
right-padded with 0x00 on insert, and no
trailing bytes are removed on select. All bytes are significant
in comparisons, including ORDER BY and
DISTINCT operations. 0x00
bytes and spaces are different in comparisons, with
0x00 < space.

For VARBINARY, there is no padding on insert
and no bytes are stripped on select. All bytes are significant
in comparisons, including ORDER BY and
DISTINCT operations. 0x00
bytes and spaces are different in comparisons, with
0x00 < space.

For those cases where trailing pad bytes are stripped or
comparisons ignore them, if a column has an index that requires
unique values, inserting into the column values that differ only
in number of trailing pad bytes will result in a duplicate-key
error. For example, if a table contains 'a',
an attempt to store 'a\0' causes a
duplicate-key error.

You should consider the preceding padding and stripping
characteristics carefully if you plan to use the
BINARY data type for storing binary data and
you require that the value retrieved be exactly the same as the
value stored. The following example illustrates how
0x00-padding of BINARY
values affects column value comparisons:

When BINARY or VARBINARY values are stored, e.g. from literal strings like 'abc' or 'Hello', there is of course a character set involved. It' s the standard character set of the operating system that is used to translate each character 'a','b','c' or 'H','e','l','o' to its byte value. (Or byte values for multi-byte character sets.)Thus, the operating system, with its standard character set, defines how characters are converted into binary values. Only there is no MySQL character set definition involved.

@Andreas, what it means that a value "has no character set" is that what you describe breaks down if the operating system has a different charset when inserting and when querying. If one user inserts text as binary and he's using for example ASCII, then another user reads the binary as text but he's using utf-16, he'll read a different text.

When you use char types the server takes care of things so that anyone will read the same text that was inserted, regardless of everyone's local encoding settings.

Posted by
Ben Griffin
on
March 25, 2013

I'm not really sure that the documentation explains whether the binary column is indexed as a number or as a binarystring. This becomes relevant regarding where the LSB is in UUID fields. Typically with sequences (eg auto-increment) it's the least significant value which changes most frequently, whereas with UUID() I notice that it is the most significant value which changes most frequently. Which generates the most efficient indices?

In the end, I guess I am wondering if it would be more efficient to index the reverse of the UUID, or leave it as it is?

So the options I'm interested in are as follows (assumption is that a primary field of BINARY(16) is populated with the function):