11.7 Data Type Storage Requirements

The storage requirements for data vary, according to the storage
engine being used for the table in question. Different storage
engines use different methods for recording the raw data and
different data types. In addition, some engines may compress the
information in a given row, either on a column or entire row
basis, making calculation of the storage requirements for a given
table or column structure.

However, all storage engines must communicate and exchange
information on a given row within a table using the same
structure, and this information is consistent, irrespective of the
storage engine used to write the information to disk.

This sections includes some guideliness and information for the
storage requirements for each data type supported by MySQL,
including details for the internal format and the sizes used by
storage engines that used a fixed size representation for
different types. Information is listed by category or storage
engine.

The internal representation of a table has a maximum row size of
65,535 bytes, even if the storage engine is capable of supporting
larger rows. This figure excludes
BLOB or
TEXT columns, which contribute only
9 to 12 bytes toward this size. For
BLOB and
TEXT data, the information is
stored internally in a different area of memory than the row
buffer. Different storage engines handle the allocation and
storage of this data in different ways, according to the method
they use for handling the corresponding types. For more
information, see Chapter 14, Storage Engines, and
Section D.7.4, “Limits on Table Column Count and Row Size”.

Storage Requirements for InnoDB Tables

Storage Requirements for NDBCLUSTER Tables

Important

For tables using the NDBCLUSTER
storage engine, there is the factor of 4-byte
alignment to be taken into account when calculating
storage requirements. This means that all
NDB data storage is done in
multiples of 4 bytes. Thus, a column value that would take 15
bytes in a table using a storage engine other than
NDB requires 16 bytes in an
NDB table. This requirement applies
in addition to any other considerations that are discussed in
this section. For example, in
NDBCLUSTER tables, the
TINYINT,
SMALLINT,
MEDIUMINT, and
INTEGER
(INT) column types each require 4
bytes storage per record due to the alignment factor.

An exception to this rule is the
BIT type, which is
not 4-byte aligned. In MySQL Cluster
tables, a BIT(M)
column takes M bits of storage space.
However, if a table definition contains 1 or more
BIT columns (up to 32
BIT columns), then
NDBCLUSTER reserves 4 bytes (32
bits) per row for these. If a table definition contains more
than 32 BIT columns (up to 64
such columns), then NDBCLUSTER
reserves 8 bytes (that is, 64 bits) per row.

In addition, while a NULL itself does not
require any storage space,
NDBCLUSTER reserves 4 bytes per row
if the table definition contains any columns defined as
NULL, up to 32 NULL
columns. (If a MySQL Cluster table is defined with more than 32
NULL columns up to 64 NULL
columns, then 8 bytes per row is reserved.)

When calculating storage requirements for MySQL Cluster tables,
you must also remember that every table using the
NDBCLUSTER storage engine requires a
primary key; if no primary key is defined by the user, then a
“hidden” primary key will be created by
NDB. This hidden primary key consumes
31-35 bytes per table record.

The storage requirements for
DECIMAL (and
NUMERIC) are version-specific:

As of MySQL 5.0.3, values for
DECIMAL columns are represented
using a binary format that packs nine decimal (base 10) digits
into four bytes. Storage for the integer and fractional parts of
each value are determined separately. Each multiple of nine digits
requires four bytes, and the “leftover” digits
require some fraction of four bytes. The storage required for
excess digits is given by the following table.

Storage Requirements for String Types

In the following table, M represents
the declared column length in characters for nonbinary string
types and bytes for binary string types.
L represents the actual length in bytes
of a given string value.

Data Type

Storage Required

CHAR(M)

M × w bytes,
0 <= M
<= 255, where w is
the number of bytes required for the maximum-length
character in the character set. See
Section 14.2.9.5, “Physical Row Structure” for information
about CHAR data type storage
requirements for InnoDB tables.

1 or 2 bytes, depending on the number of enumeration values (65,535
values maximum)

SET('value1','value2',...)

1, 2, 3, 4, or 8 bytes, depending on the number of set members (64
members maximum)

Variable-length string types are stored using a length prefix plus
data. The length prefix requires from one to four bytes depending
on the data type, and the value of the prefix is
L (the byte length of the string). For
example, storage for a MEDIUMTEXT
value requires L bytes to store the
value plus three bytes to store the length of the value.

To calculate the number of bytes used to store a particular
CHAR,
VARCHAR, or
TEXT column value, you must take
into account the character set used for that column and whether
the value contains multibyte characters. In particular, when using
the utf8 Unicode character set, you must keep
in mind that not all characters use the same number of bytes and
can require up to three bytes per character. For a breakdown of
the storage used for different categories of
utf8 characters, see
Section 10.1.10, “Unicode Support”.

VARCHAR,
VARBINARY, and the
BLOB and
TEXT types are variable-length
types. For each, the storage requirements depend on these factors:

The actual length of the column value

The column's maximum possible length

The character set used for the column, because some character
sets contain multibyte characters

For example, a VARCHAR(255) column can hold a
string with a maximum length of 255 characters. Assuming that the
column uses the latin1 character set (one byte
per character), the actual storage required is the length of the
string (L), plus one byte to record the
length of the string. For the string 'abcd',
L is 4 and the storage requirement is
five bytes. If the same column is instead declared to use the
ucs2 double-byte character set, the storage
requirement is 10 bytes: The length of 'abcd'
is eight bytes and the column requires two bytes to store lengths
because the maximum length is greater than 255 (up to 510 bytes).

The effective maximum number of bytes that
can be stored in a VARCHAR or
VARBINARY column is subject to the
maximum row size of 65,535 bytes, which is shared among all
columns. For a VARCHAR column that
stores multibyte characters, the effective maximum number of
characters is less. For example,
utf8 characters can require up to three bytes
per character, so a VARCHAR column
that uses the utf8 character set can be
declared to be a maximum of 21,844 characters. See
Section D.7.4, “Limits on Table Column Count and Row Size”.

As of MySQL 5.0.3, the NDBCLUSTER
engine supports only fixed-width columns. This means that a
VARCHAR column from a table in a
MySQL Cluster will behave as follows:

If the size of the column is fewer than 256 characters, the
column requires one byte extra storage per row.

If the size of the column is 256 characters or more, the
column requires two bytes extra storage per row.

The number of bytes required per character varies according to the
character set used. For example, if a
VARCHAR(100) column in a Cluster table uses the
utf8 character set, each character requires 3
bytes storage. This means that each record in such a column takes
up 100 × 3 + 1 = 301 bytes for storage,
regardless of the length of the string actually stored in any
given record. For a VARCHAR(1000) column in a
table using the NDBCLUSTER storage
engine with the utf8 character set, each record
will use 1000 × 3 + 2 = 3002 bytes storage; that
is, the column is 1,000 characters wide, each character requires 3
bytes storage, and each record has a 2-byte overhead because 1,000
>= 256.

TEXT and
BLOB columns are implemented
differently in the NDB Cluster storage engine, wherein each row in
a TEXT column is made up of two
separate parts. One of these is of fixed size (256 bytes), and is
actually stored in the original table. The other consists of any
data in excess of 256 bytes, which is stored in a hidden table.
The rows in this second table are always 2,000 bytes long. This
means that the size of a TEXT
column is 256 if size <= 256 (where
size represents the size of the row);
otherwise, the size is 256 + size +
(2000 – (size – 256) %
2000).

The size of an ENUM object is
determined by the number of different enumeration values. One byte
is used for enumerations with up to 255 possible values. Two bytes
are used for enumerations having between 256 and 65,535 possible
values. See Section 11.4.4, “The ENUM Type”.

The size of a SET object is
determined by the number of different set members. If the set size
is N, the object occupies
(N+7)/8 bytes,
rounded up to 1, 2, 3, 4, or 8 bytes. A
SET can have a maximum of 64
members. See Section 11.4.5, “The SET Type”.

User Comments

Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.

I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.

So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.

Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.----------------------------------------------------------------#!/usr/bin/perluse Data::Dumper;use strict;$| = 1;

The above scripts are not taking into account several important information (so they are outdated)

1. the database/table encoding. If you have an UTF8 encoding for a varchar(100) that it will take up 300 bytes (3 bytes per UTF symbol)"[...]As of MySQL 4.1, to calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character."

2. enum can have either 1 or 2 bytes"[...]The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values."

Here I wrote another script based on Marc's, that takes into account what Alex wrote and more.It calculates VARCHAR/CHAR/TEXT taking CHARSET or COLLATION into account, calculates properly SET and ENUM size, DECIMAL/NUMERIC is calculated according to >5.0.3 packed standard.Calculates also least row byte size for dynamic row length tables.It uses "mysql" and "mysqldump" tools internally.Any argument to this script is provided as an argument for mysqldump.Example: {scriptname} --all-databasesPlease report any bug, especially when it comes to size calculations. Enjoy.

It appears that TEXT fields with no length specified default to a length of 10 Bytes in your script output. However, information_schema.columns.character_maximum_length lists all my text fields as 65535?

Here is an SQL script that can be used to determine maximum space per row for InnoDB tables using the COMPACT row format.

I have tested the results against my database structures loaded with maximum length records @ 100,000 , 500,000 , and 1,000,000 records. The results seem to be fairly accurate.

I based the maximum space calculations for fields using the following MySQL reference above. I based the calculations for InnoDB Compact row format primary and secondary index record headers using the following MySQL reference:

The SQL produces all sizes in Bytes. If the SQL encounters an unknown data type, it assigns a byte value of 999999999999999 Bytes for that field. You must update TABLE_SCHEMA = 'Your Schema Name' in two places. The query add no overhead factor to it's results. Any overhead factor must be added to the results produced by this query.

The formulas above apply to MyISAM. For InnoDB data, the quick answer is to calculate for MyISAM, then double or triple that value.

The more complex way is something like:Step 1: Compute basic length of each field (without length field for VAR fields); add 1 or 2 to that length. (1 if all the fields are 'short')Step 2: Add those together, plus 29 bytes for record overhead.Step 3: Add 40% for the blocks not being full.Step 4: Multiply by the number of rows.

That contorted computation can easily be off by a significant amount, either way.