This section provides an overview of TOAST (The Oversized-Attribute Storage
Technique).

PostgreSQL uses a fixed page
size (commonly 8 kB), and does not allow tuples to span multiple
pages. Therefore, it is not possible to store very large field
values directly. To overcome this limitation, large field values
are compressed and/or broken up into multiple physical rows. This
happens transparently to the user, with only small impact on most
of the backend code. The technique is affectionately known as
TOAST (or "the best thing since sliced bread").

Only certain data types support TOAST — there is no need to impose the
overhead on data types that cannot produce large field values. To
support TOAST, a data type
must have a variable-length (varlena)
representation, in which the first 32-bit word of any stored
value contains the total length of the value in bytes (including
itself). TOAST does not
constrain the rest of the representation. All the C-level
functions supporting a TOAST-able data type must be careful to
handle TOASTed input values.
(This is normally done by invoking PG_DETOAST_DATUM before doing anything with an
input value, but in some cases more efficient approaches are
possible.)

TOAST usurps two bits of
the varlena length word (the high-order bits on big-endian
machines, the low-order bits on little-endian machines), thereby
limiting the logical size of any value of a TOAST-able data type to 1 GB (230
- 1 bytes). When both bits are zero, the value is an ordinary
un-TOASTed value of the data
type, and the remaining bits of the length word give the total
datum size (including length word) in bytes. When the
highest-order or lowest-order bit is set, the value has only a
single-byte header instead of the normal four-byte header, and
the remaining bits give the total datum size (including length
byte) in bytes. As a special case, if the remaining bits are all
zero (which would be impossible for a self-inclusive length), the
value is a pointer to out-of-line data stored in a separate TOAST
table. (The size of a TOAST pointer is given in the second byte
of the datum.) Values with single-byte headers aren't aligned on
any particular boundary, either. Lastly, when the highest-order
or lowest-order bit is clear but the adjacent bit is set, the
content of the datum has been compressed and must be decompressed
before use. In this case the remaining bits of the length word
give the total size of the compressed datum, not the original
data. Note that compression is also possible for out-of-line data
but the varlena header does not tell whether it has occurred —
the content of the TOAST pointer tells that, instead.

If any of the columns of a table are TOAST-able, the table will have an associated
TOAST table, whose OID is
stored in the table's pg_class.reltoastrelid entry. Out-of-line
TOASTed values are kept in the
TOAST table, as described in
more detail below.

The compression technique used is a fairly simple and very
fast member of the LZ family of compression techniques. See
src/backend/utils/adt/pg_lzcompress.c
for the details.

Out-of-line values are divided (after compression if used)
into chunks of at most TOAST_MAX_CHUNK_SIZE bytes (by default this value
is chosen so that four chunk rows will fit on a page, making it
about 2000 bytes). Each chunk is stored as a separate row in the
TOAST table for the owning
table. Every TOAST table has
the columns chunk_id (an OID
identifying the particular TOASTed value), chunk_seq (a sequence number for the chunk
within its value), and chunk_data
(the actual data of the chunk). A unique index on chunk_id and chunk_seq provides fast retrieval of the
values. A pointer datum representing an out-of-line
TOASTed value therefore needs
to store the OID of the TOAST
table in which to look and the OID of the specific value (its
chunk_id). For convenience, pointer
datums also store the logical datum size (original uncompressed
data length) and actual stored size (different if compression was
applied). Allowing for the varlena header bytes, the total size
of a TOAST pointer datum is
therefore 18 bytes regardless of the actual size of the
represented value.

The TOAST code is triggered
only when a row value to be stored in a table is wider than
TOAST_TUPLE_THRESHOLD bytes (normally 2
kB). The TOAST code will
compress and/or move field values out-of-line until the row value
is shorter than TOAST_TUPLE_TARGET bytes
(also normally 2 kB) or no more gains can be had. During an
UPDATE operation, values of unchanged fields are normally
preserved as-is; so an UPDATE of a row with out-of-line values
incurs no TOAST costs if none
of the out-of-line values change.

The TOAST code recognizes
four different strategies for storing TOAST-able columns:

PLAIN prevents either compression
or out-of-line storage; furthermore it disables use of
single-byte headers for varlena types. This is the only
possible strategy for columns of non-TOAST-able data types.

EXTENDED allows both compression
and out-of-line storage. This is the default for most
TOAST-able data types.
Compression will be attempted first, then out-of-line storage
if the row is still too big.

EXTERNAL allows out-of-line
storage but not compression. Use of EXTERNAL will make substring operations on
wide text and bytea columns faster (at the penalty of increased
storage space) because these operations are optimized to
fetch only the required parts of the out-of-line value when
it is not compressed.

MAIN allows compression but not
out-of-line storage. (Actually, out-of-line storage will
still be performed for such columns, but only as a last
resort when there is no other way to make the row small
enough.)

Each TOAST-able data type
specifies a default strategy for columns of that data type, but
the strategy for a given table column can be altered with
ALTER TABLE SET STORAGE.

This scheme has a number of advantages compared to a more
straightforward approach such as allowing row values to span
pages. Assuming that queries are usually qualified by comparisons
against relatively small key values, most of the work of the
executor will be done using the main row entry. The big values of
TOASTed attributes will only
be pulled out (if selected at all) at the time the result set is
sent to the client. Thus, the main table is much smaller and more
of its rows fit in the shared buffer cache than would be the case
without any out-of-line storage. Sort sets shrink also, and sorts
will more often be done entirely in memory. A little test showed
that a table containing typical HTML pages and their URLs was
stored in about half of the raw data size including the
TOAST table, and that the main
table contained only about 10% of the entire data (the URLs and
some small HTML pages). There was no run time difference compared
to an un-TOASTed comparison
table, in which all the HTML pages were cut down to 7 kB to
fit.