This section provides an overview of the page format used
within PostgreSQL tables and
indexes.[1] Sequences and
TOAST tables are formatted
just like a regular table.

In the following explanation, a byte
is assumed to contain 8 bits. In addition, the term item refers to an individual data value that is
stored on a page. In a table, an item is a row; in an index, an
item is an index entry.

Every table and index is stored as an array of pages of a fixed size (usually 8Kb, although a
different page size can be selected when compiling the server).
In a table, all the pages are logically equivalent, so a
particular item (row) can be stored in any page. In indexes, the
first page is generally reserved as a metapage holding control information, and there
may be different types of pages within the index, depending on
the index access method.

Table 49-2
shows the overall layout of a page. There are five parts to each
page.

Table 49-2. Overall Page Layout

Item

Description

PageHeaderData

20 bytes long. Contains general information about the
page, including free space pointers.

ItemIdData

Array of (offset,length) pairs pointing to the actual
items. 4 bytes per item.

Free space

The unallocated space. New item pointers are
allocated from the start of this area, new items from the
end.

The first 20 bytes of each page consists of a page header
(PageHeaderData). Its format is detailed in Table 49-3.
The first two fields track the most recent WAL entry related to
this page. They are followed by three 2-byte integer fields
(pd_lower, pd_upper, and pd_special). These contain byte offsets from
the page start to the start of unallocated space, to the end of
unallocated space, and to the start of the special space. The
last 2 bytes of the page header, pd_pagesize_version, store both the page size
and a version indicator. Beginning with PostgreSQL 8.0 the version number is 2;
PostgreSQL 7.3 and 7.4 used
version number 1; prior releases used version number 0. (The
basic page layout and header format has not changed in these
versions, but the layout of heap row headers has.) The page size
is basically only present as a cross-check; there is no support
for having more than one page size in an installation.

Table 49-3. PageHeaderData Layout

Field

Type

Length

Description

pd_lsn

XLogRecPtr

8 bytes

LSN: next byte after last byte of xlog record for
last change to this page

pd_tli

TimeLineID

4 bytes

TLI of last change

pd_lower

LocationIndex

2 bytes

Offset to start of free space

pd_upper

LocationIndex

2 bytes

Offset to end of free space

pd_special

LocationIndex

2 bytes

Offset to start of special space

pd_pagesize_version

uint16

2 bytes

Page size and layout version number information

All the details may be found in src/include/storage/bufpage.h.

Following the page header are item identifiers (ItemIdData), each requiring four bytes. An item
identifier contains a byte-offset to the start of an item, its
length in bytes, and a few attribute bits which affect its
interpretation. New item identifiers are allocated as needed from
the beginning of the unallocated space. The number of item
identifiers present can be determined by looking at pd_lower, which is increased to allocate a new
identifier. Because an item identifier is never moved until it is
freed, its index may be used on a long-term basis to reference an
item, even when the item itself is moved around on the page to
compact free space. In fact, every pointer to an item (ItemPointer, also known as CTID) created by PostgreSQL consists of a page number and the
index of an item identifier.

The items themselves are stored in space allocated backwards
from the end of unallocated space. The exact structure varies
depending on what the table is to contain. Tables and sequences
both use a structure named HeapTupleHeaderData, described below.

The final section is the "special
section" which may contain anything the access method
wishes to store. For example, b-tree indexes store links to the
page's left and right siblings, as well as some other data
relevant to the index structure. Ordinary tables do not use a
special section at all (indicated by setting pd_special to equal the page size).

All table rows are structured in the same way. There is a
fixed-size header (occupying 27 bytes on most machines), followed
by an optional null bitmap, an optional object ID field, and the
user data. The header is detailed in Table
49-4. The actual user data (columns of the row) begins at the
offset indicated by t_hoff, which
must always be a multiple of the MAXALIGN distance for the
platform. The null bitmap is only present if the HEAP_HASNULL bit is set in t_infomask. If it is present it begins just
after the fixed header and occupies enough bytes to have one bit
per data column (that is, t_natts
bits altogether). In this list of bits, a 1 bit indicates
not-null, a 0 bit is a null. When the bitmap is not present, all
columns are assumed not-null. The object ID is only present if
the HEAP_HASOID bit is set in t_infomask. If present, it appears just before
the t_hoff boundary. Any padding
needed to make t_hoff a MAXALIGN
multiple will appear between the null bitmap and the object ID.
(This in turn ensures that the object ID is suitably
aligned.)

Table 49-4. HeapTupleHeaderData Layout

Field

Type

Length

Description

t_xmin

TransactionId

4 bytes

insert XID stamp

t_cmin

CommandId

4 bytes

insert CID stamp

t_xmax

TransactionId

4 bytes

delete XID stamp

t_cmax

CommandId

4 bytes

delete CID stamp (overlays with t_xvac)

t_xvac

TransactionId

4 bytes

XID for VACUUM operation moving a row version

t_ctid

ItemPointerData

6 bytes

current TID of this or newer row version

t_natts

int16

2 bytes

number of attributes

t_infomask

uint16

2 bytes

various flag bits

t_hoff

uint8

1 byte

offset to user data

All the details may be found in src/include/access/htup.h.

Interpreting the actual data can only be done with information
obtained from other tables, mostly pg_attribute. The key values needed to identify
field locations are attlen and
attalign. There is no way to
directly get a particular attribute, except when there are only
fixed width fields and no NULLs. All this trickery is wrapped up
in the functions heap_getattr, fastgetattr and heap_getsysattr.

To read the data you need to examine each attribute in turn.
First check whether the field is NULL according to the null
bitmap. If it is, go to the next. Then make sure you have the
right alignment. If the field is a fixed width field, then all
the bytes are simply placed. If it's a variable length field
(attlen = -1) then it's a bit more complicated. All
variable-length datatypes share the common header structure
varattrib, which includes the total length
of the stored value and some flag bits. Depending on the flags,
the data may be either inline or in a TOAST table; it might be compressed, too (see
Section 49.2).

Notes

Actually, index access methods need not use this page
format. All the existing index methods do use this basic
format, but the data kept on index metapages usually doesn't
follow the item layout rules.