Synopsis

Inputs

Outputs

Description

CLUSTER instructs PostgreSQL to cluster the table specified by
table based on the index
specified by indexname. The index
must already have been defined on tablename.

When a table is clustered, it is physically reordered based on
the index information. Clustering is a one-time operation: when
the table is subsequently updated, the changes are not clustered.
That is, no attempt is made to store new or updated tuples
according to their index order. If one wishes, one can
periodically re-cluster by issuing the command again.

Notes

In cases where you are accessing single rows randomly within
a table, the actual order of the data in the heap table is
unimportant. However, if you tend to access some data more than
others, and there is an index that groups them together, you
will benefit from using CLUSTER.

Another place where CLUSTER is
helpful is in cases where you use an index to pull out several
rows from a table. If you are requesting a range of indexed
values from a table, or a single indexed value that has
multiple rows that match, CLUSTER will
help because once the index identifies the heap page for the
first row that matches, all other rows that match are probably
already on the same heap page, saving disk accesses and
speeding up the query.

During the cluster operation, a temporary copy of the table
is created that contains the table data in the index order.
Temporary copies of each index on the table are created as
well. Therefore, you need free space on disk at least equal to
the sum of the table size and the index sizes.

Because the optimizer records statistics about the ordering
of tables, it is advisable to run ANALYZE on the newly clustered table. Otherwise,
the optimizer may make poor choices of query plans.

There is another way to cluster data. The CLUSTER command reorders the original table
using the ordering of the index you specify. This can be slow
on large tables because the rows are fetched from the heap in
index order, and if the heap table is unordered, the entries
are on random pages, so there is one disk page retrieved for
every row moved. (PostgreSQL
has a cache, but the majority of a big table will not fit in
the cache.) The other way to cluster a table is to use

SELECT columnlist INTO TABLE newtable
FROM table ORDER BY columnlist

which uses the PostgreSQL
sorting code in the ORDER BY clause to create the desired
order; this is usually much faster than an index scan for
unordered data. You then drop the old table, use ALTER TABLE...RENAME to rename newtable to the old name, and recreate
the table's indexes. However, this approach does not preserve
OIDs, constraints, foreign key relationships, granted
privileges, and other ancillary properties of the table --- all
such items must be manually recreated.