An index access method can choose whether it supports
concurrent updates of the index by multiple processes. If the
method's pg_am.amconcurrent flag is true, then the core
PostgreSQL system obtains
AccessShareLock on the index during an
index scan, and RowExclusiveLock when
updating the index. Since these lock types do not conflict, the
access method is responsible for handling any fine-grained
locking it may need. An exclusive lock on the index as a whole
will be taken only during index creation, destruction, or
REINDEX. When amconcurrent is false, PostgreSQL still obtains AccessShareLock during index scans, but it obtains
AccessExclusiveLock during any update.
This ensures that updaters have sole use of the index. Note that
this implicitly assumes that index scans are read-only; an access
method that might modify the index during a scan will still have
to do its own locking to handle the case of concurrent scans.

Recall that a backend's own locks never conflict; therefore,
even a non-concurrent index type must be prepared to handle the
case where a backend is inserting or deleting entries in an index
that it is itself scanning. (This is of course necessary to
support an UPDATE that uses the index to
find the rows to be updated.)

Building an index type that supports concurrent updates
usually requires extensive and subtle analysis of the required
behavior. For the b-tree and hash index types, you can read about
the design decisions involved in src/backend/access/nbtree/README and src/backend/access/hash/README.

Aside from the index's own internal consistency requirements,
concurrent updates create issues about consistency between the
parent table (the heap) and the index.
Because PostgreSQL separates
accesses and updates of the heap from those of the index, there
are windows in which the index may be inconsistent with the heap.
We handle this problem with the following rules:

A new heap entry is made before making its index entries.
(Therefore a concurrent index scan is likely to fail to see
the heap entry. This is okay because the index reader would
be uninterested in an uncommitted row anyway. But see
Section 48.5.)

When a heap entry is to be deleted (by VACUUM), all its index entries must be removed
first.

For concurrent index types, an index scan must maintain a
pin on the index page holding the item last returned by
amgettuple, and ambulkdelete cannot delete entries from
pages that are pinned by other backends. The need for this
rule is explained below.

If an index is concurrent then it is possible for an index
reader to see an index entry just before it is removed by
VACUUM, and then to arrive at the
corresponding heap entry after that was removed by VACUUM. (With a nonconcurrent index, this is not
possible because of the conflicting index-level locks that will
be taken out.) This creates no serious problems if that item
number is still unused when the reader reaches it, since an empty
item slot will be ignored by heap_fetch(). But what if a third backend has
already re-used the item slot for something else? When using an
MVCC-compliant snapshot, there is no problem because the new
occupant of the slot is certain to be too new to pass the
snapshot test. However, with a non-MVCC-compliant snapshot (such
as SnapshotNow), it would be possible to
accept and return a row that does not in fact match the scan
keys. We could defend against this scenario by requiring the scan
keys to be rechecked against the heap row in all cases, but that
is too expensive. Instead, we use a pin on an index page as a
proxy to indicate that the reader may still be "in flight" from the index entry to the matching
heap entry. Making ambulkdelete
block on such a pin ensures that VACUUM
cannot delete the heap entry before the reader is done with it.
This solution costs little in run time, and adds blocking
overhead only in the rare cases where there actually is a
conflict.

This solution requires that index scans be "synchronous": we have to fetch each heap tuple
immediately after scanning the corresponding index entry. This is
expensive for a number of reasons. An "asynchronous" scan in which we collect many TIDs
from the index, and only visit the heap tuples sometime later,
requires much less index locking overhead and may allow a more
efficient heap access pattern. Per the above analysis, we must
use the synchronous approach for non-MVCC-compliant snapshots,
but an asynchronous scan is workable for a query using an MVCC
snapshot.

In an amgetmulti index scan, the
access method need not guarantee to keep an index pin on any of
the returned tuples. (It would be impractical to pin more than
the last one anyway.) Therefore it is only safe to use such scans
with MVCC-compliant snapshots.

Submit correction

If you see anything in the documentation that is not correct, does not match
your experience with the particular feature or requires further clarification,
please use
this form
to report a documentation issue.