26.5. Hot
Standby

Hot Standby is the term used to describe the ability to
connect to the server and run read-only queries while the server
is in archive recovery or standby mode. This is useful both for
replication purposes and for restoring a backup to a desired
state with great precision. The term Hot Standby also refers to
the ability of the server to move from recovery through to normal
operation while users continue running queries and/or keep their
connections open.

Running queries in hot standby mode is similar to normal query
operation, though there are several usage and administrative
differences explained below.

26.5.1. User's Overview

When the hot_standby
parameter is set to true on a standby server, it will begin
accepting connections once the recovery has brought the system
to a consistent state. All such connections are strictly
read-only; not even temporary tables may be written.

The data on the standby takes some time to arrive from the
primary server so there will be a measurable delay between
primary and standby. Running the same query nearly
simultaneously on both primary and standby might therefore
return differing results. We say that data on the standby is
eventually consistent with the
primary. Once the commit record for a transaction is replayed
on the standby, the changes made by that transaction will be
visible to any new snapshots taken on the standby. Snapshots
may be taken at the start of each query or at the start of each
transaction, depending on the current transaction isolation
level. For more details, see Section 13.2.

Transactions started during hot standby may issue the
following commands:

Query access - SELECT,
COPY TO

Cursor commands - DECLARE, FETCH, CLOSE

Parameters - SHOW,
SET, RESET

Transaction management commands

BEGIN, END, ABORT, START
TRANSACTION

SAVEPOINT,
RELEASE, ROLLBACK TO SAVEPOINT

EXCEPTION blocks
and other internal subtransactions

LOCK TABLE, though only
when explicitly in one of these modes: ACCESS SHARE, ROW
SHARE or ROW
EXCLUSIVE.

Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD

Plugins and extensions - LOAD

Transactions started during hot standby will never be
assigned a transaction ID and cannot write to the system
write-ahead log. Therefore, the following actions will produce
error messages:

Data Manipulation Language (DML) - INSERT, UPDATE, DELETE, COPY
FROM, TRUNCATE. Note
that there are no allowed actions that result in a
trigger being executed during recovery. This restriction
applies even to temporary tables, because table rows
cannot be read or written without assigning a transaction
ID, which is currently not possible in a Hot Standby
environment.

Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT. This restriction applies even
to temporary tables, because carrying out these
operations would require updating the system catalog
tables.

SELECT ... FOR SHARE |
UPDATE, because row locks cannot be taken without
updating the underlying data files.

Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED, ROLLBACK PREPARED because even read-only
transactions need to write WAL in the prepare phase (the
first phase of two phase commit).

Sequence updates - nextval(), setval()

LISTEN, UNLISTEN, NOTIFY

In normal operation, “read-only” transactions are allowed to
use LISTEN, UNLISTEN, and NOTIFY, so Hot Standby sessions operate under
slightly tighter restrictions than ordinary read-only sessions.
It is possible that some of these restrictions might be
loosened in a future release.

During hot standby, the parameter transaction_read_only is always true and may
not be changed. But as long as no attempt is made to modify the
database, connections during hot standby will act much like any
other database connection. If failover or switchover occurs,
the database will switch to normal processing mode. Sessions
will remain connected while the server changes mode. Once hot
standby finishes, it will be possible to initiate read-write
transactions (even from a session begun during hot
standby).

Users will be able to tell whether their session is
read-only by issuing SHOW
transaction_read_only. In addition, a set of functions
(Table 9.80)
allow users to access information about the standby server.
These allow you to write programs that are aware of the current
state of the database. These can be used to monitor the
progress of recovery, or to allow you to write complex programs
that restore the database to particular states.

26.5.2. Handling Query
Conflicts

The primary and standby servers are in many ways loosely
connected. Actions on the primary will have an effect on the
standby. As a result, there is potential for negative
interactions or conflicts between them. The easiest conflict to
understand is performance: if a huge data load is taking place
on the primary then this will generate a similar stream of WAL
records on the standby, so standby queries may contend for
system resources, such as I/O.

There are also additional types of conflict that can occur
with Hot Standby. These conflicts are hard conflicts in the sense that
queries might need to be canceled and, in some cases, sessions
disconnected to resolve them. The user is provided with several
ways to handle these conflicts. Conflict cases include:

Access Exclusive locks taken on the primary server,
including both explicit LOCK
commands and various DDL actions, conflict with table
accesses in standby queries.

Dropping a tablespace on the primary conflicts with
standby queries using that tablespace for temporary work
files.

Dropping a database on the primary conflicts with
sessions connected to that database on the standby.

Application of a vacuum cleanup record from WAL
conflicts with standby transactions whose snapshots can
still “see” any of the rows to be
removed.

Application of a vacuum cleanup record from WAL
conflicts with queries accessing the target page on the
standby, whether or not the data to be removed is
visible.

On the primary server, these cases simply result in waiting;
and the user might choose to cancel either of the conflicting
actions. However, on the standby there is no choice: the
WAL-logged action already occurred on the primary so the
standby must not fail to apply it. Furthermore, allowing WAL
application to wait indefinitely may be very undesirable,
because the standby's state will become increasingly far behind
the primary's. Therefore, a mechanism is provided to forcibly
cancel standby queries that conflict with to-be-applied WAL
records.

An example of the problem situation is an administrator on
the primary server running DROP
TABLE on a table that is currently being queried on the
standby server. Clearly the standby query cannot continue if
the DROP TABLE is applied on the
standby. If this situation occurred on the primary, the
DROP TABLE would wait until the
other query had finished. But when DROP
TABLE is run on the primary, the primary doesn't have
information about what queries are running on the standby, so
it will not wait for any such standby queries. The WAL change
records come through to the standby while the standby query is
still running, causing a conflict. The standby server must
either delay application of the WAL records (and everything
after them, too) or else cancel the conflicting query so that
the DROP TABLE can be applied.

When a conflicting query is short, it's typically desirable
to allow it to complete by delaying WAL application for a
little bit; but a long delay in WAL application is usually not
desirable. So the cancel mechanism has parameters,
max_standby_archive_delay and
max_standby_streaming_delay, that define the maximum
allowed delay in WAL application. Conflicting queries will be
canceled once it has taken longer than the relevant delay
setting to apply any newly-received WAL data. There are two
parameters so that different delay values can be specified for
the case of reading WAL data from an archive (i.e., initial
recovery from a base backup or “catching up” a
standby server that has fallen far behind) versus reading WAL
data via streaming replication.

In a standby server that exists primarily for high
availability, it's best to set the delay parameters relatively
short, so that the server cannot fall far behind the primary
due to delays caused by standby queries. However, if the
standby server is meant for executing long-running queries,
then a high or even infinite delay value may be preferable.
Keep in mind however that a long-running query could cause
other sessions on the standby server to not see recent changes
on the primary, if it delays application of WAL records.

Once the delay specified by max_standby_archive_delay or max_standby_streaming_delay has been exceeded,
conflicting queries will be canceled. This usually results just
in a cancellation error, although in the case of replaying a
DROP DATABASE the entire
conflicting session will be terminated. Also, if the conflict
is over a lock held by an idle transaction, the conflicting
session is terminated (this behavior might change in the
future).

Canceled queries may be retried immediately (after beginning
a new transaction, of course). Since query cancellation depends
on the nature of the WAL records being replayed, a query that
was canceled may well succeed if it is executed again.

Keep in mind that the delay parameters are compared to the
elapsed time since the WAL data was received by the standby
server. Thus, the grace period allowed to any one query on the
standby is never more than the delay parameter, and could be
considerably less if the standby has already fallen behind as a
result of waiting for previous queries to complete, or as a
result of being unable to keep up with a heavy update load.

The most common reason for conflict between standby queries
and WAL replay is “early cleanup”. Normally, PostgreSQL allows cleanup of old row
versions when there are no transactions that need to see them
to ensure correct visibility of data according to MVCC rules.
However, this rule can only be applied for transactions
executing on the master. So it is possible that cleanup on the
master will remove row versions that are still visible to a
transaction on the standby.

Experienced users should note that both row version cleanup
and row version freezing will potentially conflict with standby
queries. Running a manual VACUUM
FREEZE is likely to cause conflicts even on tables with
no updated or deleted rows.

Users should be clear that tables that are regularly and
heavily updated on the primary server will quickly cause
cancellation of longer running queries on the standby. In such
cases the setting of a finite value for max_standby_archive_delay or max_standby_streaming_delay can be considered
similar to setting statement_timeout.

Remedial possibilities exist if the number of standby-query
cancellations is found to be unacceptable. The first option is
to set the parameter hot_standby_feedback, which prevents
VACUUM from removing recently-dead
rows and so cleanup conflicts do not occur. If you do this, you
should note that this will delay cleanup of dead rows on the
primary, which may result in undesirable table bloat. However,
the cleanup situation will be no worse than if the standby
queries were running directly on the primary server, and you
are still getting the benefit of off-loading execution onto the
standby. If standby servers connect and disconnect frequently,
you might want to make adjustments to handle the period when
hot_standby_feedback feedback is
not being provided. For example, consider increasing
max_standby_archive_delay so that
queries are not rapidly canceled by conflicts in WAL archive
files during disconnected periods. You should also consider
increasing max_standby_streaming_delay to avoid rapid
cancellations by newly-arrived streaming WAL entries after
reconnection.

Another option is to increase vacuum_defer_cleanup_age
on the primary server, so that dead rows will not be cleaned up
as quickly as they normally would be. This will allow more time
for queries to execute before they are canceled on the standby,
without having to set a high max_standby_streaming_delay. However it is
difficult to guarantee any specific execution-time window with
this approach, since vacuum_defer_cleanup_age is measured in
transactions executed on the primary server.

The number of query cancels and the reason for them can be
viewed using the pg_stat_database_conflicts system view on
the standby server. The pg_stat_database system view also contains
summary information.

26.5.3. Administrator's
Overview

If hot_standby is on in postgresql.conf (the default value) and there
is a recovery.conf file present,
the server will run in Hot Standby mode. However, it may take
some time for Hot Standby connections to be allowed, because
the server will not accept connections until it has completed
sufficient recovery to provide a consistent state against which
queries can run. During this period, clients that attempt to
connect will be refused with an error message. To confirm the
server has come up, either loop trying to connect from the
application, or look for these messages in the server logs:

LOG: entering standby mode
... then some time later ...
LOG: consistent recovery state reached
LOG: database system is ready to accept read only connections

Consistency information is recorded once per checkpoint on
the primary. It is not possible to enable hot standby when
reading WAL written during a period when wal_level was not set to replica or logical on the primary. Reaching a consistent
state can also be delayed in the presence of both of these
conditions:

A write transaction has more than 64
subtransactions

Very long-lived write transactions

If you are running file-based log shipping ("warm standby"),
you might need to wait until the next WAL file arrives, which
could be as long as the archive_timeout setting on the primary.

The setting of some parameters on the standby will need
reconfiguration if they have been changed on the primary. For
these parameters, the value on the standby must be equal to or
greater than the value on the primary. Therefore, if you want
to increase these values, you should do so on all standby
servers first, before applying the changes to the primary
server. Conversely, if you want to decrease these values, you
should do so on the primary server first, before applying the
changes to all standby servers. If these parameters are not set
high enough then the standby will refuse to start. Higher
values can then be supplied and the server restarted to begin
recovery again. These parameters are:

max_connections

max_prepared_transactions

max_locks_per_transaction

max_worker_processes

It is important that the administrator select appropriate
settings for
max_standby_archive_delay and
max_standby_streaming_delay. The best choices vary
depending on business priorities. For example if the server is
primarily tasked as a High Availability server, then you will
want low delay settings, perhaps even zero, though that is a
very aggressive setting. If the standby server is tasked as an
additional server for decision support queries then it might be
acceptable to set the maximum delay values to many hours, or
even -1 which means wait forever for queries to complete.

Transaction status "hint bits" written on the primary are
not WAL-logged, so data on the standby will likely re-write the
hints again on the standby. Thus, the standby server will still
perform disk writes even though all users are read-only; no
changes occur to the data values themselves. Users will still
write large sort temporary files and re-generate relcache info
files, so no part of the database is truly read-only during hot
standby mode. Note also that writes to remote databases using
dblink module, and other
operations outside the database using PL functions will still
be possible, even though the transaction is read-only
locally.

The following types of administration commands are not
accepted during recovery mode:

Data Definition Language (DDL) - e.g. CREATE INDEX

Privilege and Ownership - GRANT, REVOKE, REASSIGN

Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX

Again, note that some of these commands are actually allowed
during "read only" mode transactions on the primary.

As a result, you cannot create additional indexes that exist
solely on the standby, nor statistics that exist solely on the
standby. If these administration commands are needed, they
should be executed on the primary, and eventually those changes
will propagate to the standby.

pg_cancel_backend() and
pg_terminate_backend() will work
on user backends, but not the Startup process, which performs
recovery. pg_stat_activity does
not show recovering transactions as active. As a result,
pg_prepared_xacts is always
empty during recovery. If you wish to resolve in-doubt prepared
transactions, view pg_prepared_xacts on the primary and issue
commands to resolve transactions there or resolve them after
the end of recovery.

pg_locks will show locks
held by backends, as normal. pg_locks also shows a virtual transaction
managed by the Startup process that owns all AccessExclusiveLocks held by transactions
being replayed by recovery. Note that the Startup process does
not acquire locks to make database changes, and thus locks
other than AccessExclusiveLocks do
not show in pg_locks for the
Startup process; they are just presumed to exist.

The Nagios plugin
check_pgsql will work, because
the simple information it checks for exists. The check_postgres monitoring script will also
work, though some reported values could give different or
confusing results. For example, last vacuum time will not be
maintained, since no vacuum occurs on the standby. Vacuums
running on the primary do still send their changes to the
standby.

WAL file control commands will not work during recovery,
e.g. pg_start_backup,
pg_switch_wal etc.

Dynamically loadable modules work, including pg_stat_statements.

Advisory locks work normally in recovery, including deadlock
detection. Note that advisory locks are never WAL logged, so it
is impossible for an advisory lock on either the primary or the
standby to conflict with WAL replay. Nor is it possible to
acquire an advisory lock on the primary and have it initiate a
similar advisory lock on the standby. Advisory locks relate
only to the server on which they are acquired.

Trigger-based replication systems such as Slony, Londiste and Bucardo won't run on the standby at all,
though they will run happily on the primary server as long as
the changes are not sent to standby servers to be applied. WAL
replay is not trigger-based so you cannot relay from the
standby to any system that requires additional database writes
or relies on the use of triggers.

New OIDs cannot be assigned, though some UUID generators may still work as long as
they do not rely on writing new status to the database.

Currently, temporary table creation is not allowed during
read only transactions, so in some cases existing scripts will
not run correctly. This restriction might be relaxed in a later
release. This is both a SQL Standard compliance issue and a
technical issue.

DROP TABLESPACE can only
succeed if the tablespace is empty. Some standby users may be
actively using the tablespace via their temp_tablespaces parameter. If there are
temporary files in the tablespace, all active queries are
canceled to ensure that temporary files are removed, so the
tablespace can be removed and WAL replay can continue.

Running DROP DATABASE or
ALTER DATABASE ... SET TABLESPACE
on the primary will generate a WAL entry that will cause all
users connected to that database on the standby to be forcibly
disconnected. This action occurs immediately, whatever the
setting of max_standby_streaming_delay. Note that
ALTER DATABASE ... RENAME does not
disconnect users, which in most cases will go unnoticed, though
might in some cases cause a program confusion if it depends in
some way upon database name.

In normal (non-recovery) mode, if you issue DROP USER or DROP
ROLE for a role with login capability while that user is
still connected then nothing happens to the connected user -
they remain connected. The user cannot reconnect however. This
behavior applies in recovery also, so a DROP USER on the primary does not disconnect
that user on the standby.

The statistics collector is active during recovery. All
scans, reads, blocks, index usage, etc., will be recorded
normally on the standby. Replayed actions will not duplicate
their effects on primary, so replaying an insert will not
increment the Inserts column of pg_stat_user_tables. The stats
file is deleted at the start of recovery, so stats from primary
and standby will differ; this is considered a feature, not a
bug.

Autovacuum is not active during recovery. It will start
normally at the end of recovery.

The background writer is active during recovery and will
perform restartpoints (similar to checkpoints on the primary)
and normal block cleaning activities. This can include updates
of the hint bit information stored on the standby server. The
CHECKPOINT command is accepted
during recovery, though it performs a restartpoint rather than
a new checkpoint.

26.5.5. Caveats

There are several limitations of Hot Standby. These can and
probably will be fixed in future releases:

Full knowledge of running transactions is required
before snapshots can be taken. Transactions that use
large numbers of subtransactions (currently greater than
64) will delay the start of read only connections until
the completion of the longest running write transaction.
If this situation occurs, explanatory messages will be
sent to the server log.

Valid starting points for standby queries are
generated at each checkpoint on the master. If the
standby is shut down while the master is in a shutdown
state, it might not be possible to re-enter Hot Standby
until the primary is started up, so that it generates
further starting points in the WAL logs. This situation
isn't a problem in the most common situations where it
might happen. Generally, if the primary is shut down and
not available anymore, that's likely due to a serious
failure that requires the standby being converted to
operate as the new primary anyway. And in situations
where the primary is being intentionally taken down,
coordinating to make sure the standby becomes the new
primary smoothly is also standard procedure.

At the end of recovery, AccessExclusiveLocks held by prepared
transactions will require twice the normal number of lock
table entries. If you plan on running either a large
number of concurrent prepared transactions that normally
take AccessExclusiveLocks,
or you plan on having one large transaction that takes
many AccessExclusiveLocks,
you are advised to select a larger value of max_locks_per_transaction, perhaps as
much as twice the value of the parameter on the primary
server. You need not consider this at all if your setting
of max_prepared_transactions
is 0.

The Serializable transaction isolation level is not
yet available in hot standby. (See Section 13.2.3
and
Section 13.4.1 for details.) An attempt to set a
transaction to the serializable isolation level in hot
standby mode will generate an error.