Continuous archiving can be used to create a high availability (HA) cluster configuration with
one or more standby servers ready to
take over operations if the primary server fails. This capability
is widely referred to as warm standby or
log shipping.

The primary and standby server work together to provide this
capability, though the servers are only loosely coupled. The
primary server operates in continuous archiving mode, while each
standby server operates in continuous recovery mode, reading the
WAL files from the primary. No changes to the database tables are
required to enable this capability, so it offers low
administration overhead compared to some other replication
solutions. This configuration also has relatively low performance
impact on the primary server.

Directly moving WAL records from one database server to
another is typically described as log shipping. PostgreSQL implements file-based log
shipping by transferring WAL records one file (WAL segment) at a
time. WAL files (16MB) can be shipped easily and cheaply over any
distance, whether it be to an adjacent system, another system at
the same site, or another system on the far side of the globe.
The bandwidth required for this technique varies according to the
transaction rate of the primary server. Record-based log shipping
is more granular and streams WAL changes incrementally over a
network connection (see Section
25.2.5).

It should be noted that log shipping is asynchronous, i.e.,
the WAL records are shipped after transaction commit. As a
result, there is a window for data loss should the primary server
suffer a catastrophic failure; transactions not yet shipped will
be lost. The size of the data loss window in file-based log
shipping can be limited by use of the archive_timeout parameter, which can be set as low
as a few seconds. However such a low setting will substantially
increase the bandwidth required for file shipping. Streaming
replication (see Section 25.2.5)
allows a much smaller window of data loss.

Recovery performance is sufficiently good that the standby
will typically be only moments away from full availability once
it has been activated. As a result, this is called a warm standby
configuration which offers high availability. Restoring a server
from an archived base backup and rollforward will take
considerably longer, so that technique only offers a solution for
disaster recovery, not high availability. A standby server can
also be used for read-only queries, in which case it is called a
Hot Standby server. See Section
25.5 for more information.

It is usually wise to create the primary and standby servers
so that they are as similar as possible, at least from the
perspective of the database server. In particular, the path
names associated with tablespaces will be passed across
unmodified, so both primary and standby servers must have the
same mount paths for tablespaces if that feature is used. Keep
in mind that if CREATE
TABLESPACE is executed on the primary, any new mount point
needed for it must be created on the primary and all standby
servers before the command is executed. Hardware need not be
exactly the same, but experience shows that maintaining two
identical systems is easier than maintaining two dissimilar
ones over the lifetime of the application and system. In any
case the hardware architecture must be the same — shipping
from, say, a 32-bit to a 64-bit system will not work.

In general, log shipping between servers running different
major PostgreSQL release
levels is not possible. It is the policy of the PostgreSQL
Global Development Group not to make changes to disk formats
during minor release upgrades, so it is likely that running
different minor release levels on primary and standby servers
will work successfully. However, no formal support for that is
offered and you are advised to keep primary and standby servers
at the same release level as much as possible. When updating to
a new minor release, the safest policy is to update the standby
servers first — a new minor release is more likely to be able
to read WAL files from a previous minor release than vice
versa.

In standby mode, the server continuously applies WAL
received from the master server. The standby server can read
WAL from a WAL archive (see restore_command)
or directly from the master over a TCP connection (streaming
replication). The standby server will also attempt to restore
any WAL found in the standby cluster's pg_xlog directory. That typically happens after
a server restart, when the standby replays again WAL that was
streamed from the master before the restart, but you can also
manually copy files to pg_xlog at any
time to have them replayed.

At startup, the standby begins by restoring all WAL
available in the archive location, calling restore_command. Once it reaches the end of WAL
available there and restore_command
fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming
replication has been configured, the standby tries to connect
to the primary server and start streaming WAL from the last
valid record found in archive or pg_xlog. If that fails or streaming replication
is not configured, or if the connection is later disconnected,
the standby goes back to step 1 and tries to restore the file
from the archive again. This loop of retries from the archive,
pg_xlog, and via streaming
replication goes on until the server is stopped or failover is
triggered by a trigger file.

Standby mode is exited and the server switches to normal
operation when pg_ctl promote is run
or a trigger file is found (trigger_file). Before failover, any WAL
immediately available in the archive or in pg_xlog will be restored, but no attempt is
made to connect to the master.

Set up continuous archiving on the primary to an archive
directory accessible from the standby, as described in Section 24.3. The archive
location should be accessible from the standby even when the
master is down, i.e. it should reside on the standby server
itself or another trusted server, not on the master server.

If you want to use streaming replication, set up
authentication on the primary server to allow replication
connections from the standby server(s); that is, create a role
and provide a suitable entry or entries in pg_hba.conf with the database field set to
replication. Also ensure max_wal_senders is set to a sufficiently large
value in the configuration file of the primary server.

Take a base backup as described in Section
24.3.2 to bootstrap the standby server.

To set up the standby server, restore the base backup taken
from primary server (see Section
24.3.3). Create a recovery command file recovery.conf in the standby's cluster data
directory, and turn on standby_mode.
Set restore_command to a simple
command to copy files from the WAL archive. If you plan to have
multiple standby servers for high availability purposes, set
recovery_target_timeline to latest, to make the standby server follow the
timeline change that occurs at failover to another standby.

Note: Do not use pg_standby or similar tools with
the built-in standby mode described here. restore_command should return immediately if
the file does not exist; the server will retry the command
again if necessary. See Section 25.4 for using
tools like pg_standby.

If you want to use streaming replication, fill in primary_conninfo with a libpq connection string,
including the host name (or IP address) and any additional
details needed to connect to the primary server. If the primary
needs a password for authentication, the password needs to be
specified in primary_conninfo as
well.

If you're setting up the standby server for high
availability purposes, set up WAL archiving, connections and
authentication like the primary server, because the standby
server will work as a primary server after failover.

If you're using a WAL archive, its size can be minimized
using the archive_cleanup_command
parameter to remove files that are no longer required by the
standby server. The pg_archivecleanup utility is designed
specifically to be used with archive_cleanup_command in typical
single-standby configurations, see pg_archivecleanup. Note however,
that if you're using the archive for backup purposes, you need
to retain files needed to recover from at least the latest base
backup, even if they're no longer needed by the standby.

Streaming replication allows a standby server to stay more
up-to-date than is possible with file-based log shipping. The
standby connects to the primary, which streams WAL records to
the standby as they're generated, without waiting for the WAL
file to be filled.

Streaming replication is asynchronous by default (see
Section
25.2.6), in which case there is a small delay between
committing a transaction in the primary and the changes
becoming visible in the standby. This delay is however much
smaller than with file-based log shipping, typically under one
second assuming the standby is powerful enough to keep up with
the load. With streaming replication, archive_timeout is not required to reduce the
data loss window.

If you use streaming replication without file-based
continuous archiving, you have to set wal_keep_segments in the master to a value high
enough to ensure that old WAL segments are not recycled too
early, while the standby might still need them to catch up. If
the standby falls behind too much, it needs to be reinitialized
from a new base backup. If you set up a WAL archive that's
accessible from the standby, wal_keep_segments is not required as the standby
can always use the archive to catch up.

To use streaming replication, set up a file-based
log-shipping standby server as described in Section 25.2. The step that turns a
file-based log-shipping standby into streaming replication
standby is setting primary_conninfo
setting in the recovery.conf file to
point to the primary server. Set listen_addresses
and authentication options (see pg_hba.conf) on the primary so that the standby
server can connect to the replication
pseudo-database on the primary server (see Section
25.2.5.1).

Set the maximum number of concurrent connections from the
standby servers (see max_wal_senders
for details).

When the standby is started and primary_conninfo is set correctly, the standby
will connect to the primary after replaying all WAL files
available in the archive. If the connection is established
successfully, you will see a walreceiver process in the
standby, and a corresponding walsender process in the
primary.

It is very important that the access privileges for
replication be set up so that only trusted users can read the
WAL stream, because it is easy to extract privileged
information from it. Standby servers must authenticate to the
primary as an account that has the REPLICATION privilege. So a role with the
REPLICATION and LOGIN privileges needs to be created on the
primary.

Note: It is recommended that a dedicated user
account is used for replication. While the REPLICATION privilege is granted to
superuser accounts by default, it is not recommended to
use superuser accounts for replication. While REPLICATION privilege gives very high
permissions, it does not allow the user to modify any
data on the primary system, which the SUPERUSER privilege does.

Client authentication for replication is controlled by a
pg_hba.conf record specifying
replication in the database field. For example, if the
standby is running on host IP 192.168.1.100 and the account name for
replication is foo, the
administrator can add the following line to the pg_hba.conf file on the primary:

The host name and port number of the primary, connection
user name, and password are specified in the recovery.conf file. The password can also be
set in the ~/.pgpass file on the
standby (specify replication in the
database field). For example,
if the primary is running on host IP 192.168.1.50, port 5432, the account name for replication is
foo, and the password is foopass, the administrator can add the
following line to the recovery.conf
file on the standby:

# The standby connects to the primary that is running on host 192.168.1.50
# and port 5432 as the user "foo" whose password is "foopass".
primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'

An important health indicator of streaming replication is
the amount of WAL records generated in the primary, but not
yet applied in the standby. You can calculate this lag by
comparing the current WAL write location on the primary with
the last WAL location received by the standby. They can be
retrieved using pg_current_xlog_location on the primary and
the pg_last_xlog_receive_location on the
standby, respectively (see Table
9-57 and Table
9-58 for details). The last WAL receive location in the
standby is also displayed in the process status of the WAL
receiver process, displayed using the ps command (see Section 27.1 for details).

You can retrieve a list of WAL sender processes via the
pg_stat_replication view. Large differences between
pg_current_xlog_location and
sent_location field might indicate
that the master server is under heavy load, while differences
between sent_location and
pg_last_xlog_receive_location
on the standby might indicate network delay, or that the
standby is under heavy load.

PostgreSQL streaming
replication is asynchronous by default. If the primary server
crashes then some transactions that were committed may not have
been replicated to the standby server, causing data loss. The
amount of data loss is proportional to the replication delay at
the time of failover.

Synchronous replication offers the ability to confirm that
all changes made by a transaction have been transferred to one
synchronous standby server. This extends the standard level of
durability offered by a transaction commit. This level of
protection is referred to as 2-safe replication in computer
science theory.

When requesting synchronous replication, each commit of a
write transaction will wait until confirmation is received that
the commit has been written to the transaction log on disk of
both the primary and standby server. The only possibility that
data can be lost is if both the primary and the standby suffer
crashes at the same time. This can provide a much higher level
of durability, though only if the sysadmin is cautious about
the placement and management of the two servers. Waiting for
confirmation increases the user's confidence that the changes
will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting
transaction. The minimum wait time is the roundtrip time
between primary to standby.

Read only transactions and transaction rollbacks need not
wait for replies from standby servers. Subtransaction commits
do not wait for responses from standby servers, only top-level
commits. Long running actions such as data loading or index
building do not wait until the very final commit message. All
two-phase commit actions require commit waits, including both
prepare and commit.

Once streaming replication has been configured,
configuring synchronous replication requires only one
additional configuration step:
synchronous_standby_names must be set to a non-empty
value. synchronous_commit must also
be set to on, but since this is the
default value, typically no change is required. This
configuration will cause each commit to wait for confirmation
that the standby has written the commit record to durable
storage, even if that takes a very long time. synchronous_commit can be set by individual
users, so can be configured in the configuration file, for
particular users or databases, or dynamically by
applications, in order to control the durability guarantee on
a per-transaction basis.

After a commit record has been written to disk on the
primary, the WAL record is then sent to the standby. The
standby sends reply messages each time a new batch of WAL
data is written to disk, unless wal_receiver_status_interval is set to zero on
the standby. If the standby is the first matching standby, as
specified in synchronous_standby_names on the primary, the
reply messages from that standby will be used to wake users
waiting for confirmation that the commit record has been
received. These parameters allow the administrator to specify
which standby servers should be synchronous standbys. Note
that the configuration of synchronous replication is mainly
on the master.

Users will stop waiting if a fast shutdown is requested.
However, as when using asynchronous replication, the server
will does not fully shutdown until all outstanding WAL
records are transferred to the currently connected standby
servers.

Synchronous replication usually requires carefully planned
and placed standby servers to ensure applications perform
acceptably. Waiting doesn't utilise system resources, but
transaction locks continue to be held until the transfer is
confirmed. As a result, incautious use of synchronous
replication will reduce performance for database applications
because of increased response times and higher
contention.

PostgreSQL allows the
application developer to specify the durability level
required via replication. This can be specified for the
system overall, though it can also be specified for specific
users or connections, or even individual transactions.

For example, an application workload might consist of: 10%
of changes are important customer details, while 90% of
changes are less important data that the business can more
easily survive if it is lost, such as chat messages between
users.

With synchronous replication options specified at the
application level (on the primary) we can offer synchronous
replication for the most important changes, without slowing
down the bulk of the total workload. Application level
options are an important and practical tool for allowing the
benefits of synchronous replication for high performance
applications.

You should consider that the network bandwidth must be
higher than the rate of generation of WAL data.

Commits made when synchronous_commit is set to on will wait until the sync standby responds.
The response may never occur if the last, or only, standby
should crash.

The best solution for avoiding data loss is to ensure you
don't lose your last remaining sync standby. This can be
achieved by naming multiple potential synchronous standbys
using synchronous_standby_names. The
first named standby will be used as the synchronous standby.
Standbys listed after this will take over the role of
synchronous standby if the first one should fail.

When a standby first attaches to the primary, it will not
yet be properly synchronized. This is described as catchup mode. Once the lag between standby and
primary reaches zero for the first time we move to real-time
streaming state. The catch-up
duration may be long immediately after the standby has been
created. If the standby is shut down, then the catch-up
period will increase according to the length of time the
standby has been down. The standby is only able to become a
synchronous standby once it has reached streaming state.

If primary restarts while commits are waiting for
acknowledgement, those waiting transactions will be marked
fully committed once the primary database recovers. There is
no way to be certain that all standbys have received all
outstanding WAL data at time of the crash of the primary.
Some transactions may not show as committed on the standby,
even though they show as committed on the primary. The
guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a
transaction until the WAL data is known to be safely received
by the standby.

If you really do lose your last standby server then you
should disable synchronous_standby_names and reload the
configuration file on the primary server.

If the primary is isolated from remaining standby servers
you should fail over to the best candidate of those other
remaining standby servers.

If you need to re-create a standby server while
transactions are waiting, make sure that the commands to run
pg_start_backup() and pg_stop_backup() are run in a session
with synchronous_commit = off, otherwise those requests will wait
forever for the standby to appear.