As everything that contains valuable data, Postgres databases should be backed up
regularly. While the procedure is essentially simple, it is
important to have a basic understanding of the underlying
techniques and assumptions.

There are two fundamentally different approaches to backing up
Postgres data:

The idea behind this method is to generate a text file with
SQL commands that, when fed back to the server, will recreate
the database in the same state as it was at the time of the
dump. Postgres provides the
utility program pg_dump for
this purpose. The basic usage of this command is:

pg_dump dbname > outfile

As you see, pg_dump writes
its results to the standard output. We will see below how this can
be useful.

pg_dump is a regular
Postgres client application
(albeit a particularly clever one). This means that you can do
this backup procedure from any remote host that has access to
the database. But remember that pg_dump does not operate with special
permissions. In particular, you must have read access to all
tables that you want to back up, so in practice you almost
always have to be a database superuser.

To specify which database server pg_dump should contact, use the command
line options -h host and -p
port. The default host is
the local host or whatever your PGHOST
environment variable specifies. Similarly, the default port is
indicated by the PGPORT environment
variable or, failing that, by the compiled-in default.
(Conveniently, the server will normally have the same
compiled-in default.)

As any other Postgres
client application, pg_dump
will by default connect with the database user name that is
equal to the current Unix user name. To override this, either
specify the -u option to force a prompt
for the user name, or set the environment variable PGUSER. Remember that pg_dump connections are subject to the
normal client authentication mechanisms (which are described in
Chapter 4).

Dumps created by pg_dump
are internally consistent, that is, updates to the database
while pg_dump is running will
not be in the dump. pg_dump
does not block other operations on the database while it is
working. (Exceptions are those operations that need to operate
with an exclusive lock, such as VACUUM.)

Important: When your database schema relies on
OIDs (for instances as foreign keys) you must instruct
pg_dump to dump the OIDs
as well. To do this, use the -o
command line option.

The text files created by pg_dump are intended to be read in by
the psql program. The
general command form to restore a dump is

psql dbname < infile

where infile is what you used
as outfile for the pg_dump command.
The database dbname will not be
created by this command, you must create it yourself from template0
before executing psql (e.g., with
createdb -T template0 dbname).
psql supports similar options to pg_dump for controlling the database
server location and the user names. See its reference page
for more information.

If the objects in the original database were owned by
different users, then the dump will instruct psql to connect as each affected user in
turn and then create the relevant objects. This way the
original ownership is preserved. This also means, however,
that all these user must already exist, and furthermore that
you must be allowed to connect as each of them. It might
therefore be necessary to temporarily relax the client
authentication settings.

The ability of pg_dump
and psql to write or read
from pipes also make it possible to dump a database directory
from one server to another, for example

pg_dump -h host1dbname | psql -h host2dbname

Important: The dumps produced by pg_dump are
relative to template0. This means that any languages,
procedures, etc. added to template1 will also be dumped
by pg_dump. As a result,
when restoring, if you are using a customized template1,
you must create the empty database from template0, as in
the example above.

The above mechanism is cumbersome and inappropriate when
backing up an entire database cluster. For this reason the
pg_dumpall program is
provided. pg_dumpall backs
up each database in a given cluster and also makes sure that
the state of global data such as users and groups is
preserved. The call sequence for pg_dumpall is simply

pg_dumpall > outfile

The resulting dumps can be restored with psql as described above. But in this case it
is definitely necessary that you have database superuser access, as
that is required to restore the user and group information.

pg_dumpall has one little
flaw: It is not prepared for interactively authenticating to
each database it dumps. If you are using password
authentication then you need to set it the environment
variable PGPASSWORD to communicate the
password the the underlying calls to pg_dump. More severely, if you have
different passwords set up for each database, then
pg_dumpall will fail. You
can either choose a different authentication mechanism for
the purposes of backup or adjust the pg_dumpall shell script to your needs.

Acknowledgement: Originally written by Hannu
Krosing (<hannu@trust.ee>) on
1999-06-19

Since Postgres allows
tables larger than the maximum file size on your system, it
can be problematic to dump the table to a file, since the
resulting file will likely be larger than the maximum size
allowed by your system. As pg_dump writes to the standard output,
you can just use standard *nix tools to work around this
possible problem.

Use split. This
allows you to split the output into pieces that are
acceptable in size to the underlying file system. For
example, to make chunks of 1 megabyte:

pg_dump dbname | split -b 1m - filename

Reload with

createdb dbname
cat filename.* | psql dbname

Use the custom dump format (V7.1). If PostgreSQL
was built on a system with the zlib compression library
installed, the custom dump format will compress data as it
writes it to the output file. For large databases, this
will produce similar dump sizes to using gzip, but has the
added advantage that the tables can be restored
selectively. The following command dumps a database using
the custom dump format:

pg_dump (and by
implication pg_dumpall) has
a few limitations which stem from the difficulty to
reconstruct certain information from the system catalogs.

Specifically, the order in which pg_dump writes the objects is not very
sophisticated. This can lead to problems for example when
functions are used as column default values. The only answer
is to manually reorder the dump. If you created circular
dependencies in your schema then you will have more work to
do.

For reasons of backward compatibility, pg_dump does not dump large objects by
default. To dump large objects you must use either the custom
or the TAR output format, and use the -B option in
pg_dump. See the reference
pages for details. The directory contrib/pg_dumplo of the Postgres source tree also contains a
program that can dump large objects.