Top Backup Tools for PostgreSQL

PostgreSQL has the reputation of being rock solid from its beginnings, and over the years has accumulated a set of impressive features. However the peace of mind that your on-disk data is ACID compliant — if not complemented by an equivalent well thought backup strategy — can be easily shattered.

Backup Types

Before diving into the available tools, let’s look at the available PostgreSQL backup types and what their characteristics are:

SQL dumps (or logical)

Does not block readers or writers.

Geared towards small sets of data because of the negative impact on system load and the long time required for both backup and restore operations. The performance may be increased with the –no-sync flag, but refer to the man page for the risks associated with disabling the wait for writes.

A post-restore ANALYZE is required in order to optimize the statistics.

Global objects such as roles and tablespaces can only be backed up using pg_dumpall utility. Note that tablespace directories must be manually created prior to starting the restore.

Supports parallelism at the expense of increased system load. Read man pg_dump for its caveats and special requirements e.g. synchronized snapshots.

Dumps can be loaded in newer versions of PostgreSQL, or even another machine architecture, however they are not guaranteed to be backwards compatible between major versions so some manual editing of the dump file may be required.

Filesystem (or physical)

Requires the database to be shut down.

Faster than logical backups.

Includes cluster data.

Can only be restored on the same major version of PostgreSQL.

Continuous archiving (or Point In Time Recovery or PITR)

Suitable for very large databases where logical or physical backups would take too long.

Some directories inside the data directory can be excluded to speed up the process.

Snapshots

Suitable for applications where both data directory and the database must be in sync e.g. LAMP applications, provided that the two snapshots are synchronized.

Not recommended when the database files are stored across multiple filesystems (must snapshot all filesystems simultaneously).

Cloud

All cloud providers implement backups in their PostgreSQL offering. Logical backups can be performed as usual, while physical backups and PITR are available through the cloud service offerings since access to the data store is not available (see for example Amazon Aurora for PostgreSQL). Therefore, backing up PostgreSQL in the cloud will need to be a topic for another blog.

Agent base

Features

While PostgreSQL provides out of the box the tools required to perform logical, physical, and PITR backups, specialized backup applications rely on the native PostgreSQL and operating system tools to fill the need of implementing a backup strategy that addresses the following points:

automation

frequency

retention period

integrity

ease of use

Additionally, PostgreSQL backup tools attempt to provide features common to generic backup tools such as:

incremental backups for saving storage space

backup catalogs

ability to store backups on premise or in the cloud

alerting and notification

comprehensive reporting

access control

encryption

graphical interface and dashboards

backups of remote hosts

adaptive throughput in order to minimize load on the targets

handling multiple hosts in parallel

backup orchestration e.g. jobs chaining

REST APIs

Lab Setup

For this exercise I’ve setup a command-and-control host that where I’ll be installing the backup tools, that also runs two PostgreSQL instances — 9.6 and 10 — installed from PGDG repositories:

--- a/var/lib/pgsql/9.6/data/pg_hba.conf
+++ b/var/lib/pgsql/9.6/data/pg_hba.conf
@@ -79,7 +79,8 @@
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
-host all all 127.0.0.1/32 ident
+host all all 127.0.0.1/32 trust
+host all amandabackup 10.1.9.243/32 trust
# IPv6 local connections:
host all all ::1/128 ident
# Allow replication connections from localhost, by a user with the

Barman

Barman is a disaster recovery solution for PostgreSQL maintained by 2ndQuadrant. It is designed to manage backups for multiple databases and has the ability to restore to a previous point in time using the PITR feature of PostgreSQL.

Barman’s features at a glance:

handles multiple targets

support for different PostgreSQL versions

zero data loss

streaming and/or standard archiving of WALs

local or remote recovery

simplified point in time recovery

As noted in the Barman Manual, support for incremental backups, parallel jobs, data deduplication, and network compression is available only when using the rsync option. Also, streaming WALs from a standby using the archive_command isn’t currently supported.

After following the instructions in the manual for setting up the environment we can verify:

pgBackRest

pgBackRest implements a full system backup that doesn’t rely on the common tools tar and rsync. It is currently hosted and made available by CrunchyData under an MIT license. See Recognition for details on its origins.

It offers all the features one would expect from a PostgreSQL centric tool:

The installation requires a 64-bit Linux/Unix system and it is outlined in the user guide. The guide also introduces the reader to the main concepts, very useful to those new to PostgreSQL or storage technology.

Although the guide uses command examples for Debian/Ubuntu the pgBackRest is available in the PGDG yum repository, and the installer will pull in all the dependencies:

With 2 CPUs the backup ran almost 20% faster which can make a big difference when running against a large data set.

Conclusion

PostgreSQL centric backup tools offer, as expected, more options than general purpose tools. Most PostgreSQL backup tools offer the same core functionality, but their implementation introduces limitations that can only be discovered by carefully following the documentation to test drive the product.

Viorel Tabara is a Guest Writer for Severalnines. He is a system administrator with many years of experience in a variety of environments and technologies. His passion for PostgreSQL started when `postmaster` was at version 7.4. While having to wear many hats at his day job, Viorel takes the opportunity of being a guest blogger at Severalnines to give back to the open source community that shaped his 20+ years career. When not in front of his computer Viorel enjoys the great Canadian outdoors with his wife and two kids.

More from 'Become a PostgreSQL DBA' series

Benchmarking is one of the ways to get insight into your database server capacity. There are different ways to perform a capacity test before you load your database with production data. One of the popular tools out there for RDBMS benchmarks is sysbench. This blog post shows how to generate a test load for your PostgreSQL servers using sysbench.

The PostgreSQL System Catalog is a schema with tables and views that contain metadata about all the other objects inside the database and more. With it, we can discover when various operations happen, how tables or indexes are accessed, and even whether or not the database system is reading information from memory or needing to fetch data from disk.