Managing VACUUM on Heroku Postgres

Table of Contents

Postgres uses a mechanism called
MVCC
to track changes in your database. As a side effect, some rows become
“dead” and are no longer visible to any running transaction. Dead rows are generated not just by DELETE operations, but also by UPDATEs, as well as transactions that have to be rolled back.

Your database needs periodic maintenance to clean out these dead rows. This is essentially a form of garbage collection. Typically, this happens automatically, but it can be useful to understand the details and tune the maintenance settings as needed.

Vacuuming a database

The built-in mechanism for managing this cleanup is called
VACUUM. This can be run as a regular command, but Postgres also
includes facilities for running the VACUUM process automatically in
the background as a maintenance task, periodically trying to clean out
old data where necessary. This process will perform its
maintenance based on a set of configuration parameters.

The default Heroku configuration is enough for many applications, but in some
situations, you may need to make some changes or occasionally take
manual action.

Determining bloat

To check whether you need to vacuum, you can run a query to give you
information on table and index “bloat”. Bloat is extra space taken by these
database objects on disk due to dead rows. The simplest way to do
this is to install the
pg-extras plugin for the
Heroku CLI.

The “bloat” column shows the bloat factor, which is the fraction of the original table that exists as bloat. Because it is a ratio, there are no units. The “waste” column shows the total bloat (in bytes) in each table and index in the system.

A very large bloat factor on a table or index can lead to poor
performance for some queries, as Postgres will plan them without
considering the bloat.

The threshold for excessive bloat varies according to your query
patterns and the size of the table, but generally anything with a
bloat factor over 10 is worth looking into, especially on tables over
100 MB.

To check on vacuuming in your database, you can use another pg-extras
command:

This will tell you when each table was last vacuumed, and whether that
was through a manual action or the autovacuum background worker. It
also shows the threshold number of dead rows that will trigger an
autovacuum for that particular table, and whether you should expect an
autovacuum to occur.

VACUUM variants

Bloat can be contained by ensuring that VACUUM runs regularly, and
reduced by running VACUUM FULL if it’s getting out of hand. Note
that the autovacuum process will only ever run a regular, non-full
VACUUM command.

Note that while VACUUM FULL offers a more exhaustive cleanup,
actually reducing bloat (rather than just flagging that space as
available, as with regular VACUUM), it’s also a much more
heavyweight operation: VACUUM FULL actually rewrites the entire
table, and thus prevents any other statements from running
concurrently (even simple SELECT queries). Generally, it’s a good
idea to keep autovacuum in an aggressive-enough configuration so that
VACUUM FULL is never needed.

In some cases, where a table is only used to track transient data
(such as a work queue), it may be useful to run the TRUNCATE command
instead. This will delete all the data in the table in a batch
operation. For very bloated tables, this can be much faster than a
DELETE and VACUUM FULL.

Automatic vacuuming with autovacuum

The most effective way to manage bloat is by tweaking autovacuum
settings as necessary.

You can change when a table is eligible for VACUUM. This is controlled by two settings (on Heroku, the changes can only be made on a per-table basis):

On large tables, you may want to decrease the scale factor to allow
vacuum to start making progress earlier. For very small tables, you
may decrease the threshold, though this is typically not necessary.

Furthermore, autovacuum has a built in cost-based rate-limiting
mechanism, to avoid having it overwhelm the system with VACUUM
activity. In busy databases, however, this can mean that autovacuum
does not make progress quickly enough, leading to excessive bloat.

To avoid that, you can change the back-off settings to be less
deferential. These changes can be made at the database level.

The cost limit determines how much “cost” (in terms of I/O operations)
autovacuum can accrue before being forced to take a break; the cost
delay determines how long that break is (in milliseconds). Note that
these settings affect both autovacuum and manual vacuum
(autovacuum-only variants do exist, but they can only be set on a per-table basis
on Heroku Postgres at this time). The cost limit is set to 200 by
default. Increasing the cost limit (up to 1000 or so) or adjusting the
vacuum_cost_page_* parameters can help autovacuum progress more
efficiently.

Manual vacuuming

If your database happens to have a very periodic workload, it may be
more efficient to use a simple worker process to “manually” run a
VACUUM (or even VACUUM FULL, if the locking is not an issue) and
trigger it with a tool like Heroku
Scheduler during off-peak hours.

A manual VACUUM does not have a threshold for when it “kicks in”: it
is always triggered by running the VACUUM command. The cost-based
back-off (as with autovacuum) still applies, but it is turned off by
default (the vacuum_cost_delay is set to 0). You can increase this
on a per-table basis if you find that a manual VACUUM has too much impact on
your regular workload.

To run VACUUM, open a psql shell to the desired database and type the command:

$ heroku pg:psql
=> VACUUM;
WARNING: skipping "pg_authid" --- only superuser can vacuum it
WARNING: skipping "pg_database" --- only superuser can vacuum it
WARNING: skipping "pg_tablespace" --- only superuser can vacuum it
WARNING: skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING: skipping "pg_auth_members" --- only superuser can vacuum it
WARNING: skipping "pg_shdepend" --- only superuser can vacuum it
WARNING: skipping "pg_shdescription" --- only superuser can vacuum it
WARNING: skipping "pg_db_role_setting" --- only superuser can vacuum it
VACUUM

The warnings you’ll see are expected and can be ignored. You can also
restrict VACUUM to a particular table, if only one or two need
manual vacuuming:

$ heroku pg:psql
=> VACUUM users;
VACUUM

When running VACUUM, you can add the VERBOSE keyword to get more
details about its progress.