Your database may be silently headed for trouble. PostgreSQL, like all modern MVCC-based relational databases, is subject to what’s called “bloat.” The consequences range from slowdowns and wasted space to transaction id wraparound – aka the “night of the living dead” when deleted rows come back to life.

Bloat is the disk space claimed by unneeded data rows and indices. Quinn Weaver
from Postgres Experts
explains the causes and consequences of bloat as well as its prevention and steps for emergency intervention.

Summary

Why do Postgres databases get bloat?

Couldn’t you just not do that vacuum thing? It causes so many problems!

Vacuuming is necessary for MVCC systems like Postgres

Multi-version concurrency control

When you delete a row it doesn’t really go away, instead it becomes invisible to future transactions

Transactions started before the deletion continue to see the row

Likewise an update is just a delete followed by insert

The transaction visibility of every row is stored in the hidden (but accessible) xmin and
xmax` columns on every table

Example of MVCC

What’s good about MVCC?

It’s efficient

In particular it avoids the locking found in earlier relational databases

Works with a large number of concurrent transactions

It’s now in use by all relational databases

What’s bad about MVCC?

Rows persist even when they are no longer visible to any current or future transaction

That is what we call bloat

Indices also get bloat

Leaf nodes point at dead rows, and inner nodes point at those superfluous leaf nodes, etc etc

You can’t actually remove an index pointer until its entire page stops pointing at rows

Hence indices tend to stay bloated more than tables do

What are the consequences of bloat?

It fills up pages (you have to read more pages into memory from disk)

Slows queries down

Have to keep more things in memory, some of which is junk

Bloat can in fact push the working set to be greater than ram, and you spill to disk and life sucks