Update: this article was mentioned on ycombinator. There’s a bunch of extra
info and more tips there. I want to re-iterate that I could not just type over
his slides; there are a couple of details (“more than 32GB memory in your
server”) that I left out as I only’ve got 10 fingers :-). Anyway, on to the
actual summary! End of update.

Just open a postgresql configuration file. Your first impression will be “this
looks like the dashboard of an ancient boeing 747” and “we’re all going to
die”. 500 options and knobs you can twist. But... it can also be very easy if
you know what to do.

It actually is pretty hard to seriously misconfigure postgres. Almost all
performance problems are application problems. Here are the main things you
need to look at:

Logging. Be generous with logging. You mostly won’t notice the
performance impact. And it helps a lot to find performance problems. Look at
his slides for the piece of config you need to just
copy/paste.

Memory. Only four really matter.

shared-buffers. below 2GB: set it to 20% of full memory, below 32GB: 25%
of your full memory.

work_mem. Start low at 32/64MB. Look for temporary file lines in
logs. Then set it to 2-3x the largest temp file that you see. This setting
can give a huge speed boost (if set properly).

maintenance_work_mem. Set it to 10% of system memory.

effective_cache_size is only a hint to postgres. Just set it to the amount
of filesystem cache available.

Checkpoints. Postgres doesn’t write everything directly to disk as that
kills performance. There are some checkpoint settings you can set to tune
when postgres writes.

wal_buffers = 16mb

checkpoint_completion_target = 0.9

checkpoint_timeout = 10m (to 30m depending on startup time)

Planner settings.

(Just look at his slides. I cannot type that fast).

After tweaking those settings, you never have to touch your configuration
again. There are other tips for performance:

Disable the linux OOM killer.

Do not run anything besides postgresql on the host.

Stupid DB tricks you should not do:

Sessions in the DB. Use redis or so for it.

Constantly-updated counters.

Celery task queues in the database.

Big files in the database. Put them on the filesystem where they belong.

Very long-running transactions.

Using .save() all the time for creating many objects. Just use
postgres’s COPY.

Some quick tricks you could look at on your application’s side:

Don’t use gigantic IN clauses. Django generates them a lot. JOINs are fine,
but IN isn’t well-supported in postgres.

Use postgres’s full text search instead of like%this%.

Don’t randomly create indexes. Maintaining an index takes time, too. Look at
pg_stat_user_indexes and pg_stat_user_tables to see whether indexes
are needed. Get a database expert to tweak your indexes instead of slapping
on index=True in your Django model fields.

On the database server, you can switch off the auto-vacuum process and
schedule VACUUM for specific low-load times. VACUUM also does an ANALYZE. You
ought to do an ANALYZE also after loading lots of data. This helps postgres
optimize your queries.

Note that once in a very long while (once a year or so), postgres does a
VACUUM FREEZE. Real housekeeping. To prevent it from bringing your app to its
knees at a random moment, do the VACUUM FREEZE every few months yourself.

Always monitor postgres, at least display and system load. Memory and IO is
very handy. Use 1-minute intervals.

Backups? pg_dump works fine. But... there’s also streaming replication. Easy
to set up. And... it maintains an exact logical copy of the database on a
different host. (But really really make sure it is on a different machine!)
Such a streaming replication doesn’t safeguard at all against accidental
errors, as it’ll also replicate your accidental ‘drop table’ just fine.

And... preplicas can be used for read-only copies! And you could do a pg_dump
on such a read-only copy.

WAL archiving is great for security. It does take some more time to set up,
but it helps you sleep at night.

Watch out with encoding: the encoding is fixed at database create time. So
just create it with utf-8.

Schema migration can take a lot of time, especially if a column with a default
value is added: every row needs updating. Try to allow NULL in your columns if
at all posible. So don’t set NOT NULL unless absolutely needed. (Update: I
wrote it down backwards, I originally said you should stick to NOT NULL. See
one of the comments below.) Enforce mandatory fields in your django code instead.

For hardware, use your own physical hardware. Especially good IO. Raid 10 for
the main DB, raids for the logs. Don’t use a SAN unless it is really good.

And... watch out with Django’s transactions. You might want to manage your own
transactions instead of using Django’s automatic one. See his weblog at
http://thebuild.com, he writes a lot about Django transactions there.