Waiting for 9.5 – Add pg_rewind, for re-synchronizing a master server after failback.

Add pg_rewind, for re-synchronizing a master server after failback.
Earlier versions of this tool were available (and still are) on github.
Thanks to Michael Paquier, Alvaro Herrera, Peter Eisentraut, Amit Kapila,
and Satoshi Nagayasu for review.

So, we have a situation, where we have master and slave, some kind of replication, and we have to failover. This is trivial – just promote slave to standalone, and we're good. But we don't have slave then. Normal procedure is to setup slave on old master from scratch, using full data dir sync. But with non-trivial datasizes, it can take long time.

pg_rewind should solve the problem by just working on xlogs.

Let's see if it will really work that way. For added difficulty I will try to make it so that old master still did *some* work after the failover – this work should be “rolled back" after it will be set to slave from newly elected master.

To test it we will need two PostgreSQL's, but first I'll start with just master:

The where clause is complicated, but I wanted to update semi-random row, but make it so that the same script ran on slave will update different rows. I will put slave on port 5436, so master will always update even rows (with i being divisible by 2), and slave will be updating odd rows.

I repeat such update 10 times in a transaction, and as soon as one transaction is done, I start another.

=$ pg_rewind -D/var/tmp/master --source-server="port=5436 user=pgdba dbname=postgres"
The servers diverged at WAL position 0/9BC9E268 on timeline 1.
Rewinding from last common checkpoint at 0/9B8B20D0 on timeline 1
Done!

This took ~ 1 second. Now – after running pg_rewind, state of /var/tmp/master is like if I had run pg_basebackup from “slave" (current master), including bad port (5436), and no sensible recovery.conf. Let's fix that: