"Randy Isbell \(jisbell\)" <jisbell(at)cisco(dot)com> writes:
> I found that if I take an offline backup created around the same time as
> my online backup, roll forward the transaction log files included in the
> offline backup using a recovery.conf file, the duplicate records do NOT
> exist.
> Therefore it seems there is no corruption in the WAL files. The problem
> must be in the PITR processing of the online backup file.
... or there's something wrong with your backup procedure.
I hadn't looked closely at that point before, but I see you describe it
as
> 3. Issue pg_start_backup()
> 4. Save off the data cluster
> 5. Issue pg_stop_backup()
> 6. Collect the WAL files
> 7. Create a big hairy tar file with the stuff from items 4 and 6.
> 8. Take the big hairy tar file to another server running the same pg
> 8.2.3, untar and start postgres
AFAICS this procedure is *not* suggested anywhere in our documentation.
What's bothering me about it is that I don't see anything guaranteeing
that you have a full set of WAL files back to pg_start_backup(). If
checkpoints occur during step 4, as is virtually certain given you say
step 4 takes 20 minutes, then WAL files you need will get
renamed/recycled. What are you doing to "collect the WAL files"
exactly?
Also, what do you consider to be an "offline backup", and what's
different in your process for creating that?
regards, tom lane