On Fri, 2006-10-27 at 22:19 +0100, Simon Riggs wrote:
> So we definitely have a nasty problem here.
>
> VACUUM FREEZE is just a loaded gun right now.
>
> > Maybe it's OK to say that during WAL replay we keep it
> > all the way back to the freeze horizon, but I'm not sure how we keep the
> > system from wiping clog it still needs right after switching to normal
> > operation. Maybe we should somehow not xlog updates of datvacuumxid?
>
> Thinking...
Suggestions:
1. Create a new Utility rmgr that can issue XLOG_UTIL_FREEZE messages
for each block that has had any tuples frozen on it during normal
VACUUMs. We need log only the relid, blockid and vacuum's xid to redo
the freeze operation.
2. VACUUM FREEZE need not generate any additional WAL records, but will
do an immediate sync following execution and before clog truncation.
That way the large number of changed blocks will all reach disk before
we do the updates to the catalog.
3. We don't truncate the clog during WAL replay, so the clog will grow
during recovery. Nothing to do there to make things safe.
4. When InArchiveRecovery we should set all of the datminxid and
datvacuumxid fields to be the Xid from where recovery started, so that
clog is not truncated soon after recovery. Performing a VACUUM FREEZE
after a recovery would be mentioned as an optional task at the end of a
PITR recovery on a failover/second server.
5. At 3.5 billion records during recovery we should halt the replay, do
a full database scan to set hint bits, truncate clog, then restart
replay. (Automatically within the recovery process).
6. During WAL replay, put out a warning message every 1 billion rows
saying that a hint bit scan will eventually be required if recovery
continues.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com