"Mark Reid" <mail(at)markreid(dot)org> writes:
> This morning I noticed that a normally small table (18 wide rows) was
> bloated to 6GB in size. This has happened before using older postgres
> versions in the past, where the main table got vacuumed, but the pg_toastXX
> table did not. This is the first time I've seen a problem for this version
> of postgres (8.3.5). So I decided to manually vacuum the table to find out
> what was going on. I saw a lot of "dead row versions cannot be removed
> yet." (on the order of 60k in the main table, and 3.8M in the toast table).
> There was a day-old "idle in transaction" query for that same user, so I
> killed that and tried again. Same deal. So I tried a "CLUSTER
> properties_pkey ON properties;" which gave the following error message:
> ERROR: missing chunk number 0 for toast value 396950697 in
> pg_toast_373928870
> [ and closing an old open transaction made the error go away ]
> Andres concluded "The bug is that CLUSTER seems to use the wrong xid horizon
> when determining visibility, Or autovacuum. Not sure"
No, I don't think so. I've been able to replicate the failure with this
sequence, in the presence of an old open transaction:
regression=# create table foo (f1 serial primary key, f2 text);
NOTICE: CREATE TABLE will create implicit sequence "foo_f1_seq" for serial column "foo.f1"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
CREATE TABLE
regression=# insert into foo(f2) select repeat('xyzzyklflencxlkslnvlzsm;pemc', 10000);
INSERT 0 1
regression=# begin;
BEGIN
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# commit;
COMMIT
regression=# vacuum foo;
VACUUM
regression=# cluster foo using foo_pkey;
ERROR: missing chunk number 0 for toast value 41347 in pg_toast_41336
The problem is that CLUSTER is trying to copy a tuple that it believes
to be RECENTLY_DEAD, but that is in fact DEAD (ie, inaccessible to every
open transaction). The preceding VACUUM did not remove the tuple, but
it did remove the underlying TOAST data, which means that CLUSTER fails
when trying to copy the tuple's toast data to the new table. The error
goes away once OldestXmin advances enough to let the problem tuple be
seen as DEAD.
So, how is it that the vacuum removed the underlying toast data but not
the putatively-recently-dead tuple? The problem tuple is in the middle
of the update chain that I created with those updates, and the
difficulty arises from the following bit in HeapTupleSatisfiesVacuum:
/*
* Deleter committed, but check special cases.
*/
if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),
HeapTupleHeaderGetXmax(tuple)))
{
/*
* Inserter also deleted it, so it was never visible to anyone else.
* However, we can only remove it early if it's not an updated tuple;
* else its parent tuple is linking to it via t_ctid, and this tuple
* mustn't go away before the parent does.
*/
if (!(tuple->t_infomask & HEAP_UPDATED))
return HEAPTUPLE_DEAD;
}
if (!TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), OldestXmin))
{
/* deleting xact is too recent, tuple could still be visible */
return HEAPTUPLE_RECENTLY_DEAD;
}
/* Otherwise, it's dead and removable */
return HEAPTUPLE_DEAD;
Those middle-of-the-chain tuples have xmin=xmax, and so do their
supporting toast tuples. Problem is the toast tuples aren't "updated",
they are independent tuples so far as the toast table is concerned.
Therefore, the toast tuples are reported as DEAD regardless of
OldestXmin, but their parent tuples escape the first test and can be
reported RECENTLY_DEAD. This allows the toast tuples to be removed
ahead of their parent tuple, and then you've got trouble waiting to
happen.
This issue clearly is problematic for CLUSTER (as well as VACUUM FULL
in 9.0+). I wonder whether it might explain some of the heretofore
unexplainable reports of "missing chunk number 0" errors in other
operations. You'd have to suppose that those other commands were
being run with stale snapshots, so it's not super credible, but ...
I'm inclined to think that the xmin=xmax test is just too cute and
should be dropped altogether, ie removing the first if-block quoted
above would fix the problem. Anyone see a better answer?
regards, tom lane