Thursday, March 29, 2012

Postgres Update Release, Data Loss, and a Million Monkeys

One of the major bugfixes in the most recent PostgreSQL update release (9.1.3 et. al.) is one of those annoying timing-based issues which you'd expect to be very rare in production. And it is rare, as measured against the total number of inserts you do in a database. Unfortunately, it's also a "million monkeys" issue, which means that it's not that rare as measured by the number of databases in which it occurs. For that reason, I'd urge all users to update their PostgreSQL releases as soon as they can. Let me explain.

The issue, as I understand it, was that autovaccum has been (since 8.3.0) re-using a pointer for multiple passes over the same leaf page while pruning dead nodes in b-tree indexes. If this occurred at the same time a concurrent process was inserting an new index pointer into dead space on the same leaf page at the same time, autovacuum could cause that index pointer to be flagged dead and removed. As a result, a row would exist in the table but not in the index -- a form of mild data loss or data corruption.

The above depends on split-millesecond timing to occur, which is how we were able to miss the issue for so long. The odds of it happening for any particular insert, even on a busy system, were miniscule, maybe 1 in a million. Further, it required frequent inserts into a table which also had heavy updates and deletes. Not much to worry about overall, yes? Well, no.

This is where the million monkeys come in. Just as the apochryphal mega-primate literary club will, eventually, due to random typewriter-key-mashing, produce War and Peace, with enough database activity your chances of hitting this bug move from "highly improbable" to "nearly certain". Every high-traffic production database has at least one table which receives heavy inserts, updates, and deletes (think queue table), and given a year or so might have a couple billion writes and a million autovacuum runs. Suddenly "one in a million" timing becomes 99.5% probability.

Indeed, on examination, we've found this kind of index corruption on four production customer databases. And we haven't checked nearly all of them.

Obviously this means you should update your production databases as soon as you can (downtime this weekend, perhaps?). Folks have also asked me about how to check for this form of index corruption. It's actually quite simple:

First, do a count of the rows in the table by full table scan:

staffos=# SELECT count(*) FROM candidates;

count

-------

1449

(1 row)

Then, force an index scan, and do a count(*) using a condition you know to be universally true for an indexed column in the table:

staffos=# set enable_seqscan = false;

SET

staffos=# select count(*) from candidates where id > 0;

count

-------

1448

(1 row)

Results like the above would indicate that you've encountered the index corruption bug. Note that you need to test each index to be sure. You fix it by reindexing:

staffos=# REINDEX INDEX candidates_pkey;

REINDEX

staffos=# select count(*) from candidates where id > 0;

count

-------

1449

(1 row)

Hopefully this helps with the bug. Our apologies for not catching this issue earlier, but in our defense, PostgreSQL's record on subtle bugs like this is much better than competing databases. This does show you why it's vital to stay up to date with the most recent update releases, though!

Actually, to give you a concrete example: we tested reproduceing the issue on a test database. It took an average of 8 days of fast-as-we-can inserts/updates/deletes in a queueing table (around 12 million of them) to reproduce the issue.

Disclaimer

This is my personal technical blog. Unless explicitly stated otherwise, I am not speaking on behalf of PostgreSQL Experts, the PostgreSQL Project, Software in the Public Interest, or any other organization of which I am a member.