I'm seeing a problem on my benchmark machine: checkpoints stop happening
after the ramp-up period.
It looks like the bgwriter gets starved waiting on the
CheckpointStartLock. The CheckpointStartLock is held in shared mode over
an XLogFlush when committing, which on an extremely busy system like a
benchmark is always long enough to have a new transaction to acquire the
CheckpointStartLock again.
I'm running another test with more logging to confirm that's what's
happening, but I'm pretty sure that's it...
As a proposed fix, instead of acquiring the CheckpointStartLock in
RecordTransactionCommit, we set a flag in MyProc saying "commit in
progress". Checkpoint will scan through the procarray and make note of
any commit in progress transactions, after computing the new redo record
ptr, and wait for all of them to finish before flushing clog.
Unless someone has a better idea, I'll write a patch to do the above.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com