On Thu, 29 Mar 2007, Erik Jones wrote:
> As far as the procs responsible for the writes go, we were unable to see that
> from the OS level as the guy we had as a systems admin last year totally
> screwed us with the way he set up the SunCluster on the boxes and we have
> been unable to run Dtrace which has left us watching a lot of iostat.
There are two processes spawned by Postgres that handle collecting
statistics and doing the background writing. You don't need any fancy
tools (you Solaris guys and your Dtrace, sheesh) to see if they're busy.
Just run top and switch the display to show the full command line instead
of just the process name (on Linux the 'c' key does this) and you'll see
the processes that had just been "postgres" before label themselves.
The system I saw get nailed by the bug Tom mentioned was also doing an
update-heavy workload. It manifested itself as one CPU spending almost
all its time running the statistics collector. That process's issues
kicked up I/O waits from minimal to >25% and the background writer was
incredibly sluggish as well. The problem behavior was intermittant in
that it would be crippling at times, but merely a moderate slowdown at
others. Your case sounds similar in several ways.
If you see the stats collector process taking up any significant amount of
CPU time in top, you should strongly consider the possibility that you're
suffering from this bug. It's only a few characters to patch the bug if
you don't want to wait for the next packaged release. In your situation,
I'd do it just eliminate this problem from your list of possible causes
ASAP.
To fix, edit src/backend/postmaster/pgstat.c
Around line 1650 you'll find:
write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
Change it to match the current line in the CVS tree for 8.3:
write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
That's all it took to resolve things for me.
--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD