On May 24, 2012, at 3:35 PM, Tom Lane wrote:
> Jeff Frost <jeff(at)pgexperts(dot)com> writes:
>> On May 24, 2012, at 3:13 PM, Tom Lane wrote:
>>> Huh. A bit bigger, but not by that much. It doesn't seem like this
>>> would be enough to make seqscan performance fall off a cliff, as it
>>> apparently did. Unless maybe the slightly-bloated catalogs were just a
>>> bit too large to fit in RAM, and so the repeated seqscans went from
>>> finding all their data in kernel disk buffers to finding none of it?
>
>> Seems unlikely.
>> Server has 128GB of RAM.
>
> Hm ... sure seems like that ought to be enough ;-), even with a near-2GB
> pg_attribute table. What's your shared_buffers setting?
It's 8GB.
>
>> BTW, when I connected to it this time, I had a really long time before my psql was able to send a query, so it seems to be still broken at least.
>
> Yeah, I was afraid that re-initdb would be at best a temporary solution.
Oh, sorry, I wasn't clear on that. The currently running system is still happy, but the old data directory got stuck in 'startup' for a while when I connected via psql.
>
> It would probably be useful to confirm the theory that relcache rebuild
> is what makes it stall. You should be able to manually remove the
> pg_internal.init file in the database's directory; then connect with
> psql, and time how long it takes before the pg_internal.init file
> reappears.
So, you're thinking autovac invalidates the cache and causes it to be rebuilt, then a bunch of new connections get stalled as they all wait for the rebuild?
I'll see if I can get the customer to move the data directory to a test system so I can futz with it on a non production system.