Monday, December 06, 2010

The Cost of Page Faults

Over Thanksgiving, I wrote a bit about some work I did profiling PostgreSQL, and about squeezing a bit of overhead out of the backend shutdown process. After making that change, I did some further profiling of connection startup/tearddown, and was dismayed to see that the revised profile looked pretty mundane, with most of the time being taken up by functions like memset() and memcpy() that are typically hard to optimize.

As it turns out, this profile wasn't really showing what I thought it was showing. Andres Freund and Tom Lane theorized that the reason why memset() and memcpy() showed up so high in the profile was not because those operations were intrinsically expensive, but because those functions were triggering page faults. Page faults occur when a process attempts to access a portion of its address space it hasn't previously touched, and the kernel must arrange to map that chunk of address space to an actual chunk of physical memory. As it turns out, it appears that Andres and Tom were right: processing a page fault is 2 or 3 times more expensive than zeroing a page of memory.

I found this a bit surprising, because I'm in the habit of thinking of process startup on UNIX-like systems as being very cheap, but it appears that in this case there's so little actual work going on the page faults actually become the dominant cost. This means that if we want to make a significant further reduction in our connection overhead, we're probably going to have to avoid starting a new process for each new connection. I posted a few ideas on this topic, to which Tom Lane responded. In short, there may be some benefit in making PostgreSQL follow a model more like Apache, where workers are spawned before they are actually needed, rather than on demand. I don't presently have time to follow up on this, but I think it's got potential.