On Thu, Sep 04, 2008 at 04:59:44PM +0900, KOSAKI Motohiro wrote:> > When a process enters direct reclaim it will expend effort identifying> > and releasing pages in the hope of obtaining a page. However as these> > pages are released asynchronously there is every possibility that the> > pages will have been consumed by other allocators before the reclaimer> > gets a look in. This is particularly problematic where the reclaimer is> > attempting to allocate a higher order page. It is highly likely that> > a parallel allocation will consume lower order constituent pages as we> > release them preventing them coelescing into the higher order page the> > reclaimer desires.> > > > This patch set attempts to address this for allocations above> > ALLOC_COSTLY_ORDER by temporarily collecting the pages we are releasing> > onto a local free list. Instead of freeing them to the main buddy lists,> > pages are collected and coelesced on this per direct reclaimer free list.> > Pages which are freed by other processes are also considered, where they> > coelesce with a page already under capture they will be moved to the> > capture list. When pressure has been applied to a zone we then consult> > the capture list and if there is an appropriatly sized page available> > it is taken immediatly and the remainder returned to the free pool.> > Capture is only enabled when the reclaimer's allocation order exceeds> > ALLOC_COSTLY_ORDER as free pages below this order should naturally occur> > in large numbers following regular reclaim.> > > Hi Andy,> > I like almost part of your patch.> (at least, I can ack patch 1/4 - 3/4)> > So, I worry about OOM risk.> Can you remember desired page size to capture list (or any other location)?> if possible, __capture_on_page can avoid to capture unnecessary pages.> > So, if __capture_on_page() can make desired size page by buddy merging, > it can free other pages on capture_list.> > In worst case, shrink_zone() is called by very much process at the same time.> Then, if each process doesn't back few pages, very many pages doesn't be backed.

The testing we have done pushes the system pretty damn hard, about ashard as you can. Without the zone watermark checks in capture we wouldperiodically lose a test to an OOM. Since adding that I have never seenan OOM, so I am confident we are safe. That said, clearly some widertesting in -mm would be very desirable to confirm that this does nottickle OOM for some unexpected workload.

I think the idea of trying to short-circuit capture once it has a pageof the requisit order or greater is eminently sensible. I suspect weare going to have trouble getting the information to the right place,but it is clearly worth investigating. It feels like a logical step ontop of this, so I would propose to do it as a patch on top of this set.