> > > > On Thu, 2 Mar 2000, Lawrence Manning wrote:> > > > > > You meant 2.3.43-6 and 2.3.43-7, right?> > > > Sorry, yes! The patches between 2.3.42 and 2.3.43! The uname is still> > 2.3.42, so I got confused :( oops, sorry for confusion folks!! Won't> > happen again :) Its first time I've looked at pre patches...> > Thanks for hunting it down, though. The 6-7 is one of the bigger jumps,> but it does contain a few rather suggestive things.> > In particular, pre-6-7 is where the per-zone balancing thing went in.. And> I think I see something really strange: the "pages_min/low/high" stuff is

Actually, I think the zone balancing thing first came out in 2.3.41, whichprobably exonerates it from the Elevator problems.

But look at the explanation for the cumulative numbers below.

> per-zone, but the zone size requirements are calculated with some really> strange calculations based off the _cumulative_ size of the zones as they> are added.> > Kanoj, I think that was your change. Would you mind looking at > > free_area_init_core()> > and explaining the logic behind the > > zone->pages_min = mask;> zone->pages_low = mask*2;> zone->pages_high = mask*3;> > calculations (or if no logic to them, which is my suspicion, then maybe> come up with a fix ;).> > Using a cumulative number and it's ratio just can't be right. And getting> these numbers wrong will make kswapd try to page out some zone much more> aggressively than it should, so it would explain dbench performance going> down..> > Linus>

This is from the file Documentation/vm/balance:

"In 2.2, memory balancing/page reclamation would kick off only when the_total_ number of free pages fell below 1/64 th of total memory. With theright ratio of dma and regular memory, it is quite possible that balancingwould not be done even when the dma zone was completely empty. 2.2 hasbeen running production machines of varying memory sizes, and seems to bedoing fine even with the presence of this problem. In 2.3, due toHIGHMEM, this problem is aggravated.

In 2.3, zone balancing can be done in one of two ways: depending on thezone size (and possibly of the size of lower class zones), we can decideat init time how many free pages we should aim for while balancing anyzone. The good part is, while balancing, we do not need to look at sizesof lower class zones, the bad part is, we might do too frequent balancingdue to ignoring possibly lower usage in the lower class zones. Also,with a slight change in the allocation routine, it is possible to reducethe memclass() macro to be a simple equality.

Another possible solution is that we balance only when the free memoryof a zone _and_ all its lower class zones falls below 1/64th of thetotal memory in the zone and its lower class zones. This fixes the 2.2balancing problem, and stays as close to 2.2 behavior as possible. Also,the balancing algorithm works the same way on the various architectures,which have different numbers and types of zones. If we wanted to getfancy, we could assign different weights to free pages in differentzones in the future."

So, the cumulative logic in free_area_init_core() is simply this:when do we consider a zone unbalanced? When the #free pages in a class of memory falls below 1/64th of the total #free pages inthe class.