This is V5 of the patchset to size zones and memory holes inan architecture-independent manner. This has been rebased against2.6.17-rc3-mm1 and as there were no objections against release V4, I wouldlike to have it considered for merging. If there are merge conflicts withlater trees, let me know what to rebase against.

The reasons why I'd like to this merged include;

o Less architecture-specific code - particularly for x86 and ppc64 o More maintainable. Changes to zone layout need only be made in one place o Zone-sizing and memory hole calculation is one less job that needs to be done for new architecture ports o With the architecture-independent representation, zone-based anti-fragmentation needs a lot less architecture-specific code making it more portable between architectures. This will be important for future hugepage-availability work o Nigel Cunningham has stated that that software suspend could potentially use the architecture-independent representation to discover what pages need to be saved during suspend

Changelog since V4o Rebase to 2.6.17-rc3-mm1o Calculate holes on x86 with SRAT correctly

Changelog since V3o Rebase to 2.6.17-rc2o Allow the active regions to be cleared. Needed by x86_64 when it decides the SRAT table is bad half way through the registering of active regionso Fix for flatmem x86_64 machines booting

Changelog since V2o Fix a bug where holes in lower zones get double countedo Catch the case where a new range is registered that is within an rangeo Catch the case where a zone boundary is within a holeo Use the EFI map for registering ranges on x86_64+numao On IA64+NUMA, add the active ranges before rounding for granuleso On x86_64, remove e820_hole_size and e820_bootmem_free and use arch-independent equivalentso On x86_64, remove the map walk in e820_end_of_ram()o Rename memory_present_with_active_regions, name ambiguouso Add absent_pages_in_range() for arches to call

At a basic level, architectures define structures to record where activeranges of page frames are located. Once located, the code to calculatezone sizes and holes in each architecture is very similar. Some of thiszone and hole sizing code is difficult to read for no good reason. Thisset of patches eliminates the similar-looking architecture-specific code.

The patches introduce a mechanism where architectures register where theactive ranges of page frames are with add_active_range(). When all areashave been discovered, free_area_init_nodes() is called to initialisethe pgdat and zones. The zone sizes and holes are then calculated in anarchitecture independent manner.

At this point, there is a reduction of 421 architecture-specific lines of codeand a net reduction of 25 lines. The arch-independent code is a lot easierto read in comparison to some of the arch-specific stuff, particularly inarch/i386/ .

For Patch 6, it was also noted that page_alloc.c has a *lot* ofinitialisation code which makes the file harder to read than it needs tobe. Patch 6 creates a new file mem_init.c and moves a lot of initialisationcode from page_alloc.c to it. After the patch is applied, there is still a netloss of 8 lines.

The patches have been successfully boot tested by me and verified that thezones are the correct size on

o x86, flatmem with 1.5GiB of RAMo x86, NUMAQo x86, NUMA, with SRATo x86 with SRAT CONFIG_NUMA=no PPC64, NUMAo PPC64, CONFIG_NUMA=no Power, RS6000 (Had difficulty here with missing __udivdi3 symbol in pci_32.o)o x86_64, NUMA with SRATo x86_64, NUMA with broken SRAT that falls back to k8topology discoveryo x86_64, ACPI_NUMA, ACPI_MEMORY_HOTPLUG && !SPARSEMEM to trigger the hotadd path without sparsemem fun in srat.c (SRAT broken on test machine and I'm pretty sure the machine does not support physical memory hotadd anyway so test may not have been effective other than being a compile test.)o x86_64, CONFIG_NUMA=no x86_64, AMD64 desktop machine with flatmem

Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,gensparse_defconfig and defconfig. Bob Picco has also tested and debuggedon IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-basedmachine. These were on patches against 2.6.17-rc1 but there have been noia64-changes made between release 3 and 5 of these patches.

There are differences in the zone sizes for x86_64 as the arch-specific codefor x86_64 accounts the kernel image and the starting mem_maps as memoryholes but the architecture-independent code accounts the memory as present.

The net reduction seems small but the big benefit of this set of patchesis the reduction of 421 lines of architecture-specific code, some ofwhich is very hairy. There should be a greater net reduction when otherarchitectures use the same mechanisms for zone and hole sizing but I lackthe hardware to test on.

Comments?

Additional credit; Dave Hansen for the initial suggestion and comments on early patches Andy Whitcroft for reviewing early versions and catching numerous errors Tony Luck for testing and debugging on IA64 Bob Picco for testing and fixing bugs related to pfn registration Jack Steiner and Yasunori for testing on IA64