Jeff, I posted a RHEL4 patch to rhkernel-list to deal with swapin_readahead
under memory pressure on 2/23/07. It looks like this exact same issue. Can you
grab the show_mem() output from /var/log/messages when this OOM kill happened?
I really dont have enough to go on with the above output.
Larry

Jeff in this case the system ran out of swap space and every zone's
all_unreclaimable is "yes" so its unlikely that there is any pagecache memory to
reclaim and it cant reclaim any anonymous memory. Does this workload run on any
RHEL4, RHEL5 or RT kernel hased system without OOM killing?
Larry

Jeff, its most likely that linux-2.6-mm-prevent-oom-fixes.patch needs to be
ported to the RT kernel. I ported this patch from RHEL4 to RHEL5 to prevent
transient low memory combined with swap space exhaustion problems from causing
OOM kills. I'll port it to the RT kernel as well so you can give it a try.
Larry

Larry,
Thanks. I will try it now. In the future can you please add and identifier
to the kernel N-V-R.
Currently your and the "real"/original 2.6.21-6.el5rt have the same name.
When I run this through RHTS it will skew the results for the real
2.6.21-6.el5rt kernel.
Jeff

Larry,
I don't have the results that you are looking for and the scratch build you
did is now gone. Can you please push through a new brew kernel. Please make sure
you give the kernel a unique name. i will re-run the tests. FYI the latest
kernel is 2.6.21-23.el5rt
Jeff

Jeff, can you reproduce this in the latest RT kernel in an x86_64 system that
has swap space and get me the show_mem() output when the OOM kill occurs? The
cause of the x86 failure is the normal zone is consumed by the slab cache and
x86_64 system are not limited by Lowmem.
slab:207535
Normal free:3576kB min:3736kB low:4668kB high:5604kB active:1204kB
inactive:916kB present:888800kB pages_scanned:6915 all_unreclaimable? yes
Larry

So the OOM kill in comment #2 is an x86_64 system that exhausted swap space,
therefore this is expected behavior. The OOM kill in comment #8 is an x86
system that the slabcache exhausted lowmem, this is not expected behavior but I
need a /proc/slabinfo output to see what data structures are sucking all of this
memory down. How should we proceed???
Larry

Jeff, in all of these examples the system has completely exhausted swap space.
Since the swap cache adds and deletes were pretty much the same, the active and
inactive pages were not reclaimable without more swap space being available.
Thats why we OOM killed.
Larry