Hi,
Thanks to everyone who responded to my queries. I've tried to summarise
the responses below for other's reference. Hope this is useful.
For BIOS memory settings, may want to disable "Node Memory Interleave".
It may decrease memory bandwidth and noticeably increase memory latency
(this is supported by the measurements in
http://www.digit-life.com/articles2/cpu/rmma-numa.html).
With K8SRE board in particular, there may be issues with Linux Broadcom
driver in kernels > 2.6.5 which could cause stability problems at high
load. If problems are seen, may want to use either 2.6.4 or 2.6.16+
Similarly, there are known issues with nforce4 chipset which may cause
NFS errors or K8SRE shutdowns. May need an NFS patch if these problems
occur.
Enabling ECC Scrubbing (for both cache and DRAM) using the highest scrub
times (normally 84ms) should not have a significant performance impact
(note that using scrubbing with the lowest times/highest frequency may
impact performance) and should make for a slightly more reliable
system. Enabling Chipkill should also increase memory reliability
without any performance impact and is recommended.
It is recommended to use the mcelog package so that any memory errors
are recorded at the operating system level.
Thanks to:
Alex Ninaber
Bruce Allen
Mark Hahn
Eric W. Biederman
-stephen
--
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland. http://www.aplpi.com