bob wrote:...
If you want to go as far as I did, run a lightweight kernel. No virtual memory or anything there, rock-solid repeatability.

I have googled it, but it appears you can not disable virtual memory in Linux. What kernel did you run?
What test metrics do you use for testing?

Lightweight kernel versions (one from Sandia Labs for one) for linux clusters don't do virtual memory. You don't disable it, it simply doesn't support it... I've been out of this for 1.5 years now so I can't tell you what is current, just that you probably won't want to run that on a general-purpose machine...

bob wrote:You can also use intel's vtune tool and get even more information. IE total clock cycles, total instructions executed, total cache hits/misses, etc...

On Linux, perf also gives that information.

Another interesting application of perf is identifying (false) sharing, which easily kills performance with many threads ("perf c2c").

Ditto for vtune, which was the first thing I liked about it. vtune also has a "smart mode" where it will start sampling most everything, and then begin to hone in on issues that are having a performance impact...

I also agree about perf. really helps find lots of performance bottlenecks...