If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

I upgraded the E5-2690 server to CentOS 6.2, then tried both the current release kernel.org kernel version 3.4.4, and also 3.5.0-rc5. The "lc" test results are about the same, but the "cv" tests are worse yet: the same test takes about 229 seconds to run.

Also noteworthy is that the HP has generally better performance than the Dell. But the HP E5-2690 is still worse than the X5550.

In all cases, for all servers, I disabled power-saving features (cpu frequency scaling, C-states, C1E). I verified with i7z that all CPUs spend 100% of their time in state C0.

Is this simply a corner case where Sandy Bridge is worse than its predecessor? Or is there an implementation problem?

No, for ultra low latency applications, it is absolutely critical to disable all powersaving features in modern CPUs. This is standard practice in industries such as high frequency trading and some other high performance computing situations. There is a measurable latency hit for a CPU to transition from a low power/slow state to a higher power/fast state. The latency to transition from one state to another is lessened in SNB compared to Westmere, but it's still there.

In other words, with this benchmark, enabling any kind of power saving features (on either CPU) makes things worse.

To be fair, turbo boost is debatable. In this particular benchmark, it improves things slightly; but overall, SNB still falls well behind Westmere.

You can forget the filesystem benchmarks as that is ext3 vs ext4. Also you could try differnet compiler flags and -march=native. I doubt that you lose so much latency with power management, maybe use a low latency kernel.

If you're interested, I encourage you to compile and run the sample program to which I linked (instructions for building are in the top comment).

FWIW, I have tested this program with several compilers. I haven't tried gcc 4.7 yet, but I have tried 4.6.3 on gentoo, having re-emerged the whole system with -march=corei7-avx, and built my demo program similarly. I also tried Intel's compiler (I didn't rebuild the whole gentoo system w/icc, but did build my program). The different compilers have so far made very little difference.

My sample program doesn't actually do much that is interesting; it's more or less a system call benchmark. So my suspicion is that these kernel functions I'm using are implemented sub-optimally for SNB, or this is simply a corner-case where SNB is slower than previous-gen CPUs.

I've been playing with this for a while, so I'm kind of hoping to get the attention of someone with deeper knowledge of kernel-CPU internals than me.

Basically your benchmark is the worst multicore example that could be there. Lets talk about cv mode, when you specify different cores htop does never show more than 55% load on each core. With lc mode you see 100% load, but your code is written in both cases to run on 1 core! The speed difference is extreme. I did not wait for cv to finish with 2 different cores, thats just too long. Used i7-3770S, Turbo fixed at 39.

No, for ultra low latency applications, it is absolutely critical to disable all powersaving features in modern CPUs. This is standard practice in industries such as high frequency trading and some other high performance computing situations. There is a measurable latency hit for a CPU to transition from a low power/slow state to a higher power/fast state. The latency to transition from one state to another is lessened in SNB compared to Westmere, but it's still there.

In other words, with this benchmark, enabling any kind of power saving features (on either CPU) makes things worse.

To be fair, turbo boost is debatable. In this particular benchmark, it improves things slightly; but overall, SNB still falls well behind Westmere.

If latency is important, and you are working for a large finance house, why aren't you using the rh messaging kernel thats designed for low latency?