Preliminary tests

First of all we decided to evaluate performance of each configuration in a synthetic application Everest 4.6 (it's not the latest version of the popular benchmark, but real software is not updated instantly either, so these results are also very interesting, even if Version 4.6 is not optimized for Nehalem). We added another processor to our tests, Core i5 750, operating with the same pair of memory modules as Core i7 860 in one of the modes. Why? UnCore frequency in Core i7 860 is 2.4 GHz, while in the i5 750 and i7 920 it's lower -- 2.13 GHz. So let's see what and how it affects (whether it affects anything at all).

Performance is apparently limited by memory frequency while reading data. It's not that the bottleneck is in theoretical memory bandwidth (we are getting close to its limits only in the single-channel mode, but not in the dual-channel mode, and 30GB/s in the triple-channel mode is just a dream), it's just that test results depend on it. DDR3 1066 memory demonstrates the same results with the 860 and the 920. DDR3-1333 is noticeably faster. It's expectedly a little faster with the 860 than with the 750. But we were surprised to find out that the mode with two modules per channel was even faster.

Now that's interesting: we can say that the memory write rate does not depend on memory frequency, it's connected directly to UnCore frequency! So that's the obvious method to raise performance of the Core i7/i5: you should overclock this unit to obtain better results in memory-critical applications. However, unlike processors from LGA1366, these models do not give us much freedom, as UnCore multiplier is fixed in processors for the new platform, so the only way to increase the frequency of this unit is to increase the base clock rate. That's exactly what must be done to overclock processors with a locked multiplier for its cores (there are no other models for LGA1156), so frequencies of all units grow synchronously. Anyway, even if users had such freedom, they would hardly overclock only selected units, ignoring all the rest.

High-frequency memory and UnCore are required for better results, both components are important here. Fewer modules per channel are a plus. So the absolute leader is Core i7 860 with two DDR3-1333 modules. Latencies grows, when four modules are used (two per channel), but not very much -- Core i5 750 is slower even with a couple of modules. But it's faster than Core i7 920 in the dual-channel mode owing to 1333MHz versus 1066MHz memory in the latter. However, if we install two 1066MHz memory modules per channel for the LGA1156 processor, latency will go beyond 40 ns. The only thing that saves this mode is that the triple-channel mode in processors for LGA1366 is even slower. Note that we used one module per channel on this platform. It would've been about 5% worse, if we used two modules.

So what conclusions can we draw from low-level tests? We can see that LGA1156 processors work with memory a tad better than LGA1366 models. Usage of DDR3-1333 memory (officially supported) is justified, although DDR3 1066 is not much worse. Faster memory is no better in this case, because processors simply cannot handle it in the nominal mode.

For our next tests we modified the list of contenders. As we've sorted everything out with Core i5 750 already, we'll replace it with Phenom II X4 965 to compare performance in applications. The latter was also tested with two memory configurations: one 1333MHz module per channel (that is 4GB), 7-7-7-20, and two modules per channel (6GB), also DDR3-1333, but with 8-8-8-24 timings (it's a stricter mode of the memory controller, so to preserve operating stability we had to lower memory parameters).

3D visualization

This group of applications includes several programs that need more than 4GB of memory, so worse timings (for Phenom II) and even lower memory frequency (for Core i7 860) are compensated by the increased memory volume (6GB). However, memory latency in the triple-channel mode of Core i7 920 grows too much, so it's defeated with 6GB of memory. The fastest mode for the LGA1156 processor is 1333 MHz, but with 8GB of memory.

3D rendering

Rendering is not very sensitive to memory size or timings, so test results depend primarily on a processor itself. They are usually identically to within a measurement error for the same processor, so a "slower" configuration looks even better than the others.

Scientific calculations

This picture is similar to 3D visualization: such programs sometimes require more than 4GB of memory. However, memory latency in the triple-channel mode of LGA1366 processors surges, so this mode is even slower. In other cases there are some performance gains from using more memory (even if slower modules).

Bitmap processing

There is a small performance gain from the increased memory volume (Adobe Photoshop) for AM3 and LGA1156, and a small performance drop in the triple-channel mode for LGA1366.

As for Photoshop, we publish detailed data on the most illustrative operation (Convert) for four configurations with Core i7 860:

4GB, 1333

6GB, 1066

8GB, 1333

0:07:58

0:04:37

0:04:37

There's a noticeable performance drop, when the memory volume is reduced from 6GB to 4GB. However, expanding memory from 6GB to 8GB gives no results. Why? The size of the processed file is optimized for systems with 6GB of memory :) If it had been smaller, 4GB would have been fine. If bigger, even 6GB would have been insufficient. As we can see, the amount of memory for Adobe Photoshop is a critical parameter. Expanding memory yields an irrefutable effect, but only until all simultaneously processed files fit into memory. On the other hand, an upper limit is out of the question in practice. This application is not designed for processing photos taken with a phone camera, so even 8-12GB will come in handy. Sometimes even larger memory volumes are useful, no matter the cost. Two 4GB modules and Photoshop have a similar price. So we cannot imagine a professional working with this program, who bought it and couldn't invest into equipment.