Windows 7 Application Performance

3dsmax 9

Today's desktop processors are more than fast enough to do professional level 3D rendering at home. To look at performance under 3dsmax we ran the SPECapc 3dsmax 8 benchmark (only the CPU rendering tests) under 3dsmax 9 SP1. The results reported are the rendering composite scores.

Offline 3D rendering applications make some of the best use of CPU cores, unfortunately our test here doesn't scale all that well. We only see a 7% increase over the 2600K. If we look at a more modern 3D workload however...

Cinebench 11.5

Created by the Cinema 4D folks we have Cinebench, a popular 3D rendering benchmark that gives us both single and multi-threaded 3D rendering results.

Single threaded performance is marginally better than the 2600K thanks to the 3960X's slightly higher max turbo speed. What's more important than the performance here is the fact that the 3960X is able to properly power gate all idle cores and give a single core full reign of the chip's TDP. Turbo is alive and well in SNB-E, just as it was in Sandy Bridge.

Here the performance gains are staggering. The 3960X is 53% faster than the 2600K and 19% faster than Intel's previous 6-core flagship, the 990X. The Bulldozer comparison is almost unfair, the 3960X is 75% faster (granted it is also multiple times the price of the FX-8150).

7-Zip Benchmark

While Cinebench shows us multithreaded floating point performance, the 7-zip benchmark gives us an indication of multithreaded integer performance:

Here we see huge gains over the 2600K (58%), indicating that the increase in cache size and memory bandwidth help the boost in core count a bit here. The advantage over the 990X is only 7%. This gives us a bit of a preview of what we can expect from SNB-EP Xeon server performance.

PAR2 Benchmark

Par2 is an application used for reconstructing downloaded archives. It can generate parity data from a given archive and later use it to recover the archive

Chuchusoft took the source code of par2cmdline 0.4 and parallelized it using Intel’s Threading Building Blocks 2.1. The result is a version of par2cmdline that can spawn multiple threads to repair par2 archives. For this test we took a 708MB archive, corrupted nearly 60MB of it, and used the multithreaded par2cmdline to recover it. The scores reported are the repair and recover time in seconds.

Here we see a 40% increase in performance over the 2600K and FX-8150.

TrueCrypt Benchmark

TrueCrypt is a very popular encryption package that offers full AES-NI support. The application also features a built-in encryption benchmark that we can use to measure CPU performance with:

As both the 990X and 3960X have AES-NI support, both are equally capable at cranking through an AES workload. Per core performance doesn't appear to have changed all that much with the move to Sandy Bridge, so here we have a situation where the 3960X is much faster than the 2600K but no faster than the 990X. I suspect these types of scenarios will be fairly rare.

x264 HD 3.03 Benchmark

Graysky's x264 HD test uses x264 to encode a 4Mbps 720p MPEG-2 source. The focus here is on quality rather than speed, thus the benchmark uses a 2-pass encode and reports the average frame rate in each pass.

Single threaded performance isn't significantly faster than your run-of-the-mill Sandy Bridge, which means the first x264 HD pass doesn't look all that impressive on SNB-E.

The second pass however stresses all six cores far more readily, resulting in a 47.5% increase in performance over the 2600K. Even compared to the 990X there's a 15% increase in performance.

Adobe Photoshop CS4

To measure performance under Photoshop CS4 we turn to the Retouch Artists’ Speed Test. The test does basic photo editing; there are a couple of color space conversions, many layer creations, color curve adjustment, image and canvas size adjustment, unsharp mask, and finally a gaussian blur performed on the entire image.

The whole process is timed and thanks to the use of Intel's X25-M SSD as our test bed hard drive, performance is far more predictable than back when we used to test on mechanical disks.

Time is reported in seconds and the lower numbers mean better performance. The test is multithreaded and can hit all four cores in a quad-core machine.

Our Photoshop test is multithreaded but there are only spikes that use more than four cores. That combined with the short duration of the benchmark shows no real advantage to the 3960X over the 2600K. Sandy Bridge E is faster than Intel's old 6-core solution though.

Compile Chromium Test

You guys asked for it and finally I have something I feel is a good software build test. Using Visual Studio 2008 I'm compiling Chromium. It's a pretty huge project that takes over forty minutes to compile from the command line on the Core i3 2100. But the results are repeatable and the compile process will stress all 12 threads at 100% for almost the entire time on a 980X so it works for me.

Our compile test is extremely well threaded, which once again does well on the 3960X. The gains aren't as big as what we saw in some of our earlier 3D/transcoding tests, but if you're looking to build the fastest development workstation you'll want a Sandy Bridge E.

Excel Monte Carlo

Multithreaded compute does well on SNB-E regardless of the type of application. Excel is multithreaded and if you have a beefy enough workload, you'll see huge gains over the 2600K.

Computers are only getting faster one way today, and that is more cores, designing for up to a strict number of cores is merely stupidity in today's world.

That said, developing games that support multiple cores might be somewhat more difficult than designing highly concurrent applications that processes data or request for data. (I can't say for sure as I have only briefly touched the game development part of the industry, but I work with the other part on a daily basis)

But while you might save development cost right now going down that road, you will spent the savings ones you suddenly have to think 8 cores in.

Carrying technical debt is never a good thing (And designing with a set number of cores in mind can to my programming experience only add that), it will only get more expensive to remove down the road, that has been proven to be true again and again.

And that is even considering that Frostbite 3 might be developed from the ground up, they still have to think up the concept again, while had they gone for high concurrency, then that concept would already be in place for the next version.Reply

Given QPI @ 3.2 Ghz 205 Gb/s (25.6 GB/s) also handled the PCI load, can't we have something in the middle. I'm still a little confused is DMI 2.0 still just mainly simple parallel interface where QPI is a high speed series interface?Reply

Clearly you didn't read a single of my points, or simply lack the understanding.

Applications are not developed to target specific cores, you OS handles all that, it is a simple matter of pushing out jobs in threads or processes.

Processing in 10, 100 or 1000 threads/processes is no more difficult than doing it in 4... it just requires you have enough "JOBS" to process (and that term was deliberately chosen)...

This requires a different mindset though, and this might be more difficult to think of games that way right now, mostly because they have been use to running everything in that single game loop, but doing it now could be a rather good ROI down the road.Reply

How about overclocking with turbo boost enabled? I mean, if the 3960X is stable at 4.4GHz, can it be stable at 4.8GHz when games or applications only use four cores? Then it would overclock and perform as good as a 2600K with four heavy threads.Reply

Guys, there are always people with more money than brain that will purchase just about anything.That's not the point. Having the fastest CPU makes it a status symbol and whoever makes it can have the luxury to price it in the $1000 range, for fools to buy.I don't know about CPUs, but I do know that the top performing GPUs (HD6990 and GTX590) are sold in extremely low volumes, both because of the relatively low ROI, both because the market is so little that inventory are scarce to begin with.So, you may be right on the CPU side, but in general, you're both wrong.

This said, my point was that if AMD had performed and delivered a good CPU, instead of the FX8150, OR, the FX8150 at a good price point ($170, not $279), then Intel would have had a tougher time in pushing out the 3960X for this price, AND, it would have had to work harder on the chipset. However, because of the huge lead it has over AMD, Intel now can comfortably rebrand a "mid range" chipset and shove it to the customer who has no choice but take it if they want the best CPU.Reply

I agree on the fact that only 2 6GB SATA ports are a disappointment. Interesting though is to run two SSD in RAID 0 on the intel controller. With two Kingston SSD I manage real good figures (Crystal Disk Mark) : 4000MB test -> 1040MB/s Read and 621MB/s Write in (SEQ) / 675 and 481 (512K) / 28 and 253 (4K) / 279 and 405 4K QD32. I never managed this kind of throughput on the Z68 or P67 on-board controllers. These numbers are getting close to hardware RAID controllers like ARECA and LSI. I would have been interested to see where the bottleneck lies if X79 would have had more ports. Even though X58 is 3GB Sata you had no problem bottle-necking the Intel RAID controller at around 800MB/s.Reply