Bumpy road to multi-core: Ars reviews the 12-core 2010 Mac Pro

What has twelve cores, twenty four threads, a brand new GPU, and still doesn't …

Twelve cores and twenty-four threads—that's what I'm sitting in front of. Even after owning an 8-core Nehalem Xeon Mac Pro, I just wasn't prepared for the 8 extra threads in my new shiny new 12-core Westmere Xeon Mac Pro. It's just that crazy. Sometimes, you look up at the menu bar and you think that Iran has Photoshopped extra iStat CPU bars up there, to convince you of this machine's awesome powers:

Every time that happens, I hear a Black Sabbath guitar solo off in the distance, and my mouse hand does this of its own accord:

But then the smoke machine fog dies down, and I'm left with the rest of my programs that don't cause multi-core god rays to appear. This is life with many cores.

Custom-built Mac Pro 2010 specs

Dual-socket six-core 2.66GHz Westmere Xeon Mac Pro

15GB RAM

OCZ Vertex Turbo 120GB system disk

2TB striped RAID working disk

ATI 5870 1GB

dual NEC 2490WUXi LCDs at 1920x1200

Comparison 2009 Mac Pro

Dual quad-core 2.66GHz Nehalem Xeon Mac Pro

24GB RAM

OCZ Vertex Turbo 120GB system disk

2TB striped RAID working disk

ATI 4870 512MB

dual NEC 2490WUXi LCDs at 1920x1200

Both machines are using the same hard drives—I reformatted the system disk and put it in the new Mac Pro after running the benchmarks on the 2009 Mac Pro.

The Hardware

The 2010 Mac Pro now comes with an 802.11n Wi-Fi card by default. I always use wired, but this was a dumb thing to make an upgrade because there are times you find you need wireless. The new Mac Pros also come with the magic mouse.

There isn't much else to say about the 2010 internals that wasn't said already in my 2009 Mac Pro review. That's no complaint—the internals of this machine are great, and not much was in need of revising. It's still the easiest Mac ever to upgrade, and all the goodies like the thumbscrew PCI card block are still there:

The thumbscrew block with a sexy candid shot of our lovely new ATI 5870.

The memory bay count of eight is still unchanged, so users need to be careful not to fill them for the sake of filling them. My 15GB RAM allocation may seem gross as a number, but it's the proper pairing:

Combining three 1GB with three 4GB modules means that the memory is in running in triple-channel mode. Filling all eight slots wouldn't be the best way to go.

The ATI Radeon 5870

Since I do 3D work, I upgraded from the Radeon 5770 to the Radeon 5870. On a purely aesthetic level, it's a beautiful design:

If a leather jacket from the original V series and HAL from 2001 had a baby on an ugly particle-board desk, that's what it would look like. I should have been a product photographer.

The 5870 has three outputs: one dual-link DVI and two mini DiplayPorts, so it should drive three screens with resolutions as high as the 27" that Apple just released. I wanted to test it with three LCDs, and thought it would work without any hitches, since my triple 1920x1200 screens are on the conservative side. After receiving a second single-link mini DisplayPort to DVI adapter, I've only been able to use two and, it seems, this is not a bug. If you want to connect to DVI displays (no matter the resolution), you have to use the more expensive mini DisplayPort-to-dual-link-DVI adapters. Apple's docs covering this issue are here:

You are aware that you've seriously messed up the memory subsystem by going with 15GB of ram, right? Nehalem processors REALLY want the banks to be full and have all matching DIMM's, any other memory config can have a 20-50% drag on memory throughput vs an ideal config.

It's been a while since I've used Maya and Zbrush, but dont they run better on a PC anyways?

You didn't read the article. mental ray for Maya is faster to render on OS X but the realtime 3D is faster in Windows.

afidel wrote:

You are aware that you've seriously messed up the memory subsystem by going with 15GB of ram, right? Nehalem processors REALLY want the banks to be full and have all matching DIMM's, any other memory config can have a 20-50% drag on memory throughput vs an ideal config.

I timed the render speds with 6 matching DIMMs and with 3x1GB + 3x4GB and it was the same. Matching sets of 3 is what is needed and that's what was used. I spoke to a representative at datamem.com and they confirmed this was appropriate.

I wonder what the Shark profile for that Handbrake encode would look like. x264 is absolutely capable of saturating that many cores - the problem is most likely a decoding bottleneck inside Handbrake or its iPad preset misconfiguring x264.

Wow. Not surprised, but just think of the AMD hardware you can get for that price. Using dual Socket G34 processors, you could get 16-24 full cores for the price of a Quad-core from Apple. Or likely something similar or exceeding the high end option for a whole lot cheaper. Perhaps if I am bored later I will post numbers when the Apple store is back up.

So to sum up your review, unless your doing fancy 3D work or video editing, extra cores get wasted.

For now, yes.

Actually it's going to stay like that for a bit longer, too. Unless you do video or 3D rendering or massive calculations that can be threaded easily, 4, 8 or 12 cores won't make much of a difference. It specialized hardware for people with specialized needs. The average consumer is ok with 2-4 cores 90+% of the time.

These reviews just don't seem very meaningful to me. The conclusions drawn, could be expected after just comparing a new Xeon vs and old Xeon, OpenCL on Nvidia vs ATI, and a 5870 vs 4870.

Windows has equivalent Graphics Design software, so why not add a similarly configured (custom build/God Box) and show some meaningful results that illustrate if paying the Apple premium is worth it when it comes to getting a job done. Simply showing a Mac Pro 2009 vs Mac Pro 2010 seems worthless ... of course the new one outperforms the old one, and "multicore waste" could be argued comparing a Mac Pro 2008 to a Mac Pro 2009 (ie... it's not something unique to Apple, it's a fact of life until devs decide to code for it.)

Well... I must admit I find it slightly amusing how excited he got over the 5870 when I've had two of those in my laptop for quite some time now...

And I find it even more amusing that you think a mobile 5870 is a 5870 (it's actually a little bit slower than a 5770.) Even with 2x in Crossfire you're still going to be a little slower than a 5870...

If we assume the working hour of a halfway talented designer is worth, say, US$150, and said designer would spend 6 hours getting all the components together, price-compared and built, along with another 2 hours (likely more) of getting OS X to run on that machine, that's a cost of US$ 1200 that you'd have to add to the self-built machine compared to the 10 minutes it takes to order it from Apple. And that's assuming the setup will run perfectly fine from then on.

Personally, I like to spend money so I don't have to give a shit about all that.

I'm an audio professional. Would it be possible to run some demanding audio unit effects in Logic to see how the cores get used? Back in the Emagic days, when Logic was cross-platform benchmarks between systems were run by seeing how many instances of the "Platinum Verb" plugin could be run before audio choked. I'd love to know how audio DSP gets divvied up on a many-core system.

Just a note for anyone on Ars actually looking to purchase this (or really anything) for professional purposes: Thanks to the new small business act that was passed a few weeks, ago, you can take accelerated depreciation on capital assets (like workstations and PCs) for anything bought before the end of 2010. So when you do your taxes, you can get about double what you would normally get for depreciating new hardware this year.

It's not a ton of money, but if you're looking to buy in that time frame anyway, far better to do it late 2010 than early 2011.

If we assume the working hour of a halfway talented designer is worth, say, US$150, and said designer would spend 6 hours getting all the components together, price-compared and built, along with another 2

Except the "talented designer" doesn't have to do his own legwork. All he has to do is know his tools. He has to know the hardware well enough to know what tool to buy. He has to know the hardware well enough to determine whether or not he's getting overcharged for an under-performing machine.

I wonder what the Shark profile for that Handbrake encode would look like. x264 is absolutely capable of saturating that many cores - the problem is most likely a decoding bottleneck inside Handbrake or its iPad preset misconfiguring x264.

Yes. Those Handbrake numbers just seem wrongful. Any threaded transcoder should be able to do way better than that even on much cheaper hardware.

I've never really understood why Apple bothers focusing on increasing the number of CPU threads/cores yet sticks with comparatively weak GPU options. E.g. I'd assume having two Radeon 5870s in Crossfire would be more useful and price-efficient than having a combined total of 12 cores from the CPUs. Sure not all programs may utilize a high end graphics card configuration but I wouldn't say there are many that would utilize 24 threads.

If we assume the working hour of a halfway talented designer is worth, say, US$150, and said designer would spend 6 hours getting all the components together, price-compared and built, along with another 2

Except the "talented designer" doesn't have to do his own legwork. All he has to do is know his tools. He has to know the hardware well enough to know what tool to buy. He has to know the hardware well enough to determine whether or not he's getting overcharged for an under-performing machine.

Wrong. See, the geeks assume everyone cares about hardware as much as they do. They don't. Hence they have more time to earn the money to buy whatever they like using. The main tool of a "talented designer" is his software, if anything. Now, I am sure you can find software that's way cheaper than what Adobe and Autodesk offer. You'll have a hard time finding a designer willing to use it, though.

JEDIDIAH wrote:

The rest is just telling some vendor online what components to use.

That's like 5 minutes of "work".

Cool. Please provide me with the link to a vendor that builts a machine exactly like tha Mac Pro that comes with OS X 10.6 preinstalled and ready to run. Preferrably one that is not going out of business in the next 6 months. And I'd like the same, exact hardware features, cause I like my temperature sensors to be in my RAMZ so the vents aren't annoying me all the time.

Or buy an efi-x dongle and roll your own Mac. I would also like to see more inter computational competition. Not just Old Mac vs new mac. It would be nice to see Linux God box vs similar spec'd mac box. Or windows for that matter. Never the less I found my self just as riveted and lost sleep reading the article none the less. This is the reason I religiously check arstechnica.com damn near every 2 hours. I. Love. The. Articles. Keep up then great work.

I've never really understood why Apple bothers focusing on increasing the number of CPU threads/cores yet sticks with comparatively weak GPU options. E.g. I'd assume having two Radeon 5870s in Crossfire would be more useful and price-efficient than having a combined total of 12 cores from the CPUs. Sure not all programs may utilize a high end graphics card configuration but I wouldn't say there are many that would utilize 24 threads.

Upping the CPUs is easier, they'd need decent 3D drivers first for the GPUs to make sense. And most of their markets are in audio, video and 2D design, with a subset doing actual 3D work that requires quick realtime 3D. Once OpenCL and the like are used more across all the apps, they'll likely go multi-GPU, too. Also, I wouldn't be surprised if adding more video cards would add the need new for a new case, new motherboard and the like—if the Mac Pro isn't selling millions, likely a cost they don't want to have for as long as they can.

With Apple's continued focus on consumer mobile electronics, is there really ANY reason to buy a Mac given that Windows based PCs does everything a mac does faster and cheaper.

I'm not really sure what the first half of that sentence has to do with the second, but seriously? Is it so hard to understand that some people just prefer Macs? Crucially, you also missed out the third corner of the triangle: "better."

I have hackintoshed a lot, I would hesitate to say I could build it cheaper. Your processors alone cost $1000 each, and you have to use a really expensive motherboard + FB-DIMMS, the PSU is 1000W and the build quality is excellent. The costs add up.

Can you please justify your reasoning for saying 5870's memory is insufficient?

I've studied GPU benchmarks for years, and I have yet to find one instance when the reference memory amount doesn't provide the maximum benefit. Time and again, I've seen video card makers come out with models with more than the reference memory, and in every single case the benefit results in about ~2 extra frames per second. These models can be described in one word: gimmicks. They're for people who think performance increases linearly with memory amount (which is actually a lot of people).

Compare the bar graphs of 2GB and 1GB versions of the 5870. And keep in mind that the card reviewed in that link is factory-overclocked, so even that tiny performance increase is mostly unrelated to the extra memory.

Can you please justify your reasoning for saying 5870's memory is insufficient?

I've studied GPU benchmarks for years, and I have yet to find one instance when the reference memory amount doesn't provide the maximum benefit. Time and again, I've seen video card makers come out with models with more than the reference memory, and in every single case the benefit results in about ~2 extra frames per second. These models can be described in one word: gimmicks. They're for people who think performance increases linearly with memory amount (which is actually a lot of people).

Compare the bar graphs of 2GB and 1GB versions of the 5870. And keep in mind that the card reviewed in that link is factory-overclocked, so even that tiny performance increase is mostly unrelated to the extra memory.

His requirements and the focus of Mac Pros is for professional applications not games. In this case the issue is with Mudbox.