Compute Performance

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

As our Civilization V compute benchmark is just that, a compute benchmark, so our results aren’t too surprising. This is one of the few compute tests NVIDIA does well at, so the GTX 650 Ti Boost is close to both Radeon cards, and not all that far behind the GTX 660 either.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Moving on to LuxMark, we quite frankly transition into a more normal compute benchmark pattern for NVIDIA, which sees Kepler flopping. The GTX 650 Ti Boost can’t get even remotely close to a 7770, let alone the 7850. On the NVIDIA side it doesn’t help that since this is a compute benchmark the GTX 650 Ti Boost gains fairly little over the GTX 650 Ti.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

CLBenchmark is much the same as LuxMark, with NVIDIA cards bringing up the rear. The fluid simulation ends up being the more painful of the two benchmarks for the GTX 650 Ti Boost, clocking in at less than 1/3rd the performance of the 7850.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home is moving exclusively OpenCL this year with FAHCore 17.

NVIDIA still struggles at compute with FAHBench – the move to OpenCL isn’t doing them any favors – but it’s not the blowout that was our last two benchmarks. Interestingly explicit favors NVIDIA more than implicit, which may mean NVIDIA is handling the overhead better than AMD is. Still, any Folding @ Home users will be far better served by AMD than NVIIDA here.

Our 5th compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Vegas is another OpenCL benchmark, and another benchmark NVIDIA brings up the rear with. Certainly the additional compute performance of the GTX 650 Ti Boost over the GTX 650 Ti is helping NVIDIA here, but it can’t make up for a gap of over 30 seconds.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

SystemCompute mixes things up a bit with its multiple sub-benchmarks, but it still doesn’t change the fact that Kepler and GTX 650 Ti Boost just don’t do that well in most compute scenarios. 68K points is enough to tie the 6870 of all things, itself not a particular good compute card. Otherwise the bar is set by AMD at over 100K points.

This is probably the first Kepler part Nvidia has launched so far that actually comes off looking like a good value. It's probably where price:performance should've been a year ago, but it has taken nearly a full year for 28nm prices to trickle down to this point. Still, it's pretty amazing how much Nvidia has milked Kepler. They now have 7-8 SKUs (not counting OC variants) in this sub-$300 market based off of 3 ASICs (GK104, GK106, GK107). Reminds me of that Mickey Mouse cartoon where they keep slicing off razor thin pieces of bean. At least this part makes sense however and fills a pretty cavernous void in that $150-$200 range between the 660 and old 650Ti.

Valid point to be made however about the huge disparity in gaming bundles. AMD really is kicking Nvidia's teeth in with their gaming bundles of late. Nvidia's F2P bundle stinks compared to AMD's recent offerings of Crysis 3, Bioshock Infinity, Tomb Raider etc. In a $150-200 market where one can easily account for 1/3 to 1/4th of the sticker price as a hot AAA game, the perceived bundle value does matter. I'm sure it helped the 650Ti with AC3, but that card was a bit underperforming relative to even last-gen cards. The cards in the $150+ range are much better performers, actually providing tangible upgrades from most last-gen parts in this range (GTX 560, 6850 etc).Reply

Spot on about the huge disparity in the game bundles. In the last two months I've picked up a 2GB 7850 and two 7870s. Without Never Settle Reloaded I honestly probably wouldn't have bought any of them. Sold two of the bundles and kept one.Reply

Good card compared to the 7790 but the 7850 2 gb is still a better buy. The two games you get with it and the fact that if you overclock the 7850 is going to eat the 650 ti boost (the 650 ti boost does not have much overclocking room at over 1050 mhz vs the 860 of the 7850). Competes much better in the low end (1 gb) than with the higher end. Reply

WHY didn't they just drop the price of the GTX660 to like 170 MSRP? I mean, if they're just fusing off part of the card, their cost is the same, if not higher due to whatever labor is involved in fusing off that SMX. This, IMO is a card that shouldn't even exist. The GTX660 is priced far too high for the performance offered. Random FPS hickups or no, all my recommendations are AMD until Nvidia stops pricing themselves out of competition. This, coming from someone who was, for a long time, Nvidia only ever since I had 3 horrid experiences with ATI in a row, back in the day.Reply