Compute Performance

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home is moving exclusively OpenCL this year with FAHCore 17.

Our 5th compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Initially I presumed this to be some "Optimus"-esque dynamic context switching power saving routine. However, the patent explicitly states, "This architecture lets a user run one or more GPUs in parallel, but only for the purpose of increasing performance, not to reduce power consumption."Which struck me as some kind of expansion on the nebulous "hybrid crossfire" tech that AMD has been playing w/since they birthed the 3000 series 780G igpu

Based off of AMD's previous endeavors in this area on the PC side I would be skeptical of the benefits/merit of pairing the comparatively anemic iGPU's of Kabini w/a presumably Bonaire derived GPU.As an aside; since SLI/CFX work by issuing frames to the next GPU available, if one GPU is substantially faster than the other(s), frames get finished out-of-order and the IGP/slower-GPU's tardy frames simply get dropped which may make the final rendered video stuttery/choppy.

Pairing an IGP with a disproportionately powerful discrete GPU simply does not work for realtime rendering.

It is certainly possible that with the static nature of the console and perhaps especially the unified nature of the GDDR5 memory pool/bank that performance gains could be had

oh well...my last point of arithmetic was simply that 1 fully enabled 4 core Kabini's I'm suspecting would have a 128 shader count igpu. Factor in the much ballyhooed 8-core Cpu in the PS4 we would have two Kabini's (128+128=256) + a Bonaire derived 896sp GPU all on some kind of custom MCM style packaging "semi-custom APU" (rumor had it that the majority of Sony's R&D contributions were in the stacking/packaging dept.)