Compute Performance

As always we'll start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Our next benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Our 3rd benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Moving on, our 4th compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home is moving exclusively OpenCL this year with FAHCore 17.

Our 5th compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Post Your Comment

107 Comments

Well said, and thanks. I no longer visit Dailytech for the same reasons. I enjoy reading comments, since they can offer other perspectives from like-minded people, but unmoderated is worse than nothing at all. This used to be my favorite tech site, but the comments section here has slowly been pushing me to avoid it most of the time.Reply

The 7790 reminds me of the 4770. Sure, that was on a new process node, but it's a late addition to the line designed to take advantage of tweaks, process improvements, etc.

There may be a lot of transistors in a GCN design but I couldn't help feel that there were power savings to be had. For this reason, I'd hope that their next flagship doesn't exceed the 7970GE's power draw whilst providing a decent performance boost.Reply

For those who want the most power in the smallest package and power drain, look no further then the Radeon 7790. The only disappointment was the heat factor, but more or less the same performance as 7850 at half the power; that's great. Also, I don't mind that AMD went the 6 ghz vram route, because now there is even more reason to get 2 GB which is especially needed if you apply a dozens or hundreds of mods to your games. Also its the 128 bit interface that kept the power low, so despite everyone's cussing AMD made the right choices. I have a GTX 460 which easily uses at least 200 watts. This 7790 is almost twice as fast and uses 2.5 times less power. The pricing is acceptable, if you were to include 2GB by default, then why bother with the 7850; they still want ppl to buy that one.Reply

Hey Anand, can you guys please do a video quality test? I mean I haven't seen any such test on any website for over 3 years. So please, can you do a video quality test in movies and games and please also use low quality video as well, not just top of the line 1080p type videos that would look amazing even on a GeForce 3. Reply