GPU renderers proliferate, show newfound maturity

GPU renderers have hit the market in numbers large enough to make them a …

Despite its very limited shader model and lack of area lights, beta 2.2 of the CUDA-based Octane makes my ZBrushed faux-vinyl lettering look physically realistic with very little set-up.

As SIGGRAPH 2010 winds down, one thing has been obvious: GPU rendering has matured quickly. GPU-based rendering initially got a bad name because public attention has been mostly on real-time ray-tracing implementations for games, where corners are cut to keep frame rates high. In real-time rendering schemes like those shown by Intel, light bounces were limited, they lacked color bleeding, and ambient occlusion (a key component of realistic rendering) was also AWOL. The end result looked like something from a raytracing white-paper from the early '80s: flat, lifeless images that couldn't compete even with games like Uncharted 2 that used straight-up OpenGL with a combination of tricks like baked lighting and screen-space ambient occlusion for realism.

Over the last couple years, with help from CUDA and OpenCL, GPU renderers have steadily progressed to exploit the speed of the GPU without sacrificing rendering quality. Now it seems we're spoiled for choice. There were a few on display here at SIGGRAPH, but the growing GPU renderer list is already impressive: iRay, Arion, Furryball, Octane (which I often use if I want a fast and stylish render, as seen above), V-Ray RT—and there's even the free and open-source GPU version of Luxrender. There are probably others that I'm missing—it seems like a new GPU renderer is coming out every month.

In sharp contrast to the "do the bare minimum" real-time raytracing systems, most of these CUDA and OpenCL GPU renderers are "unbiased." This is a catch-all term for a renderer that doesn't approximate anything—it attempts to do everything that light does all at once and doesn't pull tricks with limited sampling. No, they're not real-time and their "unbiased" quality still depends on the engineering, but they are typically much faster than CPU-based unbiased solutions and they produce very good results very quickly.

V-Ray RT was on show at SIGGRAPH and it's one of the more advanced GPU renderers, with area lights. It also uses the same material as V-Ray, so there's no redundant work switching from the software renderer.

You can't see it in the scaled-down screenshot above, but the elapsed render time running on three GTX 480s was 0.9s—even with the existing noise, that's very impressive. V-RayRT will be out in fall and available as a free addition to V-Ray in a service pack for both Maya (on all platforms) and Max for Windows. It's based on OpenCL.

The photographic tone-mapping is also done by the Octane Renderer. At €50, it's not hard to make Octane pay for itself.

Conventional unbiased renderers will still be around for a while.

The current feature set available in GPU renderers, even in the most complete solutions, is not as flexible or robust as conventional renderers. The GPU is very fast, but very dumb compared to a CPU, so implementing complex features in a way that can be processed by a GPU will take a lot of work and time. If you compare the unbiased GPU renderers to Maxwell—the gold standard of unbiased CPU renderers—there are still many basic things lacking from the GPU renderers:

Caustics (the effect light produces after refracting through a material and hitting a surface)

There are plenty of other things that GPU renderers lack, but you get the idea: GPU renderers aren't at a stage where they can be considered complete solutions on their own, and lacking something as crucial as caustics shows how ambiguous the term "unbiased" can be. Many GPU renderers are integrated as complements to existing software renderers, like iRay and V-Ray RT. It's hard to tell how long it will take for them to get these crucial features—I know that Furryball recently tackled subsurface scattering and displacement but many things like floating-point render buffers pose significant problems for GPUs and their comparatively limited memory.

By next year's SIGGRAPH, the landscape will likely look very different.

This wraps up our SIGGRAPH 2010 coverage. Stay tuned for the second part of our 3D on the Mac series, coming within the next couple of weeks. We'll be covering animation, dynamics, and a look at the differences and advantages of the major 3D renderers.

That's really cool. The GPU programming scene really is taking off these days. However, I see no reason why there should ever be a complete GPU renderer. Certainly it is good to make sure all algorithms in a renderer are at least implemented both ways, but some algorithms may never favor a GPU architecture. And even if all ray-tracing algorithms do, it is still wasteful to let a large, powerful multi-core CPU sit there idle. The ultimate renderer would spread the work as efficiently as possible across all the compute resources available, giving more serials tasks to the CPU and more parallel tasks to the GPU/dedicated coprocessor.

There are plenty of other things that GPU renderers lack, but you get the idea: GPU renderers aren't at a stage where they can be considered complete solutions on their own, and lacking something as crucial as caustics shows how ambiguous the term "unbiased" can be. Many GPU renderers are integrated as complements to existing software renderers, like iRay and V-Ray RT. It's hard to tell how long it will take for them to get these crucial features—I know that Furryball recently tackled subsurface scattering—but things like displacement (which balloons the size of polygonal meshes) and floating-point render buffers pose significant problems for GPUs and their comparatively limited memory.

If there is a sufficient market for this (and I suspect there will be; currently the silicon is more advanced than the gaming graphics engines and it's always been viewed as "just a matter of time" before it happened) there will be a product that addresses these issues.

Gaming will continue to drive progress in the graphics market because, let's face it, if a single card can produce extremely high-quality renders in a short amount of time, the industry won't need very many of them. Ray-tracing has always been "just over the horizon" as far as gaming render engines are concerned, but the silicon will get there. It may eventually become a moot point; the GPU is moving into the CPU and as soon as someone does it right it will rocket real-time ray-tracing to the forefront of gaming.

Intel is on the right path, but IMO they need to purchase NVidia to make real headway on the GPU front. There may be some antitrust concerns there, but I think they can be overcome (especially in light of AMD/ATI.) If NVidia could match their prowess with GPU design for Intel's engineering talent and fab process, it could produce a real winner.

The ultimate renderer would spread the work as efficiently as possible across all the compute resources available, giving more serials tasks to the CPU and more parallel tasks to the GPU/dedicated coprocessor.

It's pretty easy to do that right now with OpenCL, which can treat your CPU cores as additional openCL devices.However that doesn't really make maximum use of the CPU and its strong points compared to GPUs (large memory maps, better support for branchy code).

Wait, isn't the GPU's primary function to render. I could be going crazy but I thought games been doing GPU rendering ever since the 3dfx vodoo came out.

So does a renderer in this context mean rendering application like Blender?

Yes. As was mentioned in the beginning of the article, games "cheat" and use all kinds of hacks and workarounds to produce something that approximates reality, but doesn't even try to simulate it like professional renderers do.

I take issue with this. Ambient occlusion is in fact not a part of realistic rendering at all. There is no physical process that works anything like ambient occlusion. It's just a trick that's popular because it looks nice and is much faster to calculate compared to real caustics/radiosity/global illumination/whatever you like to call it.

I take issue with this. Ambient occlusion is in fact not a part of realistic rendering at all. There is no physical process that works anything like ambient occlusion. It's just a trick that's popular because it looks nice and is much faster to calculate compared to real caustics/radiosity/global illumination/whatever you like to call it.

Maybe I'm just picking nits.

I will join your monkey clan and pick nits with you. AO is a hack. Radiosity is probably the holy grail here, combined with the good old Kajiya light equation.

The bigger problem with realistic computer graphics is not the raw fill rate of shaders, rather it is a problem with database queries. Geometric intersection testing is the real killer here, and I think hardware vendors should focus on accelerating that. I think that will be the "next big thing" along with increased use of procedurally generated assets.

That's probably got nothing to do with the GPU - it's just a fluid solver. Have you seen Realflow? That's the industry standard for fluid effects like this. It's also by Next Limit (the Maxwell devs). I used it for this:

Apart from the free demo of Octane (refractivesoftware.com), Bunkspeed shot (iRay based) and SmallLuxGPU, there isn't anything from the show that's free.

Quote:

I take issue with this. Ambient occlusion is in fact not a part of realistic rendering at all. There is no physical process that works anything like ambient occlusion. It's just a trick that's popular because it looks nice and is much faster to calculate compared to real caustics/radiosity/global illumination/whatever you like to call it.

Well, I think it's crucial to factor in ambient occlusion for realism. The problem is that most software (unbiased) renderers require you to do it as a pass and composite, so it's a hack. In an unbiased renderer the phenomenon of ambient occlusion (light attenuation due to self-occlusion) is part of the light/GI/color bleed computation so you don't see it as a simple dark dirt pass. I'll admit I'm not 100% sure on this so I'll look into it more and revise the article if it's misleading.

Quote:

Did they show you the rendering without the noise? It's a little suspicious, because noise can be used to hide quality issues.

The V-Ray render got very clean after about 15 seconds. I've seen it in action before - it's very good.

How do these implementations compare to the, I believe it was in Boston, demonstration by Rapidmind done with a Cell proc in '06? I read a piece about them in Spectrum and they included a pic of the result (the front of a car, IIRC), which looked, IIRC, utterly real.

Intel is on the right path, but IMO they need to purchase NVidia to make real headway on the GPU front. There may be some antitrust concerns there, but I think they can be overcome (especially in light of AMD/ATI.) If NVidia could match their prowess with GPU design for Intel's engineering talent and fab process, it could produce a real winner.

Intel is on the entirely wrong path and their CPU/GPU combo parts are terrible, horribly limited products compared to even their competitors lowest-end discrete (and even integrated) parts of the last generation or two. Intel buying nVidia would probably do nothing but gut nVidia and leave them a shattered shell of what they used to be.

How do these implementations compare to the, I believe it was in Boston, demonstration by Rapidmind done with a Cell proc in '06? I read a piece about them in Spectrum and they included a pic of the result (the front of a car, IIRC), which looked, IIRC, utterly real.

How do these implementations compare to the, I believe it was in Boston, demonstration by Rapidmind done with a Cell proc in '06? I read a piece about them in Spectrum and they included a pic of the result (the front of a car, IIRC), which looked, IIRC, utterly real.

it's hard to say how it compares - from the looks of my quick glance, Rapidmind seems like it was designed as an OpenCL type language before OpenCL was announced - it's not a dedicated renderer. They just did a demo of raytracing on it as a distributed streaming example. Anyway, I haven't seen any renders or seen it in action so I can't say anything about how it compares.

So does a renderer in this context mean rendering application like Blender?

First of all that, second these are specifically ray tracing renderers, which is a very different technique from what games use.

Yes, but I don't remember just renderer becoming synonymous with ray tracer, or offline professional renderer. I am just saying for most people "GPU renderer" is a tautology :-)

I know. Just like the modern predilection to talk about "3D games", which somehow no longer means games rendered in 3D, but things like the 3DS where the screen has some visualization kludge that makes the 3D world look more 3D to us.

I was talking with someone the other day who had mentioned that the new iMacs now use all ATI graphics and he was bummed because he can no longer use some CUDA based rendering apps.

Until OpenCL matures and developers start to port over, at least for the time being, you'll have to stick to the Pro gear to use CUDA on the Mac. Not a big deal for pro's but kind of a bummer for tinkerers.

It depends on the program. V-Ray RT is high-end and uses OpenCL so it's not a matter of OpenCL maturity (V-Ray RT has a more complete feature set than Octane, which is CUDA based). The main reason that most renderers are using CUDA is because it was the first GPU compute language that would allow this type of thing to be done and NVIDIA was faster out of the gate with a lot of stream processing cores.