Caustic Graphics tilts at real-time ray-tracing windmill

A new start-up announces a dedicated add-in board for accelerating real-time …

If I were to put together a list of forward-looking articles that I've written that have turned out to be completely wrong, this 2006 piece on the coming real-time ray tracing (RTRT) revolution would be at the top of it. That's what I get for mouthing off authoritatively about something outside my wheelhouse, and the RTRT vs. rasterization debate fits that description (it's in an adjoining wheelhouse, but I still have to leave mine to get to it... and when I wander in there I typically break something).

So when I recently learned about a new startup called Caustic Graphics that aims to bring RTRT to the masses with a dedicated add-in accelerator board, I was immediately skeptical; I also resolved not to write about the company's prospects without first chasing down sources and checking into forums to see what people who are more up on this than I am thought of it.

Here's what I learned: with the sole exception of Jon Peddie, who's so excited about Caustic that gushing quotes from him make up a big chunk of the company's launch-day press release, skepticism reigned among most informed observers who weighed in via my inbox and in forums like Beyond3D. And for good reason: to do something truly interesting with ray-tracing (i.e., something superior to what you can do with rasterization) is very hard, and most of the bottlenecks are on the bandwidth, not the compute, side of the hardware problem. And furthermore, the visual results often aren't better than what rasterization plus a collection of hacks can get you.

But in spite of the technical challenges and dubious merits, a PC add-in board dedicated to RTRT acceleration is one of those bad ideas that everyone who follows the graphics market from the sidelines—and a few who are directly involved in graphics—has at least contemplated silently at some point. The logic is that a dedicated coprocessor on a daughtercard worked for the SGI rasterization pipeline, so the same type of thing should work for RTRT, as well, if we can only figure out how to do it. Indeed, in the misguided post linked above, I reference an add-in board prototype for RTRT that some researchers were trying out. But in solving this problem you inevitably end up with a many-core solution where bandwidth is the gating factor.�

Then there's the economic challenge associated with any type of application-specific coprocessor. I described these challenges in detail in this article on Ageia, another company that tried (and failed) to produce a custom, many-core coprocessor on an add-in board, and it's mainly an issue of volume. It's very tough to sell enough of these to make them profitable, �especially when the value proposition isn't jaw-droppingly apparent to every gamer who sees a demo video, as was the case with the original 3Dfx Voodoo.

The game and movie studio market that Caustic appears to be courting with this is way too small to support a custom coprocessor, especially one that's likely to be a chip multiprocessor with substantial high-level similarities to Intel's mass-market Larrabee GPU. And if Caustic thinks that there is currently a place in the evaporating high-end gaming hardware market for their part, I'm sure NVIDIA has a bridge to sell them. In short, even if Caustic can do what they claim to do, it's tough to see who will buy their product in large enough volumes to make it profitable.�

19 Reader Comments

As an occasional 3D artist, I've been hoping for one or more of the apps I use (Strata, Lightwave, Swift3D) to gain 'access' to the powerful GPUs now available to speed up rendering. There have been some tentative steps in that direction.This would be attractive to me if it were below $500, but I highly doubt it will be less than $1000. Even if it is a reasonable price, I always seem to be using the apps that aren't at the top of the list, and would probably have to wait years before they would support it.

Yeah, my first thought was how Larrabee would make this product moot. If Intel can figure out how to make good raster drivers for Larrabee, maybe they can do the same for this piece. Or if Larrabee makes RTRT more common, someone like Nvidia will considering buying Caustic and add the tech to their existing raster cards.

The holy grail in 3D rendering has never been ray-tracing, except for the uninformed. As the article pointed out, there are so many issues with RT that it's very impractical. Things like (soft) shadows for example are impossible without RT unless you fake them using various hacks. Lighting? Scene (global) illumination? All things RT struggles with and often fails at.

The real holy grail is called Photon Mapping and is somewhat like RT, except it maps every photon from every light source in the scene to calculate shadows, illumination and so on. As it's a carbon copy of what happens in reality, there are no flaws with it other than huge processing requirements. Ironically the easiest short-cuts to make something look close to PM are found in rasterization techniques.

In other words I'm failing to see what Caustic is trying to accomplish here.

RTRT is vaporware...For years there've been claims that some special hardware could make current quality graphics possible with raytracing. But whenever they come out, the quality is no longer "current". Classic 3D stays steps ahead.

There is no breakthrough in sight, either. Both types benefit from massively parallel architectures. What would tip the scale for RT?

Originally written by John Carmack:Mark Peercy of SGI has shown, quite surprisingly, that all Renderman surface shaders can be decomposed into multi-pass graphics operations if two extensions are provided over basic OpenGL: the existing pixel texture extension, which allows dependent texture lookups (matrox already supports a form of this, and most vendors will over the next year), and signed, ﬂoating point colors through the graphics pipeline. It also makes heavy use of the existing, but rarely optimized, copyTexSubImage2D functionality for temporaries.

That was a post from his .plan file in 2000. You can check it out for yourself here.

So, the ability for custom ASICs like GeForce, et al to do in software what Caustic is marketing in hardware is nothing new. What they can offer are performance and quality. A word on the latter for a second. Does anyone here know enough about raytracing to say whether single precision floats are enough to do the scattering calculations from the ray hitting the first surface? How about the second surface?

The reason I bring this up is that Larrabee is widely rumored to be single precision also. If true, this puts them in the unenviable position of keeping the performance crown away from the proverbial 800 lbs. gorilla. Giving them the benefit of the doubt that there actually is a demand strong enough to turn a profit -which is a big question, in this case- it seems to me that they've shot themselves in the foot by keeping the precision as low as they have here.

I do 3D as a hobby (LightWave), but am certainly no expert on the tech. The thing that piques my curiosity about Larabee is that if it is using multiple x86 cores, it "could" (?) be simple to turn them into multiple render nodes.

SO1OS said...So, the ability for custom ASICs like GeForce, et al to do in software what Caustic is marketing in hardware is nothing new.

Decomposing PRman (Renderman) passes into GPU shader passes doesn't really have anything to do ray tracing. PRman (especially at that time) isn't a ray tracer to begin with it's a REYES renderer that has added more ray tracing and GI support over years (originally relying on BMRT for ray tracing support).

quote:

Does anyone here know enough about raytracing to say whether single precision floats are enough to do the scattering calculations from the ray hitting the first surface? How about the second surface?

Yes, although you'll likely hear different from others.

quote:

Elledan saidThe real holy grail is called Photon Mapping and is somewhat like RT, except it maps every photon from every light source in the scene to calculate shadows, illumination and so on.

This holy grail doesn't exist. Photon mapping isn't a any holy grail, it's just another GI algorithm amongst many to use in a renderer's toolkit (and often brought up by people who try to seem more informed than others). Photon mapping by itself doesn't produce much of an image. It's just something that's used (usually in a ray tracer, although can be used in scanline renderers as well) to compute the ambient term.

Realtime raytracing will never catch on. In gaming, it makes far more sense to use a baked engine like Turtle and spend your cycles on post effects (look at Killzone 2). Bouncing rays is only one part of a good raytrace - getting ambient occlusion (real AO, not screen-space), global illumination, SSS and caustics in realtime is a pipe dream. It takes 15 minutes to get a 512x512 Cornell box image out of Maxwell (the best unbiased renderer) on a monster desktop and even if they could speed it up, what happens when the company goes out of business and Maxwell's engine gets updated?

Another thing about gaming: Deferred renderers like Killzone 2 can use thousands of lights in a scene because it renders lighting as a final pass and only lights unoccluded pixels. Try 1000 lights with a realtime raytracer.

Originally posted by BEIGE:Another thing about gaming: Deferred renderers like Killzone 2 can use thousands of lights in a scene because it renders lighting as a final pass and only lights unoccluded pixels. Try 1000 lights with a realtime raytracer.

A raytracer that is actually calculating light bounces does not need to have 1000 lights in a scene; only as many lights as there are actual light sources. The reason games need so many damn lights is because these lights don't bounce, and thus can't illuminate anything that is not in direct path of the light. Of course, modern hardware makes these simple lights very fast.

Two words - procedural generation. Ray tracing - if done correctly, can provide high quality images with shadows and reflections of procedurally generated art. So the question is not whether ray tracing can be more efficient or provide a better image than other forms of rendering images but whether it makes overall game design much cheaper and enables things that cannot currently be done due to cost of development.

Studies I have seen show that custom built hardware can blow away attempts to perform ray tracing algorithms on conventional CPUs/GPUs (just look at some of the recent state of ray tracing papers). So I could see Caustic making RTRT possible. The problem is that unless it is widely adopted physically (which seems unlikely as pointed out in this article), it is never going to take off with developers. About the most you could hope for is that a console maker decides to include it (or some version of it) in the next gen console. Sony seems to be one that pushes for the most outlandish specs so I am hoping Sony takes the plunge. And again, it is not because the ray traced game will be prettier – it is because the ray traced game will potentially allow the developer to make a more expansive game for less. Of course, improvements in AI would also help but I am more skeptical of AI improvements so this is about the only thing I see possibly moving the cool 8-10 hour game back to, IMO, a more proper 20-30 hours (and keeping epic RPGs at 40-80 hours). Anyway - until we see more this is merely interesting. But reducing the cost of ray tracing is good for everyone because it can still be used to render video frames even if it never makes it into games. So go Caustic.

Elledan and BEIGE is spot on, RT in itself is pretty much a useless tech demo of something that both acts and looks rather unrealistic.

Simple "hacks" added to rasterization usually gives much better and much faster visual results that look more real - though it requires artists to do it. Placing lots of different lights all around to emulate bouncing aren't as easy as just placing real light sources and have the computer calculate the rest...

...but with traditional raytracing this calculation lacks a lot of the real world physics to actually produce any noteworthy result. You need to add in radiosity, caustics and a lot of other "hacks" to get anywhere near. If the RTRT card doesn't support all these RT hacks as well - it's pretty much useless except for rendering clean conceptual CAD-drawings.

Steve Sheldon - I agree that procedural generation is a nice idea and the time that simply rendering in realtime would free up in theory is nice but gamers will not settle for a drop in quality. Bragging about one REAL caustic effect and two light bounces in a scene isn't going to yield the kind of results that only come from a very slow baked occlusion pass and global illumination. Look at this and then convince gamers that they don't want baked lighting, they want it easier for developers:

Well one thing people consistently gloss over is that hybrid rendering pipelines are the solution to almost every problem. People keep talking about ray tracing and rasterization as if they are mutually exclusive, but no one uses one or the other, except for people conducting tech demos, or making video games. Look at Hollywood. PRMan has been a hybrid renderer for years. Whatever gets the job done the fastest and cheapest, with quality results, will be used. Ambient occlusion and reflection occlusion passes are done separately and then projected through a light shader, or composited afterwards, with diffuse renders.

That being said - a specialized card is not going to gain any traction. Especially in our current economy. They're probably just hoping for an acquisition of their tech by a larger company. Specialized raytracing hardware did not work for SGI and it's not going to work for them.

Beige - completely agree that gamers won't go for reductions in quality. But I don't think a very nice lighting effect is completely out of the realm of possibility for RTRT. Because if done with all the listed "hacks," ray tracing can look very pretty. So it is all about finding the right balance of algorithms and hardware. I expect that sometime between 2011-2013 we will be able to do it. But your link makes me wonder if game engines will continue to improve to the point that the benefits of RTRT are minimal, even for the developer. We shall see.

Sigh. Misunderstandings abound on this topic. It's not a simple question, and all this pro-skub anti-skub stuff gets in the way of understanding.

I'm surprised that RT is quoted as being disadvantaged in bandwidth compared to rasterization. As a generalization, RT consumes bandwidth proportional to the desired visual quality (ie the number of rays sampled), whereas rasterization requires bandwidth proportional to the total number of primitives in the scene. Deferred rendering helps the constant factors but doesn't change the overall complexity. Also, the issue is somewhat blurred by RT requiring an acceleration structure, which is in essence 3d rasterization. However, unlike rendering, this structure can typically be retained and updated between frames.

quote:

Things like (soft) shadows for example are impossible without [sic] RT unless you fake them using various hacks.

Soft shadows are straight forward with RT, merely requiring multiple shadow sample rays distributed over the area of the light. Wether this is faster than rasterizing a shadow map depends on comparing constant factors.

quote:

Lighting? Scene (global) illumination? All things RT struggles with and often fails at.

Actually all are quite easily integrated into the RT framework. This is one of it's great advantages: arbitrary visual effects are generally quite easy to describe in terms of sampling rays, unlike rasterization where various effects often require quite different ways of passing over the data and then merging those passes. This is also one of RT disadvantages, because tracing individual rays is rather expensive compared to rasterizing a primitive.

quote:

The real holy grail is called Photon Mapping and is somewhat like RT

Actually, it's not. There is no holy grail. The most general approach is path tracing and assuming Moore's law holds, will be viable in RT eventually (ie, it's in the right complexity class). That said, I wouldn't bet on seeing it any time soon.

quote:

As it's a carbon copy of what happens in reality, there are no flaws with it other than huge processing requirements. Ironically the easiest short-cuts to make something look close to PM are found in rasterization techniques.

It's not a carbon copy of what happens in reality. The closest copy of what happens in reality is forward photon tracing, which is extremely expensive as few photons hit the sensor. Adjoint photon tracing inverts the math and helps the situation but still wastes a lot of effort as adjoint photons must happen to hit a primary light emitter.

Photon mapping is a biased estimate: in a very specific mathematical way it will actually not match nature. That said, we don't care, because it's quite fast as an estimator compared to full path tracing and looks great.

But photon mapping is not the only reasonable approach for real time GI in the coming decade. Instant radiosity can provide very good results, and both suffer from similar drawbacks of inter frame shimmer, etc that must be addressed. Tensor clustering instant radiosity is easily the most interesting and promising approach I've seen recently, and is quite gpu friendly.

quote:

For years there've been claims that some special hardware could make current quality graphics possible with raytracing. But whenever they come out, the quality is no longer "current". Classic 3D stays steps ahead.

This is true, but has more to do with market effects than purely technical differences between the algorithms. Existing games are written to and optimized for API's that largely assume rasterization. It will be quite difficult for any company to break in because of this. Other niche markets may be more receptive, such as visualizing large cad models out of core.

quote:

What would tip the scale for RT?

From my armchair, two things. Firstly, games being structured in a way that's RT friendly. iD tech 6 will likely be the first of these. Depending on it's success others may follow. Secondly, the cost of bandwidth in and out of chips.

In all likelyhood what will happen will not be that one purely wins. Rather, as gpu's gain a more general programing model and are more closely integrated with the primary cpu, hybrid methods become reasonable and most attractive, despite their complexity. This is the same path offline rendering took, and if that's any indication, it will likely end in some form of path tracing eventually "winning" in the far distant future. In the mean time though, combinations will abound. One interesting one that may be viable on Larrabe is non-uniform rasterization, which makes shadow maps dead simple. With simple fast shadow maps, instant radiosity starts looking more attractive.

quote:

That was a post from his .plan file in 2000. You can check it out for yourself here.

Note that the market actually made a different choice, and in some ways Carmack miss-predicted this in the early stages of the doom 3 technology. The market chose rich shaders over the bazillion passes model from Stanford. A simplified way to look at this is as two ways to arrange a loop in rendering: for each shader step, for each primitive vs for each primitive, for each shader step. The latter won because ALU's are cheaper than bandwidth. And in fact the trend towards deferred shading shows this shift still has momentum.

This observation about multi pass evaluation of PRMan shaders no longer has any relevance in the current market.

quote:

Does anyone here know enough about raytracing to say whether single precision floats are enough to do the scattering calculations from the ray hitting the first surface? How about the second surface?

There can be artifacts and 64bit is certainly better if it's free. If it's not free then the tradeoff depends on the context, but 32bit is generally favored.

quote:

It takes 15 minutes to get a 512x512 Cornell box image out of Maxwell (the best unbiased renderer)

This is really an unfair comparison. Maxwell is an offline renderer that specifically targets fidelity to physics over performance. Maxwell's performance today really has little to say about the balance of performance between real time algorithms over the next decade.

quote:

Bouncing rays is only one part of a good raytrace - getting ambient occlusion (real AO, not screen-space), global illumination, SSS and caustics in realtime is a pipe dream.

AO need not be faked: just fire a couple rays at the skybox. GI is more costly of course but it's not a pipe dream. Biased estimates like photon mapping are close to real time performance already, and would particularly help with caustics. Of course, rasterization has some untapped approaches here too such as Teller's instant radiosity.

quote:

Deferred renderers like Killzone 2 can use thousands of lights in a scene because it renders lighting as a final pass and only lights unoccluded pixels. Try 1000 lights with a realtime raytracer.

Actually they don't. "Lighting" is not a final pass. Each light is a final pass. Because of this they use a limited number of local lights and implement a lot of hacks specifically to optimize small distant lights. One advantage of raytracing is that small distant lights are quite cheap already. If further optimization is required there are equally useful hacks available such as light cuts.

quote:

RT in itself is pretty much a useless tech demo of something that both acts and looks rather unrealistic.

Simple "hacks" added to rasterization usually gives much better and much faster visual results that look more real - though it requires artists to do it. Placing lots of different lights all around to emulate bouncing aren't as easy as just placing real light sources and have the computer calculate the rest...

Offline production points against this argument. Path tracers have steadily gained exactly because of the artist burden. Even a tool that would place bounce lights automatically would be vastly preferable to the "just let the artists do it manually" argument.

quote:

Bragging about one REAL caustic effect and two light bounces in a scene isn't going to yield the kind of results that only come from a very slow baked occlusion pass and global illumination. Look at this and then convince gamers that they don't want baked lighting, they want it easier for developers:

You're asking the wrong question: will hardware become fast enough to moot baking? Or put a different way, will baking always have such better visual quality compared to real time lighting that it's worth bakings' disadvantages? On top of that, ask yourself how bandwidth is going to scale vs calculation, and in the case of baking GI and light probes, particularly what last mile bandwidth is going to be like as games move away from physical media. I think baking of some form will be important, but I also think there are clear advantages to moving calculation from baking to runtime as hardware allows. How fast that shift happens is a tricky question indeed.

just one point about the bandwidth issue, surely the solution here would be to have a dedicated chip that takes the rendered frames and compresses them to some kind of compressed format like mpeg2 or something? also, more memory onboard if scene data is too large, so the board can render any map it has loaded in.

it's a worthy project but with the rasterising accelerators dominating the market it's gonna be tough to get something that really makes so much difference it's compelling.