Today we have a new add-in board coprocessor in town. Caustic Graphics has announced their CausticOne hardware and CausticGL API which will enable hardware accelerated raytracing. We are reminded of Ageia's venture into dedicated hardware for physics, but Caustic Graphics seems to be taking a more balanced approach to bringing their hardware to market. The goal is to start at the top where cost is no object and get developers interested in and working with their hardware before they bring it to the end user.

Pixar and other studios that make heavy use of computer generated animation for films tend to have render farms that can take seconds, minutes or even hours to render. With full length films lasting about 150000 frames (plus or minus), that time really adds up. Those that need to render one frame as near reality as possible (say car designers doing preliminary visualization of a new model) can kick off rendering jobs that take days to complete. These guys put tons of cash into their computer systems. Time is money and if Caustic can save these guys more time than it would cost them to buy the hardware and port their software, then Caustic will do well.

The long term goals might have something to do with gaming, but we definitely aren't looking at that option right now. By trying to penetrate the market at the back end like this, Caustic Graphics may avoid the pitfalls we saw Aegia run into. Of course, at this point it is unclear whether or not the end user will even need a dedicated raytracing card by the time the hardware makes it to market. With current GPUs getting faster all the time, CPUs becoming increasingly parallel, and Larrabee on the horizon, there are quite a number of factors that will affect the viability of a part like this in consumer space.

Regardless, Caustic Graphics is here and ready to start making an impact. Their SDK should be available to developers today, with hardware soon to follow. Before we take a deeper look at what Caustic Graphics is offering, let's talk a little bit about the differences between rasterization (what current GPUs do) and raytracing (what the Caustic Graphics hardware will accelerate).

Wouldnt the "clearspeed e710" PCI board outperform this? The hardware is there. This company should focus on the software libraries.

The CATS™ 700 1U rack module comtains 12 of these and delivers over 1 teraflops. With the right software you probably could do near-real time raytracing (seconds rather than minutes or hours per scene). Farm the CATS 700 and yes you could do real time raytracing on a one or two second "lag". That is, use 20 of these things, all rendering separate frames, one frame apart. It will take you a few seconds to build the first scene, but then you will have the rest ready to show real-time.

How to manage this? Well that's the software issue these guys should be working on... NOT reinventing hardware that is already out there.

AFTER they get it working on CATS then perhaps they mighht consider developing their own optimised hardware. But that should come SECOND not FIRST.

If this can appreciably speed up raytracing, it'll find a market in the same demo that buys workstation graphics cards. But the way cpu cores are multiplying, they'll have to hurry up. Nothing loves multicores more than 3D rendering, and once we've got 32-core boxes this tech may be obsolete. Reply

ART PURE/Renderdrive anyone? Its been done, and the last couple times it has been tried, it was cheaper to buy six render nodes that ended up being the same speed, within six months. If its gonna be a year before they are even shipping cards to consumers, i doubt they are going to get *anywhere* Reply

...and no one has a clue what that will be. Comon people, this article was little more than a puff press piece; interesting to read and make geeks giddy, but no actual substance. To be honest, this should be a blog post. Yes the description of the differences between ray-tracing and rasterization are nice and all, but there is no meat to this product at the time being.

So no, it's not 20X faster, or 100X faster, or 2% faster, it doesn't yet exist and until independant testing has been done, I won't believe a word I read. Reply

All the people that claim, this is a bad idea, intel will just copy it an make it run on their CPU. Well people the CPU have other tasks too, if ray-tracing is to be used in games, I would be rather anoyed if I couldnt get physics and AI calculations because my CPU had to do all the rendering work. So a card to off-load some of the computations would surely be a nice addition. Of course then Intel could cooperate with nvidia and offload physics and AI calculations to the graphic card, but that doesn't seems very likely atm.

Also Caustics doesn't claim that this is a production board, infact they state that this is a prototype, and that their final product will use ASICs, and not FPGAs. Furthermore Caustics design does in fact consider the bandwidth requirements for ray-tracing, actually they claim that their algorithms are specially designed to cope with the limited bandwidth, and that this in their major achievement. Personally i think they use some sort of ray-bundling, although this have also been implemented in software ray-tracers they must have invented some new tricks to make it even better.

Another great aspect of ray-tracing is that the frame rate is more dependent on the number of pixels you wish to render, than it is on the number of triangles on the scene, in contrary to rasterization. Reply

Well I don't know much about all this, but I saw their video. The co-founder says that ray-tracing isn't a compute problem anymore, and that they looked at it in a different way. So, I'm wondering, if they're using a new algorithm or something that's making all the difference, can't Intel use their general purpose Larrabee to simulate that? Reply

It is strange to see that someone proposes a hardware architecture that by design is bandwidth limited. They have separated ray tracing at the point where the bandwidth is highest -- whihc seems like not a really smart move.

By doing the ray traversal and intersection on their card and the shading on the GPU they must constantly transfer ray data between the two: transfer the hit point of each ray/pixel to the GPU (point, normal, texture coordinates, shader ID, ...) and then transfer any rays newly generated by a shader back to their chip (origin, direction, min/max_dir, ...). Each of those transfers is easily between 30 to 60 bytes per ray.

So for a HD screen this is easily 100MB for just a single ray generation and only one sample per pixel. Given a PCI 1.0 4x bandwidth of 1GB/s this gives a theoretical maximum of just 5 fps -- and we have not done any work yet. No AA, no shadow rays, reflections, they all generate multiples of this in bandwidth. Even with PCI 3.0 and 16x lanes this will be a huge bandwidth issue.

Compare this with a latest machine used by Casutics with likely Dual Quad-core CPUs giving something 16 cores (with hyperthreading) each one easily providing a multiple of the FLOPS of the old Athlon. So it seems their performance is not even reaching what has already been done 7 years ago.

So in summary, I am not really impressed by what Caustics claims. Their hardware architecture is severely limited by design and their software results are way behind what has been done many years ago. Reply