Quite interesting, I have 4 CPU compute units (my Q6600) and 10 GPU units (the ATI HD4870). It looks like the units are 4-way SIMD and have a very limited amount of global memory (only 128MB) and even a more limited local memory (16K) ... I guess my old ATI is good for demo and not very much else.

jeanphi wrote:You can get extreme speeds with a CUDA path tracer, so I guess it's going to be the same with OpenCL. Unfortunately OpenCL apps currently segfault on my PC, so I'm not yet able to test it.

Psor, I'm very curious too, it is exactly the point of this little scholastic exercise: making a comparison between a CPU and GPU implementation of the same program ... on my PC, not with some video on Youtube.

That's what I thought too, it might be an issue with my 5770 card.Have you seen the video of VRay demo at the last SIGGRAPH? Pretty amazing stuff. I've also seen another GPU path tracer results, and it had confirmed the potential.

I ported smallpt to C, converted from double usage to float, changed from a recursive implementation to an iterative one, etc. but I was too impatient so I decided to rush for a simpler test while porting smallpt: Mandelbrot set. Yeah, I have a huge fantasy. I wrote the first implementation.First impression was "Wow, it works", second was "wtf, it is slow". It was just a bit faster than a single-core implementation on the CPU. Scratched my head for a while than I realized that the code I copied from ATI's samples is quite misleading and I was using only one OpenCL "local thread" for each compute unit instead of 256. Flipped the switch and BOOM !

The test

For testing pourpuse I run at 1024x768 with an insane value for maximum number of iterations: 10000.

mandelCPU

This is just a simple mono-thread CPU implementation (no OpenCL involved). Result:

What the hell ... it is 38 time faster than the single-thread CPU implementation ! MandelGPU is quite amazing to use. I recorded a small video (sorry for the low quality) just to give you the idea of how fast it is: http://vimeo.com/7876686