The Khronos Group - a non-profit industry consortium to develop, publish and promote open standard, royalty-free media authoring and acceleration standards for desktop and handheld devices, combined with conformance qualification programs for platform and device interoperability.

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

// And also a command queue for the context
cmd_queue = clCreateCommandQueue(context, device, 0, NULL);
}

#pragma mark Program and Kernel Creation
{
// Load the program source from disk
// The kernel/program is the project directory and in Xcode the executable
// is set to launch from that directory hence we use a relative path
const char * filename = "example.cl";
char *program_source = load_program_source(filename);
program[0] = clCreateProgramWithSource(context, 1, (const char**)&program_source,
NULL, &err);

Re: OpenCL on MacBook Pro with NVidia 320m

What information do you have saying the CPU is faster than your GPU?

Also, that message is a good message - it means that you have initialized your hardware correctly.

The main thing I see that might be problematic is the number of memory accesses you have in the kernel. IIRC, The global memory can be fickle, depending on the implementation as to where it is actually being stored. Your implementation might be storing the global memory in the main RAM, instead of on the GPU, thus causing a huge amount of communication to occur. <Note, I could be wrong on this - David.Garcia should clarify me on this>.

This more or less would store a value of a[] and b[] for that gid point on the card. Thus, it would minimize the number of accesses you are doing. If the communication overhead is really your issue, this should take care of it.

I don't see anything else glaring at me for this issue. I'll take a look at it tonight when I am at home and can run it on my non-Mac desktop to see if I can reproduce your slowness.

Re: OpenCL on MacBook Pro with NVidia 320m

A good compiler will optimize for those memory accesses and use a local variable as HolyGeneralK suggested, but i wouldn't try to rely on that. If, however, the low-level vm code is making those accesses on-the-fly and executing all these computations sequentially, it wouldn't surprise me that the CPU version is faster, as memory accesses are cheaper and sequential functions more efficient on CPU than GPU.

Re: OpenCL on MacBook Pro with NVidia 320m

thanks for the very thorough follow-up! glad the suggestions helped.

also, since you're on NVIDIA, you might want to try the loop unrolling extension. It would take way too long to fully unroll the 100000 iterations, but the extension can divide it into N unrolled "chunks". The tradeoff is compiled kernel size and compilation time vs a potential speedup of avoiding conditional statements. GPUs like unrolled kernels, CPUs are optimized for conditionals and sequential loops.