Also is it generally better to use CL_MEM_COPY_HOST_PTR than to use CL_USE_HOST_PTR for speed?

CL_USE_HOST_PTR should be faster in general. However, not in every case the implementation will be able to take your host pointer and map it to the device's address space, which in turn means that there will be copies back and forth.

For best performance across the board I recommend creating images with CL_MEM_ALLOC_HOST_PTR, then use clEnqueueMapImage() to get a pointer to the image data and then overwrite it with whatever contents you want. It takes a few more lines of code but it makes almost sure that the CPU and GPU will truly be sharing data and avoiding copies.

ssyed

05-12-2011, 12:43 AM

I see. I checked it and apparently there is no image support on macs yet.