However I know that there are some differences when targeting ATI ... the warp/wavefront size is different, I think, and I've also heard that you gain performance by using vector types, which is not the case when programming on NVidia.

I'm in grad school right now, which I guess means I've just joined the ranks of the un-dead.

FreakSoftware Wrote:I wish I had some tips for you, but I don't. I only ever figured out how to use it and write some things with it, but never got to trying to tune for specific vendors.

I had to program in CUDA a lot for a course in parallel programming, which is where I learned to target NVidia hardware. The best practices for OpenCL on Nvidia was essentially written as a copy paste job from the guide for CUDA. So I actually know way more about targeting Nvidia hardware than I would care to now ...

To answer my own question over a year later, AMD now does have an OpenCL programming guide which includes everything you need to optimize for their architecture. Since AMD abandoned its own standard in favor of OpenCL it may be destined to become the most significant OpenCL supporter in industry. My how things change ...