yesterday I fixed a nasty bug that is only present on AMD platforms with multiple GPUs. To sum up my findings (http://bloerg.net/2013/07/10/amds-opencl-multi-gpu-bug.html): If you build one program separately for each device, enqueueing one of the program's kernels will fail with CL_INVALID_PROGRAM_EXECUTABLE. Not cool, if you want to include different code paths depending on the device architecture.