Hi, I am doing some simple OpenCL tests and i found that my kernel code compiles faster on Nvidia GPU (GeForce GTX 295) rather than AMD GPU (Cayman).

I am using a separate .cl file of 533 lines, containing only one kernel. This kernel uses 1000 iterations of an algorithm. My program works as expected on Nvida card (and takes 0.37 secs ), but on AMD card (it takes more than 25 mins and aborts by displaying UNREACHABLE executed! while building).

When i reduce the number of iteration to 10, kernel works as expected on AMD card but still it takes comparable more time to build. (on AMD card it takes 6min 29secs, and on Nvidia card it takes 20 secs)

Hopefully the AMD OpenCL driver writers know about the issue and fix it at some stage.

chippies

12-19-2012, 06:52 AM

Gopal_HC are you using loop unrolling on that large loop? It seems a little odd that just changing the number of iterations would affect the compile time unless the loop is being unrolled. If it is being unrolled, that could also explain the error as unrolling that loop would result in a very large piece of source code.

Gopal_HC

12-28-2012, 12:19 AM

Hi chippies,

Thanx again !!

Yes i was using loop unrolling for the large loop. After removing the loop unroll pragma from that large loop, my code is compiling well on AMD card. Basically it made the compilation faster.

Then why it was compiling fastly on Nvidia card even with loop unroll pragma?

chippies

01-02-2013, 04:31 AM

Nvidia has spent many more years on the various parts of their compiler architecture than what I think AMD has, hence I am not too surprised that AMD's compiler is slower.