You have multiple data-dependent loops in your kernel and it is not surprising that the runtime may be variable. To debug this, first get a reproducible case, then use printf to debug the variables that control the loops. I would also suggest unrolling the outermost loop to speed up your kernel.

I have not bad performance for this kernel on HD5xxx GPUs. This behavior is HD6xxx specific, seems you missed this fact. And execution time change from ~10ms to 20 secodns (!) I would not call this "not surprising" Data flow of app can't explain such huge difference.

Originally posted by: Raistmer I have not bad performance for this kernel on HD5xxx GPUs. This behavior is HD6xxx specific, seems you missed this fact. And execution time change from ~10ms to 20 secodns (!) I would not call this "not surprising" Data flow of app can't explain such huge difference.

I think the reason for the timeout in this kernel has to be the data dependent loops in the code. Have you confirmed that it is indeed this kernel that is timing out? To do that, put a call to clFinish right after the enqueue and see if you get past it.

Originally posted by: aheirich Have you confirmed that it is indeed this kernel that is timing out?

I posted profiler data. That kernel took 20 seconds accordingly to profiler. And this was last string that profiler gave. AFAIK watchdog timer setted to 2 seconds. So, either ATi Stream Profiler reports trash (then it's field of work for your profiler team) or it's exactly this kernel.

I got new reports from card owner (it's hard to debug things remotely, but...) - when he placed his GPU in another host there were no driver restarts. Also, card begin to report different core freq - 950MHz instead of 880MHz. Is it possible that on particular kernel GPU power consumption increased a lot then card dropped its freq then kernel took to long and cause driver restart? That is, looks like it's hardware problem, not software one... In old host he was able to reproduce driver restart on the same input data, in new host he can't recive driver restart (under both Cat 11.1 and 11.2) on the same input data.

So there were definitely not data-dependent loops as I already said before. If I get another reproducible report of such failure I'll post here, but for now looks like it more hardware (card was underpowered) issue than software one. Perhaps your driver team could incorporate better power-monitoring logic in driver...

Raistmer, Seems like the power frequency issue is a result of hardware improvements to control performance/power consumption. Power efficiency and power management " Lastly, AMD has also worked on power efficiency and power management. With PowerTune Technology, the GPU TDP is clamped to a pre-determined level. The GPU includes counters across all blocks which are monitored and applied to an algorithm to infer power draw. The core clock is then adjusted dynamically to enforce the TDP level.