When I run a kernel on the GPU, the kernel execution time (taken from clGetEventProfilingInfo) is allways half of the execution time, that I get with gettimeofday on the host. It can't be a setup time because it is always half of the time measured on the host. The rest of the time the GPU seems idle.

I transfer data before measuring (with a blocking clEnqueueWriteBuffer). So data transfer shouldn't be the reason. I tried several kernels, having the same strange behaviors. When I run the kernel on the CPU, profiling time is equal to the gettimeofday time (no problem there). I tried clFinish, clWaitForEvents and waiting and checking on the host with clGetEventInfo. All with the same result.

I'll attach a screenshot of my resent tests with CodeXL. The kernel is called "nest". Very interesting is, that the idle time is always before the actual execution. It's always the same time like the execution. That really strange.