If I recall, core 15 was Cuda only. These work units did not require a full core to keep the gpu fed with data. Looking at the CPU utilization of the process was around 0.2% CPU time with some variation but relatively small. Now comparing to the opencl cores, they use a full CPU. Does this mean Cuda has some abstraction that opencl does not have? Specifically, abstract in that allow Cuda apps to move those instructions from the CPU to the gpu?

CUDA is mVidia proprietary so they do their best to sell it over any Open alternative. That includes making improvements to CUDA that never make it into their OpenCL package. That also making the cost of a license which ATI might like to buy prohibitively expensive. ATI has focused on enhancing their OpenCL support. From FAH"s perspective, it's more expensive to have one FAHCore for ATI and another for nVidia, though they did do that for Core_15/Core_16.

The amount of CPU used is governed mostly by the way the drivers are written, whether they're from nV or ATI. Drivers which use 100% of a CPU probably get better frame-rates in games than ones which are more sparing in their use of CPU resources. I don't know if that same distinction also applies to FAH performance or not.

We don't get to vote on the issue, but how much degradation in FAH performance on your GPU would you be willing to accept if SOMETHING could be redesigned to minimize CPU utilization?

Like bruce said, I don't think CUDA moved the processing to the GPU, I think OpenCL is doing more processing so the CPU works harder to keep the GPU fed with data. The size of the core_15 work units versus core_21 would seem to support this.

@7imIt may not be proven, but is it not likely that the difference in CPU usage is due to different ways of waiting for the GPU to need data transfers? Particular evidence is that, although the nV OpenCL service thread uses 100% of a core, that core can do significant other work without corresponding damage to GPU ppd. It is known that the CPU is only doing useful FAH work when the GPU needs data transfers. Required evidence to support this: Above a certain non-FAHCore CPU load, when the GPU data transfer gets delayed, is the GPU slowed?