I finally got to try CLIJ on my computer. I have a large collection of images to crunch, so I saw a good opportunity to give it a shot and test it out. I have an ASUS VivoBook S, running Window 10, with an nVidia GeForce 940MX (I also have an in-board intel GPU).

The thing is I could not get past the initial tests. When I run a benchmarking test (two 1000-rounds loops, one on GPU, one on CPU), CLIJ takes 10+ times longer than the standard CPU processing, and all the load seems to be on the CPU during the whole test.

Here’s a couple of shots form the results for repeated 2°rotation (I know…) and 10px gaussian blur:

My first guess is that you are working with very small images where the GPU cannot outperform the CPU (blobs.gif is 64kb large). I tried your macro on my Intel UHD 620 GPU and its CPU Intel i7-8650U (the test laptop from the paper but in battery mode) and added a line after loading blobs.gif to test it with a bigger image:

Furthermore, I had to turn down the number of rotations to 100, because the CPU took sooo long

The log window logs these timings then:

The explanation might be: Very small images fit into the CPUs cache which has high access-speed. The CPU does not need to access RAM while processing the image and thus, outperforms the GPU. You can see that a bit in the benchmark plot for the Rotate2D method:

You’re welcome. Regarding the blip in the GPU usage: Rotating images hardly includes computation. In classical image processing in general the CPUs and GPUs are pretty bored. It’s all about reading and writing pixels. Only the RAM is busy. If you find a computationally heavy algorithm which might be worth to be implemented on the GPU - let me know