Something is wrong with the filter function. The loadPPM and savePPM (part of the cuda samples) are working with an other kernel function, but with this filterfunction I get an black image.

So the question is: What did I wrong?

Some other comprehension questions:
Here https://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf I read that threads can only communicate within a block (shared memory, syncthreads, ..). So in my function the image is split into rectangular blocks and the picture on page 9 of the Image Processing slides is about one block? What about the pixels at the edge of a block? Are they unchanged?

So when you launch that kernel, threadIdx.y will always be zero as will blockIdx.y

When I make a modified version of your code that does not depend on PPM image load/store (so, using synthetic data), and make the changes necessary to launch a 2D grid and threadblock, to be consistent with your kernel, the code seems to run correctly for me and produce output that seems like it might be filtered output, instead of zeros: