Launch this kernel function with grid dimensions (xdim, ydim) on stream asynchronously to execute on the current CUDA device. Setting stream to anything other than an instance of CUStream will execute on the default stream 0.