I ran into this problem like many others, only my solution was very different. My kernels were loading just fine, fully serialized, until I increased the amount of data I was processing. I didn't realize I had gone beyond the maximum shared memory per block.
I pulled this program apart for 2 days, even compiled my own Cudafy libraries so I can step through debugging line by line to find out just what it was that was killing me. The kernel was allocating the memory with no failures, the kernel was being terminated
per debug settings with memory access violation.

So two lessons: 1) Be aware of how much shared mem (gThrd.AllocateShared<>()) you are attempting to allocate, and 2) check for NULL pointers to buffers!!! LOL

Great product, by the way. C# is such an elegant language, it took me 3 months to code something I tried for just under a year with C/C++.

I learned quite a bit this weekend. Actually it wasn't from the shared allocs, I fought tooth and nail to learn how to pick decent threading settings when launching the kernel(s). It usually failed with that above message if it couldn't schedule enough
threads to cover the work, and even then it's gosh darn picky! Live and learn LOL