Sometimes you see these types of unexplained errors when the Kernel launch before this memCopy failed. Are you checking the return status of your kernel launches? If not, try adding the following code after each one.

But I can't see why the original code fails. Under "release" configuration, the new code can run when compiled by PVF 11.1 with cuda toolkit 3.2, while it still fails when compiled by PVF 10.8 with cuda toolkit 3.1. Under "debug" configuration, the program can't run and immediately finishes without a warning.

My best guess is that you're getting an access violation when indexing the UA_dev, YN_dev, or ART_dev arrays. What I would do first, is compile the code in emulation mode with array bounds checking enabled (-Mcuda=emu -Mbounds). If that didn't show anything, I would then break up your "UX_dev=" expression and then comment out each line in turn until you can determine which array is causing the fault.