I suspect it's the same issue where the OS is killing your long running job. This can occur if your GPU is attached to a monitor.

Note the following code in your kernel:

Code:

if ( I <= N )then
do j=1, N
C(i)= A(i) + B(i)
end do
end if

You have every thread execute the vector add N times. Granted you may be doing this on purpose for benchmarking, but it is causing your code to take a lot longer then it should. To fix, remove the do loop.

Do a web search for "CUDA Windows Watchdog Timer" and you'll find a work around. However, the work around requires you to edit your registry and disable the GPU watchdog, leaving your systems susceptible to freeze-ups. You can try this but it's not recommended. Instead, you should consider breaking your long running kernels into smaller, shorter, ones, using smaller data sets, or getting a dedicated compute GPU.