I have been running some very long loops (millions of iterations)
where , in each iteration, I call a few CUDA kernels via feval using pre-allocated arrays of fixed size. I noticed that the host memory grows linearly with the number of iterations and in the end matlab crashes.
While I was trying to isolate the problem I found out the following:
- Using feval to call a CUDA kernel , you have to have all the arguments of the function already cast as gpuArray's, even if you pass scalar variables.
This also applies to functions like gpuArray.rand or randn:

n = 1e4;
for i = 1:1e6
out = gpuArray.rand(n,1,'single');
end

The above code causes the host memory to grow for the duration of the execution (about 100Mb per 250K iterations)
If instead of n=1e4; you write n=gpuArray(1e4); the subsequent loop does not cause the memory to grow.
I also found out the the above loop executes much faster when n is in the host memory vs. when n is a gpuArray (about 3 times faster).

-Even more puzzling is the following example:

x = gpuArray.rand(1e4,1,'single');
for i = 1:1e6
out = sqrt(x);
end

The above loop does not cause MATLAB's memory footprint to grow.
However, if we change sqrt(x) with sqrt(1./x) then we get the memory blowup again.
I am using MATLAB 2013a 64-bit on windows 7 professional. My video card
is a gtx 650 2gb.
Thanks in advance for any insights.

Hi Michael, could you explain a bit more about your comment "you have to pass all the arguments as gpuArray's, even if you pass scalars"? This shouldn't be the case.

In your first example you are passing "n" as a non-GPU scalar and this should work both for functions and CUDAKernels. I have tried several CUDAKernel calls and passing non-GPU data (either scalar or array) seems to be fine. I'm obviously not trying the right thing - could you provide an example of it not working so that we can investigate what is going wrong?

I've also been trying to reproduce the memory leak you're hitting and am not getting very far. To help narrow down the differences between what I am trying and what you are trying, can you let me know the graphics driver version that you are using? Also, could you describe the error that appears when MATLAB crashes just in case that reveals something?

After about 30 million iterations, MATLAB's memory footprint grew to about 14gb (my PC has 16gb of RAM). MATLAB started to pause at certain times because of hard drive activity. I didn't manage to crash it, memory usage was around 13.9-14.5GB but the system was almost unresponsive due to hard drive activity. I need to find again the code that produced the crash. I remember though that the error message was something about Java heap space.

If I try

n = gpuArray(1e3);
for i = 1:1e8
out = gpuArray.rand(n,1,'single);
disp(i)
end

then MATLAB's memory does not grow! However, it is quite a bit slower than the first loop (about 2-3 times)! That's what I meant when I said that I needed to cast scalar input variables as gpuArrays.

I used Windows' task manager to monitor MATLAB's memory usage.
My driver version is 314.07. I installed the latest version (320.18); it didn't make a difference.
Thanks for your help,
Michael

Thanks Michael, you are indeed right and this appears to be a bug introduced in R2013a. There is no realistic work-around I can provide right now, but I will post an update here once I have some more helpful suggestions.

The reason why the memory does not leak with certain calls is that they force a synchronisation event (in your first example, SQRT can error so has to wait to see if the error was hit; in the second the scalar parameter "n" has to be transferred back to host memory, which also causes a sync). You could achieve the same by inserting a "wait(gpu)" after every call:

Hi Ben,
I tried a piece of code that I have been working on;
Without the patch, MATLAB ends up consuming 7gb of RAM (after 4mil iterations). With the patch MATLAB ends up consuming 1.3gb.
At the beginning of the loop MATLAB was using 0.5gb.
For simpler pieces of code I have (like the ones I posted on my question) there is no memory growth.
Thanks for the patch, it basically solves completely the problem for me.