I can not think of any immediate improvements. The problem is that CUFFT uses a lot of scratch space internally as well.

The only suggestion I can give right now is to not create FFT_MY_SIGNAL before the final memcopy. You can instead copy the arrays to d_ptr using two memory copies. And if OpenGL permits, use (d_ptr[i], d_ptr[i + signalSize) instead of (d_ptr[i], d_ptr[i + 1]).

Pavan Yalamanchili,ArrayFire--~ If it is not broken, you have not tried hard enough ~