I have a test function (below) that calculates inverse of a complex<double> matrix A (of size 90x90).When I call this code in a parallel for loop using OpenMP, it gives a segfault without displaying any error messages. If I call this code in a sequential loop, it works. Or, if I comment out the call to "magma_zgetri_gpu" doesn't segfault in parfor (no error messages either).

Parallel for loop initiates 48 OpenMP threads, so there are 48x (90x90) matrices to invert which are about 6 MB in size in total at any given time. So it is not huge at all. I have also tried using only 2 threads, still segfaults.

I think MAGMA itself is thread safe, but not the linear algebra libraries that it depends for. For example, I am able to run Matrix multiplication functions in parallel in different threads at the same time without a problem (this uses ATLAS). It seems that this problem is related with CUBLAS (magma_zgetri_gpu fails when called by multiple threads at the same time).

A MAGMA developer would provide more reliable information on that though...

Yes, MAGMA itself should be thread safe. For now we do not create our own threads or have global variables that may compromise thread safety. It should be coming from the way the math libraries or CUDA are used with OpenMP. Also, for matrices of size 90x90 the current implementation would not be very efficient (for smaller than 64 we use CPU code). We are working on adding functionality like the one needed here (for small matrices) and similar to the batch gemms in CUBLAS.

CUBLAS's old API does not guarantee thread safety, and the CUBLAS4.0 or higher document recommends using Version 2 API with stream created by each individual CPU thread.

Current version of Magma relies on CUBLAS's old API and thay may break the execution order of kernels in a few calls using multiple cudaStream. In particular, use of cublasSetKernelStream is dangerous because it accesses global variables maintained by CUBLAS. Also, Magma has a global variable "magma_stream" for magmablas execution, which is another potential flaw.