I am using magma_zgetrf in gfortran with pinned-memory. I am getting a correct solutions but the magma operation is 3-4times slower then the scalar lapack routine. I am using iso_c_binding to pass pointer from fortran. Is there any suggestions what could be causing the performance decrease?

When I run testing_magma_zgetrf example I get the performance you would expect, gpu faster.

I have a attached a snippet of my code. A little messy from debugging.

So I think I have solved my own problem. It turns out it isn't a programming problem. The problem is the matrix I was trying to decompose was fairly sparse and didn't require much computational effort so the scalar lapack routine could do it quickly where magma had to spend time allocating device memory, etc. If I go back and put a random matrix in like the benchmark examples I see that magma is indeed faster.

We are going to post MAGMA 1.0 RC3 tomorrow. It will have an example on how to call magma from FORTRAN. Part of the release will be a timing function that enforces that previously started GPU computations are finished. Using this timing we get that the performance of MAGMA through C and FORTRAN testers is the same.