I am starting to learn about GPUs and MAGMA; I would like to call LAPCK and BLAS equivalents for GPU from a FORTRAN program(using gfortran to compile).

However, I am confused with the various options available. I can do call spotrf( ... )and call magmaf_spotrf( ... )with slight difference in time used (small example); my first question is, what is what - is therea difference in those two calls and if so, what do I call with spotrf - I don't think I have loaded a 'standard' lapack library.

Secondly, I would like to call spotri - however, calling magmaf_spotri( ) (or magma_spotri) gives an undefined reference, althoughlooking at testing_spotri.cpp in the testing directory of magma, there are calls to magma_spotrf() and magma_spotri().what am I missing? do I need the magma version, or does call spotri() give me s routine which uses the GPU? (how do I tell?)

I have googled for tutorials/instructions, but to no avail - does some documentation for this exist? If so, I'd greatly appreciate some pointers to it.

spotrf(...) calls LAPACK, on the CPU.magmaf_spotrf(...) calls MAGMA, which will use both the CPU and GPU.

A moderately large matrix, say 2000x2000, is required before seeing the benefit of using the GPU. For small matrices, on the order of 100x100, the overhead associated with copying to-and-from the GPU outweighs any benefit.

If you successfully linked with MAGMA, then you linked with some version of LAPACK, because MAGMA depends on LAPACK. This may the LAPACK provided in Intel MKL, ACML, or Mac OS veclib, if you use one of those.

For spotri, the first question to ask is, do you really want to invert a matrix? An inverse is rarely needed. It is faster, more accurate, and uses less memory to use spotrs or sposv to do a solve, than to explicitly invert and then multiply. However, if you really do need the inverse, for magmaf_spotri(...), there is currently the C interface, magma_spotri, but not the Fortran interface, magmaf_spotri. This was just an oversight; we will add it in the next release. Adding the interface is fairly simple. Just add these functions:

many thanks the prompt detailed & answer. That did the trick! Is there a way to add corresponding interfaces for the magma_blas functions/routines? I looked at definitions in include/magmablas_s.h, but can't quite see how to do it. Or could/should I call the corresponding CUBLAS functions instead, if I want to execute STRMM, SSYRK, SGEMM using the GPU? (I also need SLAUUM).

For BLAS functions (gemm, gemv, etc.), probably the easiest thing is to call the CUBLAS functions, which NVIDIA provides Fortran wrappers for in the CUBLAS library.

lauum is not a BLAS function, so CUBLAS won't have it. The wrapper should look like below. Basically, the wrappers do two things. First, make an appropriate fortran name, which is like magmaf_slauum, magmaf_slauum_, or MAGMAF_SLAUUM, depending on your Fortran compiler. (That's handled by the MAGMA_FORTRAN_NAME macro.) Second is to convert all input arguments to pointers. Then it calls the C interface, converting pointers to scalars as necessary. For instance, below n is passed as a pointer (magma_int_t*) to the Fortran interface, and dereferenced as *n to call the C interface. Values that are already pointers, such as A, don't need to be changed.

There is no dgesl, either in LAPACK or in MAGMA. I think you meant dgesv.

magma_dgesv and magma_dgesv_gpu do the same job, factorizing and solving AX=B, but differ in where the input and output matrices are.

magma_dgesv_gpu takes a matrix dA and right-hand side dB, both on the GPU. It produces a factorization LU that overwrites dA, then solves LU X = dB, with the result X overwriting dB on the GPU. So both the input and output matrices are on the GPU.

magma_dgesv takes A and B on the CPU. In the simple case, it transfers data to the GPU, calls magma_dgetrf_gpu and magma_dgetrs_gpu, and copies data back to the CPU. If you set the environment variable $MAGMA_NUM_GPUS > 1, it will use multiple GPUs. If the matrix does not fit in GPU memory, it will use a non-GPU-resident algorithm.

As for least-squares, there is a magma_dgels_gpu function. This uses the lower level magma_dgeqrf_gpu and magma_dgeqrs_gpu routines.

For least-squares, there is also magma_dsgeqrsv_gpu, which is a high-level mixed-precision least-squares solver (note "ds" prefix). It first tries to factor the problem in single-precision and do iterative refinement to achieve double-precision accuracy. Since single precision is twice as fast as double precision, this is usually faster than factoring in double precision. If iterative refinement fails (e.g., due to a large condition number), it falls back to doing a double-precision factorization and solve.

lauum is not a BLAS function, so CUBLAS won't have it. The wrapper should look like below. Basically, the wrappers do two things. First, make an appropriate fortran name, which is like magmaf_slauum, magmaf_slauum_, or MAGMAF_SLAUUM, depending on your Fortran compiler. (That's handled by the MAGMA_FORTRAN_NAME macro.) Second is to convert all input arguments to pointers. Then it calls the C interface, converting pointers to scalars as necessary. For instance, below n is passed as a pointer (magma_int_t*) to the Fortran interface, and dereferenced as *n to call the C interface. Values that are already pointers, such as A, don't need to be changed.