I have been doing some tests on dgetrf.cpp and testing_dgetrf.cpp. I am using a version of dgetrf.cpp which reports its block size and which blas it is using. I have also limited the block size to 192 as larger ones seemed to trigger the problem.

I think that the problem must lie somewhere outside dgetrf and the magmablas_dtrsm but somehow feeds that routine information which causes a crash. I can see no other explanation for the following two consecutive runs.

testing_dgetrf with dgetrf using magmablas_dtrsm.

The first run is fine except for the last size and the second one collapses immediately. I don't know what argument 7 of dtrsm is as it has 6 arguments. This must be a hidden argument of some sort.

M N CPU GFlop/s GPU GFlop/s ||PA-LU||/(||A||*N)============================================================magma dgetrf block size is 64 (magmablas_dtrsm) 1024 1024 22.35 22.17 4.223855e-18magma dgetrf block size is 64 (magmablas_dtrsm) 2048 2048 24.40 42.54 3.579287e-18magma dgetrf block size is 192 (magmablas_dtrsm) 3072 3072 25.31 56.89 4.001358e-18magma dgetrf block size is 192 (magmablas_dtrsm) 4032 4032 26.27 60.96 3.816939e-18magma dgetrf block size is 192 (magmablas_dtrsm) 5184 5184 26.12 64.34 3.612047e-18magma dgetrf block size is 192 (magmablas_dtrsm) 6016 6016 26.01 65.92 3.492312e-18magma dgetrf block size is 192 (magmablas_dtrsm) 7040 7040 25.73 67.28 3.401059e-18magma dgetrf block size is 192 (magmablas_dtrsm) 8064 8064 26.39 68.07 3.306196e-18magma dgetrf block size is 192 (magmablas_dtrsm) 9088 9088 26.06 68.94 3.232232e-18magma dgetrf block size is 192 (magmablas_dtrsm)can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture .......................can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture 10112 10112 25.94 449.60 nanfletcher@fletcher-desktop:~/magma_1.0.0-rc3/testing$ ./testing_dgetrfdevice 0: GeForce GTX 460, 1400.0 MHz clock, 2047.2 MB memory

Usage: testing_dgetrf -M 1024 -N 1024

M N CPU GFlop/s GPU GFlop/s ||PA-LU||/(||A||*N)============================================================magma dgetrf block size is 64 (magmablas_dtrsm)can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture .......................can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture can not bind to texture 1024 1024 22.36 41.14 nanmagma dgetrf block size is 64 (magmablas_dtrsm)Argument 7 of dgetrf had an illegal value. 2048 2048 24.25 212019.54 1.766772e-01magma dgetrf block size is 192 (magmablas_dtrsm)Argument 7 of dgetrf had an illegal value. 3072 3072 25.99 715653.21 1.767735e-01^C

Here is the equivalent with cublasDtrsm. I notice that for small sizes the GPU values are better.

// === Define what BLAS to use ============================================#define PRECISION_z#if (defined(PRECISION_s) || defined(PRECISION_d)) #define cublasZtrsm magmablas_ztrsm#endif// === End defining what BLAS to use =======================================

The #if will not be true unless one of the noncomplex precisions is defined, so magmablas_ztrsm is never called.

Am I correct that the routine magmablas_ztrsm does not exist yet, and neither is there a fermi version of magmablas_dtrsm?

I have now had a look at dtrsm_tesla.cu. I notice that the routine allocates memory in two arrays and contains code to zero the first array but not the second. I have added a line to zero the second array (see below). There are two locations for this (left and right) and I have made the same change for both. I have made the same change in strsm_tesla.cu.

Further testing on a cold start of the computer, which is when the errors happened before, has shown no problems.

I have run testing_dgesv_gpu, dgetrf_gpu and dgetrf, all of which had problems before, and seen no problems.

There is an implication that the algorithm in dtrsm is using the memory which I have set to zero in some way which assumes it is set to zero when it wasn't. I don't know much about CUDA processing, so I haven't dug into the algorithm.

I have not looked to see whether any other routines need similar action to that on dtrsm.

Thanks! This indeed helps. I have forwarded these comments and results to the colleague that is trying to fix it.

Your remark about ztrsm not being implemented in magma is correct. We redirect to CUBLAS is the complex precision case. This is done just for our software engineering convenience, e.g., we generate all four precisions form a double complex version so if we don't have a particular version we still have to define it and we define it as the reference CUBLAS.

Thank you for your comments. I am glad to help and I am sure lots of people will benefit from the hoped for result of a set of routines which will become as well established as LAPACK.

I had a look around and could not find any routines other than dtrsm and strsm which defined memory in this way.

I have reported another bug in dsgesv_gpu in another thread which I think is on top of this one, in the single precision code somewhere.

I am making good progress with my own project which uses MAGMA and it is proving fairly easy to convert existing (FORTRAN) code to run with MAGMA using dgetrf and dgetrs once I have figured out how to get the data across.