With this change the resulting libmagmablas.a library archive can be linked to a matlab mex file (or mine anyways...), and seems to run correctly. I rather stumble around with these compiler options - happy to hear corrections if this is off the mark. This will be the first time I've gotten a magma routine to work with matlab...yay!

(Also changing the zero padding in my application to be modulo 96, rather than 32)

My own application has matrices on the smallish side (O(1000)), but runs a long loop of repeated calculations. This application calls SGEMM 3 times in each loop in addition to a few other BLAS routines and culaDeviceSgesv. Replacing the cublas 3.1 with the magma SGEMM improved the overall computation time by 17% - the computation takes about 20 min. to complete on a GTX480. So that's a nice improvement!