We are in the process of launching a new page here on culatools.com that is focused on user success stories with the CULA library. We are accepting scholarly papers, presentations, and whitepapers and will be posting our own papers there as well. Please contact us using the contact form in order to submit. We are excited to share your stories with all the CULA users!

We get many questions on our forums described as CULA issues but which are in fact misunderstandings of the LAPACK syntax. Admittedly, it's difficult for a newcomer to the library! When we set out to write CULA, we made a conscious decision to follow the LAPACK interface because it's so pervasive in numerical computing. I wanted to go over a couple of the more difficult points for new programmers.

Data is Column Major

C/C++ programmers are used to data stored in row major format, that is, that the items in a row are stored contiguously in memory. LAPACK, being a FORTRAN package, uses column major notation. For some, this will require reworking their code to translate it to column major or to transpose a matrix prior to calling the CULA routine. Any code that calls CUBLAS or CULA will have this restriction, and so should be written primarily for column-major data layouts.

LD* Paramaters

Each matrix parameter passed to a CULA/LAPACK routine is inevitably followed by an integer parameter called LDA, LDB, etc. This parameter signifies the physical size of the matrix, while parameters such as M and N describe the size of the data to be operated on. Such a way of specifying is useful for describing submatrices or in the case of padded allocations. This figure describes the relationship of these parameters when the valid data (blue) is a region of a physically larger data allocation (green).

Outputs are Shared with Inputs

Simple enough, in most cases, the data is operated on inplace. So the LU decomposition, which reduces a matrix, A, to two triangular matrices A=L*U, those two "output" matrices are stored quite economically in the same storage where matrix A was located on input. The bottom line is that if you want to preserve your data, you will often need to copy it to a new matrix prior to calling CULA routines.

Routines

Finding the correct routine in LAPACK can be a challenge. Take, for instance, inverting a matrix. The signature is simple enough: culaGetri(N,A,LDA,IPIV), and we have already covered the conventions governing the first three parameters. The challenge is in the finer points of the documents, where it is noted that the input matrix, A, is not an arbitrary matrix. Is is instead "On entry, the factors L and U from the factorization A = P*L*U as computed by GETRF." This is a way of saying that the input matrix A should be the result of first calling the routine GETRF (LU decomposition) on the original matrix.

One of the big announcements at GTC was Matlab's integrated GPU computing toolbox and this generated considerable buzz. And one of the questions we receive most often is regarding the potential for Matlab to experience speedups from GPU computing. Matlab is one of those great products in terms of usability but the most common complaint is that it's too slow, so GPUs are an obvious fit here. Our friends over at Accelereyes have put together a nice summary on the state of GPU computing in Matlab, and we wanted to share that. For the advanced CULA and Matlab users out there, it is also worth checking out our recent blog series where we describe the process of manually integrating CULA routines into Matlab code.