In row major order, the dpotrf layer: i. transposes a on input into a_t ii. calls dpotrf_ with a_t iii. transposes output a_t back into athe dpotrs layer: i. transposes a_t on input into a_t and b into b_t ii. calls dptrs_ with a_t and b_t iii. transposes b_t back into b

By switching UPLO 'L' <--> 'U' in both layers you can work purely with 'a'and remove need to work with a_t. b_t is still required however.

Similar tricks can be made in many other cases, e.g.: SVD by switching U<-->VT, m<-->n; orcsd and uncsd : has 'trans' argument so no transposes required at all.

In some cases there is a change in algorithm between UPLO='L' and 'U' sothe wrappers would need to be consistent between parts of a set (as dpotrf,dpotrs above).

It would be a nice project for someone to work on these and produce benchmarks onperformance improvements and memory savings that these changes would make.

I agree with you. (This was actually mentioned in our discussion during the design of LAPACKE.) "One" could do this. That (1) would be "fun" and (2) it would be useful by saving (quite a lot of) memory and time. Yep: DGESVD, DPOTRF, DORMQR, etc. All these could be written with this in mind.

The idea is the same as for the CBLAS (C interface to the BLAS). Actually, the CBLAS supports Row Major Format and Column Major Format by (1) relying only on a Column Major Format implementation, and (2) not performing any memory allocation. This is done just by playing with order of operands and transpose arguments and tricks like this. It's fun (and useful). Beautiful.

Well anyway, yes, this is a good idea, and so I added it to our Wish List. This wish list item should already have been list, but never made it. http://www.netlib.org/lapack/WishList/We might come up to it at some point.

In the wish list as well, and on a related topic, we also wanted to try to have the inplace transposition algorithm of Fred Gustavson, Lars Karlsson and Bo Kågström. (See "Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion", ACM TOMS 2012.) The algorithm is in PLASMA already thanks to Mathieu Faverge, and it would be nice to have it in LAPACK, and that LAPACKE uses it for transposition. When no trick is possible in the layer and transposition is necessary, this would avoid memory allocation.