When I use magma_zgeqrf2_gpu, I have direct access to R, but there is no matching function to restore Q: magma_zungqr and magma_zungqr2 both require A to be in host memory and magma_zungqr_gpu requires the dT array which I don't get from magma_zgeqrf2_gpu.

When I use magma_zgeqrf3_gpu, I can use magma_zungqr_gpu to obtain Q and the code from testing_zgeqrf_gpu.cpp to restore R?

Just as a small side question: What are the computational complexities of *geqrf* and *ungqr*? Is the complexity of *ungqr* negligible in comparison to *geqrf* (and therefore the reason, why there is only a CPU-*ungqr* for magma_zgeqrf2_gpu)?

That magma_zunmqr2_gpu was written for a particular use in the eigenvalue codes, so it's weird in taking both dA (on GPU) and wA (on host).

There's no particular reason that magma_zungqr2_gpu doesn't exist. We've just never needed it.

For a real, square matrix:geqrf is 4/3 n^3 flopsungqr is 4/3 n^3 flopsIn complex, those get multiplied by about 4.For rectangular matrices, it depends on what part of Q you want. LAPACK Working Note (LAWN) 41 has detailed flop counts for most of the routines (listed under the single-precision names: sgeqrf, sorgqr, etc.).http://www.netlib.org/lapack/lawnspdf/lawn41.pdf

Often, you can use unmqr (multiply by Q) instead of ungqr (generate explicit Q), but not always.