"CPU" does not mean that only CPUs would be used - it means "CPU interface" (the input data and the output result is expected to be on the CPU memory). The "GPU" or "GPU interface" means that the input matrix as well as the output is on the GPU memory. In either case both the GPUs and the CPUs are used.For the case of QR, if you have more than one GPU, you can set environment variable MAGMA_NUM_GPUS to the number of GPUs you would like to use. For example, setting

Thanks Stan. If this is the case it would seem data input to magma_sgeqrf functions would have to all reside on CPU memory, but in testing_sgeqrf_gpu.cpp it appears to me that d_A is in device memory. So does SGEQRF also have a GPU interface in which case the "Computation Routines in Magma 1.1" table I linked earlier needs to be updated?

Also, what is the difference between sgeqrf_gpu, sgeqrf2_gpu, and sgeqrf3_gpu?

Yes, this is a typo - QR has both CPU and GPU interface. Thanks for pointing this out. We will fix it.

Regarding the different versions, sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization - 0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately. See also this discussion topic.