Authors

Abstract

Matrix operations are common and expensive computations in a variety of applications.
They occur frequently in high-performance computing, graphics, graph processing, and
machine learning applications.

This paper discusses how to map a variety of important matrix computations,
including sparse matrix-vector multiplication (SpMV), sparse triangle solve (SpTS),
graph processing, and dense matrixmatrix multiplication, to GPUs. Since many emerging
systems will use heterogeneous architectures (e.g. CPUs and GPUs) to attain the desired
performance targets under strict power constraints, this paper discusses implications
and future research for matrix processing with heterogeneous designs.

Conclusions common to the matrix operations discussed in this paper are:
(1) Future algorithms should be written to ensure that the essential computations fit
into local memory, which may require direct programmer management.
(2) Algorithms are needed that expose high levels of parallelism.
(3) While the scale of computation is often sufficient to support algorithms with superior
asymptotic order, additional considerations, such as memory capacity and bandwidth, must
also be carefully managed. (4) Libraries should be used to provide portable performance.