ABSTRACT:Memory bandwidth is a major limiting factor in the scalability of
parallel iterative algorithms that rely on sparse matrix-vector
multiplication (SpMV). This paper introduces Hierarchical Diagonal
Blocking (HDB), an approach which we believe captures many of the
existing optimization techniques for SpMV in a common
representation. Using this representation in conjuction with
precision-reduction techniques, we develop and evaluate
high-performance SpMV kernels. We also study the implications of
using our SpMV kernels in a complete iterative solver. Our method
of choice is a Combinatorial Multigrid solver that can fully utilize
our fastest reduced-precision SpMV kernel without sacrificing the
quality of the solution. We provide extensive empirical evaluation
of the effectiveness of the approach on a variety of benchmark
matrices, demonstrating substantial speedups on all matrices
considered.