Table 5.6
shows execution rates for the 64-bit matrix-vector
multiply PBLAS routine PSGEMV/PDGEMV.
The rates listed are for a matrix-vector
product , where A
is a square matrix of order N and x and
y are vectors that are both distributed
over a process column.

The Level 3 PBLAS are not necessarily limited
by memory bandwidth because they perform
many flops for each word involved.
The flop rate is correspondingly higher.
Table 5.7

shows the performance
results obtained by
the general matrix-matrix
multiply PBLAS routine
PSGEMM/PDGEMM. These
results have been
obtained for the
matrix-matrix
multiply operation
,
where A, B, and C
are square matrices
of order N.