In order to take advantage of the higher performance of sequential
level-3 BLAS, we rewrite the
above algorithm in terms of matrix-matrix operations.
Partition

where and are sub-matrices.
Now,

This in turn yields the equations

We thus conclude that the following steps will implement the
Cholesky factorization, overwriting the lower triangular
portion of A with L :

, where Cholesky factor overwrites
the lower triangular part of .

compute the Cholesky factorization of updated
recursively, yielding .

Notice that to compute , a level-2 implementation
of the Cholesky factorization can be used.
The bulk of the computation is in the operation
,
which now uses a matrix-matrix operation, thereby achieving
better performance.

The PLAPACK code for this level-3 BLAS based implementation is given
in
Figure 8.2.
Notice that the block size b must be determined. This block size
should equal the width of that makes the
the symmetric rank-k update:
most efficient.
The call

PLA_Environ_nb_alg( PLA_OP_SYM_PAN_PAN, template, &nb_alg );

queries the infrastructure for the optimal block size for
symmetric rank-k (panel-panel) update.
A second optimization, which could also have been used for the
level-2 BLAS right-looking variant, recognizes that computation
naturally flows to the right and down: notice that the columns are
completed from left to right, and similarly the order in which
rows no longer participate is from top to bottom.
The calls