The two referenced papers above take a more linear algebraic approach,
rather than just providing a matrix API. McDonald's Cactus system is
probably more appropriate for building a computer algebra system, as
it gives a framework for reasoning about general finite-dimensional
vector spaces.

Lazy evaluation

Expression templates

Expression templates are a C++ paradigm for lazy evaluation at
compile time. CLOS does a lot more at runtime than the C++ class and
template systems, so expression templates aren't necessarily as useful
for Lisp. However, the the underlying idea of lazy evaluation is
useful for avoiding temporaries. In fact, lazy evaluation is probably
easier in Lisp with its less strict type system.

It would be interesting to see how much we could do in Lisp at compile
time. Doing so may require constructing a "template system" or
"compile-time class system." Actually, though, part of this is
handled by series expressions (in the CL standard). By writing the
appropriate scanner functions for e.g. vectors and vector views, we
get part of the effect of lazy evaluation -- at least the part that
describes how to combine different vectors.

Lazy evaluation for matrix factorizations?

Lazy evaluation is handy for vector-style operations, as the
expression templates page explains. However, how useful would it
be for matrix factorizations? It doesn't really help us with
reasoning whether a particular operation could be performed in place
rather than by copying -- that requires a compile-time code analysis.
If we don't do that, it will be up to programmers to select an
in-place operation when they know that they can overwrite the input
array.

Where it can be helpful, though, is with implicit representations
of factors. A good example is the Q factor from a QR factorization.
We can save a lot if we know that we don't need to construct Q
explicitly. That is a sort of "lazy evaluation." However, making the
lazy evaluation effective in saving computation requires some
reasoning. For example, the user may start asking for columns of the
matrix. If it's just one or two columns, it's probably more efficient
to keep the implicit representation and just operate on columns of the
identiy matrix to get the columns. If the programmer asks for a lot
of columns, however, it may be more efficient just to give him/her the
whole thing. Reasoning about that at runtime is possible but incurs
overhead: for example, the "Q factor expression" can notice that the
programmer is calling for a lot of columns and thus decide to expand
itself into an explicit representation. Part of the overhead in this
case is examining the vectors to see if they are columns of the
identity.

Here are some cases where lazy evaluation is useful:

Avoiding the allocation of temporaries (costs of temporaries may
include function call overhead for allocation, register/cache
pollution, etc.)

Sticking to implicit representations as long as it's efficient
(handy if computing an explicit representation is expensive but
operating with the implicit representation is possible and
efficient)

Other lazy evaluation references

The
ZOOM LinearAlgebra C++ package. The design documents give a
particularly hideous example ("double dispatching") of how CLOS
makes life much easier for Lisp programmers than the C++ class
system for C++ programmers.

Parallel processing in Lisp

The only rational option for getting Lisp to support parallel
operations "natively" is to wrap some portable parallel processing
library, like MPI or more probably GASNet. We can't just hide it
under the covers (of e.g. ScaLAPACK) because setting up the data for
calling a library such as ScaLAPACK will require some communication
(you have to distribute the data first before you can work on it!).

MPI in Lisp

Here is a paper: G. Cooperman, STAR/MPI: Binding a
Parallel Library to Interactive Symbolic Algebra Systems, Proc. of
International Symposium on Symbolic and Algebraic Computation (ISSAC
'95), ACM Press, 126-132. You can get it on Citeseer.
Gene's package, ParGCL, uses
GCL as the underlying
Lisp. It is open-source and available for
download here. Last update on the FTP site appears to be
12/16/2005, as of 10/5/2006. According to the GCL website, the GCL
authors plan to incorporate ParGCL into GCL itself.

ScaLAPACK

High-level wrappers for ScaLAPACK is a decidedly non-trivial task.
Matlab*P does it, as well as Cleve Moler's upcoming parallel Matlab.
Again, it's not enough to wrap ScaLAPACK; there need to be parallel
primitive operations in order to distribute the data.