internally, a HashMatrix is simply a std::vector< std::map<int,Scalar> >. It is simple to implement and easy to use but it is also damn slow ! would be really nice if we could find a better alternative for efficient random writes.

each expression defines a InnerIterator which allows to efficiently traverse the inner coefficients of a sparse (or dense) matrix. Of course, these InnerIterator can be nested exactly like expressions are nested such that our sparse matrices already support expression templates !

InnerIterator are implemented for:

a default implementation for MatrixBase

CwiseUnaryOp

CwiseBinaryOp (with some shortcomings, see below)

SparseMatrix

HashMatrix

efficient code for the product of two CCS to a CCS.

Write/Update access patterns

Well dealing with sparse matrix, a critical operation is the efficient set/update of the coefficients. In order to offer best efficiency it is important to adapt the solution to the required access pattern. For instance, a temporary sparse matrix based on a std::map can handle any kind of access pattern, but it also perform very poorly if you are able to fill a matrix in a coherent order. We can identify four different access pattern schemes with their respective technical solutions ranging from the most efficient to the most flexible.

Fully coherent access

Here we assume the matrix is set in a fully coherent order, i.e., such that the coefficients (i,j) are set with increasing i + j*outersize where i is the inner coordinates and j the outer coordinate. In such a case we can directly set a compressed sparse matrix as we would fill a dynamic vector. In order to reduce memory allocations/memory copies, it is important to be able to give a hint about the expected number of non-zeros such that we are able to preallocate enough memory. Of course using a more dynamic data structure like a linked list of small array would probably performs better compared to resizing an single large buffer but since this not a standard storage format and that the standard compressed scheme works not bad it's probably better to use it directly.

Updating a sparse matrix using this access pattern can be done by filling a new temporary matrix and then do an efficient shallow copy.

Inner coherent access

Here we assume the coefficients (i,j) are set with increasing inner coordinate i for each outer vector j. For instance the following sequence of coordinates is valid:

(2,10) (4,7) (1,12) (4,10) (3,12) (2,1)

On the other hand, at that point it is forbidden to set the coefficient (2,12) since (3,12) has already been set. A typical use case of such an access pattern is to copy a row matrix B to a column matrix A: since B has to be traversed in a row major order the sequence of coordinates (i,j) won't fit the requirements of the fully coherent access pattern but perfectly match the current one.

To implement such a behavior, we need a dynamic data structure per inner vector. Since there is no such standard storage scheme, we are free to choose whatever we want. Currently, this access pattern is implemented by mean of small linked array in the LinkedVectorMatrix class. Still to do:

make the granularity of the chunks configurable at runtime

write a memory allocator for the chunks shared at the matrix level ?

write a clean linked vector class for reuse.

Outer coherent access

Random access

=> currently by mean of an array of std::map but I'd like to try:

array of a custom sorted linked lists

array of sorted binary trees (good if the filling follow a quasi-random pattern, degenerate to a linked list if it is filled coherently unless we keep the tree balanced that is rather expensive IMO)

Issues

The major issue with sparse matrices is that they can only be efficiently traversed in a specific order, i.e., per column for a CCS matrix and per row for a CRS matrix. The nighmare starts when you have to deal with expressions mixing CCS matrices (column major) and CRS matrices (row major).

At that point m.nonZeros() == 6 and you cannot add any other nonzero entries. For instance x=m(0,0); will returns zero, while m(0,0)=x; will issue an assert. Of course you can still update an existing nonzero: m(7,2) += 1;.

To allow to treat any matrix as a sparse matrix we could define dummy startFill(), fill() and endFill() members to MatrixBase but maybe we could find some better API ? Indeed, while efficient there are two major limitations:

we cannot update the matrix, only set it from zero

to be consistent, for dense matrices startFill() should do a setZero() that is not really nice

For instance we could imagine something based on the facade design pattern where you could request for a random setter or coherent setter or an updater or whatever else is needed ? For dense matrices these facade objects would degenerate to a simple references.

Binary operators (aka. operator+)

Here the problem is that we cannot mix a CCS with a CRS. In such a case one of the argument have to be evaluated to the other format (let's pick the default format). This requires some modifications in Eigen/Core:

probably needs to add a SparseBit flag

needs a more advanced mechanism to determine the storage order of a CwiseBinaryOP<> expression

needs to modify nesting in ei_traits<CwiseBinaryOP<>> such that if an argument is sparse and has a different storage order than the expression itself, then it has to be evaluated to a sparse temporary.

needs to modify ei_eval such that a sparse xpr gets evaluated to a sparse matrix (simply add an optional template parameter equal to Xpr::Flags&SparseBit and do the specialization in Eigen/Sparse)

Transpose

Basically, here the challenge is to copy a CRS matrix to a CCS one. I can imagine three strategies:

copy to a temporary column major HashMatrix and then compress it to a CCS (slow)

perform a column major traversal of the CRS matrix:

create rows InnerIterators iters;

for j=0..cols do

for i=0..rows do

if iters[i].index()==j then

res(i,j) = iters[i].value; iters[i]++

of course creating rows iterators might be too expensive so we need a way to provide lightweight iterators where the reference to the expression, the current outer index etc. are not stored. Maybe we could slightly update the InnerIterator such that they can loop over an arbitrary number of outer columns/rows.

the third options is like the first one but with a much more efficient data structure. Indeed, during a row major processing the coefficients of the column major destination matrix will be set in coherent order per column. So we can use a special temporary sparse matrix with a column/row vector of linked fixed size vectors. This seems to be the best option. If it is a temporary there is no need to convert it to a compact CCS matrix.

Matrix product (again !)

Here we have to investigate all the possible combinations for the 2 operands and the results (2*2*2=8 possibilities):

lhs

rhs

res

comments

col

col

col

Loop over rhs in a column major order (j, k), accumulate res.co(j) += rhs(k,j) * lhs.col(k). The accumulation needs a single column temporary which is initialized for each j. If the result of the product is dense (nnz > 4%), then the best is to allocate a dense vector, otherwise a sorted linked list peforms best (the accumulation of each column of lhs is coherent, so we can exploit this coherence to optimize search and insertion in the linked list)

col

col

row

weird case, let's evaluate to a col major temporary and then transpose

row

col

*

Loop over the result coefficient in the preferred order and perform coherent dot products

row

row

*

like col col *

col

row

col

if transpose is fast then evaluate rhs to a column major format, otherwise use a dynamic temporary for the results (array of sorted linked lists)

col

row

row

like col row col

Review of some existing libs

GMM++

native support of basics (add, mul, etc)

native support of various iterative linear solvers including various pre-conditioners for selfadjoint and non-selfadjoint matrix (most of them come from ITL)

can use superLU for direct solvers

provides various sparse matrix formats:

dense array (col or row) of std::map<int,scalar>

pro: easy to implement, relatively fast random access (read/write)

cons: no ideal to use as input of the algorithms, one dimension is dense