Created attachment 196[details]
A patch to add update/downdate functionality
The LDLT decomposition of a matrix A can be "updated" to yield the decomposition of A + W*W^T. Typically W is a column vector (a "rank-1 update"), but this patch also permits efficient rank-r updates. Moreover, it also supports "downdates," corresponding to removing previously-added columns of W.

hi, thanks for the patch. I see you adapted an algorithm initially designed for sparse matrices, but for dense ones do you know how does it compare to more classical algorithms based on Givens rotation? For instance see this page for a tech report and code on such an algorithm:
http://lapmal.epfl.ch/software/index.shtml

(In reply to comment #1)
> hi, thanks for the patch. I see you adapted an algorithm initially designed for
> sparse matrices, but for dense ones do you know how does it compare to more
> classical algorithms based on Givens rotation? For instance see this page for a
> tech report and code on such an algorithm:
> http://lapmal.epfl.ch/software/index.shtml
I don't know; I haven't tested it myself.
One advantage of this algorithm is that, for multiple-rank updates, it does the diagonals first and then L. If the diagonals were maintained in a separate array, this would presumably be better from a memory-locality perspective. (See comments in the summary section of the paper I referred to.)
However, since Eigen maintains the diagonals as part of the overall matrix, it may not be advantageous. Also, the main use case will presumably be rank-1 updates, for which it sounds like the BLAS-based code may have advantages.
Finally, it may be worth noting that the link you sent me contains code for LLT, whereas what I contributed is for LDLT.

Fair enough. For the record, this paper is very interesting too: "Methods for Modifying Matrix Factorizations", starting from page 10.
Regarding your patch, don't expect it to be integrated very quickly because it would require some bits of work such as cleaning, unit testing, make the API consistent with the rest of Eigen, etc. but again thanks a lot for sharing.

(In reply to comment #3)
> Fair enough.
I did some testing. I'll attach one test program---maybe this will be a starter on a unit test?
Testing: DataType = float, W has 200 rows & 50 columns
Speedwise, the new update() functionality seems to be comparable to, and indeed faster than, the current decomposition code. Last line of valgrind output:
With just the conventional decomposition of A enabled (all else commented out, except for cout << ldltDirect.vectorD().coeff(0) to ensure that the computation is not optimized out):
I refs: 18,949,850
With just the updating (W-based) decomposition enabled (for comparison leaving the calculation of A intact, even though it is not used):
I refs: 15,118,056
With the (useless) calculation of A also disabled:
I refs: 6,705,744
So if one is working with matrices originally defined as W W^T, the updating code is almost 3-fold faster than the current code. This seems like an enhancement worth having.
Accuracywise, the new update() functionality is significantly worse, perhaps due to the lack of pivoting-updates (i.e., there is no pivoting). Here was my output:
Direct err: 0.0289234
Update err: 0.548278
Direct: post/pre = 2.74359e-06
Update: post/pre = 2.42571e-05
The last comparison is probably the most relevant (see comments in code), since we typically care far more about the inverse properties of the decomposition than its accuracy in reconstructing the original matrix.
> For the record, this paper is very interesting too: "Methods for
> Modifying Matrix Factorizations", starting from page 10.
Thanks for the tip, I will look at it. On a related note, for updating SVD (something I may also be interested in implementing) I have the impression that the state-of-the-art is Brand 2002, "Incremental singular value decomposition of uncertain data with missing values." Even without missing values, my naive impression is that this algorithm is now widely used.
> Regarding your patch, don't expect it to be integrated very quickly because it
> would require some bits of work such as cleaning, unit testing, make the API
> consistent with the rest of Eigen, etc. but again thanks a lot for sharing.
OK. If there's more I can do, let me know. I'd be curious to see how it evolves in the hands of someone who really knows the API, it would surely be educational for me.

Further update: if one changes to 200 dimensions and 200 data vectors, one obtains:
"Classical" decomposition (including calculation of A): I refs: 44,198,467
"Update" decomposition (skipping the calculation of A): I refs: 51,286,285
So in the previous test case, the main advantage of update() was simply that W is smaller than A. When they are of the same size, the classical code has a slight advantage, but the two are comparable.

(In reply to comment #3)
> Fair enough. For the record, this paper is very interesting too: "Methods for
> Modifying Matrix Factorizations", starting from page 10.
Finally, I just noticed that the algorithm in the paper I cited (Davis & Hager) is simply a reorganization of the "Method C1" algorithm in the paper you cited (Gill et al). The one in the Davis paper apparently improves the memory access patterns by making only a single pass through L (should improve cache behavior I think). I doubt my own Eigenized implementation is ideal, I'm sure it could be improved upon.

Tim, thanks for the experiments. I played a bit with it too, and here, if dims==vectors==200, then the direct method is an order of magnitude faster. Same regarding the accuracy: the update method is much less accurate than the direct. Nevertheless, I also compared it to the LLT update method (recently available in the default branch) which is based on Givens rotations, and it seems the algorithm you implemented is quite comparable in term of performance, and slightly better in terms of accuracy, though as you noticed one is about LDLt and the other LLT, so they are not perfectly comparable.
For the unit tests, look at eigen/test/cholesky.cpp, there are already tests for LLT::rankUpdate.
Some remarks regarding the function:
update(MatrixBase<Derived>& w,int alpha)
- update -> rankUpdate
- the alpha should be a RealScalar (if your algorithm supports any alpha)
- w should be const qualified, and its values should really not be modified.
others:
- check it works for complexes, and for "upper" decompositions.
- I'm wondering whether it makes sense to do multi-rank updates as once, because:
1 - the speed improvement is quite small, especially for small rank update
2 - for relatively large updates it is faster and more accurate to restart from scratch anyway
3 - multi-rank updates requires more temporaries

(In reply to comment #8)
> Same regarding the accuracy: the update method is much less accurate than the
> direct. Nevertheless, I also compared it to the LLT update method (recently
> available in the default branch) which is based on Givens rotations, and it
> seems the algorithm you implemented is quite comparable in term of
> performance, and slightly better in terms of accuracy, though as you noticed one
> is about LDLt and the other LLT, so they are not perfectly comparable.
The more I think about it, the stranger I find it that updating is of lower accuracy. After all, the condition number of W W^T is the square of that of W. So I am increasingly puzzled by this result. However, I don't have any ideas at the moment for what to do about it.
> For the unit tests, look at eigen/test/cholesky.cpp, there are already tests
> for LLT::rankUpdate.
Thanks---I hadn't noticed that LLT already had rankUpdate, or I would have
based my work on it.
I simply made copies of the LLT tests for LDLT.
> Some remarks regarding the function:
>
> update(MatrixBase<Derived>& w,int alpha)
>
> - update -> rankUpdate
Done
> - the alpha should be a RealScalar (if your algorithm supports any alpha)
Mathematically any value is acceptable, so this seems reasonable, even though
I can't imagine why anyone would want to use a value different from +1 or -1.
> - w should be const qualified, and its values should really not be modified.
Done (this was easy to achieve once I restricted it to rank-1 updates, which I did in response to your other suggestions below).
>
> others:
>
> - check it works for complexes, and for "upper" decompositions.
It works for complex lower, and it works for real upper, but it does not work
for complex upper. Interestingly, I notice that LLT updating seems to fail its
unit test for complex upper, too (cholesky_6). Not quite sure what is going
on, or what to do about it---maybe the error is in Transpose?
> - I'm wondering whether it makes sense to do multi-rank updates as once,
> because:
> 1 - the speed improvement is quite small, especially for small rank update
> 2 - for relatively large updates it is faster and more accurate to restart
> from scratch anyway
> 3 - multi-rank updates requires more temporaries
Agreed, and in any case my own work requires only rank-1 updates. I changed
it, and this somewhat streamlines the code.

thanks a lot for the updated patch, it looks pretty good now, except the upper/complex case, see below.
(In reply to comment #10)
> (In reply to comment #8)
> > Same regarding the accuracy: the update method is much less accurate than the
> > direct. Nevertheless, I also compared it to the LLT update method (recently
> > available in the default branch) which is based on Givens rotations, and it
> > seems the algorithm you implemented is quite comparable in term of
> > performance, and slightly better in terms of accuracy, though as you noticed one
> > is about LDLt and the other LLT, so they are not perfectly comparable.
>
> The more I think about it, the stranger I find it that updating is of lower
> accuracy. After all, the condition number of W W^T is the square of that of W.
> So I am increasingly puzzled by this result. However, I don't have any ideas
> at the moment for what to do about it.
I don't know either what's the reason, but that seems to be a well known fact!
> > - the alpha should be a RealScalar (if your algorithm supports any alpha)
> Mathematically any value is acceptable, so this seems reasonable, even though
> I can't imagine why anyone would want to use a value different from +1 or -1.
think alpha as a weight in incremental weighted least squares. This is also to be consistent with SelfAdjointView::rankUpdate.
> > - check it works for complexes, and for "upper" decompositions.
> It works for complex lower, and it works for real upper, but it does not work
> for complex upper. Interestingly, I notice that LLT updating seems to fail its
> unit test for complex upper, too (cholesky_6). Not quite sure what is going
> on, or what to do about it---maybe the error is in Transpose?
yes, I noticed it a few days ago and the easiest solution was to pass w.conjugate(), see the latest version of the devel branch. Perhaps this works the same here.
> > - I'm wondering whether it makes sense to do multi-rank updates as once,
> > because:
> > 1 - the speed improvement is quite small, especially for small rank update
> > 2 - for relatively large updates it is faster and more accurate to restart
> > from scratch anyway
> > 3 - multi-rank updates requires more temporaries
>
> Agreed, and in any case my own work requires only rank-1 updates. I changed
> it, and this somewhat streamlines the code.
I think rankUpdate should still accept matrices, but simply process it one column at a time. I probably have to do the same change for LLT::rankUpdate() ;)

I see I've not checked this bug report for a long time, and due to my lingering confusion about how hg works I had the impression the patch had been committed. But with the impending release of 3.1 I checked out a fresh copy, and noticed (I think) that it's still not in. I'd like to finish this patch so it can be included.
I updated my copy of the default branch. Unless I'm doing something wrong (which is entirely possible), I noticed LLT is still failing cholesky_6. So I am not certain which part of "latest version of the devel branch" (see comment above) I am supposed to check.

> I think rankUpdate should still accept matrices, but simply process it one
> column at a time. I probably have to do the same change for
> LLT::rankUpdate() ;)
I'm happy to implement this too, but before I do I should ask: having it loop over columns would seem to introduce a very slight penalty for the case where the user knows there is only one column. (If the updating matrix has ColsAtCompileTime = 1 this can be optimized away, but if the user uses Dynamic then it would seem to need to run the loop.) Are you concerned at all about this? Presumably it is a very small penalty compared to the actual matrix operations. Nevertheless, if you want the absolute most in efficiency it would seem to be best to make the user run the loop in cases where they want to do multi-rank updates. Just let me know what you prefer.

hi, I've just release the alpha1, but there is still time to include it for the next beta. I'm not worried at all about the performance of looping over the rows. If the user pass a vector then the loop will be optimized away, and a loop is mainly a "if" so compared to the rest of the computations this is nothing!
cholesky_6 should be be failing, works here on many different systems.