Gains from providing sparsity patterns of derivatives?

I am solving an optimization problem using fmincon with the interior-point method. The problem has about 5000 variables and 5000 nonlinear constraints. By a very large margin, most of the computational time is spent in the routine 'factorKKTmatrix', which I guess does what its name suggests. As the trust-region version of the interior-point method does not appear to be more efficient, an idea that comes to mind to speed things up is to provide sparsity patterns for the Jacobian of the constraints and the Hessian of the Lagrangian (both are very sparse). To my knowledge, however, this is not possible using fmincon. My question is thus whether I can expect to see significant gains in terms of CPU-time from providing said sparsity patterns, making it worthwhile to try other solvers (eg. knitro) as well?

Products

1 Answer

There would definitely be significant gains in taking advantage of Hessian sparsity, but I don't know why you think that means abandoning FMINCON. This link talks a lot about options in FMINCON to customize Hessian calculations

The trust-region method offers HessPattern (to specify the sparsity pattern as you mentioned) and HessMult. In HessMult, you can customize multiplication with the Hessian in any way you want, including using a sparse matrix to represent the Hessian.

In the interior-point method, you have both HessMult and HessFcn, which allows similar customizations.

Thanks for your answer! Unfortunately I believe that you may have misunderstood my question, possible because I formulated it somewhat sloppy. Just to be clear: I have implemented routines for computing the Jacobian and Hessian based on analytical expressions. These routines both returns sparse matrices. What I would like to do, possibly, is to exploit the fact that I know the sparsity patterns of these matrices. Unlike the matrices themselves, I can treat this pattern as constant throughout the optimization. A drawback of this is of course the possible occurrence of explicit zeros in the KKT-matrix.

Also, I should be careful to point out that I have tried the trust-region version of the interior-point method (selected by setting 'options.subproblemalgorithm' to 'cg'), not the trust-region-reflective method which does not apply to my problem.

I see. Well, I doubt there's much to be gained if you're already using sparse matrix math. It doesn't look like sparsity patterns are used in the Optimization Toolbox (and hence I'd imagine not elsewhere either) for anything other than to simplify finite difference calculations, and since you're already supplying your own Hessian, that's moot.

You could possibly save a bit of overhead by pre-storing your sparsity pattern as a vector of linear indices, along with a template of your sparse Hessian matrix. Then in your HessMult, or whatever, you can update the Hessian using the syntax

Hessian(indices)=NewValues;

where "Hessian" and "indices" are fixed parameters or persistent variables.

I expect this would be better than building the matrix from scratch, since no sorting is required. Compared to other steps in the optimization, though, I'm guessing this won't have a huge impact ... but you could try.

I believe that at least some other solvers (knitro, snopt and ipopt I think) can take advantage of a fixed sparsity pattern. Guess I will have to try these solvers then.

Thanks for the idea regarding the Hessian. I am already doing something along the lines you suggest. In fact, the cost of assembling the Hessian matrix is negligible compared to solving the linear systems in the optimization.

I believe that at least some other solvers (knitro, snopt and ipopt I think) can take advantage of a fixed sparsity pattern.

Well, it would be interesting for us here to know more about that once you've tried it. You seem to confirm my suspicion that the linear system sub-problem is the bottleneck. I have only a fuzzy idea of how the fixed sparsity pattern could be used to mitigate that.

I have just done some testing, comparing fmincon with knitro on the above-mentioned problem. In both cases I use an interior-point method with direct step computation and line-search by default. Both solvers use roughly the same number of function and derivative evaluations (the total cost of which is anyway negligible here; on the order of 10-20 seconds for runtimes of 5-20 minutes) to reach the globally optimal solution (which I happen to know) with similar accuracy. However, depending on the initial point, knitro is always between two and four times faster than fmincon. As I also think that they both use the same linear solver, MA57, I am lead to believe that exploiting a fixed sparsity pattern can indeed yield significant gains. I am not sure why though. Perhaps because one only needs to allocate memory for the matrix factor once?

Regarding your question it seems that if one does not supply sparsity patterns, knitro treats the Jacobian and Hessian as dense matrices (even if my functions returns them as sparse), making things very slow for anything but very small problems. Unless there is some setting in knitro that I'm missing the comparison you suggest can't really be done.

Knitro is compiled, while fmincon is written primarily in MATLAB. Managing memory (as you mentioned), and faster calls external libraries such as MA57, tends to lead to faster execution, generally speaking.

The sparsity pattern of the Hessian or Jacobian is not used in the interior-point algorithm of fmincon. For Knitro, it is only used for memory allocation purposes.

Some algorithms, like the trust-region-reflective algorithm in fmincon, use a sparsity pattern to perform sparse finite-differences for Jacobian/Hessian approximations. However, the trust-region algorithm doesn't solve problems with nonlinear constraints.