In quasi-Newton methods, the Hessian matrix of second derivatives doesn't need to be evaluated directly. Instead, the Hessian matrix is approximated using updates specified by gradient evaluations (or approximate gradient evaluations). Quasi-Newton methods are generalizations of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensional problems, the secant equation does not specify a unique solution, and quasi-Newton methods differ in how they constrain the solution. The BFGS method is one of the most popular members of this class.[2] Also in common use is L-BFGS, which is a limited-memory version of BFGS that is particularly suited to problems with very large numbers of variables (e.g., >1000). The BFGS-B[3] variant handles simple box constraints.

where Bk{\displaystyle B_{k}} is an approximation to the Hessian matrix which is updated iteratively at each stage, and ∇f(xk){\displaystyle \nabla f(\mathbf {x} _{k})} is the gradient of the function evaluated at xk. A line search in the direction pk is then used to find the next point xk+1. Instead of requiring the full Hessian matrix at the point xk+1 to be computed as Bk+1, the approximate Hessian at stage k is updated by the addition of two matrices.

Bk+1=Bk+Uk+Vk{\displaystyle B_{k+1}=B_{k}+U_{k}+V_{k}\,\!}

Both Uk and Vk are symmetric rank-one matrices, but their sum is a rank-two update matrix.

From an initial guess x0{\displaystyle \mathbf {x} _{0}} and an approximate Hessian matrix B0{\displaystyle B_{0}} the following steps are repeated as xk{\displaystyle \mathbf {x} _{k}} converges to the solution.

Perform a line search to find an acceptable stepsize αk{\displaystyle \alpha _{k}} in the direction found in the first step, then update xk+1=xk+αkpk.{\displaystyle \mathbf {x} _{k+1}=\mathbf {x} _{k}+\alpha _{k}\mathbf {p} _{k}.}

f(x){\displaystyle f(\mathbf {x} )} denotes the objective function to be minimized. Convergence can be checked by observing the norm of the gradient, |∇f(xk)|{\displaystyle \left|\nabla f(\mathbf {x} _{k})\right|}. Practically, B0{\displaystyle B_{0}} can be initialized with B0=I{\displaystyle B_{0}=I}, so that the first step will be equivalent to a gradient descent, but further steps are more and more refined by Bk{\displaystyle B_{k}}, the approximation to the Hessian.

The first step of the algorithm is carried out using the inverse of the matrix Bk{\displaystyle B_{k}}, which can be obtained efficiently by applying the Sherman–Morrison formula to the fifth line of the algorithm, giving

This can be computed efficiently without temporary matrices, recognizing that Bk−1{\displaystyle B_{k}^{-1}} is symmetric, and that ykTBk−1yk{\displaystyle \mathbf {y} _{k}^{\mathrm {T} }B_{k}^{-1}\mathbf {y} _{k}} and skTyk{\displaystyle \mathbf {s} _{k}^{\mathrm {T} }\mathbf {y} _{k}} are scalar, using an expansion such as

In statistical estimation problems (such as maximum likelihood or Bayesian inference), credible intervals or confidence intervals for the solution can be estimated from the inverse of the final Hessian matrix. However, these quantities are technically defined by the true Hessian matrix, and the BFGS approximation may not converge to the true Hessian matrix.

A high-precision arithmetic version of BFGS (pBFGS), implemented in C++ and integrated with the high-precision arithmetic package ARPREC is robust against numerical instability (e.g. round-off errors).