A modified conjugate gradient algorithm for geometry optimization is outlined for use with ab initio MO methods. Since the computation time for analytical energy gradients is approximately the same as for the energy, the optimization algorithm evaluates and utilizes the gradients each time the energy is computed. The second derivative matrix, rather than its inverse, is updated employing the gradients. A t each step, a one-dimensionalminimization using a quartic polynomial is carried out, followed by an n-dimensional search using the second derivative matrix. By suitably controlling the number of negative eigenvalues of the second derivative matrix, the algorithm can also be used to locate transition structures. Representative timing data for optimizations of equilibrium geometries and transition structures are reported for ab initio SCF-MO calculations.

INTRODUCTION
During the past decade, the analytical calculation of the first derivatives of the energy has become practical and efficient for ab initio molecular orbital meth0ds.l For SCF calculations, analytical gradients can provide an order of magnitude more information about a molecule than the energy computation alone, while only doubling the computational time. Analytical derivative calculations recently have been extended to methods that include correlation energy.2 Energy gradients were first used t o calculate vibrational force constants.3 However, geometry optimization has now become the most widespread application. In this article we report a modified conjugate gradient algorithm for geometry optimization that is incorporated in the ab initio molecular orbital program GAUSSIAN 80.4 T h e algorithm takes into account the special economics of ab initio calculations, and is capable of converging on saddle points as well as minima. From standard numerical analysis method^,^ a variety of algorithms for functional minimization are available that make effective use of gradients, e.g., conjugate gradients, variable metric, super memory gradient, and so forth. These techniques are all closely related and a number of applications to quantum mechanical optimization problems already exist.6 However, the established methods

* Alfred P. Sloan Fellow. 1981-1983.

can be improved by adapting them specifically t o the unique features of a.b initio calculations. Energy calculations by a b initio methods require large quantities of computer time and central memory. Therefore, the optimization method can be quite elaborate and yet not significantly increase the overall demand on computer resources. Furthermore, the time to evaluate the n-dimensional gradient is frequently less than the energy computation time. Many standard algorithms assume the gradients require n times the effort of the function evaluation, based on computing the derivatives numerically and hence limit their use of the gradients. For a b initio optimizations, it is more efficient to calculate and use the gradient each time the energy is computed. Most conjugate gradient algorithms work with the inverse of the hessian or second derivative matrix, and use a relativly simple updating scheme to improve the initial matrix. Usually this procedure must be restarted after n steps by reinitializing the hessian matrix. In the current application, one cannot afford t o discard so much information. Additionally for geometry optimization, a great deal of information about the second derivative matrix is available from empirical rules and chemical principles. Thus, it is preferable to devise an algorithm based on the hessian, rather than its inverse, so that restarts can be avoided and so that chemical information can be incorporated readily.

ALGORITHM
Consider a function E , depending on n variables
x, , and the n-dimensional gradient of this function
g, = dE/dx,. We wish to construct a series of steps

(3)
These values are then used to update the current estimate of F.
F m W

leading to a stationary point of E , i.e., a point where g = 0. Assume that p such steps have already been taken. As step p 1we have the value of the function, E a , and its gradient, ga,a t m 1 points (0 I a I m 5 p ) . Let this information be arranged such that x" is the current point, x1the most recent previous point, and so on, with x m the oldest. From the previous step we also have an n X n matrix F that is an approximation to the true second derivative matrix, i.e., F,, = d2E/dx,dx,. A single optimization step can be divided into three parts: (a) obtaining corrections to the approximate second derivative matrix, (b) searching for a minimum along the line between the current point and the previous point, and (c) estimating the location of the stationary point in the entire n-dimensional space by using the gradient and the approximate second derivative matrix. If no previous points are available, for example a t step 1, parts ( a ) and ( h ) are omitted. At the beginning of the optimization process a suitable estimate of the second derivative matrix is required (uide rnfra 1. Part a. Construct an orthonormal basis set of vectors r a for the space spanned by the vectors ( x a - xO)by Schmidt orthogonalization.

[, = F:id + baa (k;b - hzb)[rer,b C
+ (1 - 6,b)rprPI
(4)

m

+

+

rp

= F:(Fal

Under certain circumstances, it may be necessary is too far from the current point ( 1 x a - xoI > r,,,) or i f points are nearly collinear ( IFa I < rthresh),then the point is discarded. Thus a t step p + 1,a total of m 1 energy and gradient evaluations remain for use in correcting the approximate force constan t matrix. I n the space spanned by the vectors r a ,obtain the new and the old estimates of the second derivatives, k",and k z b , respectively:
to eliminate some of the previous points. If a point

+

In eq. (2) the values for k:b and hka will, in general, be different if E is not a quadratic function. However, we wish F to be a symmetric matrix. The magnitude of the nonquadratic contributions to (ga - go) can be assumed to increase with the distance Ixa - xoI. If we make the further assumption that each step predicts a point closer to the minimum, or reorder x a so that this is true, then (g"- go)are ordered approximately in terms of increasing nonquadratic contributions. Hence, for b > a , kgb can be expected to have a smaller nonquadratic component than k k. If the magnitude of the gradient vector is very large, it may be better to avoid updating the second derivative matrix until the gradients fall below a given cutoff. Thus the later steps in the optimization will not be spent correcting the bad approximation to the second derivatives obtained in the first few steps. The net effect of this approach is to let the first few steps be chosen in a steepest descent sense, until the gradients are reduced to a reasonable value. Part b. Search for a minimum in E in the direction ( x a- xo)by fitting a polynomial to EO, go, E l , and g'. In the current application, a quartic function is fitted to the two energies and directional gradients with the added constraint that d 2E/dx 2 0 with the equality holding a t only one point. Such a curve has the advantage of possessing a single local minimum that is also the global minimum. This is found to be more satisfactory than a cubic polynomial. Let the coordinates of the minimum of the one-dimensional search be I,.The derivatives g, a t ZL can then be obtained from go and gl by interpolation. Part c. Using the new estimate o f t h e second derivative matrix F from part a, and the result from the one-dimensional search of part b, the next estimate of the location of the stationary point is given by
~:ew

= I.-

C FYI- j [I g
1

(4)

The inverse of the second derivative matrix is calculated by first diagonalizing the matrix. The

Figure 1. Sample input for an optimization run. The geometry is specified via the 2 matrix in terms of internal coordinates. For example, the sixth card of the 2 matrix defines atom 6 as a hydrogen attached to atom 1, a t a distance of BH2, making an angle with 2 of OBH2, and a dihedral angle 6-1-2-3 of -F90. Variables that are to be optimized are defined in a list following the 2 matrix. Constants, if any, are specified in an additional list.

not met, the process is continued unless the maximum number of steps is exceeded. After the implementation of the above approach, an optimization method by Brodlie7 came to our attention. This algorithm also maintains the hessian instead of its inverse but uses a somewhat different updating scheme. For a quadratic function, Brodlie’s approach and most earlier algorithms require accurate one-dimensional searches in order to converge to the exact hessian. The present updating formula yields the exact hessian within the space spanned by the displacement vectors at any stage during the optimization, regardless of the manner in which the previous points are generated (albeit, a t the cost of storing all the previous data). A further similarity is that both algorithms use a polynomial model for the one-dimensional search. Brodlie has adopted a cubic polynomial, whereas we find a constrained quartic to be more suitable for minimizations. The fact that both methods perform well reinforces the merit and utility of the features common to the two algorithms. IMPLEMENTATION The algorithm outlined above is currently used in GAUSSIAN 80 to optimized molecular structures. This program: like its predecessors,* employs the 2-matrix notation to input the geometry of a molecule using internal coordinates. The position of an atom is expressed in terms of bondlengths, valence angles, and dihedral angles relative to atoms already defined. The original 2-matrix definition* has been extended to provide a simple, yet flexible method for specifying geometry optimizations, as illustrated in Figure 1.The internal coordinates in the 2 matrix that are to be optimized are treated as algebraic variables and are given names which can be prefixed with a plus or minus sign. The initial value of each variable is specified in a paramter list appended to the 2 matrix. The optimization of all coordinates or only a few parameters, the imposition of constraints on individual variables, and the utilization of symmetry are all readily achieved with this notation. In addition to starting values of geometrical parameters, the optimization requires an initial estimate of the second derivative matrix. Various methods of increasing accuracy and expense are available within the framework of the GAUSSIAN 80 geometry optimization package. Bond stretching force constants can be estimated fairly

eigenvalues can then be tested to ensure that the matrix is positive definite and thus that the optimization step proceeds toward a minimum. If a negative eigenvalue is found, its sign is reversed. The net effect is to force a steepest descent step along the direction of the eigenvector possessing the negative eigenvalue. Additional bounds on the maximum and minimum magnitudes of the eigenvalues may be useful to improve the behavior of the optimization. The inverse matrix can then be constructed from the eigenvectors and eigenvalues. The total predicted change in any individual x i resulting from both the linear and the quadratic search is required to be within certain limits. If the change in x is too large, then the entire predicted change is scaled so that the limits are not exceeded. The next step is carried out in a similar manner, retaining information from a t most n previous steps and discarding the oldest information if necessary. A t each step the gradient and displacement vectors are tested to see if the optimization has converged. The root-mean-square gradient and the absolute value of the largest component of the gradient must fall below their respective thresholds. The estimated displacement must pass a similar test on the root mean square and the absolute value of the largest component. If these four conditions are satisifed, the optimization is considered complete. If the conditions are

well from empirical rules,g and average anglebending and torsion constants can be obtained from vibrational spectra or theoretical calculations. Such an empirical guess a t a diagonal force constant matrix is the default in GAUSSIAN 80. A more elaborate empirical force field similar to classical strain minimization calculations is a better starting point for optimizations of strongly coupled coordinates, such as those occurring in cyclic molecules.’O More accurate still is the direct calculation of a few second derivatives by finite differences. For each of the chosen coordinates, the molecule is displaced from its starting geometry and an additional calculation is performed. The diagonal second derivative can be obtained from an energy calculation a t the displaced geometry combined with the energy and gradient a t the starting point. If the gradient as well as the energy is calculated at the displaced geometry, then all the off-diagonal second derivatives associated with that coordinate can also be obtained. T h e best initial guess is a full force constant matrix, either read in from a previous calculation or calculated directly by numerical or analytical means.2a Often it is useful to calculate the force constants with a smaller and cheaper basis set. Typical examples of geometry optimizations* are collected in Table I. With reasonable guesses a t the starting geometry, the number of optimization steps is usually fewer than the number of variables. Strongly coupled coordinates and very flexible modes of motion increase the number of steps. TRANSITION STRUCTURES

a Convergence thresholds were set so that the error in the final geometries are less than f0.001 8, for bondlength and f0.1’ for angles. Unless otherwise indicated, calculations were performed on the Amdahl47OVl6 a t Wayne State University. CDC Cyber 172 a t Merck Sharp & Dohme using GAlJSSIAN 78. Starting with the geometry shown in Fig. 1. Starting with the 3-21G optimized structure.

A transition structure for a chemical reaction is a stationary point on an energy surface that is a local maximum along one and only one direction while being a local minimum in all other orthogonal directions. This unique direction is termed the transition vector and is the eigenvector associated with the negative eigenvalue of the force constant matrix. In general, the orientation of the transition vector is not known a priori, and hence must be determined in the course of a transition structure optimization. A recent review examines a variety of transition structure optimization methods12; most are unsatisfactory because of slow conver-

* With this optimization method and an automatic archiving mechanism, it has been possible t o optimize and tabulate more than 3000 molecules within the past two years. See ref. 11.

gence, lack of generality, or optimization to false stationary points. The minimization algorithm outlined above can be modified to converge on transition structures as well, by forcing the second derivative matrix to contain a single negative eigenvalue. If certain initial conditions are met, the optimization approaches to the transition structure in the same manner as it proceeds to a minimum. The starting geometry must be chosen within the quadratic region of the transition structure, i.e., that region where the force constant matrix contains one negative eigenvalue. Additionally, the initial guess a t the second derivative matrix must contain an appropriate negative eigenvalue. This is usually accomplished by calculating explicitly a few of the diagonal and off-diagonal force constants for the coordinates expected t o dominate the transition vector, by using the finite-difference technique built into the minimization program. The updating scheme used to improve the second derivative matrix (part a above) automatically improves the direction of the transition vector, since it is an eigenvector of the matrix. A t each step in the optimization, the eigenvalues of the force constant

218

Schlegel mization space is defined by the user and a maximum is found in one subspace, a minimum in the other. The main drawback is that the division of the optimization space is fixed, whereas in the present algorithm, it is improved in the course of the optimization. Cerjan and Miller [J.Chem. Phys., 75, 2800 (1981)l have devised a hill-climbing procedure for finding transition structures that requires analytical second derivatives a t each step. Compared to the present work, a similar number of steps may be required, but each second derivative calculation requires roughly n times the computational effort of a gradient evaluation. The author would like to thank Professor J . A. Pople and Professor L. C. Allen for constant encouragement during the development of these methods. He would also like to thank Professor Pople for an improvement to the algorithm. Financial support from the National Research Council of Canada and the Petroleum Research Foundation are gratefully acknowledged.

a See footnote a of Table I. Preliminary steps were used to calculate certain elements of the force constant matrix. The total number of energy plus gradient computations is the sum of the preliminary and the optimization steps.

References

matrix are tested to ensure that a single negative eigenvalue is retained (see part c above). The one-dimensional search (part b) must be omitted, since it cannot be determined with any degree of certainty whether a minimum or a maximum is desired in that one direction. The second derivative matrix can be computed directly, if it is necessary to verify that the stationary point found is indeed a proper saddle point. The present method does have a drawback in that the starting geometry must be in the quadratic region of the saddle point. For many reactions this is not a problem, since this region is relatively large and a reasonable starting structure can be estimated easily. However, there are some reactions such as those forbidden or nearly forbidden by orbital symmetry arguments, that may require a preliminary, manual survey of the energy surface to locate the quadratic region of the saddle point. A technique devised by Muller may be useful for this.13 Examples of transition structure optimizations are presented in Table 11. The convergence to the saddle point is similar to convergence for minimization. The somewhat slower rate might be expected since transition states are generally more flexible than equilibrium structures, and since starting geometries and initial force constants cannot be estimated as well. Performance is enhanced further if the coordinates an be chosen so that the transition vector is dominated by only one or two coordinates.
Note added in proof. After the submission of this article, two additional publications appeared on finding transition states. Bell, Gordon, and Fletcher [Chem.Phys. Lett., 82, 122 (1981)] have modified a quasi-Newton (BFGS) algorithm to locate saddle points. A partitioning of the opti-