The text for this page is taken mainly from my PhD thesis, chapter 2, and my paper in Int. J. Quant. Chem. (2006).

General considerations

The energy profile for a molecule can be visualized as in the first figure, where the energy is given as function of some intrinsic coordinate of the molecule. This intrinsic coordinate consists of a combination of all relevant coordinates of the molecule, but can be visualized most easily for a diatomic molecule. In that case, the intrinsic coordinate consists of the distance between the two atoms.
The equilibrium geometry, indicated in the first figure by the star, is found where the energy is lowest. In a geometry optimization, one aims to find this equilibrium geometry. One could use energy values only, but as this requires lots of energy evaluations it is useful only for methods in which the evaluation of the energy does not take a long time (as in the case of semi-empirical or molecular mechanics methods). A more efficient approach is to use also the gradient \(g\), which is a vector containing the first derivatives of the energy \(U\) with respect to the Cartesian coordinates of the atoms \(r_{i}\):
$$g_{i} = \frac{\partial U}{\partial r_{i}}$$
In the second figure, the gradient is sketched that belongs to the energy profile of the first figure as function of the intrinsic coordinate. In the equilibrium geometry, it is zero. On both sides of the equilibrium, a force is working on the molecule to push it back towards equilibrium. This can be used in geometry optimizations to obtain the equilibrium geometry; one has to find the geometry where the gradient is zero.
The most simple algorithm takes the negative gradient as step in the geometry optimization (steepest descent)1. This requires lots of optimization steps as the convergence is usually slow. It can be improved upon by using the conjugate gradient method. This method constructs its steps by taking a mixture of the current negative gradient and the previous search direction, thereby making it conjugate to the previous step. This will improve the convergence to some extent, but the convergence remains slow.
A further improvement of optimization techniques can be obtained by including the Hessian \(H\), which is a matrix containing the second derivatives of the energy \(U\) with respect to the atomic coordinates \(r_{i}\): $$H_{ij} = \frac{\partial^2 U}{\partial r_{i} \partial r_{j}}$$ One could compute this matrix after every geometry optimization step and use it to construct a new step in the optimization, but as the calculation of the Hessian matrix may be very time-consuming, the use of approximate Hessians may be worthwhile. These approximate Hessians are not re-calculated after every geometry optimization cycle, but simply updated based on the gradient change. One of the most popular and powerful update schemes is the Broyden-Fletcher-Goldfarb-Shanno (BFGS)see Jensen scheme. In this scheme, the Hessian is updated from \(H_{0}\) to \(H_{+}\) by using the step vector \(s\) and the gradient difference \(\Delta g\): $$H_{+} = H_{0} + \frac{\Delta g \otimes \Delta g}{\Delta g \cdot s} - \frac{(H_{0} \cdot s) \otimes (H_{0} \cdot s)}{s \cdot H_{0} \cdot s}$$ In this equation, the tensor product has been used: $$(a \otimes b)_{ij} = a_{i} b_{j}$$ The BFGS scheme has, under certain weak conditions on the step vector \(s\), the property of non-negative definiteness. This property is useful as it ensures that the quadratic model has a minimum, or put otherwise, that the eigenvalues of the Hessian are all larger than or equal to zero.

Optimization steps

Optimization techniques rely on a Taylor expansion of the energy \(E\) about the atomic coordinates \(X\),see Jensen which is usually cut off at second order: $$ E_{k+1} = E_{k} + g^T \cdot X + X^T \cdot H \cdot X + .. $$ Close to a minimum, the energy surface will be quadratic, and as a result, the best guess for the step to take is given by the Newton–Raphson step \(\Delta X = - H^{-1} \cdot g\). The success of the step depends critically on the accuracy of the curvature of the energy surface, i.e., the Hessian matrix, which should best be recalculated at every step in terms of number of geometry cycles. It is more cost-effective,Bakken however, to use an approximate Hessian \(H_{a}\), with the corresponding quasi-Newton step \(\Delta X = - H^{-1}_{a} \cdot g\). This will lead to an increase of the number of geometry cycles, but as the Hessian does not have to be calculated, it will also result in a decrease in the actual time used, saving in practice up to 84% of computer time.Bakken
Only close to the minimum is the energy surface actually quadratic, and can the Taylor expansion up to second order be trusted to be valid; this region is called the trust region, with a radius \(\tau\). If the quasi-Newton (QN) or Newton–Raphson (NR) step is smaller than \(\tau\), the QN/NR step is taken, else the restricted second order (RSOYeager, also called level-shifted trust-region Newton method)Bakken model is used. In the RSO model,Yeager a step is taken on the hypersphere of radius \(\tau\), using a Lagrange multiplier to ensure that the step length equals \(\tau\).
Although at every point the QN/NR step is the best option, the geometry optimization is enhanced by using GDIIS;Csaszar although the original paper proposed using the step as error vector, later studies showed it is more effective to use the gradient as error vector [25]. Furthermore, Farkas and SchlegelFarkas have proposed a set of four rules that the GDIIS vectors have to fulfill.
We have implemented in QUILD the option to use either the step, the gradient or the “energy” vector (e.g., \(B_{ij} = g^{T}_{i} \cdot H^{-1}_{k} g_{j}\))Eckert as error vector, and either with or without the Farkas–Schlegel rules; we observed the best performance by using the gradient as error vector, with a maximum of five GDIIS vectors, and imposing the Farkas–Schlegel rules.