Tuesday, December 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Christoph Jacob and Jan Jensen:

Here is a neat trick I learned from @gyges01 on Twitter. Say you want to share an interactive molecular structure or animation with someone not well versed in structural visualization tools. Here's what you do.

Sunday, November 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler and Jan Jensen:

Thursday, October 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Sunday, September 27, 2015

Jimmy ran many PM6-D3H+ geometry optimizations using different starting geometries. Typing "grep EQUILIBRIUM *.log" showed that they had all converged, but how to extract the heat of formation on the optimized geometry? This python program does the trick. It finds the last heat of formation in each file or, more precisely, it finds the second to last "word" in the last line where the first word is HEAT.

Update 2015.10.03.
1. Apologies to +Jimmy Charnley Kromann for the poor formulation. Jimmy ran the calculations, I wrote the script. After the post came online he was immediately teased mercilessly by his so-called friends for writing such crappy code. That's on me.

2. As +Anders Steen Christensen pointed out in the comments this can be done in one line using Bash: "for f in *.log; do tac $f | grep -m1 HEAT; done"

3. If you want to do it in Fortran +Lars Bratholm suggested the following code

Tuesday, September 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Sunday, August 9, 2015

RHF is often not accurate enough for predicting the change in energies due to a chemical reaction, no matter how big a basis set we use. The reason is the error due to the molecular orbital approximation
$$ \Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \left| {{\phi _1}(1){{\bar \phi }_1}(2) \ldots {\phi _{N/2}}(N - 1){{\bar \phi }_{N/2}}(N)} \right\rangle \equiv \Phi $$
and the energy difference due to this approximation is known as the correlation energy. Just like we improve the LCAO approximation by including more terms in an expansion, we can improve the orbital approximation by an expansion, in terms of Slater determinants
$$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \sum\limits_{i = 1}^L {{C_i}{\Phi _i}({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N})} $$
The “basis set” of Slater determinants $\{\Phi_i \}$ is generated by first computing an RHF wave function $\{\Phi_0 \}$ as usual, which also generates a lot of virtual orbitals, and then generating other determinants with these orbitals. For example, for an atom or molecule with two electrons the RHF wave function is $\left| {{\phi _1}{{\bar \phi }_1}} \right\rangle $ and we have $K-1$ virtual orbitals (${\phi _2}, \ldots ,{\phi _K}$ , where $K$ is the number of basis functions), which can be used to make other Slater determinants like $\Phi _1^2 = \left| {{\phi _1}{{\bar \phi }_2}} \right\rangle $ and $\Phi _{11}^{22} = \left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $ (Figure 1).

Figure 1. Schematic representation of the electronic structure of some of the determinants used in Equation 3

Conceptually (in analogy to spectroscopy), an electron is excited from an occupied to a virtual orbital: $\left| {{\phi _1}{{\bar \phi }_2}} \right\rangle$ represents a single excitation and $\left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $ a double excitation. For systems with more than two electrons higher excitations (like triple and quadruple excitations) are also possible. In general
$$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } } + \ldots $$
The expansion coefficients can be found using the variational principle
$$\frac{{\partial E}}{{\partial {C_i}}} = 0 \ \textrm{for all} \ i$$
and this approach is called configuration interaction (CI). The more excitations we include (i.e. increase L in Eq 2.12.1) the more accurate the expansion and the resulting energy becomes. If the expansion includes all possible excitations (known as a full CI, FCI) then we have a numerically exact wave function for the particular basis set, and if we use a basis set where the HF limit is reached then we have a numerically exact solution to the electronic Schrödinger equation! That’s the good news …

The bad news is that the FCI “basis set of determinants” is much, much larger than the LCAO basis set (i.e. $L >> K$),
$$L = \frac{{K!}}{{N!(K - N)!}}$$
where $N$ is the number of electrons. Thus, an RHF/6-31G(d,p) calculation on water involves 24 basis functions and roughly $\tfrac{1}{8}K^4$ = 42,000 2-electron integrals but a corresponding FCI/6-31G(d) calculation involves nearly 2,000,000 Slater determinants.

Just like finding the LCAO coefficients involves the diagonalization of the Fock matrix, finding the CI coefficients (Ci) and the lowest energy also involves a matrix diagonalization.

$$\bf{E} = {{\bf{C}}^t}{\bf{HC}}$$

where $\bf{E}$ is a diagonal matrix whose smallest value ($E_0$) corresponds to the variational energy minimum. While the Fock matrix is a $K \times K$ matrix, the CI Hamiltonian ($\bf{H}$) is an $L \times L$ matrix. Just holding the 2 million by 2 million matrix for the water molecule using the 6-31G(d,p) basis set requires millions of gigabites! Clever programming and large computers actually makes a FCI/6-31G(d,p) calculation on $\ce{H2O}$ possible, but FCI is clearly not a routine molecular modeling tool.

Thus, we need at least single and double excitations (CISD)
$$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } } $$
to get any correlation energy. However, in general including doubles already results in an $\bf{H}$ matrix that is impractically large for a matrix diagonalization. CI, i.e. finding the $C_i$ coefficients using the variational principle, is therefore rarely used to compute the correlation energy.

Perhaps the most popular means of finding the $C_i$’s is by perturbation theory, a standard mathematical technique in physics to compute corrections to a reference state (in this case RHF). Perturbation theory using this reference is called Møller-Plesset pertubation theory, and there are several successively more accurate and more expensive variants: MP2 (which includes some double excitations), MP3 (more double excitations than MP2), and MP4 (single, double, triple, and some quadruple excitations).

Another approach is called coupled cluster which has a similar hierarchy of methods, such as CCSD (singles and doubles) and CCSD(T) (CCSD plus an estimate of the triples contributions). In terms of accuracy vs expense, MP2 is the best choice of a cheap correlation method, followed by CCSD, and CCSD(T). For example, MP4 is not too much cheaper than CCSD(T), but the latter is much more accurate. In fact for many practical purposes it is rarely necessary to go beyond CCSD(T) in terms of accuracy, provided a triple-zeta or higher basis set it used. However, CCSD(T) is usually too computationally demanding for molecules with more than 10 non-hydrogen atoms. In general, the computational expense of these correlated methods scale much worse than RHF with respect to basis set size: MP2 ($K^5$), CCSD ($K^6$), and CCSD(T) ($K^7$). These methods also require a significant amount of computer memory, compared to RHF, which is often the practical limitation of these post-HF methods. Finally, it should be noted that all these calculations also imply an RHF calculation as the first step.

In conclusion we now have ways of systematically improving the wave function, and hence the energy, by increasing the number of basis functions ($K$) and the number of excitations ($L$) as shown in Figure 2.

Figure 2 Schematic representation of the increase in accuracy due to using better correlation methods and larger basis sets.

The most important implication of this is that in principle it is possible to check the accuracy of a given level of theory without comparison to experiment! If going to a better correlation method or a bigger basis set does not change the answer appreciably, then we have a genuine prediction with only the charges and masses of the particles involved as empirical input. These kinds of calculations are therefore known as ab initio or first principle calculations. In practice, different properties will converge at different rates, so it is better to monitor the convergence of the property you are actually interested in, than the total energy. For example, energy differences (e.g. between two conformers) converge earlier than the molecular energies. Furthermore, the molecular structure (bond lengths and angles) tends to converge faster than the energy difference. So it is common to optimize the geometry at a low level of theory [e.g. RHF/6-31G(d)] followed by an energy computation (a single point energy) at a higher level of theory [e.g. MP2/6-311+G(2d,p)]. This level of theory would be denoted MP2/6-311+G(2d,p)//RHF/6-31G(d).

Finally, the correlation energy is not just a fine-tuning of the RHF result but introduces an important intermolecular force called the dispersion energy. The dispersion energy (also known as the induced dipole-induced dipole interaction) is a result of the simultaneous excitation of at least two electrons and is not accounted for in the RHF energy. For example, the stacked orientation of base pairs in DNA is largely a result of dispersion interactions and cannot be predicted using RHF.

Monday, August 3, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Wednesday, July 15, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Tobias Schwabe and Jan Jensen:

Monday, June 15, 2015

In order to compute the energy we need to define mathematical functions for the orbitals. In the case of atoms we can simply use the solutions to the Schrödinger equation for the $\ce{H}$ atom as a starting point and find the best exponent for each function using the variational principle.

But what functions should we use for molecular orbitals (MOs)? The wave function of the$\ce{H2+}$ molecule provides a clue (Figure 1)

Figure 1. Schematic representation of the wave function of the $\ce{H2+}$ molecule.

It looks a bit like the sum of two 1$s$ functions centered at each nucleus (A and B),
$${\Psi ^{{\text{H}}_{\text{2}}^ + }} \approx \tfrac{1}{{\sqrt 2 }}\left( {\Psi _{1s}^{{{\text{H}}_{\text{A}}}} + \Psi _{1s}^{{{\text{H}}_{\text{B}}}}} \right)$$
Thus, one way of constructing MOs is as a linear combination of atomic orbitals (the LCAO approximation),
$${\phi _i}(1) = \sum\limits_{\mu = 1}^K {{C_{\mu i}}{\chi _\mu }(1)} $$
an approximation that becomes better and better as $K$ increases. Here $\chi _\mu$ is a mathematical function that looks like an AO, and is called a basis function (a collection of basis functions for various atoms is called a basis set), and $C_{\mu i}$ is a number (sometimes called an MO coefficient) that indicates how much basis function $\chi _\mu$ contributes to MO $i$, and is determined for each system via the variational principle. Note that every MO is expressed in terms of all basis function, and therefore extends over the entire molecule.

If we want to calculate the RHF energy of water, the basis set for the two $\ce{H}$ atoms would simply be the lowest energy solution to the Schrödinger equation for $\ce{H}$ atom
$${\chi _{{{\text{H}}_{\text{A}}}}}(1) = \Psi _{1s}^{\text{H}}({r_{1A}}) = \frac{1}{{\sqrt \pi }}{e^{ - \left| {{{\bf{r}}_1} - {{\bf{R}}_A}} \right|}}$$
For the O atom, the basis set is the AOs obtained from, say, an ROHF calculation on $\ce{O}$, i.e. $1s$, $2s$, $2p_x$, $2p_y$, and $2p_z$ functions from the solutions to the Schrödinger equation for the $\ce{H}$ atom, where the exponents ($\alpha_i$'s) have been variationally optimized for the $\ce{O}$ atom,
$$\Psi _{1s}^{\text{H}},\;\Psi _{2s}^{\text{H}},\;\Psi _{2p}^{\text{H}} \xrightarrow{\frac{\partial E}{\partial \alpha_i}=0} \phi _{1s}^{\text{O}},\;\phi _{2s}^{\text{O}},\;\phi _{2p}^{\text{O}} \equiv \;\left\{ {{\chi _O}} \right\}$$
Notice that this only has to be done once, i.e. we will use this oxygen basis set for all oxygen-containing molecules. We then provide a guess at the water structure and the basis functions are placed at the coordinates of the respective atoms. Then we find the best MO coefficients by variational minimization,
$$\frac{{\partial E}}{{\partial {C_{\mu i}}}} = 0 $$
for all $i$ and $\mu$. Thus, for water we need a total of seven basis functions to describe the five doubly occupied water MOs ($K = 7$ and $i = 1-5$ in Eq 5). This is an example of a minimal basis set, since it is the minimum number of basis functions per atoms that makes chemical sense.

Figure 2. (a) An exponential function is not well represented by one Gaussian, but (b) can be well represented by a linear combination of three Gaussians.

Here the $a_{i\mu}$parameters (or contraction coefficients) as well as the Gaussian exponents are determined just once for a given STO basis function. $\chi_\mu$ is a contracted basis function and the $X$ individual Gaussian functions are called primitives. Generally, three primitives are sufficient to represent an STO, and this basis set is known at the STO-3G basis set. $p$- and $d$-type STOs are expanded in terms of $p$- and $d$-type primitive Gaussians [e.g. $({x_1} - {x_A}){e^{ - \beta r_{1A}^2}}$ and $({x_1} - {x_A})({y_1} - {y_A}){e^{ - \beta r_{1A}^2}}$]. An RHF calculation using the STO-3G basis set is denoted RHF/STO-3G. Unless otherwise noted, this usually also implies that the geometry is computed (i.e. the minimum energy structure is found) at this level of theory.

Minimal basis sets are usually not sufficiently accurate to model reaction energies. This is due to the fact that the atomic basis functions cannot change size to adjust to their bonding environment. However, this can be made possible by using some the contraction coefficients as variational parameters. This will increase the basis set size (and hence the computational cost) so this must be done judiciously. For example, we’ll get most improvement by worrying about the basis functions that describe the valence electrons that participate most in bonding. Thus, for $\ce{O}$ atom we leave the 1$s$ core basis function alone, but “split” the valence 2$s$ basis function into linear combinations of two and one Gaussians respectively,
$$\begin{split}
{\chi _{1s}} &= \sum\limits_i^3 {{a_{i1s}}{e^{ - {\beta _i}r_{1A}^2}}} \\
{\chi _{2{s_a}}} &= \sum\limits_i^2 {{a_{i2s}}{e^{ - {\beta _i}r_{1A}^2}}} \\
{\chi _{2{s_b}}} &= {e^{ - {\beta _{2{s_b}}}r_{1A}^2}} \\
\end{split} $$
and similarly for the 2$p$ basis functions. This is known as the 3-21G basis set (pronounced “three-two-one g” not “three-twenty one g”), which denotes that core basis functions are described by 3 contracted Gaussians, while the valence basis functions are split into two basis functions, described by 2 and 1 Gaussian each. Thus, using the 3-21G basis set to describe water requires 13 basis functions: two basis functions on each $\ce{H}$ atom (1$s$ is the valence basis function of the H atom) and 9 basis functions on the $\ce{O}$ atom (one 1$s$ function and two each of 2$s$, 2$p_x$, 2$p_y$, and 2$p_z$).

The $\chi _{2{s_a}}$ basis function is smaller (i.e., the Gaussians have a larger exponent) than the basis function. Thus, one can make a function of any intermediate size by (variationally) mixing these two functions (Figure 3). The 3-21G is an example of a split valence or double zeta basis set (zeta, ζ, is often used as the symbol for the exponent, but I find it hard to write and don’t use it in my lectures). Similarly, one can make other double zeta basis sets such as 6-31G, or triple zeta basis sets such as 6-311G.

Figure 3. Sketch of two different sized $s$-type basis functions that can be used to make a basis function of intermediate size

As the number of basis functions ($K$ in Eq 2) increase the error associated with the LCAO approximation should decrease and the energy should converge to what is called the Hartree-Fock limit ($E_{\text{HF}}$) that is higher than the exact energy ($E_{\text{exact}}$) (Figure 4). The difference is known as the correlation energy,
$$E_{\text{corr}}=E_{\text{HF}}-E_{\text{exact}}$$
and is the error introduced by the orbital approximation
$$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \left| {{\phi _1}(1){{\bar \phi }_1}(2) \ldots {\phi _{N/2}}(N - 1){{\bar \phi }_{N/2}}(N)} \right\rangle $$

Figure 4. Plot of the energy as a function the number of basis functions.

However, in the case of a one-electron molecule like $\ce{H2+}$ we would expect the energy to converge to $E_{\text{exact}}$ since there is no orbital approximation. However, if we try this with the basis sets discussed thus far we find that this is not the case (Figure 5)!

What’s going on? Again we get a clue by comparing the exact wave function to the LCAO-wave function (Figure 6).

Figure 6. Comparison of the exact wave function and one computed using the 6-311G basis set.

We find that compared to the exact result there is not “enough wave function” between the nuclei and too much at either end. As we increase the basis set we only add $s$-type basis functions (of varying size) to the basis set. Since they are spherical they cannot be used to shift electron from one side of the $\ce{H}$ atom to the other. However, $p$-functions are perfect for this (Figure 7).

Figure 7. Sketch of the polarization of an s basis function by a p basis function

So basis set convergence is not a matter of simply increasing the number of basis functions, it is also important to have the right mix of basis function types. Similarly, $d$-functions can be used to “bend” $p$-functions (Figure 8).

Figure 8. Sketch of the polarization of a p basis function by a d basis function

Such functions are known as polarization functions, and are denoted with the following notation. For example, 6-31G(d) denotes d polarization functions on all non-$\ce{H}$ atoms and can also be written as 6-31G*. 6-31G(d,p) is a 6-31G(d) basis set where p-functions have been added on all $\ce{H}$ atoms, and can also be written 6-31G**. A RHF/6-31G(d,p) calculation on water involves 24 basis functions: 13 basis functions for the 6-31G part (just like for 3-21G) plus 3 $p$-type polarization functions on each H atom and 5 $d$-type polarization functions (some programs use 6 Cartesian d-functions instead of the usual 5).

Anions tend to have very diffuse electron distributions and very large basis functions (with very small exponents) are often needed for accurate results. These diffuse functions are denoted with “+” signs: e.g. 6-31+G denotes one s-type and three $p$-type diffuse Gaussians on each non-$\ce{H}$ atom, and 6-31++G denotes the addition of a single diffuse $s$-type Gaussian on each $\ce{H}$-atom. Diffuse functions also tend to improve the accuracy of calculations on van der Waals complexes and other structures where the accurate representation of the outer part of the electron distribution is important.

Of course there are many other basis sets available, but in general they have the same kinds of attributes as described already. For example, aug-cc-pVTZ is a more modern basis set: “aug” stands for “augmented” meaning “augmented with diffuse functions”, “pVTZ” means “polarized valence triple zeta”, i.e. it is of roughly the same quality as 6-311++G(d,p). “cc” stands for “correlation consistent” meaning the parameters were optimized for correlated wave functions (like MP2, see below) rather than HF wave function like Pople basis sets [such as 6-31G(d)] described thus far.

Monday, June 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Sunday, May 24, 2015

Let’s start by considering a simple example with one coordinate (for example a distance), $R$, and where the energy is given by

$$ E= \frac{1}{2}(R-R_e)^2$$
The corresponding potential energy surface, known as a quadratic PES, is shown in Figure 1.

Figure 1. Plot of the (quadratic) potential energy surface given by Equation 1.

Here $R_e$ is the value of $R$ at which the energy is lowest (this is known as the equilibrium geometry) and this is what we’d like to find. We start by taking a guess at $R$, $R_g$. We already know how to check whether this is an energy minimum: we need to evaluate the gradient, which is
$$\left(\frac{\partial E}{\partial R}\right)_{R_g}=k(R_g-R_e)$$
It’s clear that the gradient is non-zero for $R_e \ne R_g$. However, by rearranging the equation the gradient also tells us how to change $R$ to get an energy minimum
$$R_e=R_g-\frac{1}{k}\left(\frac{\partial E}{\partial R}\right)_{R_g}=R_g+\frac{1}{k}F_g$$
where $F$ is the force, which is the negative gradient. If we know $k$ we can find $R_e$ in one step starting from $R_g$ (Figure 2).

Figure 2. The equilibrium geometry can be found in one step on a quadratic PES

If we don’t know $k$ then it is safest to take many small steps, i.e. scale the gradient by some small constant ($c$) and repeat until the gradient falls below some threshold (Figure 3)
$$R_{n+1}=R_n-c\left(\frac{\partial E}{\partial R}\right)_{R_n}$$

Figure 3. Energy minimization by steepest descent.

This general approach is known as steepest descent, since the gradient tells you in which direction where the energy is decreasing (descending) the most. Notice that this means that minimizing the energy will find the closest minimum to the starting geometry. When steepest descent is used to find equilibrium geometries it is often combined with a so-called “line search”, which tries to find the lowest energy in the direction of the current gradient, but the general idea is still the same.

Another use of steepest descent is to connect a transitions state with its two closest minima, since this tells you what two minima the transition state connects. Here the guess structure is a transition state structure displaced along the normal mode corresponding to the imaginary frequency (the transition state structure itself cannot be used because its gradient is zero). This path is the minimum energy path (MEP) between reactions and products and the resulting collection of structures is known as the intrinsic reaction coordinate (IRC). An IRC is usually depicted as a plot of the potential energy vs the mass-weighted root-mean-square-displacement of the Cartesian coordinates relative to some reference geometry (usually the transition state), usually denoted $s$ (Figure 4). When we draw a potential energy surface for a reaction we usually mean an IRC.

Figure 4. Depiction of an IRC or MEP: two steepest descent paths starting from the transition state

Clearly, the more steps we take, the higher the computational cost, and in general we want to take as few steps as possible. As I mentioned above, if we knew $k$ we could find the energy minimum in one step. For a quadratic surface $k$ is the second derivative of $E$ with respect to $R$ (or the first derivative of the gradient) so Eq 3 becomes
$$ R_e=R_g-\left(\frac{\partial^2 E}{\partial R^2}\right)^{-1}_{R_g}\left(\frac{\partial E}{\partial R}\right)_{R_g}$$
Such a step is known as a Newton-Raphson step or quadratic step. This works fine if the surface is quadratic, which is a good approximation near the minimum but not far away. For really bad guesses, where the PES tends to be flat, the quadratic step can be too large (Figure 5).

Figure 5. Quadratic steps (a) close to and (b) far from the minimum.

Most algorithms will scale back quadratic steps that are considered unreasonably large, and even using quadratic energy minimizers many steps are needed:
$$R_{n+1}=R_n-\left(\frac{\partial^2 E}{\partial R^2}\right)^{-1}_{R_n}\left(\frac{\partial E}{\partial R}\right)_{R_n}$$
$$ \mathbf{q}_{n+1}=\mathbf{q}_{n}-c\mathbf{H}^{-1}_n\mathbf{g}_n$$
Eq. 7 is the same equation as 6 but in many dimensions: qn are the current coordinates (for example Cartesian coordinates) for step number $n$ ($n = 0$ for the guess coordinates), $\mathbf{q}_{n+1}$ are the new coordinates, $\mathbf{g}_n$ is the gradient, and $\mathbf{H}_n$ is the Hessian both evaluated at the current coordinates. Note that the Hessian should be computed at each step.

Friday, May 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Mario Barbatti, Steven Bachrach, and Jan Jensen:

Wednesday, April 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Tobias Schwabe, Martin Korth, Steven Bachrach, and Jan Jensen:

Sunday, March 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Sunday, February 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Andres Cisneros, Steven Bachrach, David Bowler, and Jan Jensen:

Wednesday, January 7, 2015

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in two previous posts (Part 1 and Part 2). In addition I'll talk about further improvements that could potentially increase the accuracy further.

where $\Delta E_\mathrm{gas}$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{gas,RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively - both evaluated in the gas phase. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E_{\mathrm{gas}}$. $\Delta \delta G^\circ_{\mathrm{solv}}$ is the change in solvation free energy computed using a continuum model of the solvent, such as COSMO.

Solvation Thermodynamics

Background. Most continuum models (CMs) of solvation compute the solvation free energy as the difference between the free energy in solution and the gas phase electronic energy

$G\mathrm{^{CM}_{soln}(X)}$ typically contain energy terms describing the electrostatic interaction of the molecule and the continuum as well as the van der Waal inteactions with the solvent and free energy required to create the molecular cavity in the solvent (cavitation). The electrostatic interaction with the solvent alters the molecular wavefunction and is computed self-consistently.

Some software packages automatically compute $G\mathrm{^{\circ,CM}_{soln}(X)}$ and $E\mathrm{_{gas}(X)}$ in one run, while others only compute $G\mathrm{^{\circ,CM}_{soln}(X)}$. Also, some programs just compute the electrostatic component of $G\mathrm{^{\circ,CM}_{soln}(X)}$ by default. However, the van der Waals and, especially, the cavitation component can make sizable contributions to the binding free energy and must be included for accurate results. It is worth noting that any hydrophobic contribution to binding will derive primarily from the change in cavitation energy.

The standard state for both $G\mathrm{^{\circ,exp}_{soln}(X)}$ and $G\mathrm{^{\circ,exp}_{gas}(X)}$ is generally 1 M. The latter is the reason why a 1 M reference state also must be used when computing $\Delta G^\circ_{\mathrm{RRHO}}$

Atomic radii. The solvation energy is computed using a set of atomic radii that define the solute-solvent boundary surface. These radii are usually obtained by fitting to experimentally measured solvation energies. Accurate solvation energies should not be expected from methods that use iso-electron density surfaces or van der Waals radii without additional empirical fitting. When using fitted radii one should use the same level of theory for the solute as was used in the parameterization.

Ions. For neutral molecules solvation free energies can be measured with an accuracy of roughly 0.2 kcal/mol and reproduced theoretically to within roughly 0.5 kcal/mol. However, the solvation energies of ions cannot be directly measured and must be indirectly inferred relative to a standard (typically the solvation energy of the proton). The experimentally obtained solvation energies are typically accurate to within 3 kcal/mol and can be reproduced computationally with roughly the same accuracy. The solvation energy of ions are therefore an especially likely source of error in binding free energies - especially if the ionic regions of the molecules become significantly desolvated due to binding.

Gas phase vs solution optimization. The fitting of the radii described above is usually done using gas phase optimized structures only, i.e. any change in structure and corresponding rotational and vibrational effects are "included" in the radii via the parameterization. However, for ionic species gas phase optimization can lead to significantly distorted structures or even proton transfer and in these cases solution phase optimizations and, hence, vibrational frequency calculations, tend to be used. However, numerical instability in the continuum models can make it necessary to increase (i.e. make less stringent) the geometry convergence criteria and can lead to more imaginary frequencies than in the gas phase. One option is to compute the vibrational contribution to $\Delta G^\circ_{\mathrm{RRHO}}$ using gas phase optimized structures.

When using solution phase geometries the gas phase single point energies needed to evaluate $\delta G\mathrm{^\circ_{solv}(X)}$ represent added computational expense and it can be tempting to use solution phase free energies to evaluate the binding free energies

One problem with this approach is that $\Delta G\mathrm{^{\circ,CM}_{soln}}$, unlike $\Delta E_{\mathrm{gas}}$, is not systematically improveable due to the empirical parameterization.

Cavities. The atomic radii and corresponding cavity generation algorithms are parameterized for small molecules. For more complex molecules such as receptors this can lead to continuum solvation of regions of molecules, e.g. deep in the binding pocket, that are not accessible to the molecular solvent. Furthermore, any solvent molecule inside such pocket is likely to be quite "un-bulk-like" and not well-represented by the bulk solvent or fixed by the underlying parametrization. However, how big an error this may introduce to the binding free energy is not really known, but certain models for the cavitation energy have been shown to give unrealistically large contributions to the binding free energy.

Explicit water molecules. Adding explicit solvent molecules to the receptor and/or ligand can potentially lead to more accurate results. For example, including explicit water molecules around ionic sites reduces the strong dependence of the solvation energy on the corresponding atomic radii. Also, "un-bulk-like" water molecules now are treated more naturally and the risk of solvating non-solvent-acesssible regions is reduced somewhat. However, adding explicit solvent molecules increases the computational cost by increasing the CPU time needed to compute energies, perform conformational searches, and compute vibrational frequencies.

There are several approaches to include the effect of explicit solvent molecules in the binding free energy. Bryantsev and co-workers suggest computing the solvation energy by

with "$\circ$" and "$liq$" referring to a standard state of 1 M and 55.34, respectively. The term $RT\ln(\mathrm{[H_2O]/n})$ is the free energy required to change the standard state of (H$_2$O)$_n$ from 1 M to 55.34/n M.

Bryantsev et al. have shown that using this water cluster approach leads to a smooth convergence of the solvation free energy with respect to the cluster size $n$. The optimum choice of $n$ is this one where an additional water does changes the solvation energy by less than a certain user defined amount. One can thereby compute the optimum number of water molecules for the receptor ($n$), ligand ($m$) and receptor-ligand complex ($l$) and then compute the binding free energy as

However, this is only approximately true in practice due to errors in the computed gas phase and solvation free energies. Furthermore, Reaction 2 does not really lead to any significant reduction in CPU time because the water cluster free energies only have to be computed once. However, if Reaction (2) is used then one must add an additional term correcting for the indistinguishability of water molecules

and similarly for the water clusters. Using Reaction (1) leads to a cancellation of this term and also maximizes error cancellation in the other energy terms. Similar considerations apply to when using individual water molecules to the balance the reaction instead of water clusters

When using many explicit water molecules the error in the continuum solvation energies can be reduced by ensuring that the continuum solvation energy of a single water molecule matches the experimental value of -6.32 kcal/mol at 298.15K as close as possible.

Thursday, January 1, 2015

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Robert Paton, Steven Bachrach, David Bowler, and Jan Jensen: