Reading some papers, there is a bit of a curious observation that does not exactly make a lot of sense to me.

In Quantum Chemistry, different methods exist to carry out geometry optimisation and energy calculations. Quantum chemists can have a huge field day arguing which method and basis set to use.

A common type of calculation used for larger (though large is relative to one's computational resources) is density functional theory (DFT) - because it provides good results on a reasonable time scale.

Now here is where it gets weird:
Every method has its basis set - basis sets geared towards DFT and basis sets geared towards couple cluster methods. It is generally recommended not to use a correlation consistent (cc) basis set with a DFT method (and I guess conversely, a basis set aimed at DFT should not be used with a coupled cluster method).

Why then would people benchmark methods using seemingly inappropriate basis sets. E.g. a cc basis set with an M06 functional.

$\begingroup$"Every method has its basis set" They do? Also, it is useful to implement a set of consistent basis sets when comparing methods in order to eliminate 'basis set effects'.$\endgroup$
– LordStrykerJun 29 '16 at 14:25

1

$\begingroup$@LordStryker Yes, in the sense that the basis set is geared towards a type of method and performs best with that class of methods.$\endgroup$
– DetlevCMJun 29 '16 at 14:29

2

$\begingroup$I suspect the Sherrill group picked cc- basis sets to eliminate basis-set effects when comparing with their coupled-cluster calculations. But you can ask them.$\endgroup$
– Geoff HutchisonJun 29 '16 at 21:40

2 Answers
2

It is generally recommended not to use a cc basis set with a DFT method (and I guess conversely, a basis set aimed at DFT should not be used with a coupled cluster method).

This statement glosses over some specifics that might be important.

There is nothing technically wrong with using correlation-consistent or ANO basis sets with DFT, unless the basis is deficient to begin with (cc-pVDZ). That is, using cc-pVQZ to optimize the geometry of a small organic molecule will not lead to more "incorrect" results than 6-31G* unless linear dependencies arise in the basis due to over-completeness.

However, it is most certainly inefficient to use these basis sets for general usage (SCF-type methods) due to using general contraction rather than segmented contraction. Most modern integral algorithms still have a difficult time dealing with highly-contracted basis sets. This is one reason to use the TURBOMOLE/Ahlrichs/Karlsruhe Def2 and Frank Jensen's segmented polarization-consistent (pcseg-n) basis sets.

"Using a basis set aimed at DFT should not be used with a coupled cluster method" is more tricky. The TURBOMOLE basis sets are certainly used for MP2 and coupled cluster calculations, however they are more appropriate for "single" calculations (not CBS extrapolation), and only with the largest basis one can afford. They were designed to capture the total mean-field energy, which decays exponentially, not the correlation energy, which decays like ~$(L+1)^{-3}$, where $L$ is the maximum angular momentum being considered. The correlation-consistent and atomic natural orbital basis sets are better for coupled cluster because they were designed from the beginning to recover the correlation energy.

The use of Pople-style (STO-nG, 3-21G, 6-31G*, 6-311++G(d,p), etc.) basis sets would be incorrect with wavefunction-based methods, perhaps even for qualitative understanding depending on how "difficult" your system is. Like the TURBOMOLE basis sets, they are designed to capture the mean-field atomic energies, but many have constraints (identical s- and p-type exponents, the "SP" functions that are often seen) due to the computational resources available at the time of their creation. Many of the older basis sets (notably 6-31G and not 6-311G) were optimized to use Cartesian rather than pure d-type functions, leading to potential inconsistencies in results. It is also not as obvious how to systematically improve the quality of the basis sets, and none are larger than triple-$\zeta$. Ultimately, the virtual MO space created by these basis sets is not sufficient to recover the correlation energy, and the benefits of using coupled cluster are completely lost.

$\begingroup$Thanks for the response (I'll see which I'll accept eventually).$\endgroup$
– DetlevCMJun 30 '16 at 8:01

$\begingroup$@DetlevCM, I think you should accept the other answer; I think it's more general, and it's the answer I wish I could have written. The only thing I might disagree with is extrapolation using Def2, but that's nitpicky.$\endgroup$
– pentavalentcarbonJun 30 '16 at 14:35

$\begingroup$@pentavalentcarbon Thanks for your kind comment. I suggest to take a look at the Neese paper DOI: 10.1021/ct100396y . They found good results for def2 basis sets extrapolation (on both, SCF and correlation steps). They even fit the corresponding parameters. It worked for me. Also, I like your point two.$\endgroup$
– user1420303Jun 30 '16 at 23:43

1

$\begingroup$@user1420303 Thank you for the paper. I sing their praises use them all the time (def2-QZVPP is my workhorse, with def2-SV(P) for exploratory stuff), both for DFT and MRCI, but I had no idea they were this good for CBS extrapolation of the correlation energy. I also forgot that Molcas was designed to be used with their ANO sets.$\endgroup$
– pentavalentcarbonJul 1 '16 at 4:09

Because in fact it is appropriate. In most cases there is not a huge difference (quality/efficiency) among basis set families. For example Dunning (cc) basis sets work reasonable well for DFT, and Alrichs's (def2) are ok for basis set extrapolations.

There could be many reasons for the choice:

Diffuse augmentation functions were designed together with the use of cc basis sets (aug-cc-pV(n¿+d?)Z), so it is a good combination. I would trust more in such combination than in mixing functions from different families, also it looks neater (yes, the latter matter).

They comment

[...] with a polarized triple-ζ basis set yields a mean unsigned error of $0.82~\mathrm{kcal\, mol^{-1}}$, demonstrating that only minor improvements are obtained for DFT-D by increasing the basis set size [...]

So, it suggests that the results won't change significantly with another double-ζ basis set.

Each DFT functional is parameterized with an specific basis set, so, the best basis set does not need to be the largest.

Some families are often chosen because they provide a clear hierarchy that simplifies the analysis of the results.

I do not think that the following applies to the paper you cited, but in some cases it can also just be a not so good choice.

It can be a polite decision. I remember that many years ago I wished to try the ano basis set, because in that moment I had some problems to get CBS results and they had proven to behave very good for the kind of calculation I was doing. But I received the orders to stick with the cc basis set, even though I obtained bad results with them. I have never received an argument for its use, but I suspect that "The good old Dunning's basis sets, who will complain about its usage?" was the reason.

$\begingroup$Thanks for the response (I'll see which I'll accept eventually).$\endgroup$
– DetlevCMJun 30 '16 at 8:01

1

$\begingroup$@pH13-YetanotherPhilipp , yes we do! Or maybe not (I normally prefer def2 Alrich's basis set), but when in doubt many time it is employed the basis set that one is used to. In the same way that one choose a method or functional.$\endgroup$
– user1420303Jun 30 '16 at 16:42

$\begingroup$So, which of the two good answers should I accept? Having given them some time now.$\endgroup$
– DetlevCMJul 3 '16 at 18:07