Fast evaluation of solid harmonic Gaussian integrals for local resolution- of-the-identity methods and range-separated hybrid functionals Dorothea Golze,1,a) Niels Benedikter,2 Marcella Iannuzzi,1 Jan Wilhelm,1 and Ju¨rg Hutter1 1)Department of Chemistry, University of Zu¨rich, Winterthurerstrasse 190, CH-8057 Zu¨rich, Switzerland 2)QMath, Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100 København, Denmark (Dated: 7 February 2017) Anintegralschemefortheefficientevaluationoftwo-centerintegralsovercontractedsolidharmonicGaussian functionsispresented. Integral expressionsarederivedfor local operatorsthat dependon thepositionvector 7 of one of the two Gaussian centers. These expressions are then used to derive the formula for three-index 1 overlap integrals where two of the three Gaussians are located at the same center. The efficient evaluation of 0 2 thelatterisessentialforlocalresolution-of-the-identitytechniquesthatemployanoverlapmetric. Wecompare the performance of our integral scheme to the widely used Cartesian Gaussian-based method of Obara and b Saika (OS). Non-local interaction potentials such as standard Coulomb, modified Coulomb and Gaussian- e F type operators, that occur in range-separated hybrid functionals, are also included in the performance tests. The speed-up with respect to the OS scheme is up to three orders of magnitude for both, integrals and 4 their derivatives. In particular, our method is increasingly efficient for large angular momenta and highly contracted basis sets. ] h p - I. INTRODUCTION approaches such as Hartree-Fock and KS-DFT. The effi- m cient computation of (ab) is not of major importance for e The rapid analytic evaluation of two-center Gaus- QM methods since their contribution to the total com- h putational cost is negligible. However, the efficient eval- sian integrals is important for many molecular simu- c uation of the three-index overlap integrals (aba˜), where . lation methods. For example, Gaussian functions are s two functions are located at the same center, is essential widelyusedasorbitalbasisinquantummechanical(QM) c for local RI approaches that use an overlap metric.37–39 i calculations and are implemented in many electronic- s structure codes.1–6 Gaussians are further used at lower Employing local RI in KS-DFT, the atomic pair densi- y ties are approximated by an expansion in atom-centered h level of theory to model charge distributions in molecu- p lar mechanics7–15 (MM), semi-empirical16–18 and hybrid auxiliary functions. In order to solve the RI equations, [ QM/MM methods.19–21 The Gaussian-based treatment it is necessary to calculate (aba˜) for each pair where a,b of the electrostatic interactions requires the evaluation referstoorbitalfunctionsatatomsAandBanda˜ tothe 2 auxiliaryfunctionatA.Theevaluationof(aba˜)iscompu- v of two-center Coulomb integrals. tationallyexpensivebecausetheauxiliarybasissetis3-5 8 The efficient evaluation of two-center integrals is also 8 important at the Kohn-Sham density functional theory timeslargerthantheorbitalbasisset. Arapidevaluation 5 (KS-DFT) level, in particular for hybrid density func- of (aba˜) is important to ensure that the computational 6 overheadoftheintegralcalculationisnotlargerthanthe tionals. In order to speed-up the evaluation of the 0 speed-up gained by the RI . Hartree-Fock exchange term, the exact evaluation of the . 1 four-centerintegralscanbereplacedbyresolution-of-the- Two-center integrals with the local operator r2n (n ∈ 0 identity (RI) approximations.22–25 Especially, when an N),wherer dependsonthecenterofoneoftheGaaussian 7 a overlap metric is employed, the efficient evaluation of functions, are required for special projection and expan- 1 two-center integrals is required. The interaction poten- sion techniques. For example, these integrals are used : v tial can take different functional forms dependent on the for projection of the primary orbital basis on smaller, Xi hybrid functionals.26 The most popular potential is the adaptive basis sets.40 standardCoulomboperatoremployedinwell-established r Numerous schemes for the evaluation of Gaus- a functionalssuchasPBE027–29 andB3LYP.30–32 Ashort- sian integrals have been proposed based on Cartesian rangeCoulombpotentialis,e.g.,employedfortheHSE06 Gaussian,41–47 HermiteGaussian48–51 andsolidorspher- functional,33–35 whereas a combination of long-range ical harmonic Gaussian functions.51–58 For a review of Coulomb and Gaussian-type potential is used for the Gaussian integral schemes see Ref. 59. A very popu- MCY3 functional.36 lar approach is the Obara-Saika (OS) scheme,42 which Gaussianoverlapintegrals,inthefollowingdenotedby employs a recursive formalism over primitive Cartesian (ab),arecomputedinsemi-empiricalmethods16 andQM Gaussian functions. However, electronic-structure codes utilize spherical harmonic Gaussians (SpHGs) since the number of SpHGs is equal or smaller than the number ofCartesianGaussians, i.e. forfixedangularmomentum a)[email protected] l, (2l+1) SpHGs compare to (l+1)(l+2)/2 Cartesian 2 Gaussians. Furthermore, Gaussian basis sets are often A. Definitions and notations constituted of contracted functions. Thus, the primitive Cartesian integrals obtained from the OS recursion are The notations used herein correspond to Refs. 56, 64, subsequently contracted and transformed to SpHGs. and 65 unless otherwise indicated. An unnormalized, In this work, we further develop an alternative inte- primitive SHG function is defined as gral scheme52–54,56 that employs contracted solid har- χ (α,r)=C (r)exp(−αr2) (1) monic Gaussians (SHGs). The latter are closely re- l,m l,m lated to SpHG functions and differ solely by a con- where the complex solid harmonics C (r), stant factor. The SHG integral scheme is based on l,m the application of the spherical tensor gradient operator (cid:114) 4π (STGO).60,61 The expressions resulting from Hobson’s C (r)= rlY (θ,φ), (2) l,m 2l+1 l,m theorem of differentiation62 contain an angular momen- tumtermthatisindependentontheexponentsandcon- are obtained by rescaling the spherical harmonics traction coefficients. This term is obtained by relatively Y (θ,φ). Contracted SpHG functions ϕ (r) are con- l,m l,m simple recursions. It can be pre-computed and re-used structedaslinearcombinationoftheprimitiveSHGfunc- multiple times for all functions in the basis set with the tions samelandmquantumnumber. Theintegralandderiva- (cid:88) tive evaluation requires the contraction of a set of aux- ϕ (r)=N c χ (α,r), (3) l,m l α l,m iliary integrals over s functions and their scalar deriva- α∈A tives. The same contracted quantity is re-used several where {c } are the contraction coefficients for the set of times for the evaluation of functions with the same set α exponentsA={α}andN isthenormalizationconstant of exponents and contraction coefficients, but different l given by66 angular dependency m. Unlike for Cartesian functions, subsequent transformation and contraction steps are not (cid:34) (cid:35)−1/2 required. N =K (cid:88) (cid:88) π1/2(2l+2)!cαcαˆ . (4) This work is based on Refs. 56 and 63, where the two- l l 22l+3(l+1)!(α+αˆ)l+3/2 α∈Aαˆ∈A index integral expressions for the overlap operator and general non-local operators are given. We extend the The factor SHG scheme to the local operator r2n and derive formu- (cid:114) a 2l+1 tlaasl ffoorrtthheesiunbtesgerqaulesn(tad|rea2rniv|ba)t.ioTnhoefltahtetetrhraeree-ifnudnedxamoveenr-- Kl = 4π (5) lap integral (aba˜). The performance of the SHG method isincludedinthenormalizationconstanttoconvertfrom is compared to the OS scheme. We also include inte- SHG to SpHG functions. grals with different non-local operators such as standard In the following, the absolute value of the m quantum Coulomb, modified Coulomb or Gaussian-type operators number is denoted by in our comparison. In the next section, the expressions for the integrals µ=|m|. (6) and their Cartesian derivatives are given followed by Furthermore, we use the notations, details on the implementation of the integral schemes. The performance of the SHG scheme is then discussed r =r−R , r =r−R , R2 =|R −R |2, (7) in terms of number of operations and empirical timings. a a b b ab a b The derivations of the expressions for (a|r2n|b) are given where R references the position of the Gaussian center a a in Appendix A and B. A and R the position of center B. The scalar derivative b of X(r2) with respect to r2 is denoted by (cid:18) ∂ (cid:19)k X(k)(r2)= X(r2). (8) ∂r2 II. INTEGRAL AND DERIVATIVE EVALUATION After introducing the relevant definitions and nota- B. Integrals (a|O|b) tions,wesummarizetheworkofGieseandYork56 inSec- tion IIB. The integral expressions of (a|r2n|b) and (aba˜) In this section, the expression to compute the two- a are then derived in Sections IIC and IID, respectively. center integral (a|O|b) is given which is defined as Subsequently, the formulas for the Cartesian derivatives (cid:90)(cid:90) aregiven(SectionIIE)aswellasthedetailsonthecom- (a|O|b)= ϕ (r −R )O(r −r ) putation of the angular-dependent term in the SHG in- la,ma 1 a 1 2 (9) tegral expressions (Section IIF). ×ϕ (r −R )dr dr . lb,mb 2 b 1 2 3 ϕ (r )andϕ (r )arecontractedSpHGfunctions where the prefactors are la,ma a lb,mb b asdefinedinEquation(3),whicharecenteredatR and a (cid:113) Rb,respectively. O(r)isanoperatorthatisexplicitlyin- A =(−1)m (2−δ )(l+m)!(l−m)! (17) l,m m,0 dependentonthepositionvectorsR orR . Suchopera- a b torsare,e.g.,thenon-localCoulomboperatorO(r)=1/r 1 or the local overlap O(r)=δ(r). Bj,κ = (2−δ )(j+κ)!(j−κ)!. (18) κ,0 The derivation for an efficient expression to compute (a|O|b)followsRef.56. ItisbasedonHobson’stheorem62 and n!! denotes the double factorial. The superscript on of differentiation, which states that O (R2 )inEquation(16)denotesthescalarderivative la,lb ab (cid:18) ∂ (cid:19)l with respect to Ra2b, see Equation (8) and (14), C (∇)f(r2)=2lC (r) f(r2), (10) l,m l,m ∂r2 O(k) (R2 )=N N (cid:88) (cid:88) cαcβ where the differential operator Cl,m(∇) is called STGO. la,lb ab la lbα∈Aβ∈B (2α)la(2β)lb The differential operator is obtained by replacing r in (19) the solid harmonic C (r) by ∇ = (∂/∂x,∂/∂y,∂/∂z). (cid:18) ∂ (cid:19)k l,m × (0 |O|0 ). The derivation of the (a|O|b) integrals starts by noting ∂R2 a b ab that exp(−αr2) is an eigenfunction of (∂/∂r2)l with the eigenvalue(−α)l. UsingEquation(10)andthedefinition Since s functions contain no angular dependency, ofprimitiveSHGsfromEquation(1), theprimitiveSHG (0a|O|0b) is a function of Rab (or equivalently, Ra2b), see at center R can be rewritten as Table S1 (SI). Therefore, the derivative in Equation (19) a C (∇ )exp(cid:0)−αr2(cid:1) is well-defined. The integral Ola,lb(Ra2b) can be inter- χ (α,r )= l,m a a , (11) preted as the monopole result of the expansion given in l,m a (2α)l Equation (16). where C (∇ ) acts on R . Inserting Equation (1) for TheexpressiongiveninEquation(16)dependsfurther l,m a a s functions, χ (α,r)=exp(cid:0)−αr2(cid:1), yields onQc/s,c/s (R ),whereµ=|m|. Positivemvalues 0,0 la,µa,lb,µb,j,κ ab refer to a cosine (c) component and negative m to a sine C (∇ )χ (α,r ) χ (α,r )= l,m a 0,0 a . (12) (s) component, i.e. l,m a (2α)l Inserting the STGO formulation of χ from Equa- Qcc (R ): m ,m ≥0 l,m a,b,j,κ ab a b tion (12) in Equation (9) gives Qcs (R ): m ≥0,m <0 a,b,j,κ ab a b (a|O|b)=Cla,ma(∇a)Clb,mb(∇b)Ola,lb(Ra2b). (13) Qsac,b,j,κ(Rab): ma <0,mb ≥0 (20) The contracted integral over s functions is denoted by Qss (R ): m ,m <0. a,b,j,κ ab a b O (R2 )=N N (cid:88) (cid:88) cαcβ (0 |O|0 ), la,lb ab la lb (2α)la(2β)lb a b Note that we used the abbreviation a,b for the indices α∈Aβ∈B (l ,µ ,l ,µ )inEquation(20). Detailsonthecalculation (14) a a b b where cα and cβ are the expansion coefficients of of Qca/,bs,,jc,/κs(Rab) can be found in Section IIF. ϕ (r ) and ϕ (r ), respectively, with correspond- la,ma a lb,mb b ingexponentsαandβ. Theintegral(0 |O|0 )overprim- a b itive s functions is given by C. Integrals (a|r2n|b) a (cid:90)(cid:90) (0 |O|0 )= χ (α,r −R )O(r −r ) a b 0,0 1 a 1 2 (15) The integrals (a|r2n|b), a ×χ (β,r −R )dr dr . 0,0 2 b 1 2 (cid:90) (a|r2n|b)= ϕ (r )r2nϕ (r )dr, (21) The analytic expressions of (0a|O|0b) for the overlap a la,ma a a lb,mb b and different non-local operators are given in Table S1, see Supplementary Information (SI). Application of the n ∈ N, are fundamental for the derivation of the over- product and differentiation rules for the STGO52–54,67 lap matrix elements (aba˜) with two Gaussians at center finally yields R , which are discussed in the next section. Since the a (a|O|b)=(−1)lbAla,maAlb,mb oapndera(1to6r)rca2annndoetpebnedasdoanptthedepboysirteipolnacRinag,E(0qu|Oat|i0on)sw(1it4h) a b ×min(cid:88)(la,lb)2la+lb−jOl(ala,l+blb−j)(Ra2b) (16) (in0gaS|ir(naa2cn|er|a20rnb2)|bn.)CdaeropenesdneedqrusiveoendntlyiRn, nt,heiwCs seexc(pt∇rioens)s.iiosnsacftoirngcoomnptuhte- j=0 a a l,m a product of χ (α,r ) and r2n, j l,m a a ×(2j−1)!!(cid:88)B Qc/s,c/s (R ), j,κ la,µa,lb,µb,j,κ ab χ (α,r )r2n =C (r )exp(cid:0)−αr2(cid:1)r2n. (22) κ=0 l,m a a l,m a a a 4 The expression of this product in terms of the STGO rule of differentiation to Equation (28) C (∇ ) is obtained using Hobson’s theorem, l,m a (0 |r2m|0 )(k) a a b n (cid:18) (cid:19) χl,m(α,ra)ra2n = Cl(,m2α(∇)la)(cid:88) nj ((ll+−j1−)!α1j)! =π3/2exp(−ρRa2b)min(cid:88)(m,k)(cid:18)k(cid:19)(−ρ)k−i j=0 2mcm+3/2 i (30) i=0 ×exp(cid:0)−αra2(cid:1)ra2(n−j). (cid:88)m (cid:18) ∂ (cid:19)i (23) × Iα,β,m(R2 ). ∂R2 j ab j=i ab The derivation of Equation (23) is given in Appendix A. Inserting Equations (12) and (23) in Equation (21) D. Overlap integrals (aba˜) yields (a|r2n|b)=C (∇ )C (∇ )T (R2 ), (24) The three-index overlap integral (aba˜) includes two a la,ma a lb,mb b la,lb ab functions at center R and is defined by a where T (R2 )=T(0) (R2 ) is again the monopole re- (cid:90) la,lb ab la,lb ab (aba˜)= ϕ (r )ϕ (r )ϕ (r )dr. (31) sultsfor theintegralgiveninEquation(25). Thederiva- la,ma a ˜la,m˜a a lb,mb b tion follows now the same procedure as for the integrals (a|O|b) and yields In traditional Cartesian Gaussian-based schemes, the product of the two Cartesian functions at center R is a (a|ra2n|b)=(−1)lbAla,maAlb,mb obtained by adding exponents and angular momenta of both Gaussians, respectively. The result is a new Carte- ×min(cid:88)(la,lb)2la+lb−jT(la+lb−j)(R2 ) tshiaennGasaufossriatnheattwRo-ai.ndTexheovinetrelagprailnetevgarlualasti(oanb)p.roInceethdes la,lb ab j=0 SHGschemeontheotherhand, theproductoftwoSHG j functions at the same center is obtained by a Clebsch- ×(2j−1)!!(cid:88)B Qc/s,c/s (R ), Gordan (CG) expansion of the spherical harmonics. In j,κ la,µa,lb,µb,j,κ ab the following, the expression of this expansion in terms κ=0 (25) of the STGO is derived and used to obtain the integral formula. where the scalar derivative of T (R2 ) with respect to Employing the definitions given in Equations (1) and R2 is la,lb ab (2), the product of two primitive SHG functions at Ra ab can be written as T(k) (R2 )=N N (cid:88) (cid:88) cαcβ la,lb ab la lb (2α)la(2β)lb χl,m(α,ra)χ˜l,m˜(α˜,ra) α∈Aβ∈B =C (r )exp(−αr2)C (r )exp(−α˜r2) (32) ×(cid:88)n (cid:18)n(cid:19)(la+j−1)!(0 |r2(n−j)|0 )(k). l,m a a ˜l,m˜ a a j (la−1)!αj a a b =λexp(cid:0)−α(cid:48)r2(cid:1)Y (θ,φ)Y (θ,φ)rl+˜l (33) j=0 a l,m ˜l,m˜ a (26) where α(cid:48) =α+α˜ and The integral over primitive s functions is 4π λ= . (34) (cid:113) (cid:90) (2l+1)(2˜l+1) (0 |r2m|0 )= χ (α,r )r2mχ (β,r )dr (27) a a b 0,0 a a 0,0 b π3/2exp(−ρR2 )(cid:88)m Theproductoftwosphericalharmonicscanbeexpanded = ab Iα,β,m(R2 ) (28) in terms of spherical harmonics, 2mcm+3/2 j ab j=0 Y (θ,φ)Y (θ,φ)= (cid:88)GL,l,˜l Y (θ,φ), (35) l,m ˜l,m˜ M,m,m˜ L,M with c=α+β and ρ=αβ/c and L,M (2m+1)!!(cid:0)m(cid:1)β2j where |l − ˜l| ≤ L ≤ l + ˜l. GL,l,˜l are the Gaunt Iα,β,m(R2 )=2j j R2j. (29) M,m,m˜ j ab (2j+1)!! cj ab coefficients68 which are proportional to a product of CG coefficients.69 The expansion given in Equation (35) is The proof of Equation (28) is similarly elaborate as for validsincethesphericalharmonicsformacompletesetof Equation (23) and is given in Appendix B. The deriva- orthonormalfunctions. Asimilarexpansionforsolidhar- tives of (0 |r2m|0 ) are obtained by applying the Leibniz monics C (r) is not possible because the latter are no a a b l,m 5 basis of L2(R3). Inserting the CG expansion into Equa- see Equation (28). The derivation proceeds as for the tion(33)[1],re-introducingsolidharmonics[2]asdefined (a|O|b) and (a|r2n|b) integrals yielding the final formula, a in Equation (2) and employing the definition given in Equation (1) [3] yields (aba˜)=(−1)lbAlb,mb (cid:88) GMLaa,,lma,˜ala,m˜aALa,Ma χ (α,r )χ (α˜,r ) La,Ma l,m [=1]λaexp˜l,(cid:0)m˜−α(cid:48)ra2a(cid:1)(cid:88)GML,l,m,˜l,m˜YL,M(θ,φ)ral+˜l (36) ×min(cid:88)(La,lb)2La+lb−jPL(La,al+a,l˜lba−,ljb)(Ra2b) j=0 L,M j [=2]λ(cid:88)GML,l,m,˜l,m˜KLCL,M(ra) ×(2j−1)!!(cid:88)Bj,κQcL/as,,|cM/sa|,lb,µb,j,κ(Rab), κ=0 L,M (43) ×exp(cid:0)−α(cid:48)r2(cid:1)rl+˜l−L (37) a a where the coefficients A and B are given in Equa- l,m j,κ [=3]λ(cid:88)GL,l,˜l K χ (α(cid:48),r )rl+˜l−L, (38) tions (17) and (18). See Section IIF for the expres- M,m,m˜ L L,M a a sions of Qc/s,c/s . The superscript (L −l −j) on L,M La,|Ma|,lb,µb,j,κ a b P indicates the derivative as defined in Equa- where K is defined in Equation (5). The L quantum La,la,˜la,lb L tion (8). numbers of the non-vanishing contributions in the CG The integral (aba˜) can be considered as a sum of expansion proceed in steps of two starting from L = min (a|r2n|b) integrals, introducing some modifications due |l−˜l| to L =l+˜l. Thus, l+˜l−L is even and we can a max to normalization and contraction. express χ (α(cid:48),r )rl+˜l−L in terms of the STGO using L,M a a Equation (23), E. Cartesian Derivatives χ (α,r )χ (α˜,r ) l,m a ˜l,m˜ a =λ(cid:88)GL,l,˜l K CL,M(∇a) Cartesianderivativesarerequiredforevaluatingforces M,m,m˜ L (2α(cid:48))L and stress in molecular simulations. The Cartesian L,M (39) derivatives of the integrals (a|O|b), (a|r2n|b) and (aba˜) ×(cid:88)p (cid:18)p(cid:19) (L+j−1)! exp(cid:0)−α(cid:48)r2(cid:1)r2(p−j) are obtained by applying the product ruale to the Ra2b- j (L−1)!(α(cid:48))j a a dependentcontractedquantities[Equations(19),(26)and j=0 (41)] and the matrix elements of Qc/s,c/s (R ) . la,µa,lb,µb,j,κ ab with p=(l+˜l−L)/2. The derivative of (a|O|b) [Equation (16)] with respect to The derivation of the integral expression for (aba˜) is Ra is analogous to the (a|O|b) integrals. Inserting the STGO ∂ formulations given in Equation (11) and Equation (39) (a|O|b) ∂R into Equation (31) yields a,i =2(R −R ) a,i b,i (aba˜)=L(cid:88)a,MaGLMaa,,lma,˜ala,m˜aCLa,Ma(∇a) (40) ×min(cid:88)(la,lb)Ol(ala,l+blb−j+1)(Ra2b)Q(cid:101)cla/,sµ,ca/,lsb,µb,j(Rab) ×C (∇ )P (R2 ) j=0 lb,mb b La,la,˜la,lb ab with +min(cid:88)(la,lb)O(la+lb−j)(R2 )∂Q(cid:101)cla/,sµ,ca/,lsb,µb,j(Rab) P (R2 ) j=0 la,lb ab ∂Ra,i La,la,˜la,lb ab (44) =λK N N N (cid:88) (cid:88) (cid:88) cαcβcα˜ La la lb ˜la (2α(cid:48))La(2β)lb with i = x,y,z and where we have introduced the nota- α∈Aβ∈Bα˜∈A˜ tion p (cid:18) (cid:19) ×(cid:88) pj ((LLaa−+1j)!−(α1(cid:48)))!j(0a(cid:48)|ra2(p−j)|0b), Q(cid:101)cla/,sµ,ca/,lsb,µb,j(Rab) j=0 (41) =(−1)lbAla,µaAlb,µb2la+lb−j(2j−1)!! (45) j where the dependence of P on R2 originates ×(cid:88)B Qc/s,c/s (R ). La,la,˜la,lb ab j,κ la,µa,lb,µb,j,κ ab from the integrals over primitive s functions, κ=0 (cid:90) The derivatives of (a|r2n|b) are obtained from Equa- (0 |r2m|0 )= χ (α(cid:48),r )r2mχ (β,r )dr, (42) a a(cid:48) a b 0,0 a a 0,0 b tion (44) by substituting O (R2 ) by T (R2 ). For la,lb ab la,lb ab 6 (aba˜), we replace O (R2 ) by P (R2 ) consid- The cosine and sine parts can be constructed by the fol- ering additionally thlae,lCb GaebxpansioLna.,laT,˜lah,elbderaibvatives of lowing recursion relations64,65 Q(cid:101)c/s,c/s are constructed from (l−1) terms, which is exlpa,lµaain,lbe,dµbi,nj detail in Section IIF. R0c0 =1, R0s0 =0 (54) xRc −yRs Rc =− ll ll (55) l+1,l+1 2l+2 F. Computation of Qc/s,c/s and its derivatives yRc +xRs a,b,j,κ Rs =− ll ll (56) l+1,l+1 2l+2 Qca/,bs,,jc,/κs, introduced in Equation (20), are elements of (2l+1)zRc/s −r2Rc/s the2×2matrixQ ,whichiscomputedfromthereal Rc/s = l,m l−1,m, 0≤m<l (57) a,b,j,κ l+1,m (l+m+1)(l−m+1) translation matrix W 64,65 l,m,j,κ (cid:18)Qcc Qcs (cid:19) where r = (x,y,z). The usage of c/s in the last recur- Q (R )= a,b,j,κ a,b,j,κ renceformulaindicatesthattherelationisusedforboth, a,b,j,κ ab Qsc Qss a,b,j,κ a,b,j,κ (Rab) Rlc,m(r) and Rls,m(r). The recursions are only valid for positive m. However, the regular scaled solid harmon- =W (−R )WT (−R ). la,µa,j,κ ab lb,µb,j,κ ab ics are also defined for negative indices and satisfy the (46) following symmetry relations Note that we abbreviate the indices (l ,µ ,l ,µ ) with Rc =(−1)mRc , Rs =−(−1)mRs . (58) a a b b l,−m l,m l,−m l,m (a,b)inQ asinEquation(20). Therealtranslation a,b,j,κ matrix is a 2×2 matrix with the elements Notethat thesesymmetry relationshave tobeemployed for the evaluation of Rc/s since µ−κ can be also (cid:18)Wcc Wcs (cid:19) l−j,µ−κ W (R )= l,m,j,κ l,m,j,κ . (47) negative. Furthermore,onlyelementswithl−j ≥|µ±κ| l,m,j,κ ab Wsc Wss l,m,j,κ l,m,j,κ (Rab) give non-zero contributions. The elements of the transformation matrix W l,m,j,κ The expressions for W are given by65, are also defined for negative m. The matrix elements of l,m,j,κ W obey the same symmetry relations with respect l,m,j,κ Wcc (R )= (cid:18)1(cid:19)δκ0(cid:2)Rc (−R ) to sign changes of m, l,m,j,κ ab 2 l−j,m−κ ab Wcc/cs =(−1)mWcc/cs +(−1)κRlc−j,m+κ(−Rab)(cid:3) (48) l,m,j,κ l,m,j,κ (59) Wsc/ss =−(−1)mWsc/ss Wcs (R )= −Rs (−R ) l,m,j,κ l,m,j,κ l,m,j,κ ab l−j,m−κ ab +(−1)κRs (−R ) (49) where we have used the notation m = −m. These l−j,m+κ ab symmetry relations are used for the derivatives of Wls,mc ,j,κ(Rab)= (cid:18)21(cid:19)δκ0(cid:2)Rls−j,m−κ(−Rab) QclTa/,sµh,cae/,lsbd,µerbjiv,κa.tives of Qc/s,c/s and equivalently of +(−1)κRls−j,m+κ(−Rab)(cid:3) (50) Q(cid:101)c/s,c/s from Equaltai,oµna,lb(,4µ5b,)j,κare obtained by em- la,µa,lb,µb,j Wss (R )=Rc (−R ) ploying the differentiation rules70 of the solid harmonics l,m,j,κ ab l−j,m−κ ab C (r). ThederivativesofC (r)arealinearcombina- −(−1)κRc (−R ). (51) l,m l,m l−j,m+κ ab tionof(l−1)solidharmonics. Therefore,thegradientsof Q(cid:101)c/s,c/s are also linear combinations of lower order Here, we introduced the regular scaled solid harmonics la,µa,lb,µb,j terms, R (r) which are defined as l,m R (r)= 1 C (r), (52) ∂Q(cid:101)cla/,sµ,ca/,lsb,µb,j = Ala,µa Q(cid:101)c/s,c/s l,m (cid:112)(l−m)!(l+m)! l,m ∂Ra,x Ala−1,µa+1 la−1,µa+1,lb,µb,j A where the definition of the complex solid harmonics − la,µa Q(cid:101)c/s,c/s Cl,m(r) from Equation (2) has been employed. The reg- Ala−1,µa−1 la−1,µa−1,lb,µb,j (60) ular scaled solid harmonics are also complex and can be A dpeacrotmasposed into a real (cosine) and an imaginary (sine) −Alb−l1b,,µµbb+1Q(cid:101)cla/,sµ,ca/,lsb−1,µb+1,j A Rl,m(r)=Rlc,m(r)+iRls,m(r). (53) +Alb−l1b,,µµbb−1Q(cid:101)cla/,sµ,ca/,lsb−1,µb−1,j, 7 ∂Q(cid:101)cla∂/,sRµ,caa/,,lsyb,µb,j = (±1)maAlaA−la1,,µµaa+1Q(cid:101)sla/−c,1c,/µsa+1,lb,µb,j ForCeaaclhcualtaotmeciconktirnadc:tionmatrix: Cla,α=Nlacα/(2α)la lmax=MAX(la,max,lb,max) +(±1)maAlaA−la1,,µµaa−1Q(cid:101)sla/−c,1c,/µsa−1,lb,µb,j FFoorrTaallallb00u≤≤latllea≤/Rbllcm≤,malx(aR:/ba,bm)aaxn:dRls,m(Rab) −(±1)mbAlbA−l1b,,µµbb+1Q(cid:101)cla/,sµ,sa/,lcb−1,µb+1,j CIfadlceurilvaatteivQeeclsa/r,sµe,cqa/u,slbir,eµdb,:j(Rab) A Calculate ∂R∂a,i Qcla/,sµ,ca/,slb,µb,j(Rab), i=x,y,z −(±1)mbAlb−l1b,,µµbb−1Q(cid:101)cla/,sµ,sa/,lcb−1,µb−1,j, Fornalmlasxet=sal/a,bm:axset+elb,maxset (61) Ifderivativesrequired: nmax=nmax+1 Forallexponentsinseta/b: ∂Q(cid:101)cla/,sµ,ca/lbs,µb,j = 2 Ala,µa Q(cid:101)c/s,c/s FCoarlcaulllasthee(l0lsai|nO|s0ebt)(ak/)b,: 0≤k≤nmax ∂Ra,z Ala−1,µa la−1,µa,lb,µb,j (62) Contract: Ol(ak,)lb(Ra2b)= α βCla,αClb,β(0a|O|0b)(k) A Forallshellsinseta/b: P P −2Albl−b,1µ,bµbQ(cid:101)cla/,sµ,ca/,lsb−1,µb. ForCalall−culala/tbe≤(am|Oa|/bb)≤=la/jb:Ol(ala,l+blb−j)(Ra2b)Qcla/,sµ,ca/,slb,µb,j(Rab) where (±1) = 1 if m ≥ 0 and (±1) = −1 if m < 0. IfderivativesrequirePd: m m Calculate ∂ (a b), i=x,y,ze Note that the cosine part of the y derivatives are con- ∂Ra,i |O| structed from the sine part and vice versa. Furthermore, the terms in Equations (60)-(62) with l −1 < 0 are FIG. 1. Pseudocode for the calculation of the (a|O|b) inte- a/b zero. A special case has to be considered for the x,y grals for an atom pair using a basis set with several sets of derivatives, when µ = 0. The matrix elements of the Gaussianfunctionsasinput. Allfunctionsthatbelongtoone type Q(cid:101)c/s,c/s and Q(cid:101)c/s,c/s are required for set share the same Gaussian exponents. Each set consists of la−1,−1,lb,µb,j la,µa,lb−1,−1,j shells characterized by the l quantum number and a set of the construction of the x and y derivatives if µ = 0, a/b contraction coefficients. see Equations (60) and (61). These matrix elements are nevercalculatedsinceµispositivebydefinition,butthey can be obtained using the symmetry relations given in Equation (59). For example if µa = 0, the following re- matrix elements Q(cid:101)cla/,sµ,ca/,lsb,µb,j and their Cartesian deriva- lations are used for the x-derivative tives do not depend on the exponents, they are com- puted only once for all l = 0,...,l , where l is the max max Q(cid:101)cc/cs =(−1)Q(cid:101)cc/cs (63) maximal l quantum number of the basis set. The ma- la−1,−1,lb,µb,j la−1,1,lb,µb,j trix elements Q(cid:101)c/s,c/s are used multiple times for and for the y derivative we employ the symmetry rela- all functions witlha,µtah,elb,sµabm,je l and m quantum number. tions: The integral and scalar derivatives (0 |O|0 )(k) are then a b calculated for each set of exponents and subsequently Q(cid:101)slac−/s1s,−1,lb,µb,j =Q(cid:101)slac−/s1s,1,lb,µb,j. (64) contracted in one step using matrix-matrix multiplica- tions. Thesamecontractedmonopoleanditsderivatives O(k) (R2 )areusedforallthosefunctionswiththesame III. IMPLEMENTATION DETAILS la,lb ab set of exponents and contraction coefficients, but differ- ent angular dependency m. Integrals of the type (a|O|b) have been implemented The only difference for the implementation of the for the overlap δ(r), Coulomb 1/r, long-range Coulomb (a|r2n|b) and (aba˜) integrals is the evaluation of the con- erf(ωr)/r, short-range Coulomb erfc(ωr)/r, Gaussian- a tracted monopole and its scalar derivatives. For the damped Coulomb exp(−ωr2)/r operator and the Gaus- three-index overlap integrals (aba˜) we have additionally sian operator exp(−ωr2), where r = |r −r |. The pro- 1 2 toconsidertheCGexpansion. Theexpansioncoefficients cedure for calculating these integrals differs only by the are independent on the position of the Gaussians and evaluation of the s-type integrals (0 |O|0 ) and their a b are precalculated only once for all (aba˜) integrals. The derivatives with respect to R2 . The expressions for the k-th derivatives (0a|O|0b)(ka)bhave been derived from Gaunt coefficients GLM,l,m,˜l,m˜ are obtained by multiplying Ref. 47 and are explicitly given in Table S1, see SI. Equation(35)byYL,M(θ,φ)andintegratingoverthean- The pseudocode for the implementation of the SHG gular coordinates φ and θ of the spherical polar system. integrals is shown in Figure 1. Our implementation is The allowed values for L range in steps of 2 from |l−˜l| optimized for the typical structure of a Gaussian basis to l+˜l. Note that not all terms with −L ≤ M ≤ L in set, where Gaussian functions that share the same prim- Equation (35) give non-zero contributions. For l,˜l ≤ 2, itive exponents are organized in so-called sets. Since the the product of two spherical harmonics is expanded in 8 For example, the specification (TESTBAS-L1, K=7) in- TABLE I. Specifications for the basis sets used for the per- dicates that we have five contracted p functions at both formance tests. Number of s,p,d,f,g,h and i functions and centers,whereeachcontractedfunctionisalinearcombi- their contraction length K. nation of seven primitive Gaussians. Furthermore, tim- basis set name functions K ings have been measured for basis sets of the MOLOPT TESTBAS-L0 5s 1,...,7 type74 that are widely used for DFT calculations with TESTBAS-L1 5p 1,...,7 CP2K, see SI for details. The MOLOPT basis sets con- TESTBAS-L2 5d 1,...,7 tain highly contracted functions with shared exponents, TESTBAS-L3 5f 1,...,7 i.e. they are so-called family basis sets. A full contrac- TESTBAS-L4 5g 1,...,7 tion over all primitive functions is used for all l quan- TESTBAS-L5 5h 1,...,7 tum numbers. For the (aba˜) integrals, we use for the H-DZVP-MOLOPT-GTH 2s1p 7 O-DZVP-MOLOPT-GTH 2s2p1d 7 second function at center Ra, ϕ˜la,m˜a, the corresponding O-TZV2PX-MOLOPT-GTH 3s3p2d1f 7 LRI-MOLOPT basis sets, see Table I. The latter is an aux- Cu-DZVP-MOLOPT-SR-GTH 2s2p2d1f 6 iliary basis set and contains uncontracted functions, as H-LRI-MOLOPT-GTH 10s9p8d6f 1 typically used for RI approaches. O-LRI-MOLOPT-GTH 15s13p12d11f9g 1 Cu-LRI-MOLOPT-SR-GTH 15s13p12d11f10g9h8i 1 V. RESULTS AND DISCUSSION no more than four terms. However, the number of terms ThissectioncomparestheefficiencyoftheSHGscheme increases with l+˜l. A detailed discussion of the proper- in terms of mathematical operations and empirical tim- tiesoftheGauntcoefficientscanbefoundinRef.71and ings to the widely used OS method. tabulated values for low-order expansions of real-valued spherical harmonics are given in Ref. 63. ToassesstheperformanceoftheSHGintegrals,anop- A. Comparison of the algorithms timized OS scheme42 has been implemented. In the OS scheme, we first compute the Cartesian primitive inte- Employing the OS scheme for the evaluation of SpHG grals recursively. Subsequently, the Cartesian integrals integrals, the most expensive step is typically the re- are contracted and transformed to SpHGs. An efficient cursive computation of the primitive Cartesian Gaussian sequenceofverticalandhorizontalrecursivestepsisused integrals. The recurrence procedure is increasingly de- to enhance the performance of the recurrence procedure. manding in terms of computational cost for large an- For the integrals (a|b), (a|ra2n|b) and (aba˜), the recursion gular momenta. The recursion depth is even increased canbeperformedseparatelyforeachCartesiandirection, when the gradients of the integrals are required, since whichdrasticallyreducesthecomputationalcostforhigh the derivatives of Cartesian Gaussian functions are con- angularmomenta. Thecontractionandtransformationis structed from higher-order angular terms (l + 1). In performedinonestepusingefficientmatrix-matrixoper- caseoftheTESTBAS-L5basisset,thecomputationalcost ations. The three-index overlap integrals (aba˜) are com- for evaluating both, the Coulomb integral (a|1/r|b) and puted as described in Section IID by combining the two its derivatives, is three times larger than for calculat- Cartesian Gaussian functions at center Ra into a new ing solely the integral. The integral matrix of primi- Cartesian function at Ra. tive Cartesian integrals (and their derivatives) has to be transformedtoprimitiveSpHGintegrals, whicharethen contracted. The contribution of the contraction step to IV. COMPUTATIONAL DETAILS the total computational cost is small for integrals with non-local operators. However, the OS recursion takes a The OS and SHG integral scheme have been imple- significantly smaller amount of time for local operators, mented in the CP2K2,72 program suite and are available when efficiently implemented, see Section III. Thus, the as separate packages. The measurements of the tim- contraction of the primitive SpHG integrals contributes ings have been performed on an Intel Xeon (Haswell) by up to 50% to the total timings for the integrals (ab), platform73 using the Gfortran Version 4.9.2 compiler (aba˜) and (a|r2n|b). The contraction step can be even a with highest possible optimization. Matrix-matrix mul- dominantwhenderivativesoftheseintegralsarerequired tiplications are efficiently computed using Intel(cid:13)R MKL sinceithastobeperformedforeachspatialdirection,i.e. LAPACK Version 11.2.1. we have to contract the x,y and z Cartesian derivatives Empiricaltimingshavebeenmeasuredfortheintegrals of the primitive integral matrix separately. Details on (a|O|b), (a|r2n|b) and (aba˜) using the basis sets specified thecontributionofthedifferentstepstotheoverallcom- a in Table I. The basis sets at centers R and R are cho- putational cost are displayed in Figures S1-S4 (a,b), SI. a b sen to be identical. The measurements have been per- The SHG method requires only recursive operations formed for a series of test basis sets with angular mo- for the evaluation of Rc/s [Equations (54)-(57)], which l,m menta L = 0,...,5 and contraction lengths K = 1,...,7. do not depend on the Gaussian exponents and can be 9 TABLE II. Number of matrix elements that need to be 900 (a) Integrals 10 contracted for two-index integrals comparing the OS and 600 (ab) SHG method for integral (Int.) and integral+derivative actor 300 00 1 2 (a|1/r|b) (inInTt.a+bDleeIv..) evaluation. Thebasissetspecificationsaregiven eed-up f 90 00 (b) 60 I+n tDeegrriavlast ives ((aa||eerrff(cω(ωr)r/)r/|rb|b)) Integral method H-DZVP O-DZVP Sp 600 (a|exp(-ωr2)|b) 0 Int. Int.+Dev. Int. Int.+Dev. 300 0 1 2 (a|exp(-ωr2)/r|b) OS 784 3136 3969 15876 0 SHG 147 196 245 294 0 1 2 3 4 5 L tabulated for all functions of the basis set. Furthermore, 100 (c) 20 Integrals (a|r2|b) a a deeper recursion is not required for the derivatives of the integrals because they are constructed from linear 50 00 1 2 (a|r4a|b) ctSiopomHnsbGin(,6aw0t)ieo-(nc6so2n)ot.fralcIontwsaetner-aodarudoxefirlicaaornnygturinalatcertginrtgaerlmeoasfc,hssefpuernimEctqiitouinvaes- p factor 10 00 (d) 20 I+n tDeegrriavlastives (((aaa|||rrr6a8a1|| 0bb |))b) u a and its scalar derivatives. The number of scalar deriva- d- 50 0 e 0 1 2 tives is linearly increasing with l. If the gradients are pe S 0 required, the increase in computational cost for the con- (e) 15 traction is marginal. We have to contract only one ad- 200 (ab~a) ditional scalar derivative of the auxiliary integral. As 100 0 0 1 2 showninTableII,thenumberofmatrixelements,which have to be contracted for the MOLOPT basis sets, is 1-2 0 0 1 2 3 4 5 orders of magnitude smaller for the SHG scheme. Note L thatthenumbersofSHGmatrixelementsrefertoourim- plementation, where actually more scalar derivatives of FIG.2. Speed-upfordifferenttwo-centerintegralsdependent (0 |O|0 ) and (0 |r2n|0 ) are contracted than necessary, onthelquantumnumberatthefixedcontractionlengthK = a b a a b 7. Thespeed-upfactorisdefinedastheratioOS/SHG.Speed- in order to enable library-supported matrix multiplica- up for (a,b) integrals (a|O|b), (c,d) (a|r2n|b) and (e) (aba˜). tions. a The solid line in (e) is the speed-up for the integrals and the Forbothmethods,wehavetocalculatethesamenum- dashed line the speed-up for both, integrals + derivatives. ber of fundamental integrals (0 |O|0 ) and their scalar a b derivativeswithrespecttoR2 (SHG)and−ρR2 (OS),47 ab ab where ρ = αβ/(α+β). The time for evaluating these gained by the SHG method is presented for the basis auxiliary integrals is approximately the same for both sets TESTBAS LX for a fixed contraction length. Gen- methods. In the SHG scheme, the evaluation of the lat- erally, the ratio of the timings OS/SHG increases with ter constitutes the major contribution to the total tim- increasing l. For the (a|O|b) integrals, we observe speed- ings for highly contracted basis sets with different sets ups between 40 and 400 for l = 5. For s functions, our of exponents. The remaining operations are orders of method can become up to a factor of two faster. The magnitudes faster than those in the OS scheme. Details smallest speed-up is obtained for the overlap integrals are given in Figures S1-S4 (c,d). The recursive proce- since the OS recursion can be spatially separated. The dure to obtain regular scaled solid harmonics is negligi- speed-up for the other operators depends on the compu- ble in terms of computational cost. The evaluation of tationalcostfortheevaluationoftheprimitiveGaussian Q(cid:101)cla/,sµ,ca/,lsb,µb,j [Equation (45)] from the pretabulated Rlc,/ms integrals (0a|O|0b). The SHG method outperforms the contributes increasingly for large angular momenta. The OS scheme by up to a factor of 1000 (l = 5) when also integrals (a|O|b) are finally constructed from the con- the derivatives of (a|O|b) are computed. tracted quantity O(k) [Equation (19)] and Q(cid:101)c/s,c/s The computational cost for calculating (a|r2n|b) inte- la,lb la,µa,lb,µb,j a as displayed Figure 1. This step becomes increasingly grals of h functions is up to two orders of magnitude expensive for large l quantum numbers and is in fact reduced compared to the OS scheme. The speed-up in- dominant for family basis sets, where the fundamental creases with n. The SHG method is beneficial for all integrals are calculated only for one set of exponents. l > 0 and also for l = 0 when n ≥ 3. The speed-up fac- tor is generally slightly larger when also the derivatives are required. However, the performance increase is not B. Speed-up with respect to the OS method as pronounced as for the derivatives of (a|O|b) which is again due to the efficient spatial separation of the OS Figure 2 displays the performance of the SHG scheme recurrence. as function of the l quantum number. The speed-up Theperformanceimprovementfor(aba˜)iscomparable 10 TABLE III. Speed-up for different two-center integrals. The speed-up is defined as the ratio of the timings OS/SHG. The basis set specifications are given in Table I. H-DZVP O-DZVP O-TZV2PX Cu-DZVP Integral type Int. Int.+Dev. Int. Int.+Dev. Int. Int.+Dev. Int. Int.+Dev. (ab) 2.4 2.4 6.2 5.5 11.4 10.3 8.9 8.3 (a|1/r|b) 1.8 6.2 5.9 18.4 16.8 31.6 14.6 26.0 (a|erf(ωr)/r|b) 1.7 6.0 5.8 18.4 16.6 31.7 14.4 26.0 (a|erfc(ωr)/r|b) 1.7 5.4 5.2 16.3 14.9 29.5 12.9 24.8 (a|exp(−ωr2)|b) 1.8 6.7 6.4 19.7 18.0 32.5 16.0 27.4 (a|exp(−ωr2)/r|b) 1.6 5.0 4.4 14.1 12.3 25.4 10.8 22.0 (a|r2|b) 2.6 2.7 9.7 8.8 22.9 18.6 19.7 15.8 a (a|r4|b) 4.0 4.0 16.0 14.0 39.4 29.3 34.7 25.2 a (a|r6|b) 6.6 6.3 25.3 21.6 59.5 44.3 56.1 38.9 a (a|r8|b) 9.1 8.1 34.7 29.6 79.3 61.4 73.4 54.6 a (a|r10|b) 11.8 10.5 44.7 36.7 105.2 79.9 97.5 72.2 a (aba˜) 7.0 7.6 10.1 8.7 7.5 7.2 7.2 10.5 15 (a) Integrals angular momenta. For instance, the derivatives of the h functions require the recursion up to l +l +1=11. 10 (ab) a a˜ ctor 5 (a|1/r|b) Figure 3 shows the performance of the SHG scheme up fa 0 (a|erf(ωr)/r|b) ainscrfueanscetsionwiothf tKhe fcoorntarlalctiniotneglreanlgttyhpKes.. TAhesastpuereadt-iuopn Speed- 60 (b) I+n tDeegrriavlast ives ((aa||eerxfpc((-ωωrr)2/r)||bb)) tisheob(sae|rOv|ebd) ainroteugnrdalsK, f=or6e,x7amfoprle(a(|ara2|1n/|br)|ba).ndTshoemereao-f 30 (a|exp(-ωr2)/r|b) son isthat the computation ofthe fundamental integrals 0 (0a|1/r|0b)(k) increasingly contributes with K to the to- 1 2 3 4 5 6 7 tal computational cost in the SHG scheme, whereas its Contraction length K relative contribution to the total time is approximately constantintheOSscheme,seeFigureS3(SI).ForK =7 (c) Integrals and l = 2, the evaluation of (0 |1/r|0 )(k) is with 70% 20 (ab~a) the predominant step in the SHGa schemb e. Since the ab- ctor 10 (a|r2a|b) solute time for calculating the fundamental integrals is d-up fa 0 (d) Integrals ((aa||rr4a6||bb)) tohffe. sTahmeesiantubroatthiosncheffemecetsi,stlheessinpcrroenaosuenincesdp,efeodr-euxpamlevpelels, Spee 20 + Derivatives (a|ra8a|b) fisorcotmhepuotvaetriloanpa(llayb)lesbseceaxupseenstihvee tehvaalnuafotrio(n0o|f1/(r0|a00b))((kk)). a b 10 (a|r1 0 |b) Its relative contribution to the total time in the SHG a 0 scheme is with 50% significantly smaller, see Figure S3 1 2 3 4 5 6 7 (d) for K = 7. However, the saturation for large K Contraction length K is hardly of practical relevance because the contraction lengthsofGaussianbasissetsistypicallynotlargerthan FIG.3. Speed-upfordifferenttwo-centerintegralsdependent K =7. onthecontractionlengthK. Thel quantumnumberisfixed and set to l =2. The speed-up factor is defined as the ratio The speed-up for separate operations in the integral OS/SHG.Speed-upfor(a,b)integrals(a|O|b),(c,d)(a|r2n|b) evaluationcanonlybeassessedforstepssuchasthecon- a and (aba˜). traction, which have an equivalent in the OS scheme. The SHG contraction is increasingly beneficial for large l quantum numbers, large contraction lengths and when to the (a|r2n|b) integrals. For the derivatives of (aba˜) also derivatives are computed, see Figure S5 (SI). a on the other hand, we get a significantly larger speed-up TableIIIpresentstheperformanceoftheSHGmethod duetothefactthatitincreasesmorethanlinearlywithl for the MOLOPT basis sets. We find that the SHG scheme andthattheOSrecurrencehastobeperformedforlarger is superior to the OS method for all two-center integrals