ebook img

Linear Scaling Solution of the Time-Dependent Self-Consistent-Field Equations PDF

0.18 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linear Scaling Solution of the Time-Dependent Self-Consistent-Field Equations

LA-UR 09-06104 Linear Scaling Solution of the Time-Dependent Self-Consistent-Field Equations Matt Challacombe∗ Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 A new approach to solving the Time-Dependent Self-Consistent-Field equations is developed based on the double quotient formulation of Tsiper [J. Phys. B, 34 L401 (2001)]. Dual channel, quasi-independent non-linear optimization of these quotients is found to yield convergence rates approaching those of the best case (single channel) Tamm-Dancoff approximation. This formulation is variational with respect to matrix truncation, admitting linear scaling solution of the matrix-eigenvalue problem, which isdemonstratedforbulkexcitonsinthe polyphenylenevinyleneoligomerandthe (4,3) carbon nanotube segment. 0 1 0 The Time-Dependent Self-Consistent-Field equations and developed a corresponding dual channel Lanczos 2 together with models that include some portion of the solver. Subspace solvers in this dual representationhave n Hartree-Fock(HF)exchangeadmitcontrolovertherange recently been surveyed by Tretiak, Isborne, Niklasson a ofself-interactionintheopticalresponse[1,7,9,16],and and Challacombe (TINC) [19], with comparative results J are related to new models of electron correlation based for semi-empirical models. 9 on the Random Phase Approximation (RPA) [5, 6, 8, 1 Anotherchallengeisdimensionalityandscaling. Writ- 18]. Solving the TD-SCF equations is challenging due ing Eq. (1) in the general form L·~v = ω~v, admitting ] to anunconventionalJ-symmetricstructureof the naive arbitrary representation, the superoperator matrix L is h p molecular orbital (MO) representation, a ∼N2×N2 tetradic, with N the number of basis func- - tions, assumed proportional to system size. In prac- t n tice the action of L onto ~v is carried out implicitly as a A B X~ =ω X~ , (1) L[v] = [F, v]+[G[v], P], using an existing framework u (cid:18)−B −A(cid:19)(cid:18)Y~(cid:19) (cid:18)Y~(cid:19) for construction of the effective Hamiltonian (Fockian) q [ F, where P is the one-particle reduced density matrix, where A and B are Hermitian blocks corresponding to G is a screening operator involving Coulomb, exchange 2 4th order tensors spanning transitions between occupied and/or exchange-correlation terms and the correspon- v and virtual sub-spaces, ω is the real excitation energy dence between superoperator and functional notation is 6 8 and~v = XY~~ is the corresponding transition density. By givenbyatensorialmappingbetweendiadicandmatrix, 5 construct(cid:0)ion(cid:1), the MO representation allows strict sepa- ~vN2×1⇔vN×N. 2 ration between the dyadic particle-hole (ph) and hole- Recent efforts have focused on addressing the prob- 1. particle (hp) solutions, X~ and Y~, for which specialized lem of dimensionality by employing linear scaling meth- 0 algorithms exist. Nevertheless, convergence of the naive ods that reduce the cost of L[·] within Density Func- 0 J-symmetric problem is typically much slower than the 1 tional Theory (DFT) to O(N). However, this remains corresponding Hermitian Tamm-Dancoff approximation : an open problem for the Hartree-Fock (HF) exchange, v (TDA),AX~ =ωX~,whichisofreduceddimensionalityin an ingredient in models that account for charge transfer i X the MO representation. in the dynamic and static response, including the Ran- r Several TD-SCF eigensolvers are based on the oscil- dom Phase Approximation (RPA) at the pure HF level a 0 K lator picture p~ = ω p~ , with K = A+B and of theory. Likewise, scaling of the TD-SCF eigenprob- (cid:18)T 0(cid:19) q~ q~ lem remains formidable due to associated costs of lin- T = A−BtheHermi(cid:0)tia(cid:1)npote(cid:0)nt(cid:1)ialandkineticmatrices, ear algebra, even when using powerful Krylov subspace and the dual {p~,~q} = X~ −Y~, X~ +Y~ corresponding methods. Underscoring this challenge, one of the most n o to position and momentum. This picture avoids the im- successful approaches to linear scaling TD-DFT avoids balance X~ ≫ Y~ whilstadmittingconventionalsolu- the matrix eigenproblem entirely through explicit time- tionsbas(cid:13)(cid:13)ed(cid:13)(cid:13)onth(cid:13)(cid:13)eH(cid:13)(cid:13)ermitianmatrixG=K·T,asshown evolution [21, 22]. by Tama(cid:13)ra(cid:13)and(cid:13)Ud(cid:13)agawa [17] and extended by Narita Linear scaling matrix methods exploit quantum local- and Shibuya with second order optimization of the quo- ity,manifestinapproximateexponentialdecayofmatrix tientω2[p~,~q]=~q·G·p~/|p~·~q| [10]. Morerecently,Tsiper elements expressed in a well posed, local basis; with the considered the quotients dropping of small elements below a threshold, τmtx, this decay leads to sparse matrices and O(N) complexity at the forfeit of full precision [3, 4, 11]. Likewise, linear p~·K·~p q·T·~q scaling methods for computing the HF exchange employ ω[~p,~q]= + , (2) 2 |p~·~q| 2 |p~·~q| an advanced form of direct SCF, exploiting this decay 2 in the rigorous screening of small exchange interactions 2 bellow the two-electron integral threshold τ2e [13]. The RPA [RQI] TDA [RQI] consequence of these linear scaling approximations is an r 0 RPA [QUIRQI] o inexact linear algebra that challenges Krylovsolversdue r r E to nested error accumulation, a subject of recent formal e -2 interest [14, 15]. Consistent with this view, TINC found v that matrix perturbation (a truncation proxy) disrupts elati-4 convergenceofKrylovsolverswithslowconvergence,i.e. R 0 Lanczos and Arnoldi for the RPA, but has less impact 1 g-6 on solvers with rapid convergence, i.e. generically for o L the TDA or Davidson for the RPA. Relative to semi- empiricalHamiltonians,theimpactofincompletenesson -8 subspace iteration may be amplified with first principles 0 500 1000 1500 models and large basis sets (ill-conditioning). Conjugate Gradient Iterations An alternative is Rayleigh Quotient Iteration (RQI), Figure 1: Convergence of RHF/3-21G TDA and RPA with which poses the eigenproblem as non-linear optimiza- theRQIandQUIRQIalgorithms for lineardecaene(C H ). 10 2 tion and is variational with respect to matrix pertur- Calculations were started from the same random guess, and bation. Narita and Shibuya [10] considered optimiza- tightnumericalthresholdswere usedthroughout. Intherep- tion of the quotient ω2[p~,~q] with second order methods, resentation independent scheme, the cost per iteration is the but these are beyond the capabilities of current linear same for TDA and RPA. scaling technologies and also, convergence is disadvan- taged by a power of 1/2. For semi-empirical Hamiltoni- This framework provides the freedom to work in any or- ans,TINCfoundthatoptimizationoftheThoulessfunc- tional ω[~v] = ~v · L ·~v/|~v·~v|, corresponding to the so- thogonal representation, and to switch between transi- tion density and oscillator duals with minimal cost. lution of Eq. (1), was significantly slower for the RPA QUIRQI is given in Algorithm 1. It begins with a relative to the TDA, and also compared to subspace guess for the transition density, which is then split into solvers. For first principles models and non-trivial basis itsdual(lines2-3). Thechoiceofinitialguessisdiscussed sets, this naive RQI can become pathologically slow as later. Lines 4-24 consist of the non-linear conjugate gra- shown in Fig. 1. On the other hand, the Tsiper formula- dient optimization of the nearly independent channels: tion exposes the underlying pseudo-Hermitian structure In eachstep, the flow of information proceeds from opti- of the TD-SCF equations. Here, this structure is ex- mizationof the duals to builds involvingthe density and ploited with QUasi-Independent Rayleigh Quotient Iter- back to the duals in a merge-annihilate-truncate-build- ation (QUIRQI), involving dual channel optimization of split-truncate (MATBST) sequence. For the variables v, the Tsiper quotients coupled only weakly through line pandqthissequenceiscomprisedbylines22-23and5-7, search. andlines15-19forthecorrespondingconjugategradients Ourdevelopmentbeginswithabriefreviewoftherep- h , h and h . Truncation is carried out with the filter v p q resentationindependentformulationdevelopedbyTINC, operation as described in Ref. [11], with cost and error which avoids the O(N3) cost of rotating into an ex- determined by the matrix threshold τ . mtx plicit p-h, h-p symmetry. Instead, this symmetry is The Tsiper functional is the sum of dual quotients ω p maintained implicitly via annihilation, x ← fa(x) = P andωq,determinedatline8,followedbythegradientsgp ·x·Q + Q·x·P, with P the first order reduced den- and g computed at line 9. After the first cycle, the cor- sity matrix and Q = I −P its compliment. Likewise, responqding relative error e (10) and maximum matrix rel the indefinite metric associated with the J-symmetry of elementofthegradientg (11)arecomputedandused max Eq. (1) is carried through the generalized norm hx,yi= as an exit criterion at line 4, along with non-variational tr{xT·[y,P]}. Introducing the operator equivalents, behavior ω >ωold. L[p] ⇔ K.p~ and L[q] ⇔ T.~q , the Tsiper functional Next,thePolak-Ribierevariantofnon-linearconjugate becomes ω[p,q] = hp,L[p]i + hq,L[q]i. Transformations gradients yields the search direction in each channel, h 2|hp,qi| 2|hp,qi| p between the transition density and the dual space in- and h (12-14). The action of L[·] on to h and h is q p q volves simple manipulations and minimal cost, allowing then computed, again with a MATBST sequence (15- Fock builds with the transitiondensity andoptimization 19),followedbyaself-consistentdualchannellinesearch in the dual space. The splitting operation is given by at line 20, as described below. With steps λ and λ in p q p = f (v) = P ·v ·Q+[Q·v·P]T and q = f (v) = hand,minimizingupdatesaretakenalongeachconjugate + − P ·v·Q−[Q·v·P]T. Likewise, L[p] = f (L[v]) and direction (22), and the cycle repeats with the MATBST − L[q]=f (L[v]). Thebacktransformation(merge)from sequence spanning lines 21-23 and 5-7. + dual to density is v = F(p,q) = p+q+[p−q]T /2. Optimization of the Tsipper functional ω[λ ,λ ] ≡ p q (cid:0) (cid:1) 3 Algorithm 1 QUIRQI controlled by the two-electron screening threshold τ 2e 1: procedure QUIRQI(ω,v) [13]. N-scalingsolutionoftheQUIRQImatrixequations 2: guess v is achieved with the sparse approximate matrix-matrix 3: p=f+(v), q=f−(v) multiply (SpAMM), with cost and accuracy determined 4: whileerel >ǫ and gmax>γ and ω<ωold do bythedroptoleranceτ [3,4,11]. Allcalculationswere 5: L[v]=[F, v]+[G[v], P] mtx carriedout with version4.3 of the gcc/gfortrancompiler 6: L[p]=f (L[v]), L[q]=f (L[v]) 7: filter(L−[p], L[q], τ ) + under version8.04 of the Ubuntu Linux distributionand mtx 8: ωp = h2p|h,pL,[qpi]|i, ωq = h2q|h,Lp,[qqi]|i, ω=ωp+ωq runFoornsyasftuelmlyslsotauddeided2tGoHdzatAeM, QDUQIRuQadI iOspfoteurnodnt8o35co0n.- 9: gp =qωq−L[p], gq =pωp−L[q] verge monotonically with rates comparable to the TDA 10: e = ωold−ω /ω rel (cid:0) (cid:1) as shown in Fig. 1. Based on the comparative perfor- 11: g =max g , g max i,j n(cid:2) p(cid:3)ij (cid:2) p(cid:3)ijo mance presentedby TINC, the TDA rateof convergence 12: βp = hghpg,opgldp,−gopglopdlidi,βq = hghqg,oqgldq,−goqgloqdlidi atopptehaerscotonvbeergaelnocweerrabtoe,unpderffoorrmRPanAcesoilsvesrtsr.onInglaydddiettieorn- 13: ωold←ω, gold ←g , gold ←g p p q q mined by the initial guess. The following results have 14: hp ←gp+βphp, hq ←gq+βqhq been obtained using the polarization response density 15: hv =F(hp,hq),hv ←fa(hv) 16: filter(hp,hq,hv, τmtx) along the polymer axis [19], which can be computed in 17: L[hv]=[F, hv]+[G[hv], P] O(N) by Perturbed Projection[20]. Also, a relativepre- 18: L[hp]=f−(L[hv]), L[hq]=f+(L[hv]) cisionof4digitsintheexcitationenergyistargetedwith 19: filter(L[hp], L[hq], τmtx) theconvergenceparametersǫ=10−4andγ =10−3,with 20: {λp,λq}=argminω[p+λphp,q+λqhq] exitfromtheoptimizationlooponviolationofmonotonic {λp,λq} convergence (ω > ωold due to precision limitations asso- 21: p←p+λphp, q←q+λqhq ciated with linear scaling approximations). 22: v←F(p,q), v←fa(v) 23: filter(p,q,v, τ ) In Fig. 2, linear scaling and convergence to the bulk mtx 24: endwhile limit are demonstrated for a series of polyphenylene 25: endprocedure vinylene (PPV) oligomers at the RHF/6-31G** level of theory for the threshold combinations {τ ,τ } = mtx 2e 10−4,10−5 and 10−5,10−6 . Significantly more con- ω[p+λ h ,q+λ h ] involves a two dimensional line- p p q q (cid:8)servative th(cid:9)reshold(cid:8)s have bee(cid:9)n used for the Coulomb search (line 20) corresponding to minimization of sums, which incur only minor cost. Convergence A +λ B +λ2C +A +λ B +λ2C is reached in 24 − 25 iterations, with the cost of p p p p p q q q q q ω[λp,λq]= , CoulombsummationviaQCTCcomparabletothecostof R +λ S +λ T +λ λ U pq p pq q pq p q pq SpAMM(τ =10−4). In Fig. 3, linear scaling and con- (3) mtx vergence to the bulk limit are demonstrated for a series with coupling entering through terms in the denomina- of (4,3) carbon nanotube segments at the RHF/3-21G tor such as U = hh ,h i. A minimum in Eq. (3) can pq p q leveloftheoryforthesamethresholdcombinations,again be found quickly to high precision by alternately substi- withconvergenceachievedinabout24-25cycles. Inboth tuting one-dimensionalsolutions one into the other until cases, tightening the pair {τ ,τ } leads to a system- self-consistency is reached. This semi-analytic approach mtx 2e aticallyimprovedresult. Whilethe 10−4,10−5 thresh- starts with a rough guess at the pair {λ ,λ } (eg. found p q olds that work well for PPV lead to(cid:8)a non-mono(cid:9)tone be- byacoarsescan)followedbyiterativesubstitution,where havior with respect to extent for the nanotube series, for example the p-channel update is dropping one more decade to 10−5,10−6 leads to a sharply improvedbehavior. Dro(cid:8)pping thresh(cid:9)olds further λ ← (2C R +2C λ S )2−4(C T +C λ U ) to 10−6,10−7 yieldsidenticalresultstowithinthecon- p p pq p q pq p pq p q pq nh ver(cid:8)gence criter(cid:9)ia (∼ four digits) across the series, also B R +B λ S − A +A +B λ +C λ2 T p pq p q pq q p q q q q pq scaling with N but at roughly twice the cost. (cid:2) (cid:0) 1/2(cid:1) These results demonstrate that QUIRQI can achieve − A λ +A λ +B λ2+C λ3 U (4) q q q q q q q q pq both systematic error control and linear scaling in solu- (cid:0) (cid:1) (cid:3)i tionofthe RPA eigenproblemforsystemswithextended −2C R −2C λ S [2C (T +λ U )] , p pq p q pq p pq q pq o(cid:14) conjugation. RelativetoPPV,thegreaternumericalsen- with an analogous update for the q-channel obtained by sitivityencounteredwiththenanotubeseriesisconsistent swappingsubscripts. Asthe solutiondecouples(S ,T withthegroundstateproblem,whereasmallerbandgap pq pq andU becomesmall)thestepsarefoundindependently. andgreateratomicconnectivitytypicallydemandtighter pq QUIRQI has been implemented in FreeON [2], which thresholds. employsthelinearscalingCoulombandHartree-Fockex- QUIRQI exploits decoupling of the Tsipper functional change kernels QCTC and ONX with cost and accuracy into nearly independent, pseudo-Hermitian quotients 4 leading to aggressiveconvergencerates equivalent to the 2.8 160 frτSitsuee2creslhnlpywτote2htacHeotr.vtueatzrIgrotmhiianc,mittaeiidanqoautunnrbeaaiexTlltitowsDtyyriArsut[ithn1,ge3comwra]re.ohatsiutipoliTseecncahert(lerelτsyrtmmeooitramxpti)bhnrp.ooeirnpuoHsgenvcordervtwdesiaeeersbnbviayieapntrsirgt,eoeidgQnspheaaUontlrnetIawnRmtoiiQnhtpehgeI-- on Energy (eV) 222...67755 Total Time (hours) 11124680240000000 SpAOMNMX((τmτ2tex==11QDDC--T65C)) portunities for more precise error control as suggested ti 2.6 0 ta 0 500 1000 1500 by Rubensson, Rudberg and Salek [12]. Further, these i Atoms c propertiesareexpectedtoholdevenforthemostgeneral Ex 2.55 A SCF models, with the only difference being an increas- RP 2.5 ipnrgelfyacltoocralwizietdh tarnanisnictrioenasidnegnsDitFyTmcaotrmixpoannedntla.rgFeirnaclolsyt, 2.45 ((ττmmttxx==11DD--45,,ττ22ee==11DD--56)) 4 8 12 16 20 24 a variational approach allows considerable flexibility in nm the path to solution, as errors due to approximationcan be overcome by optimization, offering opportunities for Figure3: Approachtothebulklimitofthefirstexcitedstate ofthe(4,3)carbonnanotubesegmentatthe3-21G/RPAlevel single precisionGPU acceleration,variablethresholding, of theory, with inset showing linear scaling cost for HF ex- incremental Fock builds as well as extrapolation tech- change(ONX),sparselinearalgebra(SpAMM)andCoulomb niques. sums (QCTC). 3.45 90 ONX(τ2e=1D-5) ) 3.4 rs) 7800 SSppAAOMMNMMX(((ττmmτ2ttexx===111DDD---645))) Kgle.rN, eCm.eJt.h,TAym. cMza.kN, .anNdikVla.ssWone,beAr., O20d0e9l,l, EF.reSecOhNwe(- (eV 3.35 hou 60 Sep-18-2009 ), Los Alamos National Laboratory (LA- gy e ( 50 CC-04-086, Copyright University of California)., URL ner 3.3 Tim 40 http://www.nongnu.org/freeon/. n E al 30 [3] Challacombe, M., 1999, J. Chem. Phys.110, 2332. tio 3.25 Tot 20 [4] Challacombe, M., 2000, Comp. Phys.Comm. 128, 93. ta 10 [5] Furche, F., 2008, J. Chem. Phys.129(11), 114105. i A Exc 3.2 0 0 500 Atoms 1000 [6] H05a6r4l,01J.., and G. Kresse, 2009, Phys. Rev. Lett. 103(5), P R 3.15 ((ττmmttxx==11DD--45,,ττ22ee==11DD--56)) [7] I2g0u0m7,eJn.shCchheemv,. PKh.yIs..,1S2.7T(1re1t)i,a1k1,4a9n0d2.V. Y. Chernyak, 3.1 [8] Lu, D., Y. Li, D. Rocca, and G. Galli, 2009, Phys. Rev. 0 10 20 30 40 50 60 Lett. 102(20), 206411. nm [9] Magyar, R. J., and S. Tretiak, 2007, J. Chem. Theor. Comp. 3, 976. Figure2: ApproachtothebulklimitofthePPVfirstexcited [10] Narita,S.,andT.Shibuya,1992,Can.J.Chem.70,296. stateatthe6-31G**/RPA leveloftheory,withinsetshowing [11] Niklasson, A. M. N., C. J. Tymczak, and M. Challa- linear scaling cost for HF exchange (ONX) and sparse lin- combe, 2003, J. Chem. Phys.118, 8611. earalgebra(SpAMM).ThecostofCoulombsumswithmuch [12] Rubensson, E. H., E. Rudberg, and P. Salek, 2008, J. tighter thresholds are comparable to those for theSpAMM. Math. Phys.49(3), 032103. [13] Schwegler, E., M. Challacombe, and M. Head-Gordon, This work was supported by the U.S. Department of 1997, J. Chem. Phys. 106, 9708. Energy and Los Alamos LDRD funds. Los Alamos Na- [14] Simoncini, V., 2005, SIAMJ. Num.Anal. 43(3), 1155 . tional Laboratory was operated by the Los Alamos Na- [15] Simoncini, V., and D. B. Szyld, 2007, Num. Lin. Alg. tional Security, LLC, for the National Nuclear Security App.14(1), 1 . Administration of the U.S. Department of Energy under [16] Song,J.-W.,M.A.Watson,andK.Hirao,2009,J.Chem. Phys. 131(14), 144108. Contract No. DE-AC52-06NA25396. Special acknowl- [17] Tamura, T., and T. Udagawa, 1964, Nuc. Phys.53, 33. edgmentsgotheInternationalTenBarCaféforscientific [18] Toulouse,J.,I.C.Gerber,G.Jansen,A.Savin,andJ.G. vlibations, and to Sergei Tretiak and Nicolas Bock for Ángyán,2009, Phys.Rev.Lett. 102(9), 096404. valuable input. [19] Tretiak, S., C. M. Isborn, A. M. N. Niklasson, and M. Challacombe, 2009, J. Chem. Phys. 130(5), 054111. [20] Weber, V., A. M. N. Niklasson, and M. Challacombe, 2004, Phys.Rev.Lett. 92, 193002. [21] Yam, C. Y., S. Yokojima, and G. Chen, 2003, J. Chem. ∗ Electronic address: [email protected] Phys. 119(17), 8794. [1] Autschbach,J., 2009, Chem. Phys. Chem. 10, 1757. [22] Yokojima, S., and G. H. Chen, 1999, Chem. Phys. Lett. [2] Bock, N., M. Challacombe, C. K. Gan, G. Henkelman, 300(5-6), 540.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.