ebook img

Robust Mixing for Ab-Initio Quantum Mechanical Calculations PDF

0.55 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robust Mixing for Ab-Initio Quantum Mechanical Calculations

APS/123-QED Robust Mixing for Ab-Initio Quantum Mechanical Calculations L. D. Marks1 and D. R. Luke2,∗ 1Department of Materials Science and Engineering, Northwestern University, Evanston, IL 60201. 2Department of Mathematical Sciences, University of Delaware, Newark DE 19716-2553, USA. (Dated: version 0.98—January 10, 2008) 8 0 0 We study the general problem of mixing for ab-initio quantum-mechanical problems. 2 Guided by general mathematical principles and the underlying physics, we propose a mul- n a tisecant form of Broyden’s second method for solving the self-consistent field equations of J 0 Kohn-Shamdensityfunctionaltheory. Thealgorithmisrobust,requiresrelativelylittlefine- 2 tuning and appears to outperform the current state of the art, converging for cases that ] i defeat many other methods. We compare our technique to the conventional methods for c s problems ranging from simple to nearly pathological. - l r t PACS numbers: PACS: 71.15.-m, 02.70.-c, 31.15.-p, 31.15.ec m . t a m I. INTRODUCTION - d n We consider the problem of determining the electron density ρ that satisfies the self-consistent o c field equations according to the Kohn-Sham density functional theory1,2: [ 1 v (H0+Vρ)φi = (cid:15)iφi (I.1a) 8 9 0 (cid:88)(cid:16) (cid:17)−1 3 ρ(x) = 1+eβ((cid:15)i−µ) |φ (x)|2. (I.1b) i . 1 i 0 8 Here H0 is the single-particle noninteracting Hamiltonian and Vρ is an effective potential param- 0 : eterized by the particle density ρ. The constant β is 1/kT where k is Boltzmann’s constant and v Xi T is temperature. The term (1+eβ((cid:15)i−µ))−1 is the Fermi-Dirac occupation and the constant µ is ar determined by (cid:82) ρ(x)dx = N for an N-body problem. Following3 we let Hρ := H0+λVρ60 denote the Kohn-Sham Hamiltonian and reformulate the above system of equations as a nonlinear fixed point problem: find ρ such that (cid:16) (cid:17)−1 F(ρ)(x) := 1+eβ(Hρ−µ) (x,x) = ρ(x) (I.2) where µ is the unique solution to N = trace((cid:0)1+eβ(Hρ−µ)(cid:1)−1). We refer to the operator F above as the self-consistent field (SCF) operator. We will not be concerned with the details of the SCF 2 operator or its approximations since these tend to be specific to the application. Also, we will work with the discretized version of the SCF operator, which we will call the SCF mapping since it is a real vector-valued mapping of the discretized density. Throughout this work, however, we will point to instances where the form of this mapping can cause problems for numerical procedures for solving Eq.(I.2). Numerical algorithms for solving Eq.(I.2) abound – the representative examples we focus on here are4,5,6,7,8,9,10,11. These are iterative procedures and the process of determining the desired densityρfrom previousestimates has come tobe knownas “mixing” inthe physical literature. For ab-initio methods there is frequently a user-provided mixing term which, if it is improperly chosen, will lead to divergence of the iterations. In many cases the user has to learn by failure what is the correct value to use, expending a fair amount of computer resources in the process. We will show that many of the methods found in the physical literature have counterparts in the mathematical literature thus providing a more systematic approach to the choice of algorithm parameters. The goal of this work is the development of a method that does not require expert user input, is fast, and can handle many of the more complicated and poorly convergent problems such as metallic surfaces or heterostructures that can defeat a novice and sometimes an expert. In the next section we provide detailed background both to the mathematical literature on methods for solving non-linear equations, as well the physical literature on mixing. There are two major “branches” of numerical techniques distinguished by the space in which they are derived. However we show that all of the different methods are generated by solving an optimization prob- lem of the same form. In Section III we detail our proposed algorithm for solving Eq.(I.2). We show in Section III that most algorithms are predicated upon fixed point mappings with a great deal of regularity, in particular monotone mappings. Motivated by the prospect that most systems of physical interest do not lead to convex SCF mappings, the principal insight that yields our improvements is to treat the prior steps as random samples in a higher-dimensional space rather than deterministic steps in a path to a fixed point. In effect, the Kohn-Sham Hamiltonian eval- uated at an electron density far from equilibrium can vary in a semirandom and perhaps chaotic fashion. This viewpoint leads to a natural interpretation of the updates in terms of predicted and unpredicted components. Controls on the algorithm are exerted through parameter choices that affect primarily the unpredicted step. The rest of the section involves technical considerations for controlling step sizes and safeguarding against instabilities. We introduce the idea of using proxi- mal mappings12 to account for problems associated with near-linear dependencies and, at the same time, to safeguard against uncertainties in the model analogous to a classic Wiener filter. This 3 regularization has the advantage that it also acts similarly to a classic trust region technique in numerical optimization. In Section IV we present numerical results for both very easy problems as well as semi-pathological cases. The new approach outperforms existing algorithms in most cases, and does significantly better with poorly constructed Kohn-Sham mappings. The algorithm is also relatively insensitive to user input. We conclude with a discussion of some of the open issues. II. ITERATIVE METHODS FOR SOLVING NONLINEAR EQUATIONS We wish to determine the electron density ρ with fixed atom locations. The density is a ∗ real-valued vector with k elements. With an estimated density ρ at the n-th step of an iterative n procedure for determining ρ , we check whether our estimate satisfies the ab-initio self-consistent ∗ field(SCF)equationsgivenbyEq.(I.2). EvaluationoftheSCFmappingreturnsamodifieddensity ρ(cid:48) := F(ρ ), another real-valued vector with k elements. The density we seek is a fixed point of n n F, i.e., we solve the system of non-linear equations F(ρ )−ρ = 0. (II.1) ∗ ∗ This suggests the usual Newton algorithm as a possible numerical solution strategy. A. Newton’s Method Expressing the mapping in Eq.(II.1) by its Taylor series expansion centered on a fixed point ρ ∗ yields F(ρ)−ρ = (J(ρ )−I)(ρ−ρ )+O(|ρ−ρ |2) (II.2) ∗ ∗ ∗ where J(ρ ) is the Jacobian of F (supposing this is well-defined) at ρ and I is the identity ∗ ∗ mapping. The Jacobian J is a k-by-k real-valued matrix. Given a point ρ , Newton’s method n generates the next point ρ by n+1 ρ = ρ −(J(ρ )−I)−1(F(ρ )−ρ ). (II.3) n+1 n n n n Under standard assumptions, this iteration can be shown to converge quadratically in a neigh- borhood of a local solution13. The computational cost of calculating the Jacobian and inverting is, however, prohibitive for high-dimensional problems such as density functional calculations. 4 B. Matrix Secant Methods A classical approach to avoid computing and inverting the Jacobian is via solutions to the matrix secant equation: (J(ρ )−I) ≈ B where n n (cid:16) (cid:17) B (ρ −ρ ) = (F(ρ )−ρ )−(F(ρ )−ρ ) (II.4) n n n−1 n n n−1 n−1 Introducing new variables, a conventional reduction, this is compactly represented as B s = y (II.5) n n−1 n−1 or H y = s (II.6) n n−1 n−1 where H = B−1 and n n s = ρ −ρ and y = (F(ρ )−ρ )−(F(ρ )−ρ ). (II.7) n−1 n n−1 n−1 n n n−1 n−1 The next density ρ is then generated either by the recursion n+1 ρ = ρ −B−1(F(ρ )−ρ ) (II.8) n+1 n n n n where B satisfies Eq.(II.5), or by n ρ = ρ −H (F(ρ )−ρ ) (II.9) n+1 n n n n where H satisfies Eq.(II.6). Sequences generated by Eq.(II.8) or Eq.(II.9) are known as quasi- n Newton methods. The unknowns in Eq.(II.5) and Eq.(II.6) are the matrices B and H , hence n n these are systems of k linear equations in k2 unknowns. There are infinitely many solutions to the matrix secant equation and each different solutionleads to a different numerical method for finding the fixed point of F. A common approach in the literature on density functional calculations, due to Srivastava9, is based on the Broyden rank one matrix updates14. Other common updates in the optimization literature are the symmetric rank one (SR1) and the BFGS updates15. Our focus in this study is on improvements in the context of Broyden updates, however the basic principles outlined here extend more generally to other matrix secant methods. C. Rank One Updates AnewmatrixB isobtainedbyupdatinginsomefashionB usingthenewdatapair(s ,y ) n+1 n n n combined with the prior information (s ,y ),(s ,y ),...,(s ,y ) subject to the constraint that 0 0 1 1 n−1 n−1 5 B satisfy Eq.(II.5). In his original paper Broyden14 looked at two approaches. The first, n+1 normally referred to as Broyden’s “good” method (GB), finds the nearest matrix to B with n respect to the Frobenius norm (i.e. the square root of the sum of squares of the matrix entries) that solves Eq.(II.5) (this property, due to Dennis and Mor´e16 is slightly different than Broyden’s original interpretation). This is given explicitly by (y −β B s )sT B = B + n n n n n. (II.10) n+1 n β ||s ||2 n n √ Here and throughout this work the norm ||s|| = sTs is the Euclidean norm and a vector (un- derstood to be a column vector) or matrix raised to the power T indicates the transpose. The scalar β > 0 is a step size parameter which has been retained for formal consistency although n it is normally taken to be unity. This method has a simple recursion for its inverse (needed in Eq.(II.8)): (cid:0)β s −B−1y (cid:1)sTB−1 B−1 = B−1+ n n n n n n . (II.11) n+1 n (sTB−1y ) n n n Broyden’s first method is shown in16 (Theorem 5.2) to converge locally superlinearly under the standard assumptions that the Jacobian exists, is nonsingular, and Lipschitz continuous at the solution. The second method proposed by Broyden has the attractive feature that it is a recursion on the inverse Jacobian H and is given by n (β s −H y )yT H = H + n n n n n . (II.12) n+1 n ||y ||2 n Note that our sign convention is different to the usual sign in the literature where H = −B−1 . n+1 n+1 UpdateEq.(II.12)wasnotrecommendedbyBroydenandsubsequentlybecameknownasBroyden’s “bad” method (BB) primarily, in our opinion, because the test problems to which the methods were applied favored Eq.(II.10). Analogously to GB, the BB update finds the nearest matrix to H with respect to the Frobenius norm that solves Eq.(II.6). n Generalizing the two methods suggested by Broyden one could consider updates of the form (y −B s )vT B = B + n n n n . (II.13) n+1 n vTs n n Barnes17 and Gay and Schnabel18 proposed an update of this form where v is the projection of n s onto the orthogonal complement of [s ,s ,...,s ] (0 ≤ m < n). This strategy, known n n−m n−m+1 n as projected updates, will be discussed in more detail in the next subsection. 6 AvariationofEq.(II.12)fortheinverserecursionH wasproposedbyMartinezandZambaldi19 n that takes the form (s −H y )eT n n n n,j H = H + (II.14) n+1 n eT y n,j n (cid:12) (cid:12) wheree isthej’thcolumnoftheidentitymatrix,chosensothat(cid:12)eT y (cid:12) = (cid:107)y (cid:107) where(cid:107)y (cid:107) is n,j (cid:12) n,j n(cid:12) n ∞ n ∞ the component of y with largest absolute value. The matrix update then differs from the previous n update only in the j’th column. Numerical experience with this update is very favorable20,21,22. Recall that the GB and BB updates are the matrices nearest to the previous matrices with repsecttotheFrobeniusnormthatsatisfythematrixsecantequation. Themaindifferencebetween the GB and BB methods is the space in which the “nearest” criterion is applied23. For GB the criterion is applied in the domain of the mapping, while BB is applied in the range, where the domain of the mapping is the space of the density differences s and the range is the space of the n residual differences y . We see from Eq.(II.8) that an ill-conditioned matrix update B will lead n n to a large and numerically unstable estimation of the step s since, to compute this one needs to n invert B . On the other hand, from Eq.(II.9) it is clear that a least change criterion in the space of n the residual differences y will lead to smaller steps that could slow progress for well-conditioned n problems. Inotherwords,theBBalgorithmisinherentlymoreconservativethantheGBalgorithm. The advantage of this property is not apparent until one is faced with ill-conditioned or ill-posed problems – see Section III for a discussion in the context of the physics of a DFT problem. In this respect, the distinction between Broyden’s first and second methods is analogous to the distinction between backward, or implicit, and forward techniques for numerical solutions to stiff differential equations. We formulate these statements more precisely below. TherecursionsEq.(II.10)-(II.13)areallone-steprecursionsintermsofthemostrecentJacobian estimate. Mostpracticalimplementationsmakeuseofmulti-steprecursionsontheinverseJacobian that allow one to avoid explicitly forming the matrix. In particular, the recursion for Eq.(II.11) with β = 1 (j = 0,1,2,...), can be written as24 (Theorem 6.2) j B−1 = B−1−(cid:0)B−1Y −S (cid:1)(cid:0)L +STB−1Y (cid:1)−1STB−1 (II.15) n+1 0 0 n n n n 0 n n 0 where S := [s ,s ,s ,...,s ], Y := [y ,y ,y ,...,y ] (k-by-(n+1) matrices) (II.16) n 0 1 2 n n 0 1 2 n 7 and  −sTi−1sj−1 if i > j (L ) := . n i,j 0 else For the multi-step recursion of H an elementary calculation yields the following recursion for n+1 Eq.(II.12):   n n n (cid:89) (cid:88) (cid:89) Hn+1 = H0 Wj − Zj Wi. (II.17) j=0 j=0 i=j+1 where the products ascend from left to right with the empty product defined as 1, and y yT s yT j j j j W := I − and Z := β . j (cid:107)y (cid:107)2 j j(cid:107)y (cid:107)2 j j We prefer this recursion over the recursion proposed by Srivastava9 because, as we shall see in the followingsections, wegainvaluableinsightintothegeometryoftheoperationsforthesamestorage requirements. Srivastava’s formulation was implimented for LAPW code by11, although extensive numerical experience in the Wien2k code indicates that H needs to be adjusted dynamically, for 0 reasons that will be clearer later. In the recursions Eq.(II.15) and Eq.(II.17) the initial matrix, B and H respectively, is crucial. 0 0 Several authors have studied optimal initializations23,25,26,27,28,29. We explore scalings in greater detail in Subsection IIIC. A few papers have also explored strategies for combining the updates Eq.(II.15) and Eq.(II.17) in order to take advantage of the strengths of each. IP and Todd23 derive an update that is a convex combination of GB and BB with weighting determined explicitly so as to yield an optimally conditioned matrix. Their method is locally superlinearly convergent23 (Corollary 11) under standard assumptions and the additional conditions that the initial approx- imate Jacobian, B , is close to the true Jacobian at the solution and that the sequence of matrix 0 updates and their inverses are uniformly bounded. Martinez30 switches between GB and BB based on a simple criterion that estimates which method will give the best performance at a given step. Numerical experiments showed this to be a promising approach, but rates of convergence were not pursued. D. Multisecant Methods In the context of path following algorithms, it is natural to look only at local information in ordertoconstructastepdirectionandlength. Togeneratethen+1-thJacobianapproximationthe 8 methods described above satisfy the matrix secant equation Eq.(II.5) or Eq.(II.6) for the current step s and residual difference y . However, for highly nonlinear problems, where iterates may n n belong to domains of attraction of different solutions, it is perhaps more appropriate to view the previous data as samples of an unknown process in a high dimensional space. In this context, updating the Jacobian based only on the most recent sample and ignoring the other sample points imposes a bias. Invoking some of the terminology used for stochastic optimization, searching for the nearest matrix that satisfies the matrix secant equation only for the most recent sample point is a greedy strategy without recourse. Multisecant techniques put the previous data on more equal footing with the most recent steps; that is, rather than satisfying the matrix secant equation for only the most recent step one satisfies all matrix secant equations simultaneously: Y = B S , (II.18) n n n where S = [s ,s ,...,s ] and Y = [y ,y ,...,y ] are k-by-m (m ≤ min{n,k}) ma- n 1,n 2,n m,n n 1,n 2,n m,n trices whose columns are previous steps and residual differences respectively. Multisecant tech- niques have been studied by several authors in the mathematical literature over the last 48 years15,17,18,24,31,32,33,34,35,36. A few authors in the physical sciences5,6,7,8,10 independently derived updates that, we will show, are simple relaxations of more conventional multisecant methods. For our application the dimension of the problem, k, is much larger than the number of iter- ations, n, so we will simply consider m matrix secant equations with m < n. We do not as yet specify the composition of S and Y since there are many options. The construction given in n n Eq.(II.7) is conventional for matrix secant methods; an alternative construction centers the steps on the initial point ρ rather than the previous step as in Eq.(II.7): 0 s = ρ −ρ and y = (F(ρ )−ρ )−(F(ρ )−ρ ), (j = 0,1,...,n−1). (II.19) j,n j 0 j,n j j 0 0 The method we propose in the following section is centered on the current point s = ρ −ρ and y = (F(ρ )−ρ )−(F(ρ )−ρ ) (j = 0,1,...,n−1). (II.20) j,n j n j,n j j n n Many multisecant methods are easily understood by formulating the underlying optimization problem each of the approximate Jacobians (implicitly) solves. We consider first the optimization problem α minimize (cid:107)A−X(cid:107)2+ι (X) (II.21) C X∈Rk×k 2 9 where, throughout, the norm of a matrix is the Frobenius norm, A is a real k×k matrix and D,G are real k×m matrices such that the set C defined below is nonempty: (cid:110) (cid:111) C := X ∈ Rk×k such that XD = G . (II.22) The corresponding indicator function of C, ι , is defined by C  0 if X ∈ C ι (X) = C ∞ else. The solution to the optimization problem Eq.(II.21) expressed as (cid:110)α (cid:111) argmin (cid:107)A−X(cid:107)2+ι (X) (II.23) X 2 C is the prox mapping12,37 of the indicator function to C at A: (cid:110)α (cid:111) prox (A) := argmin (cid:107)A−X(cid:107)2+ι (X) . 1/α,ιC X 2 C It is an elementary exercise (see, for instance,38 (Chapter 1, Section G)) to show that prox (A) = P (A) (for all α > 0) 1/α,ιC C where P (A) = argmin {(cid:107)A−X(cid:107)} C X∈C is the projection of A onto C. This projection has a simple closed form so long as the columns of D are linearly independent: P (A) = A+(G−AD)(cid:0)DTD(cid:1)−1DT. (II.24) C Specializingtomultisecants,ifA = B ,then’thapproximationtotheJacobian,D = S ∈ Rk×m 0 n and G = Y ∈ Rk×m (1 ≤ m ≤ n), the columns of which are denoted y and s respectively n j j (j ∈ [0,n]), then we arrive at the multisecant extension of the Broyden’s first update (MSGB) as studied by17,18,32,35: B = P (B ) = B +(Y −B S )(cid:0)STS (cid:1)−1ST. (II.25) n+1 C 0 0 n 0 n n n n Elementary calculations using the Sherman-Morrison-Woodbury formula yield the multi-step re- cursion for B−1 , analogous to Eq.(II.15), n+1 B−1 = B−1+(cid:0)S −B−1Y (cid:1)(cid:16)(STS )−1STB−1Y (cid:17)−1(STS )−1STB−1 (II.26) n+1 0 n 0 n n n n 0 n n n n 0 10 For the update Eq.(II.13) v is the projection of s onto the orthogonal complement of n n [s ,s ,...,s ] (0 ≤ m < n) where s and y are given by Eq.(II.7). Here the update n−m n−m+1 n j j B will satisfy m+1 consecutive secant equations B s = y (j = n−m,...,n). Methods n+1 n+1 j j based on this approach are called projected secant updates. This idea, however, seems not to have benefited from the endorsement of prominent researchers at the time, and hence there is little numerical experience to recommend it. A notable exception is the work of Martinez and collaborators22,34,39,40. Sequences based on update Eq.(II.25) are shown in35 (Theorem 2.5) to be locally q-superlinearly convergent if, in addition to other standard assumptions, the approximate Jacobians, B stay close to the behavior of the true Jacobian, and if the columns of S are strongly n n linearly independent as measured by the condition number (cid:13) (cid:13) κ(S ) := (cid:107)S (cid:107)(cid:13)(cid:0)STS (cid:1)−1ST(cid:13). n n (cid:13) n n n(cid:13) An alternative specialization of Eq.(II.24) leads to a multisecant form of Broyden’s second method (MSBB) if we let A = H , D = Y and G = S so that 0 n n H = P (H ) = H +(S −H Y )(cid:0)YTY (cid:1)−1YT. (II.27) n+1 C 0 0 n 0 n n n n To our knowledge, there are no published numerical comparisons of Eq.(II.27) to alternatives, nei- ther is there any published convergence theory, though we believe this is only a minor modification of the theory for Eq.(II.25). Yet another specialization of Eq.(II.24) is a simplex gradient technique41 for vector-valued func- tions proposed by Gragg and Stewart33 and implemented in34. Here the Jacobian update is given by B = P (0) = Y (STS )−1ST. (II.28) n+1 C n n n n where A = 0, D = Y and G = S where the steps s and residual differences y are centered on n n n n the initial point ρ as in Eq.(II.19) . 0 Independentstudiesappearinginthephysicsliteraturethatparallelthemathematicalliterature have a different variational form. The various approaches can all be shown to be specializations of the the following optimization problem n minimize 1 (cid:88)α dist 2 (X)+ α0(cid:107)A−X(cid:107)2 (II.29) X∈Rk×k 2 j Cj 2 j=1 where each C is a set of the form Eq.(II.22) and A ∈ Rk×k, and dist (X) is the Euclidean j Cj

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.