ebook img

A Density Matrix-based Algorithm for Solving Eigenvalue Problems PDF

0.19 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Density Matrix-based Algorithm for Solving Eigenvalue Problems

A Density Matrix-based Algorithm for Solving Eigenvalue Problems Eric Polizzi∗ Department of Electrical and Computer Engineering, University of Massachusetts, Amherst (Dated: January 17, 2009) A new numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques, and takes its inspiration from the contour integration and density matrix representation in quantum mechanics. It will be shown that this new algorithm - named FEAST - exhibits high efficiency, robustness, accuracy andscalability onparallel architectures. Examplesfrom electronicstructurecalculations ofCarbon nanotubes(CNT) are presented, and numerical performances and capabilities are discussed. 9 PACSnumbers: 02.60.-x,02.70.Hm,02.70.-c,02.10.Ud,31.15.-p,71.15.Dx 0 Keywords: diagonalizationtechnique, contour integration,electronicstructurecalculations 0 2 I. INTRODUCTION FEAST- which deviates fundamentally from the tradi- n tional techniques above, and takes its inspiration from a the density matrix representation and contour integra- J The generalized eigenvalue problem, that is the de- tion in quantum mechanics. Section II summarizes the 7 termination of nontrivial solutions (λ,x) of Ax=λBx electronic structure and contour integration problems 1 with A and B square matrices, is a central topic in nu- merical linear algebra and arises from a wide range of which have motivated the development of the new al- ] gorithm. The FEAST algorithm is then described in de- applications in sciences and engineering. In electronic E tail in Section III, and numerical examples and perfor- structurecalculations,inparticular,theeigenvalueprob- C mance results are presented in Section IV. Finally, Sec- lem is one of the most challenging applied numerical . tionVpresentssomediscussionsregardingtheefficiency, s process - also called diagonalization procedure or spec- c robustness and scalability of the algorithm. tral decomposition. In these calculations, the electron [ density can be formally calculated by summation of the 1 amplitude square of the wave functions Ψm solution of v II. THE CONTOUR INTEGRATION 5 theSchr¨odinger-likeeigenvalueproblemHΨm =EmSΨm TECHNIQUE IN ELECTRONIC STRUCTURE 6 with different discrete energies Em (where H represents CALCULATIONS the Hamiltonian Hermitian matrix and S is a symmet- 6 2 ric positive matrix obtained using non-orthogonal basis Although new fast sparse solvers have allowed consid- . functions). Thisprocedurecanbe quitecomputationally 1 challengingforlarge-scalesimulationsofsystemscontain- erable time savingfor obtaining the eigenpairs(Em,Ψm) 0 inelectronicstructurecalculations,suchastheRayleigh- ing more than a hundred of atoms and/or where a large 9 quotient multigrid [3] developed for the MIKA package, 0 number of eigenpairs (Em,Ψm) are needed. Progress in or the parallel Chebyshev subspace iteration technique : electronic structure calculations as for other large-scale v developed for the PARSEC package [4, 5], these calcula- modern applications, are then much likely dependent on i tionsarestillconsideredcomputationallyextremelychal- X advances in diagonalizationmethods. lenging and linear scalability is not easily achievable. r In the past decades, the eigenvalue problem has led a An alternative approach to the Schr¨odinger picture to many challenging numerical questions and a central for obtaining the electron density consists in perform- problem[1]: how canwe compute eigenvalues andeigen- ing a contour integration of the diagonal elements of vectorsinanefficientmannerandhowaccuratearethey? theGreen’sfunctionmatrixG(Z)=(ZS−H)−1overthe Powerful tools have then been developed from Jacobi complex energy space [6]. At zero temperature, the re- method and power iterations, to iterative Krylov sub- sulting expression for the electron density in real-space space techniques (including Arnoldi, and Lanczos meth- is: ods), or other Davidson-Jacobi techniques [2]. Tradi- tional numerical algorithms and library packages are yet n(r)=− 1 dZ G(r,r,Z)= |Ψm(r)|2, (1) facing new challenges for addressing the current large- 2πıZ C Xm scale simulations needs for ever higher level of efficiency, accuracyandscalabilityinmodernparallelarchitectures. where the complex contour C includes all the eigenval- This article presents a new robust and scalable algo- ues Em below the Fermi level EF, and where the spin rithm design for solving the eigenvalue problem- named factor is not considered. It should be noted that at non zerotemperatures,thisexpressionwouldalsoincludethe contribution of the residues of all poles of the Fermi- Dirac distribution function on the imaginary axis at the ∗URL:http://www.ecs.umass.edu/ece/polizzi position of the Fermi level [7]. For transport problems 2 and open systems, in turn, the contour integration is of- inquantummechanics. Onecanshowthatthisfactoriza- ten used to compute the equilibrium partof the electron tioncanbeexpressedintermsoftheeigenvectorspresent density[8]whereself-energyboundaryconditionsneedto inside the contour as follows: be included in the Hamiltonian matrix H. The contour integrationtechniquerepresentsapriorianattractiveal- ternative approachto the traditionaleigenvalue problem 1 M ρ=− dZ G(Z)== |x ihx |. (3) for computing the electron density since the number of 2πıZ m m Green’s function to be calculated -typically ∼O(10) us- C mX=1 ing a Gauss quadratureprocedure-is independent of the Inmatrixnotationsthesecondtermoftheequationreads size of the system. In particular, an efficient linear scal- XXT where X = {x ,x ,..x } (M being the num- ing strategy CMB (CMB stands for Contour integration N×M 1 2 M ber of eigenvalue inside the contour and N the size of - Mode approach- Bandedsystemsolver),has been pro- G). It should be noted that the diagonal elements of ρ posed in [9, 10] for simulating nanowire-type structures representthe electron density in quantum mechanics (1) withinareal-spacemeshframeworkwhileovercomingthe discussed in Section II. impossiblenumericaltaskofinvertingalargesizematrix PostmultiplyingρbyasetofMlinearlyindependentran- at each point of the contour. For arbitrary systems (i.e. dom vectors Y = {y ,y ,..y }, the first expres- beyond nanowire structures), however, there are no nu- N×M 1 2 M sion in (3) leads to a new set of M independent vectors merical advantages of abandoning the traditional eigen- Q ={q ,q ,..q } obtained by solving linear sys- value problem in favor of the contour integration tech- N×M 1 2 M tems along the contour nique for computing the electron density. In addition, it is clear from equation (1) that the contour integration 1 technique does not provide a natural route for obtaining Q =− dZ G(Z)Y , (4) N×M 2πıZ N×M the individual eigenvectors but rather the summation of C their amplitudes square. In the following section, a new while the second expression in (3), implies these vectors numerical algorithm design FEAST is proposed for ob- Q can be formally generated by the eigenfunctions X tainingdirectlytheeigenpairssolutionsusingthedensity inside the contour matrixrepresentationandanumericallyefficientcontour integration technique. Q =X W with W =xTy . (5) N×M N×M M×M i,j i j In other words, each Q column vector obtained in (4) III. FEAST representsa different linearcombinationof unknownba- sis functions X in (5). Using a Rayleigh-Ritz proce- A. Introduction dure,theproblem(2)isnowequivalenttocomputingthe eigenpairs(ǫ ,Φ ) of the following reduced generalized m m Inthissection,anewalgorithmispresentedforsolving eigenvalue problem of size M: generalized eigenvalue problems of this form A Φ=ǫB Φ (6) Ax=λBx, (2) Q Q with A =QTAQ and B =QTBQ. (7) Q Q withinagiveninterval[λmin,λmax],whereAisrealsym- metric or Hermitian and B is a symmetric positive def- The Ritz values and vectors are then given by: inite (s.p.d). One common way to accelerate the con- vergence rate of traditional iterative techniques consists λ =ǫ , m=1,...,M (8) m m in performing a factorization of the Green’s function X =Q Φ (9) G(σ) = (σB −A)−1 for some reasonable shift σ close N×M N×M M×M to the eigenvalues in the search interval and which leads where Φ = {Φ ,Φ ,..Φ }. One can show that the to solving linear systems (i.e. shifting strategy). More M×M 1 2 M obtainedeigenvectorsXarenaturallyB-orthonormali.e. recently, Sakurai et al. [11, 12] have proposed a root xTBx =δ , if the eigenvectors of the reduced problem finding technique which consists of a contour integration i j i,j (6) are B -orthonormal i.e. ΦTB Φ =δ . of a projected Laurent series-type decomposition of the Q i Q j i,j Green’sfunction. Inprinciple,asetofcomplexmoments can be obtained by solving few linear systems along the B. Practical Considerations and Pseudocode contour,whichcangenerateanidenticalsubspace to the one spanned by the eigenvectors present inside the con- tour. In practice, however, robustness and accuracy are Inpractice,thevectorsQarecomputedbyperforming not easily achievable. In our approach, we avoid decom- a numericalintegrationof eachvectorsG(Z)Y (4) along posingdirectlythe Green’sfunctionandperforminstead the complex contour C. Let us consider a circle centered an exact mathematical factorization of its contour inte- inthemiddleofthesearchinterval[λmin,λmax],itshould gration - which represents the reduced density matrix ρ be noted that the expression of the contour integration 3 + (ii) Postmultiplying the density matrix (3) by M0 ran- C dom vectors (rather than M) where M0 is greater than M. The reduced dense generalized eigenvalue problem (6) of size M0 can be solved using standard eigenvalue LAPACK routines [13]. Since we do not perform the or- thogonalization of the vectors Q, one has to make sure thatBQM0,M0 issymmetricpositivedefinitei.e. M0 does λmin λ λmax not exceed an upper limit which can easily be obtained a posteriori. FIG.1: Schematicrepresentationofthecomplexcontourinte- graldefinedbythepositivehalfcircleC+. Inpractice,thevec- torsQarecomputedviaanumericalintegration(e.g. Gauss- 1- Select M0 >M random vectors YN×M0 ∈RN×M0 nLeeegdensdtroebqeusaodlvraetduraet)swpehceirfiecopnoliyntfsewZelianleoanrgstyhsteecmosntGou(Zr.)Y 2- Set Q=0 with Q∈RN×M0; r=(λmax−λmin)/2; For e=1,...Ne can be further simplified since G(Z¯) = G†(Z). Denot- compute θe=−(π/2)(xe−1), ing C+ the positive half circle of the complex contour, it compute Ze=(λmax+λmin)/2+rexp(ıθe), comes if A is Hermitian: solve (ZeB−A)Qe =Y to obtain Qe ∈CN×M0 ρ=− 1 dZ {G(Z)−G†(Z)}, (10) compute Q=Q−(ωe/2)ℜ{rexp(ıθe) Qe} 2πıZC+ End and if A is real symmetric: 3- Form AQM0×M0 =QTAQ and BQM0×M0 =QTBQ 1 4- SolveAQΦ=ǫBQΦ to obtain theM0 eigenvalue ǫm, ρ=−π ZC+dZ ℑ{G(Z)}, (11) and eigenvectors ΦM0×M0 ∈RM0×M0 where ℑ{} stands for the imaginary part. Using a Ne- 5- Set λm =ǫm and computeXN×M0 =QN×M0ΦM0×M0 pointGauss-Legendrequadratureonthepositivehalfcir- If λm ∈[λmin,λmax], λm is an eigenvalue solution cle C+ (see Figure 1), with xe the eth Gauss node asso- and its eigenvector is Xm (themth column of X). ciated with the weight ω , one obtains if A is Hermitian and Y,Q∈CN×M: e 6- Check convergence for thetrace of the eigenvalues λm If iterative refinement is needed, compute Y=BX Q=− Ne 1ωer exp(ıθe)G(Ze)+exp(−ıθe)G†(Ze) Y, and go back to step 2 4 Xe=1 (cid:0) (cid:1) (12) FIG. 2: FEAST pseudocode (sequential version) for solving with the generalized eigenvalue problem Ax=λBx, where A is realsymmetricandBiss.p.d.,andobtainingalltheMeigen- r = λmax−λmin, θ =−(π/2)(x −1), (13) pairswithinagiveninterval[λmin,λmax]. Thenumericalinte- e e 2 gration is performed using Ne-point Gauss-Legendre quadra- th λmax+λmin ture with xe the e Gauss node associated with the weight Ze = 2 +rexp(ıθe). (14) ωe. Forthe case Ne=8, one can use: (x1,ω1)=(0.183434642495649,0.362683783378361), IfAisrealsymmetric,Y,Q∈RN×M andonecanuse: (x3,ω3)=(0.525532409916328,0.313706645877887), (x5,ω5)=(0.796666477413626,0.222381034453374), Ne 1 (x7,ω7)=(0.960289856497536,0.101228536290376), Q=− 2ωeℜ{rexp(ıθe) G(Ze)Y}, (15) and (x2i,ω2i)i=1,...,4 =(−x2i−1,ω2i−1) Xe=1 where ℜ{} stands for the real part. The performances of the basic FEAST algorithm will In order to reduce the numerical quadrature error of then depend on a trade off between the choices of the the contour integral,one may consider the two following number of Gauss quadrature points Ne, the size of the improvements: subspace M0, and the number of outer refinement loops. (i)Performingouter-iterativerefinementsteps. Oncethe So far, using M0 ≥ 1.5M, Ne = 8, and with at most 2 eigenvectorsXareobtained(9),anewsetofinitialguess refinement loops, we have consistently obtained a rela- vectors Y = BX can be used. Postmultiplying the den- tive residual equal or smaller than 10−10 seeking up to sity matrix (3) by Y, one now obtains from (5) that Q 1000 eigenpairs for a variety of problems. The basic converges to X since XTBX = I (i.e. W = δ and pseudocode for the FEAST algorithm is given in Fig- i,j i,j then ρBX = X). A fast test for convergence can be ure 2 in the case of A real symmetric. In the case of obtained by checking the trace of the eigenvalues (8). A complex Hermitian, we note the following changes: 4 Y,Q ∈ CN×M0, Φ ∈ CM0×M0, and the construction of TEST-CNT ARPACK FEAST the vectors Q in step-2 of the pseudocode must be mod- N=12,450 Time(s) Resid. Time(s) Resid. ified to satisfy (12). M=100 12.2 2.0∗10−11 7.8 4.5∗10−10 M=200 31 2.0∗10−11 14 5.5∗10−10 M=400 86 1.4∗10−11 21 1.8∗10−10 IV. NUMERICAL EXPERIMENTS M=800 213 4.5∗10−9 58 3.4∗10−11 In this section, we propose to demonstrate the numer- ical stability, robustness and scalability of the FEAST TABLE I: Simulations times and relative residual algorithm using three examples derived from electronic maxi(||Axi − λiBxi||1/||Axi||1), obtained by the solver ARPACK and FEAST on the TEST-CNT system seeking structure calculations of Carbon nanotube (CNT). M (lowest) eigenpairs for different search intervals. The simulations are performed on a Intel Clovertown (8cores, 1 node, 2.66GHz, 16Gb). The shift-strategy has been used in A. Example I ARPACK to accelerate the convergence (the regular mode would give ∼ 300s for M = 100). The inner linear systems Let us first consider a family of eigenvalue problems, in ARPACK and FEAST are both solved using the direct Test-CNT,obtainedusinga2DFEMdiscretizationofthe parallel solver PARDISO [15] on 8 cores. Finally, the size DFT/Kohn-Shamequations at a given cross-sectionof a of the subspace has been chosen to be M0 = 1.5M for both algorithms, and the number of contour points for FEAST is (13,0) CNT – the 2D atomistic potential is derived from fixedat Ne =8. themodeapproachusedintheCMBstrategyforsolving the full 3D problem presented in [9]. In Test-CNT, A is realsymmetric,andB iss.p.d., the sizeofbothmatrices that only one loop is necessary to obtain the eigenvalues isN=12,450andtheirsparsitypatternisidenticalwith with an accuracy of ∼10−5 or below. a number of non-zero elements nnz=86,808. In Table I, we report the times and relative residual obtained by the public domain eigenvalue solver pack- TEST-CNT Relative error on theTrace age ARPACK [14] (using the shift-invert strategy) and N=12,450 1st loop 2nd loop 3rd loop the FEAST algorithm presented in Figure 2 for solving M=100 3.0∗10−6 2.9∗10−12 1.0∗10−15 the Test-CNT example seeking up to M = 800 (low- M=200 1.8∗10−5 4.8∗10−12 2.1∗10−14 M=400 2.4∗10−8 3.2∗10−16 est) eigenpairs. The inner linear systems in ARPACK M=800 1.8∗10−9 4.3∗10−16 andFEASTaresolvedusingtheshared-memoryparallel direct solver PARDISO [15]. It should be noted, how- ever,thatFEASTbenefitsmorethanARPACKfromthe TABLE II: Variation of the relative error on the trace of PARDISO solver, as the inner linear systems have mul- theeigenvaluesPmλm fordifferentsearch intervalswiththe tiple right-hand sides. Although both algorithms could number of iterative refinement loops. The convergence cri- benefit from a parallel distributed implementation (e.g. teria is set to 10−13 where the final relative residual on the using the P ARPACK package), the simulation runs are eigenpairs is reported in Table I. here restricted to a given node of a 8-cores Intel Clover- town system (16Gb,2.66GHz) where the linear systems The simulation results in Table I demonstrate very in FEAST are factorized and solved one after another. good scalability for FEAST while the search interval The performancesof ARPACK andFEAST can also de- keeps increasing but the number of contour points Ne pendonfinetuningsparameterssuchasthechoicesofthe stays identical (i.e. the number of numerical operations size of the subspace M0 (M0 = 1.5M here for both algo- stays the same for a given loop of FEAST with a fixed rithms),theinnersystemssolvers,thenumberofcontour Ne = 8 linear systems to solve). In addition, from Ta- pointsNe forFEAST,orthestoppingcriteriaforobtain- ble III, one can see how the robustness of the FEAST ing the residual. The simulation results in this section algorithmis affected while the number of contour points are then not intended to compare quantitatively the two Ne changes. In particular, Ne = 4 points along the con- solvers but rather to point out the potentialities of the tour did suffice to capture M = 100 eigenpairs with a FEAST algorithm. relatively small residual (decreasing the simulation time Inourexperiments,theconvergencecriteriaontherel- reportedinTable I for this case),while the caseNe =16 ative residual for FEAST is obtained when the relative pointsgeneratedaresidualsmallerthantheoneobtained erroronthetraceoftheeigenvalues mλm inthesearch by ARPACK (using M0 =1.5M). interval is smaller or equal to 10−13P. Table II shows the variationofthe relativeerroronthe tracewiththe num- ber of outer-iterative refinement for FEAST. These re- B. Example II sults demonstrate that only 2 to 3 refinement loops are necessary to obtain the small relative residuals for the In another set of numerical experiments, we intend to different cases reported in Table I. It should be noted demonstrate the robustness of FEAST in capturing the 5 C. Example III TEST-CNT FEAST M=100 Time(s) Resid. # loops Ne=4 7.0 8.3∗10−8 6 We have shown that FEAST can re-use the computed Ne=8 7.8 4.5∗10−10 4 subspace as suitable initial guess for performing itera- Ne=16 10.2 3.4∗10−12 3 tive refinements. This capability can also be of benefit to modern applications in science and engineering where it is often necessary to solve a series of eigenvalue prob- TABLEIII:PerformanceresultsobtainedbyFEASTseeking lems that are close one another. In bandstructure cal- M = 100 eigenpairs for different values of Ne. The conver- culations,inparticular,manyeigenvalueproblemsofthe gence is obtained when the error on the trace is equal or form(A+S )x =λ Bx needtobesolvedatdifferent smaller to 10−13. k k k k locationsinthe k-space(i.e. fordifferentvaluesofkand where S is Hermitian with S = I). Let us consider the 0 eigenvaluesparsesystemofsizeN=492,982obtainedfor multiplicity of the eigenvalues. We propose to create ar- a (5,5)metallic CNT using our in-houseDFT/real-space tificially new TEST-CNT systems called k(N,M) where meshtechniqueframeworkforbandstructurecalculations the matrices A and B are repeated k times along the ofnanowires-typestructure[16]. InFigure3,wepropose main diagonal (the new system matrix is block diagonal to solve this eigenvalue problem using the same search with k blocks). Physically, these systems can describe interval for the eigenvalues λ for different locations of k thecrosssectionofabundlecomposedbykCNTs,where where the subspace computed by FEAST at the point wedo notconsiderthe interactionsbetweenthe different k−1 is successively used as initial guess for the neigh- tubes such that each eigenvalue is now k times degener- boring point k. In addition, the inner linear systems in ate. If we keep the same search interval used to obtain M=100 eigenpairs for k =1 (where the size of the ma- 0 tricesAandBisN),100keigenpairsmustnowbefound -2 fork ≥1,whereeachoneofthemhavethemultiplicityk. In Table IV, we report the simulation times and relative -4 V) rke(sNid,uMa)lsToEbtSaTin-CedNuTsisnygstAeRmPs.ACFoKr athnedcFaEseA8S(TN,oMn)t,hefoser E (e-6 -8 example, the size of the new system matrix is 99,600 and the first 100 eigenvalues have all the multiplicity 8 -10 (so800eigenpairsarefoundintotal). Thesimulationre- -12 stAhuReltPssAysshCtoeKwmwlianhneedarerthstcheaenlanubumimlbitbeyerrpooefrffemoigrametnrapinxac-iversse.cwtIointrhcmothunelttrisapiszltiectaoo-f Eigenvalues12114068 11124680 # 12 12 tionsandlinearsystemsolveswouldkeepincreasingwith p 4 4 k,thenumberofoperationsinFEASTstaysthesamefor T loo 3 3 AS 2 2 all these cases. The scalability of the algorithm depends FE 1 1 # thenmainlyonthescalabilityofthelinearsystemsolver. 0 Γ k X0 FIG. 3: Bandstructure calculations of a (5,5) metallic CNT. The eigenvalue problems are solved successively for all the k points (from Γ to X), while the computed subspace of size TEST-CNT ARPACK FEAST M0 = 25 at the point k is used as initial guess for the point N=12,450 k+1. Thenumberofeigenvaluesfoundrangesfrom13to20, M=100 Time(s) Resid. Time(s) Resid. andbythethirdkpoint,theFEASTconvergenceisobtained (N,M) 12.2 2.0∗10−11 7.8 4.5∗10−10 using only one refinement loop. The convergence is obtained 2(N,M) 85 3.5∗10−11 27 7.7∗10−10 with the relative error on trace of the eigenvalues smaller or 4(N,M) 668 4.6∗10−11 109 8.8∗10−10 equal to 10−8, while the inner linear systems are solved us- 8(N,M) 5492 6.2∗10−11 523 6.5∗10−10 irneglatainveitreersaidtiuvaelsmoenthtohdeweiigtehnpanairasccruanragceyfroofm101−03−.3Tthoe10fi−n5a.l FEASTaresolvedusinganiterativemethodwithprecon- TABLE IV: Simulations times and relative residual ditioner where a modest relative residual of 10−3 is used maxi(||Axi − λiBxi||1/||Axi||1), obtained by the solver (e.g. a suitable banded preconditioner can be obtained ARPACK and FEAST on the k(N,M) TEST-CNT systems using a mode approach[9]). It should be noted that the which artificially reproduce k times the original TEST-CNT convergence criteria for the relative error on the trace system. The kM (lowest) eigenpairs are found where each of the eigenvalues is chosen much smaller at 10−8, while eigenvalue has a multiplicity of k. the eigenvectors are expected to be obtained within the same (or a smaller) order of accuracy that the one used 6 for the solutions of the inner systems. Figure 3 shows relative residual and obtaining the eigenvectors solution that 13 to 20 eigenvalues (i.e. energies)arefound within within the same order of accuracy. The resulting sub- the selected search interval along the different k points spacecouldalsobeusedasaverygoodinitialguessfora (from the Γ to the X point in the graph). Although the one step more accurate refinement procedure (i.e. using size of the subspace stays identical at M0 =25,after the more accurate relative residual for the inner systems). firstinitialpointatk=0(Γpointinthegraph)FEAST FEAST exhibits important potentialities for paral- converges within only one refinement loop for almost all lelism at three different levels: (i) many search interval the other k points. [λmin,λmax] can be run independently, (ii) each linear systems can be solvedsimultaneously(e.g. oneachnode ofparallelarchitecturewherethe factorizationofthelin- V. DISCUSSIONS ear system can be done only once for all the refinement loops), (iii) the linear system solver can be parallel (e.g. IncomparisontoiterativeKrylovsubspacetechniques, within a given node as in Section IV). Depending on FEAST can be cast as a ”direct” technique which is the parallel architecture at hand, the local memory of based on an exact mathematical derivation (3). FEAST a given node and the properties of the matrices of the does naturally then capture all the multiplicity and no- eigenvalue problems, one may preferably select one par- orthogonalizationprocedureis necessary(suchas Gram- allel option among the others, or just take advantage of Schmidtorthogonalizationprocess). Asdescribedabove, a combination of those. In particular, there will be a the main computationaltasks in FEAST consist of solv- trade off between how many search intervals to consider ing Ne independent linear systems along the contour and how many eigenpairs FEAST can handle by inter- with M0 right-hand-sides and a reduced dense general- vals. Forexample if M0 is morethan few thousands, one ized eigenvalue problem of size M0. Since FEAST has couldeither (i)solvethe obtainedreducedsystemofsize the ability to re-use the basis from the previously com- M0 using efficient dense parallel symmetric eigenvalue puted subspace, an outer-iterative refinement procedure solvers [18], or (ii) propose to divide the initial search isproposedtoimprovetheaccuracyofthesolutions. The interval into two or more to be processed in parallel. In capability to take advantage of suitable initial guess can addition, it shouldbe noted that the orthonormalization also be of benefit to modern applications in sciences and step is absentfrom FEAST whichwill drastically reduce engineering where it is often necessary to solve a series the communicationoverheadfor performing scalarprod- of eigenvalue problems that are close one another (e.g. ucts on high-end parallel architectures (the scalar prod- Bandstructure calculations in Example III Section IV). uct in step-3 in Fig. 2 has to be done only once per In one sense, the difficulty of solving an eigenvalue iterative refinement). Given the recent advances in par- problem has been replaced by the difficulty of solving allel architectures and parallel linear system solvers, it a linear system with multiple right-handsides. For large is reasonable to envision using FEAST in a near future sparse systems, this latter can be solved using either a for obtaining up to millions of eigenpairs of large sparse direct systemsolversuchas PARDISO [15] (as proposed symmetric eigenvalue problems. Finally the capabilities inSectionIV),oraniterativesystemsolverwithprecon- of FEAST could potentially be enhanced for addressing ditioner. In turn, for banded systems or banded precon- non-symmetric eigenvalue problems where the contour ditioner, FEAST can be seen as an outer-layer for the integrationwouldthen be performedin a givenregionof author’s SPIKE parallel system solver [17]. It should be the complex space. noted that the inner linear systems arising from stan- dard eigenvalue solvers (using the shift-strategy), need often to be solved highly accurately via direct methods. Acknowledgments Direct system solvers, however, are not always suited foraddressinglarge-scalemodernapplicationsbecauseof The author wishes to acknowledge helpful discussions memory requirements. In Example III of Section IV we with Dr. Ahmed Sameh and Dr. Massimo Fischetti. have shown that FEAST can take advantage of iterative This material is supported by NSF under Grant #CCF- solvers for solving the inner linear systems with modest 0635196. [1] G. Golub and H. A. van der Vorst, J. of Comput. and [4] Y. Saad, Y. Zhou, C. Bekas, M. L. Tiago, and J. Che- Appl.Math. 123, 35 (2000). likowsky, physica status solidi (b) 243, 2188 (2006). [2] Z.Bai,J.Demmel,J.Dongarra,A.Ruhe,andH.vander [5] Y.Zhou,Y.Saad,M.L.Tiago,andJ.Chelikowsky,Phys. Vorst,TemplatesforthesolutionofAlgebraicEigenvalue Rev. E. 74, 066704 (2006). Problems: A Practical Guide (Society for Industrial and [6] R.Zeller,J.Deutz,andP.Dederichs,Solidstatecommu- Applied Mathematics, Philadelphia, PA, 2000). nications 44, 993 (1982). [3] T. Torsti, M. Heiskanen,M. J.Puska, and R.M.Niemi- [7] J.Taylor,H.Guo,andJ.Wang,Phys.Rev.B.63,245407 nen,Int.J. QuantumChem 91, 171 (2003). (2001). 7 [8] M. Brandbyge, J.-L. Mozos, P. Ordejon, J. Taylor, and [14] R.B.Lehoucq,D.C.Sorensen,andC.Yang., ARPACK K.Stokbro, Phys.Rev.B. 65, 165401 (2002). Users Guide: Solution of Large Scale Eigenvalue Prob- [9] D.ZhangandE.Polizzi,J.Comput.Elec.7,427(2008). lems with ImplicitlyRestarted Arnoldi Methods. (Society [10] D. Zhang and E. Polizzi, 2008 NSTI Nanotechnology for Industrial and Applied Mathematics, Philadelphia, Conference and Trade Show. Technical Proceedings. 1, PA, 1998). 12 (2008). [15] O.SchenkandK.G¨artner,JournalofFutureGeneration [11] T. Sakurai and H. Sugiura, J. of Comput. and Appl. Computer Systems 20, 475 (2004). Math. 159, 119 (2003). [16] D. Zhang and E. Polizzi, in preparation (2009). [12] T. Sakurai, Y. Kodaki, H. Tadano, D. Takahashi, [17] E. Polizzi and A. Sameh, Parallel Computing 32, 177 M. Sato, and U. Nagashima, Future Generation Com- (2006). puterSystems24, 613 (2008). [18] L. Blackford, J. Choi, A.Cleary, E. D’Azevedo,J. Dem- [13] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dem- mel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, mel, J. D. C. J. Dongarra, A. Greenbaum, S. Hammar- A. Petitet, et al., ScaLAPACK Users Guide (Society for ling, A. McKenney, and D. Sorensen, LAPACK Users Industrial and Applied Mathematics, Philadelphia, PA, Guide (Society for Industrial and Applied Mathematics, 1997). Philadelphia, PA,1999), third ed.ed.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.