Table Of Content

Improving Polynomial-filtered Hybrid Monte Carlo With Hasenbusch 7 1 0 2 b TaylorHaar∗ e F CSSM,DepartmentofPhysics,TheUniversityofAdelaide,Adelaide,SA,Australia5005 E-mail: [email protected] 1 WaseemKamleh ] t CSSM,DepartmentofPhysics,TheUniversityofAdelaide,Adelaide,SA,Australia5005 a l E-mail: [email protected] - p e JamesZanotti h CSSM,DepartmentofPhysics,TheUniversityofAdelaide,Adelaide,SA,Australia5005 [ E-mail: [email protected] 1 v YoshifumiNakamura 4 RIKENAdvancedInstituteforComputationalScience,Kobe,Hyogo650-0047,Japan 2 E-mail: [email protected] 1 0 0 The predominant method for generating Lattice QCD configurations is Hybrid Monte Carlo . 2 (HMC). In order to speed up this generation, a wide range of preconditioning techniques that 0 7 modifythe lattice action have been devised. This work comparesthe performanceof the well- 1 knownHasenbuschpreconditioningtechniquewiththepolynomialfilteringtechniqueonasmall : v 163×32latticewithtwoflavoursofWilsonfermionsatapionmassMp ∼400MeV.Weexplore i X anovelmethodofcombiningpolynomialandHasenbuschfilters,revealingaspeedupwhencom- r a paredtothestandardtwoHasenbuschfilters. Thiscomeswiththeaddedadvantageofsimplified tuning. The26thInternationalNuclearPhysicsConference 11-16September,2016 Adelaide,Australia ∗ Speaker. (cid:13)c Copyrightownedbytheauthor(s)underthetermsoftheCreativeCommons Attribution-NonCommercial-NoDerivatives4.0InternationalLicense(CCBY-NC-ND4.0). http://pos.sissa.it/ ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar 1. Introduction andMotivation Lattice QCDisthe non-perturbative method ofchoice whendealing withstrong interactions. Inordertomeasureobservablesonthelattice,wenumericallyevaluatepathintegralsviaanensem- ble average over alarge number of configurations, each of which are described bythe state of the gauge fieldU and fermion field y . In order to generate these configurations, we most commonly use Hybrid Monte Carlo (HMC), which involves repeated inversions of the Dirac matrix M that describes the strong force between fermions at any two lattice sites. The large number of inver- sionsrequiredandthesizeofthematrixinvolved meanthatHMCisverycomputational intensive, makingitdifficulttosimulatenearphysicalquarkmasses. AlargevarietyofoptimizationtechniqueshavebeenappliedtoHMC.Thesetechniquesreduce the number of matrix inversions required or improve the condition number of the fermion matrix such that the overall cost is reduced. In our work [1], we examine mass preconditioning [2] and polynomial filtering [3] on a 163×32 lattice to compare their performance, before investigating a more novel technique where these two methods are combined. In section 2, we explain the formulation of both methods before the initial comparison in section 3. Then in section 4 we investigate theperformance benefitsofusingpolynomial filtersontopofmasspreconditioners. 2. Theory Thestandard method forgenerating lattice configurations withdynamical fermions isHybrid MonteCarlo(HMC).Thestarting pointisthelatticeaction S=S [U]+S [U,y ,y ], (2.1) G F consisting of a gluonic part S that depends purely on the gauge fieldsU and a fermionic part S G F that also depends onthefermion fieldsy ,y . Wewishtofindconfigurations (U,y ,y )distributed accordingtoexp(−S[U,y ,y ]). Todothis,wefirstuseWick’stheoremtoconvertthefermionsfield y tobosonicpseudofermionfieldsf tomakethemcomputationallyfriendly,modifyingthefermion action in the process. We then need to sample f from the distribution exp[−S (U,f ,f †)], which F canbedoneinpractisebyrelatingtoGaussiannoisevectorsc distributedaccordingtoexp[−c †c ]. To generate correctly distributed gauge fields U, we introduce a fictitious conjugate momentum field P distributed according to exp[−(cid:229) tr[P2]], then produce configurations which preserve the Hamiltonian H[U,f ,f †]=(cid:229) tr[P2]+S[U,f ,f †] (2.2) byusingHamilton’sequations, leadingtointegration steps Tˆ[e ]:(P,U)→(P,eiePU) (2.3a) and Sˆ[e ]:(P,U)→(P−e F,U), (2.3b) whereF = ¶ S istheforceterm. Wecombineasequence ofthesestepsintoatrajectory totakean ¶ U ′ ′ existing configuration (P,U)andproduce anewstate(P,U ),suchthatwegettoanewcandidate configuration (U′,f ′). This candidate undergoes a Metropolis acceptance step to ensure detailed balance andhencethattheprocess willconverge tothetargetdistribution exp[−S]. 1 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar Themaincostarises from thefermionic forceterm, whichinthetwo-flavour degenerate case S =f †(M†M)−1f takestheform F ¶ S ¶ ¶ M†M F = [f †(M†M)−1f ]=−f †(M†M)−1 (M†M)−1f . (2.4) ¶ U ¶ U ¶ U The issue here is the inversion of the fermion matrix K ≡M†M. It is a very large, sparse matrix, so it takes iterative solvers (e.g. conjugate gradient) many matrix multiplications to invert. It can also give rise to large force terms, which then necessitates a reduction in the step-size to keep a good Metropolis acceptance rateand thus increases the number ofrequired inversions. Therefore, modern lattice simulations use a variety of HMC improvements to reduce the force terms (thus reducing thenumberofrequired inversions) ortomakethematrixK easiertoinvert. A template for achieving a reduction in computational cost was proposed by [4], which con- siderssplitting theactionintotwoterms S=S +S , (2.5) UV IR such that S captures the high-energy modes (∼ large forces) of the action whilst S captures UV IR the low-energy modes (∼ small forces), and F is relatively cheap to calculate compared to F . UV IR Suchtermscanthenbeplacedondifferenttime-scalesusingamultipletime-scaleintegrator,which allowstheexpensiveF tobeevaluatedlessoftenandhenceimprovethecost. IR One of the standard HMC improvement techniques is mass preconditioning (MP) [2], where wefactorize thefermionactionS intotwotermsasfollows: F S =f †J−1f +f †JK−1f . (2.6) MP 1 1 2 2 Here, J is a fermion matrix just like K, but with a modified mass parameter k ′ <k giving rise to a‘heavier’ quark mass. Thefirstaction term S captures thehigh-energy modes buthasacheaper 1 forcetermthanS ,sowecanusemultipletime-scalestoreducetheoverallcost. 2 We compare this technique to polynomial filtering (PF) [3], whereby the fermion action is filteredvia S =f †P(K)f +f †[P(K)K]−1, (2.7) PF 1 1 2 where P(K)isalow-order polynomial approximating K−1. In ourwork, weuse Chebyshev poly- nomialssuchthattheonlyparametertotuneisthepolynomial order p. Thepolynomial filterterm S has a very cheap force term due to a lack of matrix inversions, and it captures the high-energy 1 modesofthesystembyvirtueofP(K)approximating theinverse. Hence,wecanputS andS on 1 2 separate time-scales toachieveacostreduction. 3. Comparisonoffiltering methods 3.1 Setup Ourinitialstudycomparesthecomputationalcostofpolynomialfilteringagainstmassprecon- ditioning. Weusean =2,163×32lattice withaWilson fermion action withhopping parameter f k =0.15825, and even-odd preconditioning. This lattice has a pion mass of ∼400 MeV and a latticespacing of∼0.08fm. 2 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar Themetricweuseforthecostis N mat C= (3.1) P acc whereN isthenumberoffermionmatrixKmultiplicationspertrajectoryandP istheMetropo- mat acc lisacceptance rate. Weaveragethisquantity overatleast2000trajectories foreachdisplayed data pointinthegraphsthatfollow. Asnoted in equation (2.5), wesplit the different action terms onto different integration time- scales. Bywayofexample,thesinglepolynomial filteraction(1PF)takestheform S=S +f †P(K)f +f †[P(K)K]−1f , (3.2) G 1 1 2 2 and we set the step-sizes h =h <h <h to reflect the relative cost and frequency scales. The 0 G 1 2 corresponding number of steps n at each scale are given by the relation t =hn where t is the i i i trajectory length;wehavefixedt =1asisstandard. Weuseageneralized integrationscheme(see section3.2)thatgivesaveryflexiblechoiceofstep-sizes,andthe‘balancedforces’methodtotune thesestep-sizes. Seethepaper[1]forfurtherdetails. 3.2 Generalizedmulti-scaleintegration To create multiple time-scales in a HMC integration, one typically uses a nested integration scheme. Finer integration scales arebuilt bysubstituting integration schemes for thetimeupdates intheoriginalintegration scheme. Forexample,consider thespace-time-space leapfrog integrator I [t ]= Sˆ h Tˆ[h]Sˆ h n, (3.3) LPF (cid:0) (cid:2)2(cid:3) (cid:2)2(cid:3)(cid:1) where n=t /h and Tˆ,Sˆ are as defined in (2.3a), (2.3b). To add a nested integration scale to this scheme, wecanreplacethetimeupdatesTˆ[h]withmleapfrog stepsinthesecondtermS : 2 I [h]= Sˆ h Tˆ[h]Sˆ h m, (3.4) 2 (cid:0) 2(cid:2)2(cid:3) 2(cid:2)2(cid:3)(cid:1) n s.t. I [t ]= Sˆ h1 I [h ]Sˆ h1 . (3.5) nested (cid:16) 1h2i 2 1 1h2i(cid:17) This ensures that S is integrated with step-size h =t /n and S is integrated with finer step-size 1 1 2 h =h /m. However, nested schemes force the step-sizes of each scale to evenly divide those on 2 1 eachcoarserscale. Wehavedevisedangeneralizedschemeforconstructingmultipleintegrationtime-scaleswith- outanyrelativestep-size restriction. Thebasicideaistonotethatwhenweintegrate someHamil- tonianH=T+S=T+S +S +...,wealwayshavejustonekindoftimestepTˆ[h]thatisapplied 1 2 in a uniform direction h> 0. It thus makes sense to parametrize the progress of time-steps via a time parameter t that increases monotonically from 0 to t . We can then combine integration schemes for Hamiltonians for each action term H =T+S into a generalized scheme for the full i i HamiltonianH=T+Sbyintegratingwithtimestepsfrom0tot ,insertingtheactionstepupdates Sˆ[h]attheirrespective‘times’t inthecompositeintegrators. Forexample,consideratwo-stepand i athree-step leapfrogintegrator forthetwoactionterms: I [t ]=Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t , (3.6a) 1 1(cid:2)4(cid:3) (cid:2)2(cid:3) 1(cid:2)2(cid:3) (cid:2)2(cid:3) 1(cid:2)4(cid:3) and I [t ]=Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t . (3.6b) 2 2(cid:2)6(cid:3) (cid:2)3(cid:3) 2(cid:2)3(cid:3) (cid:2)3(cid:3) 2(cid:2)3(cid:3) (cid:2)3(cid:3) 2(cid:2)6(cid:3) 3 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar We can combine these two schemes via the generalized scheme by overlaying the space updates basedontheirposition in‘time’. Thisproduces theintegrator I[t ]=Sˆ t Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t Tˆ t Sˆ t Sˆ t (3.7) 1(cid:2)4(cid:3) 2(cid:2)6(cid:3) (cid:2)3(cid:3) 2(cid:2)3(cid:3) (cid:2)6(cid:3) 1(cid:2)2(cid:3) (cid:2)6(cid:3) 2(cid:2)3(cid:3) (cid:2)3(cid:3) 2(cid:2)6(cid:3) 1(cid:2)4(cid:3) Further details are given in [1], where we also prove that this method produces an integration scheme that meets therequirements for detailed balance, given thatthe composite integrators also doso. 3.3 Results Figure 1 shows the cost function C (3.1) for a single polynomial filter and for a single mass preconditioner. Asonecansee,themasspreconditionedactionperformsmuchbetter: acostofC= 43,800±3,500 atk ′=0.1545comparedwithC=87,500±7,400at p=10. Thissuggeststhata shortorderpolynomialoforder10can’tcaptureasmuchofthedynamicsasamasspreconditioner thatrequires80ormoreiterations toinvert. Polynomial Massprec. ·104 ·104 12 10 8 ost 6 C 4 2 0 4 10 20 54 45 55 55 56 1 5 1 5 1 0. 0.1 0. 0.1 0. p k ′ Figure1:Thesimulationcostfunction(3.1)for(singlefilter)polynomialfilteringandmasspreconditioning. We can improve the performance of both methods by using two filters instead of one. In the case of polynomial filtering, we take two polynomials P (K) and P (K) with orders p > p that 1 2 2 1 factorize intoapolynomialQ(K)=P (K)/P (K),suchthatwecanuseaction 2 1 S =f †P (K)f +f †Q(K)f +f †[P (K)K]−1f (3.8) 2PF 1 1 1 2 2 3 2 3 and thus keep the cost of evaluating the intermediate force F low. In the case of mass precondi- 2 tioning, weusetwomodifiedmassparametersk <k <k ,andaction 1 2 S =f †J−1f +f †J J−1f +f †J K−1f . (3.9) 2MF 1 1 1 2 1 2 2 3 2 3 To keep the parameter space manageable, we fixed the cheapest term S in both cases for our 1 analysis –namely, p =4andk =0.145. 1 1 Figure 2 shows the cost of these two methods side by side. In both cases we see improved results overtheone-filter case. Twopolynomial filtersreduce thecostapproximately asmuchasa single mass preconditioner: a cost ofC =47,700±3,700 at p =54. Two mass preconditioners 2 reducethecostfurther, toC=31,100±2,200 atk =0.1555. 2 4 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar 2PF 2MP ·104 ·104 6 4 st o C 2 0 24 34 54 54 45 55 55 56 65 1 5 1 5 1 5 0. 0.1 0. 0.1 0. 0.1 p k 2 2 Figure 2: The cost function (3.1) for 2 filter actions, comparing polynomial filtering (2PF) and mass- preconditioning(2MP). 4. Combined filters 4.1 Combiningpolynomialfilteringandmass-preconditioning Itisinformativetocomparethetwotechniques’relativeefficiency. Polynomialfilteringworks betteratcapturingthehigh-frequency(energy)modesofthesystem,becausewecanthenusealow polynomial order. To capture lower frequency modes, we have to increase the polynomial order and the cost to evaluate the force terms increases dramatically. On the other hand, Hasenbusch filtering is best suited to small changes D k =k ′−k in the quark mass, because this implies that JK−1≈I andhencethecorrectiontermiseasiertoinvert. However,thefiltertermf †J−1f would 1 1 thenhaveamoreexpensiveforcetermduetothelightermass. To combine the benefits of both of these methods, we place a polynomial filter on top of a masspreconditioner, resulting intheaction SPF−MP=f 1†P(J)f 1+f 2†[P(J)J]−1f 2+f 3†JK−1f 3 (4.1) whereJ=K(k ′),k ′>k andP(J)isapolynomial. Theideahereisthatthehighfrequencymodes arecapturedbythecheappolynomialP(J),suchthattherelativemassoftheHasenbuschcorrection termD k canbekeptsmall. We tested this on our lattice with the polynomial P(K) fixed to order p=4 and k ′ varying. Thecostfunctionforvariousk ′isshowninFigure3inthesecondcolumn,whichcanbecompared to the 2MP action in the first column. As the graph shows, the cost for PF-MP is very similar to thatof2MP. 4.2 3-filteractions Note that in the cost function for 2MP and PF-MP,the cost increases significantly depending onthechoiceoftheintermediatetermparameterk /k ′,soadegreeoftuningisrequiredtoachieve 2 aminimum. Thisisnotacheapprocedure,asweneedatleast500trajectoriestogetadecenthandle on the acceptance rate P for each set of parameters and hence the cost. This is complicated by acc 5 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar the fact that the Hasenbusch parameter k /k ′ is a real parameter whose optimal value is heavily 2 dependent on the quark mass k , so we’d have to tune it again for different quark masses. This makes it difficult to optimize an action with three mass preconditioners effectively, because there aretoomanyparameters k ,k ,k whichrequirefinetuning. 1 2 3 Onthe other hand, polynomial filtering only depends on a single integral parameter p once a classofpolynomialsischosen,andthesamepolynomialordershouldfilteroutsimilarproportions of the action regardless of k . Hence, a polynomial filtering term requires very little to no tuning once a good set of polynomials are found. We can thus add more polynomial filters to our action withoutincreasing thetimeittakestotune. Weconsider onepolynomial filterontopoftwomasspreconditioners (1PF-2MP) S1PF−2MP=f 1†P(J1)f 1+f 2†[P(J1)J1]−1f 2+f 3†J1J2−1f 3+f 4†J2,K−1f 4 (4.2) wherewefix p=4,k =0.145;andtwopolynomial filtersontopofasinglemasspreconditioner 1 (2PF-1MP) S2PF−1MP =f 1†P(J)f 1++f 2†Q(J)f 2+f 3†[P2(J)J]−1f 3+f 4†JK−1f 4, (4.3) where we fix p=4, q=20. The cost function for these two actions are shown in columns three andfourrespectively inFigure3. 2MP 1PF-1MP 1PF-2MP 2PF-1MP ·104 6 5 4 ost 3 C 2 1 0 4 5 5 6 5 4 5 5 6 5 3 4 5 5 3 4 5 5 5 5 5 5 6 5 5 5 5 6 5 5 5 6 5 5 5 6 1 1 5 1 5 1 1 5 1 5 1 1 5 5 1 1 5 5 0. 0.0.1 0.0.1 0. 0.0.1 0.0.1 0. 0. 0.1 0.1 0. 0. 0.1 0.1 k k ′ k k ′ 2 2 Figure 3: The cost function(3.1), comparingmass preconditioning(2MP) with polynomialfiltered mass preconditioning(1PF-1MP,2PF-1MP,1PF-2MP) AscanbeseeninFigure3,theoptimal costforthetwothree-filter actions areverysimilarto thoseforPF-MPand2MP.Theextrapolynomialfilterin1PF-2MPontopof2MPhasnoeffecton the performance. However, for 2PF-1MP the cost has a much lower dependence on the choice of the Hasenbusch parameter k , as can be seen at the k ′ =0.1565 data point. Since the polynomial 6 ImprovingPolynomial-filteredHybridMonteCarloWithHasenbusch TaylorHaar filterorders(p,q)don’trequiremuchtuning,thisindicatesthatthe2PF-1MPactiondoesn’trequire much tuning to reach close-to-optimum computational costs. In fact, it requires even less tuning than2MP. 5. Conclusionand Outlook We have compared the performance of polynomial filtering and mass preconditioning on a modest 163×32lattice before combining thetwotechniques totrytoachieve lowercosts. Whilst wedidn’tobserveanysignificantimprovementsincostoverthe‘standard’ 2MPalgorithm, wedid findthat 2PF-1MPcostmuchless totune duetothecost’s lowdependence onk ′ and thefact that the polynomial filters don’t require very much tuning. It remains to be seen if this behaviour still holdsforlargerlatticesandforsmallerquarkmasses. References [1] T.Haar,W.Kamleh,J.ZanottiandY.Nakamura,Applyingpolynomialfilteringtomasspreconditioned HybridMonteCarlo,1609.02652. [2] M.HasenbuschandK.Jansen,SpeedinguptheHMC:QCDwithcloverimprovedWilsonfermions, Nucl.Phys.Proc.Suppl.119(2003)982–984,[hep-lat/0210036]. [3] W.KamlehandM.Peardon,PolynomialFilteredHMC:AnAlgorithmforlatticeQCDwithdynamical quarks,Comput.Phys.Commun.183(2012)1993–2000,[1106.5625]. [4] TRINLAT COLLABORATIONcollaboration,M.J.PeardonandJ.Sexton,Multiplemoleculardynamics timescalesinhybridMonteCarlofermionsimulations,Nucl.Phys.Proc.Suppl.119(2003)985–987, [hep-lat/0209037]. 7