Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation A.B.Duncan SchoolofMathematicalandPhysicalSciences UniversityofSussex Falmer and G.A.Pavliotis DepartmentofMathematics 7 ImperialCollegeLondon 1 LondonSW72AZ,UK 0 2 and n K.C.Zygalakis a SchoolofMathematics J UniversityofEdinburgh 6 1 Edinburgh,EH93FD,UK ] January17,2017 E M t. Abstract a t Foragiventargetdensityπ : Rd (cid:55)→ R, thereexistaninfinitenumberofdiffusionprocesseswhichhave s uniqueinvariantdensityπ. Asobservedinanumberofpapers[7,37,38]samplersbasedonnonreversibledif- [ fusionprocessescansignificantlyoutperformtheirreversiblecounterpartsbothintermsofasymptoticvariance 1 andrateofconvergencetoequilibrium. Inthispaper,wetakeadvantageofthisinordertoconstructefficient v samplingalgorithmsbasedontheLie-Trotterdecompositionofanonreversiblediffusionprocessintoreversible 7 andnonreversiblecomponents. Weshowthatsamplersbasedonthisschemecansignificantlyoutperformstan- 4 dardMCMCmethods, atthecostofintroducingsomecontrolledbias. Inparticular, weprovethatnumerical 2 integrators constructed according to this decomposition are geometrically ergodic and characterize fully their 4 asymptoticbiasandvariance, showingthatthesamplerinheritsthegoodmixingpropertiesoftheunderlying 0 . nonreversible diffusion. This is illustrated further with a number of numerical examples ranging from highly 1 correlatedlowdimensionaldistributions,tologisticregressionproblemsinhighdimensionsaswellasinference 0 forspatialmodelswithmanylatentvariables. 7 1 v: 1 Introduction i X Consider the problem of computing expectations with respect to a probability distribution with smooth density r a π(x),knownonlyuptoanormalizationconstant,i.e. wewishtoevaluate (cid:90) π(f)= f(x)π(x)dx. (1.1) Rd For high dimensional distributions, deterministic techniques are no longer tractable. On the other hand, proba- bilisticmethodsdonotsufferthesamecurseofdimensionalityandthusareoftenthemethodofchoice. Onesuch approachisMarkovChainMonteCarlo(MCMC)whichisbasedontheconstructionofaMarkovprocessonRd whoseuniqueinvariantdistributionisπ(x). Duetotheirsimplicityandwideapplicability,Markovchainsbased onMetropolis-Hastings(MH)transitionkernels[13,28]andtheirnumerousvariantsremainthemostwidelyused schemeforsamplingfromageneraltargetprobabilitydistribution,despitehavingbeenintroducedover60years ago. AsthereareinfinitelymanyMarkovprocesseswhichareergodicwithrespecttoagiventargetdistribution 1 π,anaturalquestioniswhetheraMarkovprocesscanbechosenwhichismoreefficient,intermsofconvergence toequilibriumandmixing. MetropolizedschemesarereversibleMarkovchainsbyconstruction,i.e. theysatisfy detailedbalance. Itisawelldocumentedfactthatnonreversiblechainsmightconvergencetoequilibriumfaster than reversible ones [34, 5, 32]. Various MCMC schemes have been proposed which are based on the general ideaofbreakingreversibilitybyintroducinganaugmentedtargetmeasureonanextendedstatespace,alongwith dynamicswhichareinvariantwithrespecttotheaugmentedtargetmeasure. Fordiscretestatespaces,thelifting method[5,15,47]isonesuchapproach,wheretheMarkovchainis“lifted”fromthestatespaceEtoE 1, 1 . ×{ − } ThetransitionprobabilitiesineachcopyofEaremodifiedtointroducetransitionsbetweenthecopiestopreserve theinvariantdistributionbutnowpromotethesamplertogeneratelongtrajectories. Forcontinuousstatespaces, analogous approaches involve augmenting the state space with a velocity/momentum variable and constructing Makoviandynamicswhichareabletomixmorerapidlyintheaugmentedstatespace. SuchmethodsincludeHy- bridMonteCarlo(HMC)methods,inspiredbyHamiltoniandynamics. WhilethestandardconstructionofHMC [6,35]isreversible,itisstraightforwardtoconstructdynamicsbasedontheGeneralizedHMCscheme[14]which willnotbereversible,seealso[36]andmorerecently[23]. Deferring issues of simulation until later, another candidate Markov process for sampling from π is the diffu- sion(X ) definedbythefollowingItôstochasticdifferentialequation(SDE): t t≥0 dX =b(X )dt+√2dW , (1.2) t t t whereW isastandardRd–valuedBrownianmotionandb:Rd Rdisasmoothvectorfieldwhichsatisfies t → b(x)= logπ(x)+γ(x), (π(x)γ(x))=0, (1.3) ∇ ∇· forsomesmoothvectorfieldγ onRd satisfyingsomemildassumptions(c.f. Proposition2.2). TheprocessX is t nonreversibleifandonlyifγ =0. BytheBirkhoffergodictheorem, (cid:54) 1 (cid:90) T lim f(X )ds=E [f], f L1(π), s π T→∞T 0 ∈ andthusonecanuse 1 (cid:90) T π (f):= f(X )ds T s T 0 asanestimatorforE [f],forT sufficientlylarge. Anaturalwaytomeasuretheefficiencyofsuchestimatoristhe π meansquareerror(MSE)givenby MSE(T):=Eπ (f) π(f)2. (1.4) T | − | UnderappropriateconditionsonX andf,theestimatorπ (f)willsatisfyacentrallimittheorem,i.e. t T lim √T (π (f) E [f])= (0,2σ2(f)), (1.5) T π T→+∞ − N whereσ2(f)istheasymptoticvarianceoftheestimatorπ (f)whichcanbeexpressedby T σ2(f):= φ,( )φ , (1.6) (cid:104) −L (cid:105)π where istheinfinitesimalgeneratorof(1.2)andφisthemeanzerosolutionofthefollowingPoissonequation onRd,L φ=f π(f). (1.7) −L − This relationship can be used to simplify the expression for the MSE (1.4) and decompose it in terms of bias µ (f)andvarianceσ2(f)asfollows T T Eπ (f) π(f)2 =(Eπ (f) π(f))2+E(π (f) Eπ (f))2 =(µ (f))2+σ2(f). | T − | T − T − T T T For large T, the variance satisfies σ2(f) T−1σ2(f), while µ (f)2 = o(T−1). Since γ(x) is not uniquely T (cid:39) T definedin(1.3),anaturalquestionishowitshouldbechosentoensurethatforagiventimeT,theMSEin(1.4) is as small as possible. This can be achieved in two manners, the first by maximising the L2(π)-spectral gap associatedwith(1.2)asstudiedin[20,48]andhenceincreasingthespeedwithwhichµ convergestozero. In T 2 general,maximisingtheL2(π)-spectralgapischallenging. Analternativeistochooseγ(x)insuchawaysoasto reducetheasymptoticvarianceσ2(f). Itshouldbeemphasisedthattheoptimalchoicewillbedifferentforeach case. In particular in [7, 37, 38], it was shown that the choice γ(x) = 0, which corresponds to using reversible dynamics,givesthemaximumvalueofasymptoticvarianceforagivenchoiceofdiffusiontensor. Inparticular, introducinganonreversibleperturbationwillneverdecreasetheperformanceofanestimatorbasedonLangevin dynamics,bothintermsofconvergencetoequilibriumandasymptoticvariance. In general (1.2) cannot be simulated exactly, and one typically resorts to a discretisation of the SDE, denoted byX(cid:98)∆t,inordertoapproximateπ(f). Inparticular,thefollowingergodicaverageisused n N 1 (cid:88) π(cid:98)T∆t(f):= N f(X(cid:98)k∆t), N∆t=T. (1.8) k=0 Extra caution has to be taken in order to ensure that the above quantity converges in the limit of T since → ∞ even if (1.2) is ergodic (or even exponentially ergodic), this will not necessarily be the case for its numerical discretisation[39,43,44]. Inaddition,evenwhenthenumericaldiscretizationisergodicandthus (cid:90) lim π (f)=π∆t(f)= f(x)π∆t(x)dx, (1.9) (cid:98)T (cid:98) (cid:98) T→∞ Rd itisnottrueingeneralthatπ∆t =π,sincetheunderlyingnumericaldiscretizationintroducesbiasintheestimation (cid:98) ofπ(f)(see[45,1,2]). OnewaytoeliminatesuchbiasisthroughMetropolization[42,46],i.e. theintroduction ofanaccept-rejectstepthatensuresthatthecorrespondingMarkovchainisergodicwithrespecttothetargetdis- tributionπ. However,suchbiaseliminationmightnotbeadvantageousinpracticesincetheMetropolisedchain willbereversiblebyconstruction,thuseliminatinganybenefitintroducedbythenonreversibleperturbationγ. When computing expectations of distributions with expensive likelihoods, it might be too costly to sample a longMarkovchaintrajectory. IfanappropriatenonreversibleLangevindynamics(1.2)canbeintroducedwhich doesgiverisetoadramaticreductioninasymptoticvariance,thenitmightbeadvantageoustopermitacontrolled amountofbiasinexchangeforneedingtosamplefareless.Thisbias-variancetradeoff,inthecontextofnumerical discretisations of (1.2) is the subject of study of this paper. In particular, we will consider discretizations based onaLie-Trottersplittingbetweenthereversibleandthenonreversiblepartofthedynamics. Morespecifically,we considerintegratorsoftheform X(cid:98)n∆+t1 =Θ∆t◦Φ∆t(X(cid:98)n∆t), (1.10) whereΦ (x)isaintegratorthatapproximatestheflowmapcorrespondingtothedeterministicdynamics ∆t dx t =γ(x ), (1.11) t dt andΘ (x)whichapproximatesthereversibledynamics ∆t dx = logπ(x )dt+√2dW . (1.12) t t t ∇ ThechoiceofΦ ,Θ hasafundamentalinfluenceonthebias,asymptoticvarianceandstabilityoftheresulting ∆t ∆t sampler. Inparticular,ifonechoosesΦ tobeaMetropolisedintegrator[3]then,similarlytotheresultin[2], ∆t the order of convergence of the deterministic integrator Φ provides a lower bound for the difference between ∆t expectations with respect to π∆t and π. However, this is not the case for the numerical asymptotic variance (cid:98) σ2 (f), since even though we can show that it is a perturbation of σ2(f) the difference will depend crucially (cid:98)∆t onthechoiceofΘ . Theseresultsareimportantastheyallowtochoosethecorrectcombinationofdynamics ∆t andnumericalschemethatdrasticallyreducesthecomputationalcostrequiredtoachieveagiventoleranceoferror. Insummary,themainofthecontributionsofthispaperare 1. proving geometric ergodicity for the Markov chain given by (1.10) for a variety of different numerical integratorsappliedtothereversiblepart; 2. acompletecharacterisationoftheasymptoticbiasof(1.10); 3 3. showingthat,bycompletelycharacterisingtheasymptoticvariance,numericalintegratorsofthetype(1.10) inherittheasymptoticvariancebenefitsofthenonreversibleSDE(1.2); 4. exhibiting the potential of using nonreversible integrators for sampling as illustrated from a number of differentnumericalexperimentsoninferenceforspatialmodelsaswellasrealdatasets. Therestofthepaperisorganisedasfollows. InSection2wedescribesomeknowntheoreticalresultsforthe SDE(1.2)whicharenecessaryforthedevelopmentofthispaper. InSection3weidentifitysufficientconditions to guarantee geometric ergodicity of the Lie-Trotter splitting scheme (1.10) on Rd. In Section 4 we study the asymptotic properties of a class of numerical integrators for (1.2) for which the Lie-Trotter scheme is a special case. Inparticularwederiveperturbativeexpansionsfortheasymptoticbiasandvariance. InSection5weapply these results to characterise the asymptotic bias and variance of the Lie-Trotter scheme on the bounded domain Td.InSection6,wefocusonthecasewherethetargetdistributionisGaussianandstudyanalyticallythetrade-off betweentheasymptoticbiasandasymptoticvarianceinthiscase. Todemonstratetheefficacyoftheseschemes, inSection7wepresentanumberofnumericalexperimentsoninferenceforspatialmodelsaswellasonBayesian logisticregression. ProofsofthemainresultsofthispaperaredeferredtoSection8aswellastheAppendices. Finally,adiscussionoftheresultspresentedinthispaperandpotentialfutureresearchdirectionscanbefoundin Section9. 2 Properties of Overdamped Langevin Diffusions Inthissectionwediscussdifferentknowntheoreticalresultsthatareusefulforunderstandingthemainresultsof thepaper. WestartbylistingtheassumptionsweshallmakeonπandtheSDE(1.2)toensureergodicity. Assumptions2.1. 1. The measure π possesses a positive smooth density π(x) > 0, known up to a normalizing constant, such that π L1(Rd). ∈ 2. Thedriftvectorb:Rd Rdof (1.2)issmoothandsatisfies → b(x)= logπ(x)+γ(x), (2.1) ∇ whereγ :Rd RdisasmoothvectorfieldwithcomponentsinL1(π)suchthat → (π(x)γ(x))=0. (2.2) ∇· Thefollowingresultprovidesnecessaryandsufficientconditionsonthecoefficientsof(1.2)toensurethatX t possessesauniquestationarydistributionπ. Proposition2.2. SupposethatAssumptions2.1hold. ThenthediffusionprocessX definedby(1.2)possessesa t stronglycontinuoussemigroup(P ) onL2(π)definedby t t≥0 P f(x)=E[f(X ) X =x]. (2.3) t t 0 | Theassociatedinfinitesimalgeneratorisananextensionof 1 = (π )+γ (2.4) L π∇· ∇· ·∇ withcoreC∞(Rd). Moreover,P hasuniqueinvariantdistributionπ. Conversely,givenadiffusionprocessofthe c t form(1.2)whichisinvariantwithrespecttoπ,thenthedriftbnecessarilysatisfies(2.1)and(2.2). Proof. The first part of this result is a direct application of [21, Thm 8.1.26]. The converse implication can be checkedusingintegrationbyparts. While many choices for γ are possible (see [22] for a more complete recipe) a natural family of vector fields is given by γ(x) = J Φ(π(x)), where Φ is a smooth function satisfying Φ(π()) L1(π) and J is d d ∇ ∇ · ∈ × skew-symmetricmatrix. Weshallfocusspecificallyonthefollowingthreechoices: 4 (cid:82) 1. Ifπsatisfies logπ(x)π(dx)< ,thenthevectorfield Rd|∇ | ∞ γ(x)=J logπ(x), J = J(cid:62), (2.5) ∇ − satisfiescondition(2.2). Thiswasthechoicewhichwasstudiedspecificallyin[7]. 2. If(cid:82) logπ(x)π1+α(dx) < forsomeα > 0thenanothernaturalchoiceforthevectorfieldisgiven Rd|∇ | ∞ by γ(x)=J πα(x), J = J(cid:62). (2.6) ∇ − Although(2.6)introducesanadditionaltuningparameterα,onemightpreferthischoiceasitcoincideswith theintuitionthatwhenfarawayfromthemodesthesamplershouldmovetowardsthemodesasquicklyas possible,andshouldonlyundergothesedeterministicmeandersinregionsofhighprobability. 3. LetΨ:R Rbeasmooth,compactlysupportedfunction. Then → γ(x)=J logπ(x)Ψ(π(x)), J = J(cid:62), andβ R, (2.7) ∇ − ∈ willalwayssatisfy(2.2). Moreover,ifπhascompactlevelsets,thenγwillalsobecompactlysupportedon Rd. Applyingtheresultsdetailedin[11,31],weshallassumethattheprocessX possessesaLyapunovfunction, t whichissufficienttoensuretheexponentialergodicityofX ,asdetailedinthesubsequentproposition. t Assumptions 2.3 (Foster–Lyapunov Criterion). There exists a function V : Rd R and constants c > 0 and b Rsuchthat → ∈ V(x) cV(x)+b1 , andV(x) 1, x Rd, (2.8) C L ≤− ≥ ∈ where1 istheindicatorfunctionoverapetiteset. C Forthedefinitionofapetitesetwereferthereaderto[30]. Forthegenerator correspondingtotheprocess L (1.2)compactsetsarealwayspetite.TheexponentialergodicityofX followsfromthefollowingproposition(see t also[24,30]). Proposition2.4. SupposethatAssumption2.3holds,thenthereexistconstantsC >0andλ>0suchthat: P f(x) π(f) CU(x)e−λt, x Rd, (2.9) t | − |≤ ∈ forallf satisfying f U. | |≤ Moreover,theFoster-LyapunovcriterionalsoprovidesasufficientconditionforthePoissonequation(1.7)to bewell-posed,andthusforthecentrallimittheorem(1.5)tohold. Proposition 2.5. Suppose that Assumption 2.3 holds and that π(U2) < , then for any function f such that ∞ f U, the central limit theorem (1.5) holds, i.e. √T(π (f) π(f)) converges weakly to a (0,σ2(f))– T | | ≤ − N distributedrandomvariable,with (cid:90) σ2(f)= φ(x)( )φ(x)π(x)dx, Rd −L whereφistheuniquemeanzerosolutiontothePoissonequation(1.7). Moreoverthesolutionφcanbeexpressed as (cid:90) ∞ φ= [P f π(f)] dt. t − 0 Thefollowinglemmaprovidesasufficientconditiononπ for(1.2)topossessaLyapunovfunction. Itisaslight generalisationofasimilarresultfrom[40],extendedtoapplyalsointhecaseofnonreversiblediffusionprocesses. Lemma2.6. [40,Theorem2.3]ConsidertheprocessX definedby(1.2)withdriftcoefficientbsatisfying(2.1). t Supposethatπisbounded,thereexists0<δ <1suchthat, liminf(cid:0)(1 δ) logπ(x)2+∆logπ(x)(cid:1)>0, (2.10) |x|→∞ − |∇ | andthevectorfieldγ satisfies γ(x)=0, x Rd, (2.11) ∇· ∈ thentheFoster–Lyapunovcriterionholdsfor(1.2)withU(x)=π−δ(x)andmoreoverπ(U)< . ∞ Remark2.7. Notethatwhenγ(x) = J Φ(π(x))equation(2.11)isautomaticallysatisfied. Hencethechoices ∇ ofchoicesofγ specifiedby(2.5),(2.6)and(2.7)allsatisfy(2.11). 5 3 Geometric ergodicity of the splitting scheme on Rd InthissectionweidentifysufficientconditionsunderwhichtheLie-TrotterschemeonRdisgeometricallyergodic withrespecttoaninvariantdistributionπ∆t whichwillbeaperturbationofπ. Ingeneral,adiscretizationofthe (cid:98) ergodicdiffusionprocess(1.2)neednottobeergodic,geometricorotherwise,see[40]. Forthesplittingscheme we shall show that provided the approximate nonreversible flow Φ is sufficiently weak away from the origin, ∆t theprocess(1.10)willinheritthegeometricergodicityfromthereversibledynamics. (cid:16) (cid:17) We follow the Meyn and Tweedie [30] recipe to demonstrate geometric ergodicity of X(cid:98)∆t . Consider n n∈N thereversibleprocessdefinedby Z∆t =Θ Z∆t, (3.1) n+1 ∆t n andP(cid:101)∆tbethecorrespondingtransitionsemigroup.WeshallassumethatthereversibledynamicsareaMetropolis- Hastingschain,withproposalkernelq ( x),morespecifically,givenx Rd,Θ (x)isconstructedasfollows ∆t ∆t ·| ∈ 1. Sampley q ( x). ∆t ∼ ·| 2. Withprobability (cid:18) (cid:19) π(y)q (xy) ∆t α(x,y)=min 1, | , π(x)q (y x) ∆t | setΘ x:=yotherwiseΘ x:=x. ∆t ∆t ItiswellknownthatthetargetdistributionπisinvariantunderthemapΘ [28,13]. ∆t Denote by P(cid:98)∆t(x, ) and P(cid:101)∆t(x, ) the transition distribution functions of the splitting scheme (1.10) and (3.1) · · respectively. Thenclearly P(cid:98)∆tf(x,A)=(P(cid:101)∆tf)(Φ∆t(x),A), A (Rd). ∈B Followingtheapproachof[26]wefirstshowthat(1.10)isaπ-irreducible,aperiodicMarkovchain. Moreover,we willshowthatallcompactsetsaresmall,i.e. foreverycompactsetC,thereexistsaδ >0andn>0suchthat P(cid:98)n (x, ) δν(), x C. ∆t · ≥ · ∈ Finally,wewillshowthatifaFoster-LyapunovconditionholdsforthereversibledynamicsP(cid:101)∆t,thenitalsoholds forP(cid:98)∆t. Tothisend,weshallmakethefollowingassumptions. Assumptions3.1. For∆tsufficientlysmall,weassumethat 1 Thereversiblechain(3.1)satisfiesaFoster-Lyapunovcondition,i.e. thereexistsacontinuousfunctionV 1,a compactsetC Rdandconstantsλ (0,1)andb 0suchthat ≥ ⊂ ∈ ≥ P(cid:101)∆tV(x) λV(x)+b1C(x), x Rd. (3.2) ≤ ∈ 2 ThenonreversibleflowmapΦ satisfiesthefollowingcondition, ∆t V(Φ (x)) V(x) 1 ∆t lim sup − < 1. (3.3) V(x) λ − |x|→∞ 3 ThepreimageΦ−1(C)isbounded. ∆t Themaintheoremofthissectionestablishesthegeometricergodicityof(1.10). Theorem 3.2. Suppose that Assumptions 3.1 hold, and that π and q (y x) are positive and continuous for all ∆t | x,y Rd. Thenfor∆tsufficientlysmall, theprocessX(cid:98)∆t isgeometricallyergodic, i.e. thereexistsρ (0,1) ∈ n ∈ andK >0suchthat (cid:12)(cid:90) (cid:12) sup (cid:12)(cid:12) g(y)(P∆t(x,y) π(y)) dy(cid:12)(cid:12) KV(x)ρn, n N. |g|≤V (cid:12) Rd − (cid:12)≤ ∈ 6 WenowfocusonthecasewhenthereversibledynamicsaresimulatedusingMALA(Metropolis-AdjustedLangevin Algorithm),i.e. usingaproposaloftheform q ( x) (x+ logπ(x)∆t,2∆t), (3.4) ∆t ·| ∼N ∇ forastepsize∆t>0. ThefollowingresultisanapplicationofTheorem3.2fortheproposal(3.4). Corollary 3.3 (Geometric Ergodicity of Lie-Trotter scheme with MALA dynamics). Consider the Lie-Trotter splitting scheme X(cid:98)∆t where the reversible dynamics (1.12) are simulated using a MALA scheme with proposal n definedby(3.4). Supposethattheconditionsonπandq specifiedin[40,Theorem4.1]holdandmoreoverthat ∆t lim (Φ (x) x)=0, (3.5) ∆t |x|→∞ | |−| | for∆tsufficientlysmall. ThenX(cid:98)∆tisgeometricallyergodic. n In particular, suppose that lim π(x) 0, and that, given α > 0, there exist positive constants α(cid:48), K |x|→∞ 1 → andK suchthat 2 πα(x) K πα(cid:48)(x), πα(x) K , x Rd, (3.6) |∇ |≤ 1 |∇∇ |max ≤ 2 ∈ where denotesthemaxnorm. Ifγ =J πα forJ antisymmetric,thencondition(3.5)willholdifΦ (x) max ∆t |·| ∇ issimulatedusinganexplicitEulerorRunge-Kuttascheme. Asimilarresultholdsforγ givenby(2.7). 4 Asymptotic Bias and Variance Estimates for general integrators Inthissectionweconsidertheasymptoticbehaviouroftheestimator(1.8)forπ(f),obtainedforageneralnumer- icalscheme(X(cid:98)k∆t)k≥0. Inparticular,weshallderiveestimatesfortheasymptoticbiasandasymptoticvarianceof theestimatorπ (f). ForsimplicityweshallfocusonthecasewherethedomainisTd,i.e. theunithypercube (cid:98)∆t withperiodicboundaryconditions. Asin[25]thisset-upgreatlysimplifiesthederivationofexpressionsforbias andvariance,particularlysinceremaindertermsarisingfromTaylorexpansionscanbeeasilycontrolled. Weex- pectthatextendingtheseresultstounboundeddomainsshouldbepossiblebyfollowinganalogousapproachesin [17]. Throughoutthissection,weshallassumethatthenumericalintegratorX(cid:98)∆tisergodic,withuniqueinvariant k distributionπ∆t. (cid:98) 4.1 Notation Wefirstintroducethenotationwhichwillbeusedinthissectionandtheremainderofthepaper. Givenaprobabil- itymeasureµon(Td, (Td))defineL2(µ)tobetheHilbertspaceofsquareintegrablefunctionsonTd,equipped B withinnerproduct(cid:104)·,·(cid:105)µandnorm(cid:107)·(cid:107)L2(π). ThesubspaceL20(µ)ofL2(µ)isdefinedtobe L2(µ)= f L2(µ) : µ(f)=0 , (4.1) 0 { ∈ } WedefineL∞(µ)(alsodenotedL∞(Td))tobetheBanachspaceofessentiallyboundedfunctionsonTdequipped withnorm(cid:107)·(cid:107)L∞(Td). ThesubspaceL∞0 (µ)ofL∞(µ)isdefinedanalogouslyto(4.1). Finally,givena(signed) measureν on(Td, (Td))wedenotethetotalvariationnormofν by ν . TV B (cid:107) (cid:107) 4.2 BackwarderroranalysisforODEs Backward error analysis is a powerful tool for the analysis of numerical integrators for differential equations [41,19,12]. Inparticular,itisthemainingredientfortheproofofthegoodenergyconservation(withoutdrift) ofsymplecticRunge-KuttamethodswhenappliedtodeterministicHamiltoniansystemsoverexponentiallylong timeintervals[12]. InourcontextitisusefultocharacterizetheinfinitesimalgeneratorofthenumericalflowΦ ∆t approximatingthesolutionoftheODE(1.11).Indeed,givenaconsistentintegratorz =Φ (z )fortheODE n+1 ∆t n dz(t) =f(z(t)), (4.2) dt 7 theideaofbackwarderroranalysisistosearchforamodifieddifferentialequationwrittenasaformalseriesin powersofthestepsize∆t, dz (cid:101) =f(z)+∆tf (z)+∆t2f (z)+..., z(0)=z (4.3) (cid:101) 1 (cid:101) 2 (cid:101) (cid:101) 0 dt suchthat(formally)z =z(t ),wheret =n∆t(intheabovedifferentialequation,weomitthetimevariablefor n (cid:101) n n brevity). Thenumericalsolutioncanthisbeinterpretedasahigherorderapproximationoftheexactsolutionofa modifiedODE.Forallreasonableintegrators,thevectorfieldsf canbeconstructedinductively[19,12],starting j fromf =f. Ingeneral,theseriesin(4.3)willdivergefornonlinearsystems,andthusneedstobetruncated. We 0 thusconsiderthetruncatedmodifiedODEatorders dz (cid:101) =f(z)+∆tf (z)+∆t2f (z)+...+∆tsf (z), z(0)=z (4.4) (cid:101) 1 (cid:101) 2 (cid:101) s (cid:101) (cid:101) 0 dt wehavezn = z(cid:101)(tn)+ (∆ts+1)for∆t 0forboundedtimestn = n∆t T. WenotethattheflowΦ(cid:101)∆t(z) O → ≤ ofthemodifieddifferentialequation(4.4)satisfies (cid:32)(cid:88)M ∆tk(cid:101)k (cid:33) φ Φ(cid:101)∆t = LD φ+ (∆tM+1), (cid:101)D =F0+∆tF1+∆t2F2+...+∆tsFs, (4.5) ◦ k! O L k=0 forallM 0,andsmoothtestfunctionsφ,andwhereF φ = f φ, j = 0,...,sandf = f. Notethatthe j j 0 ≥ ·∇ (∆tM+1)termsin(4.5)areindependentof∆t 0butdependonM,sandφ1. O → 4.3 Asymptoticbiasofnumericalintegrators Theaimofthissubsectionistodescribetheconditionsonanumericalintegratorfor(1.2)whicharesufficientfor the numerical invariant distribution π∆t to approximate π to order r in the weak sense. These conditions relate (cid:98) directly to the expansion of one-step numerical expectations in powers of ∆t. In particular, denote by P(cid:98)∆t the transitionsemigroupassociatedwithX(cid:98)∆t,i.e. (cid:104) (cid:105) P(cid:98)∆tf :=E f(X(cid:98)1∆t)|X0 =x . andassumethatthefollowingexpansionholds P(cid:98)∆tf =f +∆tA0f +...+∆tkAk−1f +∆tk+1Akf +∆tqQf,∆t, q >k+1 (4.6) where A ,i = 0,1, k are linear differential operators with coefficients depending smoothly on π(x) and its i ··· derivatives, as well as on the choice of the numerical integrator. In addition Q is a smooth remainder term f,∆t dependingbothonf and∆twhilebeinguniformlyboundedwithrespectto∆t. Thefollowingtheoremprovides sufficientconditionsforexpectationswithrespecttoπ∆ttoapproximateexpectationswithrespecttoπtoorderr. (cid:98) Theorem 4.1. Consider equation (1.2) solved by an numerical scheme which is ergodic with respect to some probabilitymeasureπ andsuchthat (cid:98)∆t A∗π =0, for j =1, ,r 1, (4.7) j ··· − whereq >r,thenoneobtains (cid:90) (cid:90) (cid:90) f(x)π∆t(dx)= f(x)π(dx)+∆tr A ( )−1(f π(f))π(dx)+∆tqR , (4.8) (cid:98) r f,∆t Td Td Td −L − wheretheremaindertermR isuniformlyboundedwithrespectto∆t,for∆tsufficientlysmall. f,∆t Proof. Theproofcanbefoundin[1]. Remark4.2. IntegratorsX(cid:98)∆t whichhaveweakerrororderr willautomaticallysatisfycondition(4.7)forj = n 0,...,r 1. However,theconverseisnotnecessarilytrue,see[1]forfurtherdiscussion. − AnimmediatecorollaryofTheorem(4.1)isthat, if(4.7)holds, thenfor∆tsufficientlysmall, theestimator π givenby(1.8)satisfies (cid:98)T (cid:90) lim π (f)=π(f)+∆tr A ( )−1(f π(f))π(dx). (cid:98)N∆t r N→∞ Td −L − 1Forall∆tsmallenough,thesumin(4.5)canbeshowntoconvergeforM →∞inthecaseofanalyticvectorfieldsfj(andanalytictest functionsφ),whichpermitstoremovetheOremainder. 8 4.4 Asymptoticvarianceofnumericalintegrators Theaimofthissubsectionistoderiveaperturbationexpansioninthesmalltimestepregimefortheasymptotic varianceofanarbitraryergodicnumericalintegratorforthedynamics(1.2). Tothisend,weconsideradiffusion X for which the central limit theorem (1.5) holds. Moreover, we shall make the following assumption, which t implies that the corresponding numerical scheme X(cid:98)∆t converges to equilibrium exponentially fast in L∞(Td), k withratewhichisuniformwithrespectto∆t. Assumptions4.3. ThereexistconstantsC >0andλ>0independentof∆tsuchthat,for∆tsufficientlysmall, (cid:13)(cid:13)(cid:13)P(cid:98)∆ktf −π(cid:98)∆t(f)(cid:13)(cid:13)(cid:13)L∞(Td) ≤Ce−λk∆t(cid:13)(cid:13)f −π(cid:98)∆t(f)(cid:13)(cid:13)L∞(Td), f ∈L∞(Td). Remark4.4. Thisconditionisnontrivialtoverifyingeneral. ForthespecificcaseoftheLie-Trotterintegrator (1.10),whenthereversiblecomponentofthedynamicsisintegratedusingMALA,inTheoremC.3weprovethat Assumption4.3holds. Givenanobservablef C∞(Td)weconsiderπ∆tasin(1.8). Wedefinetherescaledasymptoticvarianceof ∈ (cid:98)T theestimatorπ∆tasfollows (cid:98)T (cid:34) N−1 (cid:35) 1 (cid:88) σ(cid:98)∆2t(f)=∆tNl→im∞NVarπ(cid:98)∆t N f(X(cid:98)k∆t) . (4.9) k=0 Note here that we rescale the asymptotic variance with ∆t, to guarantee a well–defined limit when ∆t 0. → Assumption4.3impliesthatthereexistsaconstantK >0,independentof∆tsuchthat (cid:13) (cid:13) (cid:13)(cid:34) (cid:35)−1(cid:13) (cid:13) I P(cid:98)∆t (cid:13) (cid:13) − (cid:13) <K, (4.10) (cid:13) ∆t (cid:13) (cid:13) (cid:13) L∞0 (π(cid:98)∆t) for∆tsufficientlysmall. Inparticular,wecanexpress(4.9)as σ(cid:98)∆2t(f)=2∆t(cid:28)(cid:0)f −π(cid:98)∆t(f)(cid:1),(cid:16)I−P(cid:98)∆t(cid:17)−1(cid:0)f −π(cid:98)∆t(f)(cid:1)(cid:29) −∆tVarπ(cid:98)∆t[f]. (4.11) π(cid:98)∆t It should be clear from (4.11) that there will be two contributions to the error between σ2 (f) and σ2(f): one (cid:98)∆t arisingfromtheorderofweakconvergenceofthenumericalmethod, andonefromthetimediscretenessofthe processX(cid:98)∆t. Indeed,evenwhenoneconsiderstheexactdiscretetimedynamicsdefinedby k X∆t =X(n∆t), n N, n ∈ the error between the corresponding asymptotic variance σ2 (f) and σ2(f) will be non-zero, despite the fact ∆t that both discrete and continuous time Markov processes have the same invariant distribution. To isolate the differentsourcesoferror,wepresentfirstProposition4.5whichquantifiestheeffectofthetime-discretenesson theasymptoticvariance. InTheorem4.6wethenquantifytheerrorbetweentheasymptoticvariancesσ2 (f)and ∆t σ(cid:98)∆2t(f)ofXn∆tandX(cid:98)n∆t,respectively. Proposition4.5. Forallφ C∞(Td), suchthatπ(φ) = 0thereexistsasmoothfunctionR suchthatfor∆t φ ∈ sufficientlysmall, (cid:18)I P (cid:19)−1 ∆t(cid:18)I P (cid:19)−1 ∆t2 (cid:18)I P (cid:19)−1 − ∆t φ(x)=( )−1φ(x)+ − ∆t ( )φ(x) − ∆t ( )2φ+∆t3R , φ ∆t −L 2 ∆t −L − 6 ∆t −L (4.12) whereR isbounded,independentof∆t. Inparticular,forf C∞(Td), φ ∈ ∆t2 σ2 (f)=σ2(f)+ ( )(f π(f)),f π(f) +o(∆t2). ∆t 6 (cid:104) −L − − (cid:105)π Proof. TheproofcanbefoundinSection8.2. 9 DefinetheoperatorM tobetheprojectorontofunctionswithmeanzerowithrespecttoπ ,i.e. ∆t (cid:98)∆t (cid:90) M φ(x)=φ(x) φ(y)π (y)dy. ∆t (cid:98)∆t − Td Thefollowingtheoremcharacterisesthedifferencebetweentheasymptoticvariancearisingfromtheexactdiscrete timedynamicsX∆tandthenumericalintegratorX(cid:98)∆t. n n Theorem 4.6. Suppose that, for some k N, k 1, there exist operators A ,...,A on C∞(Td), bounded 0 k ∈ ≥ uniformly with respect to ∆t, where A = Li+1 ,i = 0, ,k 1 and such that for all ψ C∞(Td) the i (i+1)! ··· − ∈ semigroupP(cid:98)∆tsatisfies(4.6). Supposethatthecorrespondinginvariantdistributionπ(cid:98)∆tsatisfies (cid:90) (cid:90) ψ(x)π∆t(x)dx= ψ(x)π(x)dx+∆trR , (cid:98) ψ Td Td wherer >kandR isasmoothremainderterm,uniformlyboundedwithrespectto∆t. Moreover,supposethat ψ P(cid:98)∆tsatisfies(4.10). Thenforallf,g C∞(Td)suchthatπ(f)=π(g)=0,wehavetheexpansion ∈ (cid:42) (cid:18) (cid:19)−1 (cid:43) (cid:42) (cid:32) (cid:33)−1 (cid:43) g, I−P∆t f = M g, I−P(cid:98)∆t M f +∆tkR (f,g)+o(∆tk), (4.13) ∆t ∆t 1 ∆t ∆t π π(cid:98)∆t where (cid:42)(cid:32)I P(cid:98)∆t(cid:33)−1 (cid:18) k+1 (cid:19)(cid:18)I P∆t(cid:19)−1 (cid:43) R (f,g)= − M L A − f,M g . (4.14) 1 ∆t k ∆t ∆t (k+1)! − ∆t π(cid:98)∆t Inparticular σ2 (f)=σ2 (f)+2∆tkR (f,f)+o(∆tk). (4.15) (cid:98)∆t ∆t 1 Ifmoreover π(A ψ)=0, (4.16) k holdsforforallψ C∞(Td),thenwecanwrite ∈ (cid:28) (cid:18) k+1 (cid:19) (cid:29) R (f,g)= ( )−1 L A ( )−1f,g +o(∆tk). (4.17) 1 k −L (k+1)! − −L π Proof. TheproofcanbefoundatSection8.2. Remark4.7. ItisinterestingtonotethatcontrarytothecaseoftheasymptoticbiasinTheorem4.1,theorderof errorforthediscreteasymptoticvarianceinTheorem4.6dependscruciallyontheorderoftheweakconvergence oftheunderlyingnumericalintegrator. Furthermore, weseethatiftheweakorderoftheintegratoristhantwo thentheleadingordererrortermbetweenσ2 (f)andtheasymptoticvarianceofthecontinuousprocessσ2(f) (cid:98)∆t equalstotheleadingordertermofdifferencebetweenσ2 (f)andσ2(f). ∆t Tocompletethisanalysisweshallconsidertheasymptoticvariancearisingfromaperturbeddiffusionprocess X(cid:101)thavinginfinitesimalgenerator (cid:101)∆tsuchthat,for∆tsufficientlysmall L (cid:101)∆tf = f +∆tk kf +∆tq−1Rf, f C∞(Td), (4.18) L L L ∈ whereq >k+1. Weshallalsoassumethat(L(cid:101)∆t)−1isboundedinL∞0 (π(cid:98)∆t)uniformlywithrespectto∆t. More specificallythereexistsK >0,independentof∆tsuchthat (cid:13) (cid:13) (cid:13)(cid:16) (cid:17)−1(cid:13) (cid:13) (cid:101)∆t (cid:13) <K, (4.19) (cid:13) −L (cid:13) L∞0 (π(cid:98)∆t) for∆tsufficientlysmall. Thefollowingresultcharacterisestheinfluenceofthisperturbationontheasymptotic varianceforsmall∆t.FornumericalapproximationsofX forwhichamodifiedSDE[49]isknown,thefollowing t result combinedwith Proposition 4.5provide a convenientmeans of obtainingan expression forthe asymptotic varianceσ2 ofthenumericalschemeintermsofσ2(f). (cid:101)∆t 10