ebook img

Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation PDF

2.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation

Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation A.B.Duncan SchoolofMathematicalandPhysicalSciences UniversityofSussex Falmer and G.A.Pavliotis DepartmentofMathematics 7 ImperialCollegeLondon 1 LondonSW72AZ,UK 0 2 and n K.C.Zygalakis a SchoolofMathematics J UniversityofEdinburgh 6 1 Edinburgh,EH93FD,UK ] January17,2017 E M t. Abstract a t Foragiventargetdensityπ : Rd (cid:55)→ R, thereexistaninfinitenumberofdiffusionprocesseswhichhave s uniqueinvariantdensityπ. Asobservedinanumberofpapers[7,37,38]samplersbasedonnonreversibledif- [ fusionprocessescansignificantlyoutperformtheirreversiblecounterpartsbothintermsofasymptoticvariance 1 andrateofconvergencetoequilibrium. Inthispaper,wetakeadvantageofthisinordertoconstructefficient v samplingalgorithmsbasedontheLie-Trotterdecompositionofanonreversiblediffusionprocessintoreversible 7 andnonreversiblecomponents. Weshowthatsamplersbasedonthisschemecansignificantlyoutperformstan- 4 dardMCMCmethods, atthecostofintroducingsomecontrolledbias. Inparticular, weprovethatnumerical 2 integrators constructed according to this decomposition are geometrically ergodic and characterize fully their 4 asymptoticbiasandvariance, showingthatthesamplerinheritsthegoodmixingpropertiesoftheunderlying 0 . nonreversible diffusion. This is illustrated further with a number of numerical examples ranging from highly 1 correlatedlowdimensionaldistributions,tologisticregressionproblemsinhighdimensionsaswellasinference 0 forspatialmodelswithmanylatentvariables. 7 1 v: 1 Introduction i X Consider the problem of computing expectations with respect to a probability distribution with smooth density r a π(x),knownonlyuptoanormalizationconstant,i.e. wewishtoevaluate (cid:90) π(f)= f(x)π(x)dx. (1.1) Rd For high dimensional distributions, deterministic techniques are no longer tractable. On the other hand, proba- bilisticmethodsdonotsufferthesamecurseofdimensionalityandthusareoftenthemethodofchoice. Onesuch approachisMarkovChainMonteCarlo(MCMC)whichisbasedontheconstructionofaMarkovprocessonRd whoseuniqueinvariantdistributionisπ(x). Duetotheirsimplicityandwideapplicability,Markovchainsbased onMetropolis-Hastings(MH)transitionkernels[13,28]andtheirnumerousvariantsremainthemostwidelyused schemeforsamplingfromageneraltargetprobabilitydistribution,despitehavingbeenintroducedover60years ago. AsthereareinfinitelymanyMarkovprocesseswhichareergodicwithrespecttoagiventargetdistribution 1 π,anaturalquestioniswhetheraMarkovprocesscanbechosenwhichismoreefficient,intermsofconvergence toequilibriumandmixing. MetropolizedschemesarereversibleMarkovchainsbyconstruction,i.e. theysatisfy detailedbalance. Itisawelldocumentedfactthatnonreversiblechainsmightconvergencetoequilibriumfaster than reversible ones [34, 5, 32]. Various MCMC schemes have been proposed which are based on the general ideaofbreakingreversibilitybyintroducinganaugmentedtargetmeasureonanextendedstatespace,alongwith dynamicswhichareinvariantwithrespecttotheaugmentedtargetmeasure. Fordiscretestatespaces,thelifting method[5,15,47]isonesuchapproach,wheretheMarkovchainis“lifted”fromthestatespaceEtoE 1, 1 . ×{ − } ThetransitionprobabilitiesineachcopyofEaremodifiedtointroducetransitionsbetweenthecopiestopreserve theinvariantdistributionbutnowpromotethesamplertogeneratelongtrajectories. Forcontinuousstatespaces, analogous approaches involve augmenting the state space with a velocity/momentum variable and constructing Makoviandynamicswhichareabletomixmorerapidlyintheaugmentedstatespace. SuchmethodsincludeHy- bridMonteCarlo(HMC)methods,inspiredbyHamiltoniandynamics. WhilethestandardconstructionofHMC [6,35]isreversible,itisstraightforwardtoconstructdynamicsbasedontheGeneralizedHMCscheme[14]which willnotbereversible,seealso[36]andmorerecently[23]. Deferring issues of simulation until later, another candidate Markov process for sampling from π is the diffu- sion(X ) definedbythefollowingItôstochasticdifferentialequation(SDE): t t≥0 dX =b(X )dt+√2dW , (1.2) t t t whereW isastandardRd–valuedBrownianmotionandb:Rd Rdisasmoothvectorfieldwhichsatisfies t → b(x)= logπ(x)+γ(x), (π(x)γ(x))=0, (1.3) ∇ ∇· forsomesmoothvectorfieldγ onRd satisfyingsomemildassumptions(c.f. Proposition2.2). TheprocessX is t nonreversibleifandonlyifγ =0. BytheBirkhoffergodictheorem, (cid:54) 1 (cid:90) T lim f(X )ds=E [f], f L1(π), s π T→∞T 0 ∈ andthusonecanuse 1 (cid:90) T π (f):= f(X )ds T s T 0 asanestimatorforE [f],forT sufficientlylarge. Anaturalwaytomeasuretheefficiencyofsuchestimatoristhe π meansquareerror(MSE)givenby MSE(T):=Eπ (f) π(f)2. (1.4) T | − | UnderappropriateconditionsonX andf,theestimatorπ (f)willsatisfyacentrallimittheorem,i.e. t T lim √T (π (f) E [f])= (0,2σ2(f)), (1.5) T π T→+∞ − N whereσ2(f)istheasymptoticvarianceoftheestimatorπ (f)whichcanbeexpressedby T σ2(f):= φ,( )φ , (1.6) (cid:104) −L (cid:105)π where istheinfinitesimalgeneratorof(1.2)andφisthemeanzerosolutionofthefollowingPoissonequation onRd,L φ=f π(f). (1.7) −L − This relationship can be used to simplify the expression for the MSE (1.4) and decompose it in terms of bias µ (f)andvarianceσ2(f)asfollows T T Eπ (f) π(f)2 =(Eπ (f) π(f))2+E(π (f) Eπ (f))2 =(µ (f))2+σ2(f). | T − | T − T − T T T For large T, the variance satisfies σ2(f) T−1σ2(f), while µ (f)2 = o(T−1). Since γ(x) is not uniquely T (cid:39) T definedin(1.3),anaturalquestionishowitshouldbechosentoensurethatforagiventimeT,theMSEin(1.4) is as small as possible. This can be achieved in two manners, the first by maximising the L2(π)-spectral gap associatedwith(1.2)asstudiedin[20,48]andhenceincreasingthespeedwithwhichµ convergestozero. In T 2 general,maximisingtheL2(π)-spectralgapischallenging. Analternativeistochooseγ(x)insuchawaysoasto reducetheasymptoticvarianceσ2(f). Itshouldbeemphasisedthattheoptimalchoicewillbedifferentforeach case. In particular in [7, 37, 38], it was shown that the choice γ(x) = 0, which corresponds to using reversible dynamics,givesthemaximumvalueofasymptoticvarianceforagivenchoiceofdiffusiontensor. Inparticular, introducinganonreversibleperturbationwillneverdecreasetheperformanceofanestimatorbasedonLangevin dynamics,bothintermsofconvergencetoequilibriumandasymptoticvariance. In general (1.2) cannot be simulated exactly, and one typically resorts to a discretisation of the SDE, denoted byX(cid:98)∆t,inordertoapproximateπ(f). Inparticular,thefollowingergodicaverageisused n N 1 (cid:88) π(cid:98)T∆t(f):= N f(X(cid:98)k∆t), N∆t=T. (1.8) k=0 Extra caution has to be taken in order to ensure that the above quantity converges in the limit of T since → ∞ even if (1.2) is ergodic (or even exponentially ergodic), this will not necessarily be the case for its numerical discretisation[39,43,44]. Inaddition,evenwhenthenumericaldiscretizationisergodicandthus (cid:90) lim π (f)=π∆t(f)= f(x)π∆t(x)dx, (1.9) (cid:98)T (cid:98) (cid:98) T→∞ Rd itisnottrueingeneralthatπ∆t =π,sincetheunderlyingnumericaldiscretizationintroducesbiasintheestimation (cid:98) ofπ(f)(see[45,1,2]). OnewaytoeliminatesuchbiasisthroughMetropolization[42,46],i.e. theintroduction ofanaccept-rejectstepthatensuresthatthecorrespondingMarkovchainisergodicwithrespecttothetargetdis- tributionπ. However,suchbiaseliminationmightnotbeadvantageousinpracticesincetheMetropolisedchain willbereversiblebyconstruction,thuseliminatinganybenefitintroducedbythenonreversibleperturbationγ. When computing expectations of distributions with expensive likelihoods, it might be too costly to sample a longMarkovchaintrajectory. IfanappropriatenonreversibleLangevindynamics(1.2)canbeintroducedwhich doesgiverisetoadramaticreductioninasymptoticvariance,thenitmightbeadvantageoustopermitacontrolled amountofbiasinexchangeforneedingtosamplefareless.Thisbias-variancetradeoff,inthecontextofnumerical discretisations of (1.2) is the subject of study of this paper. In particular, we will consider discretizations based onaLie-Trottersplittingbetweenthereversibleandthenonreversiblepartofthedynamics. Morespecifically,we considerintegratorsoftheform X(cid:98)n∆+t1 =Θ∆t◦Φ∆t(X(cid:98)n∆t), (1.10) whereΦ (x)isaintegratorthatapproximatestheflowmapcorrespondingtothedeterministicdynamics ∆t dx t =γ(x ), (1.11) t dt andΘ (x)whichapproximatesthereversibledynamics ∆t dx = logπ(x )dt+√2dW . (1.12) t t t ∇ ThechoiceofΦ ,Θ hasafundamentalinfluenceonthebias,asymptoticvarianceandstabilityoftheresulting ∆t ∆t sampler. Inparticular,ifonechoosesΦ tobeaMetropolisedintegrator[3]then,similarlytotheresultin[2], ∆t the order of convergence of the deterministic integrator Φ provides a lower bound for the difference between ∆t expectations with respect to π∆t and π. However, this is not the case for the numerical asymptotic variance (cid:98) σ2 (f), since even though we can show that it is a perturbation of σ2(f) the difference will depend crucially (cid:98)∆t onthechoiceofΘ . Theseresultsareimportantastheyallowtochoosethecorrectcombinationofdynamics ∆t andnumericalschemethatdrasticallyreducesthecomputationalcostrequiredtoachieveagiventoleranceoferror. Insummary,themainofthecontributionsofthispaperare 1. proving geometric ergodicity for the Markov chain given by (1.10) for a variety of different numerical integratorsappliedtothereversiblepart; 2. acompletecharacterisationoftheasymptoticbiasof(1.10); 3 3. showingthat,bycompletelycharacterisingtheasymptoticvariance,numericalintegratorsofthetype(1.10) inherittheasymptoticvariancebenefitsofthenonreversibleSDE(1.2); 4. exhibiting the potential of using nonreversible integrators for sampling as illustrated from a number of differentnumericalexperimentsoninferenceforspatialmodelsaswellasrealdatasets. Therestofthepaperisorganisedasfollows. InSection2wedescribesomeknowntheoreticalresultsforthe SDE(1.2)whicharenecessaryforthedevelopmentofthispaper. InSection3weidentifitysufficientconditions to guarantee geometric ergodicity of the Lie-Trotter splitting scheme (1.10) on Rd. In Section 4 we study the asymptotic properties of a class of numerical integrators for (1.2) for which the Lie-Trotter scheme is a special case. Inparticularwederiveperturbativeexpansionsfortheasymptoticbiasandvariance. InSection5weapply these results to characterise the asymptotic bias and variance of the Lie-Trotter scheme on the bounded domain Td.InSection6,wefocusonthecasewherethetargetdistributionisGaussianandstudyanalyticallythetrade-off betweentheasymptoticbiasandasymptoticvarianceinthiscase. Todemonstratetheefficacyoftheseschemes, inSection7wepresentanumberofnumericalexperimentsoninferenceforspatialmodelsaswellasonBayesian logisticregression. ProofsofthemainresultsofthispaperaredeferredtoSection8aswellastheAppendices. Finally,adiscussionoftheresultspresentedinthispaperandpotentialfutureresearchdirectionscanbefoundin Section9. 2 Properties of Overdamped Langevin Diffusions Inthissectionwediscussdifferentknowntheoreticalresultsthatareusefulforunderstandingthemainresultsof thepaper. WestartbylistingtheassumptionsweshallmakeonπandtheSDE(1.2)toensureergodicity. Assumptions2.1. 1. The measure π possesses a positive smooth density π(x) > 0, known up to a normalizing constant, such that π L1(Rd). ∈ 2. Thedriftvectorb:Rd Rdof (1.2)issmoothandsatisfies → b(x)= logπ(x)+γ(x), (2.1) ∇ whereγ :Rd RdisasmoothvectorfieldwithcomponentsinL1(π)suchthat → (π(x)γ(x))=0. (2.2) ∇· Thefollowingresultprovidesnecessaryandsufficientconditionsonthecoefficientsof(1.2)toensurethatX t possessesauniquestationarydistributionπ. Proposition2.2. SupposethatAssumptions2.1hold. ThenthediffusionprocessX definedby(1.2)possessesa t stronglycontinuoussemigroup(P ) onL2(π)definedby t t≥0 P f(x)=E[f(X ) X =x]. (2.3) t t 0 | Theassociatedinfinitesimalgeneratorisananextensionof 1 = (π )+γ (2.4) L π∇· ∇· ·∇ withcoreC∞(Rd). Moreover,P hasuniqueinvariantdistributionπ. Conversely,givenadiffusionprocessofthe c t form(1.2)whichisinvariantwithrespecttoπ,thenthedriftbnecessarilysatisfies(2.1)and(2.2). Proof. The first part of this result is a direct application of [21, Thm 8.1.26]. The converse implication can be checkedusingintegrationbyparts. While many choices for γ are possible (see [22] for a more complete recipe) a natural family of vector fields is given by γ(x) = J Φ(π(x)), where Φ is a smooth function satisfying Φ(π()) L1(π) and J is d d ∇ ∇ · ∈ × skew-symmetricmatrix. Weshallfocusspecificallyonthefollowingthreechoices: 4 (cid:82) 1. Ifπsatisfies logπ(x)π(dx)< ,thenthevectorfield Rd|∇ | ∞ γ(x)=J logπ(x), J = J(cid:62), (2.5) ∇ − satisfiescondition(2.2). Thiswasthechoicewhichwasstudiedspecificallyin[7]. 2. If(cid:82) logπ(x)π1+α(dx) < forsomeα > 0thenanothernaturalchoiceforthevectorfieldisgiven Rd|∇ | ∞ by γ(x)=J πα(x), J = J(cid:62). (2.6) ∇ − Although(2.6)introducesanadditionaltuningparameterα,onemightpreferthischoiceasitcoincideswith theintuitionthatwhenfarawayfromthemodesthesamplershouldmovetowardsthemodesasquicklyas possible,andshouldonlyundergothesedeterministicmeandersinregionsofhighprobability. 3. LetΨ:R Rbeasmooth,compactlysupportedfunction. Then → γ(x)=J logπ(x)Ψ(π(x)), J = J(cid:62), andβ R, (2.7) ∇ − ∈ willalwayssatisfy(2.2). Moreover,ifπhascompactlevelsets,thenγwillalsobecompactlysupportedon Rd. Applyingtheresultsdetailedin[11,31],weshallassumethattheprocessX possessesaLyapunovfunction, t whichissufficienttoensuretheexponentialergodicityofX ,asdetailedinthesubsequentproposition. t Assumptions 2.3 (Foster–Lyapunov Criterion). There exists a function V : Rd R and constants c > 0 and b Rsuchthat → ∈ V(x) cV(x)+b1 , andV(x) 1, x Rd, (2.8) C L ≤− ≥ ∈ where1 istheindicatorfunctionoverapetiteset. C Forthedefinitionofapetitesetwereferthereaderto[30]. Forthegenerator correspondingtotheprocess L (1.2)compactsetsarealwayspetite.TheexponentialergodicityofX followsfromthefollowingproposition(see t also[24,30]). Proposition2.4. SupposethatAssumption2.3holds,thenthereexistconstantsC >0andλ>0suchthat: P f(x) π(f) CU(x)e−λt, x Rd, (2.9) t | − |≤ ∈ forallf satisfying f U. | |≤ Moreover,theFoster-LyapunovcriterionalsoprovidesasufficientconditionforthePoissonequation(1.7)to bewell-posed,andthusforthecentrallimittheorem(1.5)tohold. Proposition 2.5. Suppose that Assumption 2.3 holds and that π(U2) < , then for any function f such that ∞ f U, the central limit theorem (1.5) holds, i.e. √T(π (f) π(f)) converges weakly to a (0,σ2(f))– T | | ≤ − N distributedrandomvariable,with (cid:90) σ2(f)= φ(x)( )φ(x)π(x)dx, Rd −L whereφistheuniquemeanzerosolutiontothePoissonequation(1.7). Moreoverthesolutionφcanbeexpressed as (cid:90) ∞ φ= [P f π(f)] dt. t − 0 Thefollowinglemmaprovidesasufficientconditiononπ for(1.2)topossessaLyapunovfunction. Itisaslight generalisationofasimilarresultfrom[40],extendedtoapplyalsointhecaseofnonreversiblediffusionprocesses. Lemma2.6. [40,Theorem2.3]ConsidertheprocessX definedby(1.2)withdriftcoefficientbsatisfying(2.1). t Supposethatπisbounded,thereexists0<δ <1suchthat, liminf(cid:0)(1 δ) logπ(x)2+∆logπ(x)(cid:1)>0, (2.10) |x|→∞ − |∇ | andthevectorfieldγ satisfies γ(x)=0, x Rd, (2.11) ∇· ∈ thentheFoster–Lyapunovcriterionholdsfor(1.2)withU(x)=π−δ(x)andmoreoverπ(U)< . ∞ Remark2.7. Notethatwhenγ(x) = J Φ(π(x))equation(2.11)isautomaticallysatisfied. Hencethechoices ∇ ofchoicesofγ specifiedby(2.5),(2.6)and(2.7)allsatisfy(2.11). 5 3 Geometric ergodicity of the splitting scheme on Rd InthissectionweidentifysufficientconditionsunderwhichtheLie-TrotterschemeonRdisgeometricallyergodic withrespecttoaninvariantdistributionπ∆t whichwillbeaperturbationofπ. Ingeneral,adiscretizationofthe (cid:98) ergodicdiffusionprocess(1.2)neednottobeergodic,geometricorotherwise,see[40]. Forthesplittingscheme we shall show that provided the approximate nonreversible flow Φ is sufficiently weak away from the origin, ∆t theprocess(1.10)willinheritthegeometricergodicityfromthereversibledynamics. (cid:16) (cid:17) We follow the Meyn and Tweedie [30] recipe to demonstrate geometric ergodicity of X(cid:98)∆t . Consider n n∈N thereversibleprocessdefinedby Z∆t =Θ Z∆t, (3.1) n+1 ∆t n andP(cid:101)∆tbethecorrespondingtransitionsemigroup.WeshallassumethatthereversibledynamicsareaMetropolis- Hastingschain,withproposalkernelq ( x),morespecifically,givenx Rd,Θ (x)isconstructedasfollows ∆t ∆t ·| ∈ 1. Sampley q ( x). ∆t ∼ ·| 2. Withprobability (cid:18) (cid:19) π(y)q (xy) ∆t α(x,y)=min 1, | , π(x)q (y x) ∆t | setΘ x:=yotherwiseΘ x:=x. ∆t ∆t ItiswellknownthatthetargetdistributionπisinvariantunderthemapΘ [28,13]. ∆t Denote by P(cid:98)∆t(x, ) and P(cid:101)∆t(x, ) the transition distribution functions of the splitting scheme (1.10) and (3.1) · · respectively. Thenclearly P(cid:98)∆tf(x,A)=(P(cid:101)∆tf)(Φ∆t(x),A), A (Rd). ∈B Followingtheapproachof[26]wefirstshowthat(1.10)isaπ-irreducible,aperiodicMarkovchain. Moreover,we willshowthatallcompactsetsaresmall,i.e. foreverycompactsetC,thereexistsaδ >0andn>0suchthat P(cid:98)n (x, ) δν(), x C. ∆t · ≥ · ∈ Finally,wewillshowthatifaFoster-LyapunovconditionholdsforthereversibledynamicsP(cid:101)∆t,thenitalsoholds forP(cid:98)∆t. Tothisend,weshallmakethefollowingassumptions. Assumptions3.1. For∆tsufficientlysmall,weassumethat 1 Thereversiblechain(3.1)satisfiesaFoster-Lyapunovcondition,i.e. thereexistsacontinuousfunctionV 1,a compactsetC Rdandconstantsλ (0,1)andb 0suchthat ≥ ⊂ ∈ ≥ P(cid:101)∆tV(x) λV(x)+b1C(x), x Rd. (3.2) ≤ ∈ 2 ThenonreversibleflowmapΦ satisfiesthefollowingcondition, ∆t V(Φ (x)) V(x) 1 ∆t lim sup − < 1. (3.3) V(x) λ − |x|→∞ 3 ThepreimageΦ−1(C)isbounded. ∆t Themaintheoremofthissectionestablishesthegeometricergodicityof(1.10). Theorem 3.2. Suppose that Assumptions 3.1 hold, and that π and q (y x) are positive and continuous for all ∆t | x,y Rd. Thenfor∆tsufficientlysmall, theprocessX(cid:98)∆t isgeometricallyergodic, i.e. thereexistsρ (0,1) ∈ n ∈ andK >0suchthat (cid:12)(cid:90) (cid:12) sup (cid:12)(cid:12) g(y)(P∆t(x,y) π(y)) dy(cid:12)(cid:12) KV(x)ρn, n N. |g|≤V (cid:12) Rd − (cid:12)≤ ∈ 6 WenowfocusonthecasewhenthereversibledynamicsaresimulatedusingMALA(Metropolis-AdjustedLangevin Algorithm),i.e. usingaproposaloftheform q ( x) (x+ logπ(x)∆t,2∆t), (3.4) ∆t ·| ∼N ∇ forastepsize∆t>0. ThefollowingresultisanapplicationofTheorem3.2fortheproposal(3.4). Corollary 3.3 (Geometric Ergodicity of Lie-Trotter scheme with MALA dynamics). Consider the Lie-Trotter splitting scheme X(cid:98)∆t where the reversible dynamics (1.12) are simulated using a MALA scheme with proposal n definedby(3.4). Supposethattheconditionsonπandq specifiedin[40,Theorem4.1]holdandmoreoverthat ∆t lim (Φ (x) x)=0, (3.5) ∆t |x|→∞ | |−| | for∆tsufficientlysmall. ThenX(cid:98)∆tisgeometricallyergodic. n In particular, suppose that lim π(x) 0, and that, given α > 0, there exist positive constants α(cid:48), K |x|→∞ 1 → andK suchthat 2 πα(x) K πα(cid:48)(x), πα(x) K , x Rd, (3.6) |∇ |≤ 1 |∇∇ |max ≤ 2 ∈ where denotesthemaxnorm. Ifγ =J πα forJ antisymmetric,thencondition(3.5)willholdifΦ (x) max ∆t |·| ∇ issimulatedusinganexplicitEulerorRunge-Kuttascheme. Asimilarresultholdsforγ givenby(2.7). 4 Asymptotic Bias and Variance Estimates for general integrators Inthissectionweconsidertheasymptoticbehaviouroftheestimator(1.8)forπ(f),obtainedforageneralnumer- icalscheme(X(cid:98)k∆t)k≥0. Inparticular,weshallderiveestimatesfortheasymptoticbiasandasymptoticvarianceof theestimatorπ (f). ForsimplicityweshallfocusonthecasewherethedomainisTd,i.e. theunithypercube (cid:98)∆t withperiodicboundaryconditions. Asin[25]thisset-upgreatlysimplifiesthederivationofexpressionsforbias andvariance,particularlysinceremaindertermsarisingfromTaylorexpansionscanbeeasilycontrolled. Weex- pectthatextendingtheseresultstounboundeddomainsshouldbepossiblebyfollowinganalogousapproachesin [17]. Throughoutthissection,weshallassumethatthenumericalintegratorX(cid:98)∆tisergodic,withuniqueinvariant k distributionπ∆t. (cid:98) 4.1 Notation Wefirstintroducethenotationwhichwillbeusedinthissectionandtheremainderofthepaper. Givenaprobabil- itymeasureµon(Td, (Td))defineL2(µ)tobetheHilbertspaceofsquareintegrablefunctionsonTd,equipped B withinnerproduct(cid:104)·,·(cid:105)µandnorm(cid:107)·(cid:107)L2(π). ThesubspaceL20(µ)ofL2(µ)isdefinedtobe L2(µ)= f L2(µ) : µ(f)=0 , (4.1) 0 { ∈ } WedefineL∞(µ)(alsodenotedL∞(Td))tobetheBanachspaceofessentiallyboundedfunctionsonTdequipped withnorm(cid:107)·(cid:107)L∞(Td). ThesubspaceL∞0 (µ)ofL∞(µ)isdefinedanalogouslyto(4.1). Finally,givena(signed) measureν on(Td, (Td))wedenotethetotalvariationnormofν by ν . TV B (cid:107) (cid:107) 4.2 BackwarderroranalysisforODEs Backward error analysis is a powerful tool for the analysis of numerical integrators for differential equations [41,19,12]. Inparticular,itisthemainingredientfortheproofofthegoodenergyconservation(withoutdrift) ofsymplecticRunge-KuttamethodswhenappliedtodeterministicHamiltoniansystemsoverexponentiallylong timeintervals[12]. InourcontextitisusefultocharacterizetheinfinitesimalgeneratorofthenumericalflowΦ ∆t approximatingthesolutionoftheODE(1.11).Indeed,givenaconsistentintegratorz =Φ (z )fortheODE n+1 ∆t n dz(t) =f(z(t)), (4.2) dt 7 theideaofbackwarderroranalysisistosearchforamodifieddifferentialequationwrittenasaformalseriesin powersofthestepsize∆t, dz (cid:101) =f(z)+∆tf (z)+∆t2f (z)+..., z(0)=z (4.3) (cid:101) 1 (cid:101) 2 (cid:101) (cid:101) 0 dt suchthat(formally)z =z(t ),wheret =n∆t(intheabovedifferentialequation,weomitthetimevariablefor n (cid:101) n n brevity). Thenumericalsolutioncanthisbeinterpretedasahigherorderapproximationoftheexactsolutionofa modifiedODE.Forallreasonableintegrators,thevectorfieldsf canbeconstructedinductively[19,12],starting j fromf =f. Ingeneral,theseriesin(4.3)willdivergefornonlinearsystems,andthusneedstobetruncated. We 0 thusconsiderthetruncatedmodifiedODEatorders dz (cid:101) =f(z)+∆tf (z)+∆t2f (z)+...+∆tsf (z), z(0)=z (4.4) (cid:101) 1 (cid:101) 2 (cid:101) s (cid:101) (cid:101) 0 dt wehavezn = z(cid:101)(tn)+ (∆ts+1)for∆t 0forboundedtimestn = n∆t T. WenotethattheflowΦ(cid:101)∆t(z) O → ≤ ofthemodifieddifferentialequation(4.4)satisfies (cid:32)(cid:88)M ∆tk(cid:101)k (cid:33) φ Φ(cid:101)∆t = LD φ+ (∆tM+1), (cid:101)D =F0+∆tF1+∆t2F2+...+∆tsFs, (4.5) ◦ k! O L k=0 forallM 0,andsmoothtestfunctionsφ,andwhereF φ = f φ, j = 0,...,sandf = f. Notethatthe j j 0 ≥ ·∇ (∆tM+1)termsin(4.5)areindependentof∆t 0butdependonM,sandφ1. O → 4.3 Asymptoticbiasofnumericalintegrators Theaimofthissubsectionistodescribetheconditionsonanumericalintegratorfor(1.2)whicharesufficientfor the numerical invariant distribution π∆t to approximate π to order r in the weak sense. These conditions relate (cid:98) directly to the expansion of one-step numerical expectations in powers of ∆t. In particular, denote by P(cid:98)∆t the transitionsemigroupassociatedwithX(cid:98)∆t,i.e. (cid:104) (cid:105) P(cid:98)∆tf :=E f(X(cid:98)1∆t)|X0 =x . andassumethatthefollowingexpansionholds P(cid:98)∆tf =f +∆tA0f +...+∆tkAk−1f +∆tk+1Akf +∆tqQf,∆t, q >k+1 (4.6) where A ,i = 0,1, k are linear differential operators with coefficients depending smoothly on π(x) and its i ··· derivatives, as well as on the choice of the numerical integrator. In addition Q is a smooth remainder term f,∆t dependingbothonf and∆twhilebeinguniformlyboundedwithrespectto∆t. Thefollowingtheoremprovides sufficientconditionsforexpectationswithrespecttoπ∆ttoapproximateexpectationswithrespecttoπtoorderr. (cid:98) Theorem 4.1. Consider equation (1.2) solved by an numerical scheme which is ergodic with respect to some probabilitymeasureπ andsuchthat (cid:98)∆t A∗π =0, for j =1, ,r 1, (4.7) j ··· − whereq >r,thenoneobtains (cid:90) (cid:90) (cid:90) f(x)π∆t(dx)= f(x)π(dx)+∆tr A ( )−1(f π(f))π(dx)+∆tqR , (4.8) (cid:98) r f,∆t Td Td Td −L − wheretheremaindertermR isuniformlyboundedwithrespectto∆t,for∆tsufficientlysmall. f,∆t Proof. Theproofcanbefoundin[1]. Remark4.2. IntegratorsX(cid:98)∆t whichhaveweakerrororderr willautomaticallysatisfycondition(4.7)forj = n 0,...,r 1. However,theconverseisnotnecessarilytrue,see[1]forfurtherdiscussion. − AnimmediatecorollaryofTheorem(4.1)isthat, if(4.7)holds, thenfor∆tsufficientlysmall, theestimator π givenby(1.8)satisfies (cid:98)T (cid:90) lim π (f)=π(f)+∆tr A ( )−1(f π(f))π(dx). (cid:98)N∆t r N→∞ Td −L − 1Forall∆tsmallenough,thesumin(4.5)canbeshowntoconvergeforM →∞inthecaseofanalyticvectorfieldsfj(andanalytictest functionsφ),whichpermitstoremovetheOremainder. 8 4.4 Asymptoticvarianceofnumericalintegrators Theaimofthissubsectionistoderiveaperturbationexpansioninthesmalltimestepregimefortheasymptotic varianceofanarbitraryergodicnumericalintegratorforthedynamics(1.2). Tothisend,weconsideradiffusion X for which the central limit theorem (1.5) holds. Moreover, we shall make the following assumption, which t implies that the corresponding numerical scheme X(cid:98)∆t converges to equilibrium exponentially fast in L∞(Td), k withratewhichisuniformwithrespectto∆t. Assumptions4.3. ThereexistconstantsC >0andλ>0independentof∆tsuchthat,for∆tsufficientlysmall, (cid:13)(cid:13)(cid:13)P(cid:98)∆ktf −π(cid:98)∆t(f)(cid:13)(cid:13)(cid:13)L∞(Td) ≤Ce−λk∆t(cid:13)(cid:13)f −π(cid:98)∆t(f)(cid:13)(cid:13)L∞(Td), f ∈L∞(Td). Remark4.4. Thisconditionisnontrivialtoverifyingeneral. ForthespecificcaseoftheLie-Trotterintegrator (1.10),whenthereversiblecomponentofthedynamicsisintegratedusingMALA,inTheoremC.3weprovethat Assumption4.3holds. Givenanobservablef C∞(Td)weconsiderπ∆tasin(1.8). Wedefinetherescaledasymptoticvarianceof ∈ (cid:98)T theestimatorπ∆tasfollows (cid:98)T (cid:34) N−1 (cid:35) 1 (cid:88) σ(cid:98)∆2t(f)=∆tNl→im∞NVarπ(cid:98)∆t N f(X(cid:98)k∆t) . (4.9) k=0 Note here that we rescale the asymptotic variance with ∆t, to guarantee a well–defined limit when ∆t 0. → Assumption4.3impliesthatthereexistsaconstantK >0,independentof∆tsuchthat (cid:13) (cid:13) (cid:13)(cid:34) (cid:35)−1(cid:13) (cid:13) I P(cid:98)∆t (cid:13) (cid:13) − (cid:13) <K, (4.10) (cid:13) ∆t (cid:13) (cid:13) (cid:13) L∞0 (π(cid:98)∆t) for∆tsufficientlysmall. Inparticular,wecanexpress(4.9)as σ(cid:98)∆2t(f)=2∆t(cid:28)(cid:0)f −π(cid:98)∆t(f)(cid:1),(cid:16)I−P(cid:98)∆t(cid:17)−1(cid:0)f −π(cid:98)∆t(f)(cid:1)(cid:29) −∆tVarπ(cid:98)∆t[f]. (4.11) π(cid:98)∆t It should be clear from (4.11) that there will be two contributions to the error between σ2 (f) and σ2(f): one (cid:98)∆t arisingfromtheorderofweakconvergenceofthenumericalmethod, andonefromthetimediscretenessofthe processX(cid:98)∆t. Indeed,evenwhenoneconsiderstheexactdiscretetimedynamicsdefinedby k X∆t =X(n∆t), n N, n ∈ the error between the corresponding asymptotic variance σ2 (f) and σ2(f) will be non-zero, despite the fact ∆t that both discrete and continuous time Markov processes have the same invariant distribution. To isolate the differentsourcesoferror,wepresentfirstProposition4.5whichquantifiestheeffectofthetime-discretenesson theasymptoticvariance. InTheorem4.6wethenquantifytheerrorbetweentheasymptoticvariancesσ2 (f)and ∆t σ(cid:98)∆2t(f)ofXn∆tandX(cid:98)n∆t,respectively. Proposition4.5. Forallφ C∞(Td), suchthatπ(φ) = 0thereexistsasmoothfunctionR suchthatfor∆t φ ∈ sufficientlysmall, (cid:18)I P (cid:19)−1 ∆t(cid:18)I P (cid:19)−1 ∆t2 (cid:18)I P (cid:19)−1 − ∆t φ(x)=( )−1φ(x)+ − ∆t ( )φ(x) − ∆t ( )2φ+∆t3R , φ ∆t −L 2 ∆t −L − 6 ∆t −L (4.12) whereR isbounded,independentof∆t. Inparticular,forf C∞(Td), φ ∈ ∆t2 σ2 (f)=σ2(f)+ ( )(f π(f)),f π(f) +o(∆t2). ∆t 6 (cid:104) −L − − (cid:105)π Proof. TheproofcanbefoundinSection8.2. 9 DefinetheoperatorM tobetheprojectorontofunctionswithmeanzerowithrespecttoπ ,i.e. ∆t (cid:98)∆t (cid:90) M φ(x)=φ(x) φ(y)π (y)dy. ∆t (cid:98)∆t − Td Thefollowingtheoremcharacterisesthedifferencebetweentheasymptoticvariancearisingfromtheexactdiscrete timedynamicsX∆tandthenumericalintegratorX(cid:98)∆t. n n Theorem 4.6. Suppose that, for some k N, k 1, there exist operators A ,...,A on C∞(Td), bounded 0 k ∈ ≥ uniformly with respect to ∆t, where A = Li+1 ,i = 0, ,k 1 and such that for all ψ C∞(Td) the i (i+1)! ··· − ∈ semigroupP(cid:98)∆tsatisfies(4.6). Supposethatthecorrespondinginvariantdistributionπ(cid:98)∆tsatisfies (cid:90) (cid:90) ψ(x)π∆t(x)dx= ψ(x)π(x)dx+∆trR , (cid:98) ψ Td Td wherer >kandR isasmoothremainderterm,uniformlyboundedwithrespectto∆t. Moreover,supposethat ψ P(cid:98)∆tsatisfies(4.10). Thenforallf,g C∞(Td)suchthatπ(f)=π(g)=0,wehavetheexpansion ∈ (cid:42) (cid:18) (cid:19)−1 (cid:43) (cid:42) (cid:32) (cid:33)−1 (cid:43) g, I−P∆t f = M g, I−P(cid:98)∆t M f +∆tkR (f,g)+o(∆tk), (4.13) ∆t ∆t 1 ∆t ∆t π π(cid:98)∆t where (cid:42)(cid:32)I P(cid:98)∆t(cid:33)−1 (cid:18) k+1 (cid:19)(cid:18)I P∆t(cid:19)−1 (cid:43) R (f,g)= − M L A − f,M g . (4.14) 1 ∆t k ∆t ∆t (k+1)! − ∆t π(cid:98)∆t Inparticular σ2 (f)=σ2 (f)+2∆tkR (f,f)+o(∆tk). (4.15) (cid:98)∆t ∆t 1 Ifmoreover π(A ψ)=0, (4.16) k holdsforforallψ C∞(Td),thenwecanwrite ∈ (cid:28) (cid:18) k+1 (cid:19) (cid:29) R (f,g)= ( )−1 L A ( )−1f,g +o(∆tk). (4.17) 1 k −L (k+1)! − −L π Proof. TheproofcanbefoundatSection8.2. Remark4.7. ItisinterestingtonotethatcontrarytothecaseoftheasymptoticbiasinTheorem4.1,theorderof errorforthediscreteasymptoticvarianceinTheorem4.6dependscruciallyontheorderoftheweakconvergence oftheunderlyingnumericalintegrator. Furthermore, weseethatiftheweakorderoftheintegratoristhantwo thentheleadingordererrortermbetweenσ2 (f)andtheasymptoticvarianceofthecontinuousprocessσ2(f) (cid:98)∆t equalstotheleadingordertermofdifferencebetweenσ2 (f)andσ2(f). ∆t Tocompletethisanalysisweshallconsidertheasymptoticvariancearisingfromaperturbeddiffusionprocess X(cid:101)thavinginfinitesimalgenerator (cid:101)∆tsuchthat,for∆tsufficientlysmall L (cid:101)∆tf = f +∆tk kf +∆tq−1Rf, f C∞(Td), (4.18) L L L ∈ whereq >k+1. Weshallalsoassumethat(L(cid:101)∆t)−1isboundedinL∞0 (π(cid:98)∆t)uniformlywithrespectto∆t. More specificallythereexistsK >0,independentof∆tsuchthat (cid:13) (cid:13) (cid:13)(cid:16) (cid:17)−1(cid:13) (cid:13) (cid:101)∆t (cid:13) <K, (4.19) (cid:13) −L (cid:13) L∞0 (π(cid:98)∆t) for∆tsufficientlysmall. Thefollowingresultcharacterisestheinfluenceofthisperturbationontheasymptotic varianceforsmall∆t.FornumericalapproximationsofX forwhichamodifiedSDE[49]isknown,thefollowing t result combinedwith Proposition 4.5provide a convenientmeans of obtainingan expression forthe asymptotic varianceσ2 ofthenumericalschemeintermsofσ2(f). (cid:101)∆t 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.