ebook img

Importance Sampling of Rare Events in Chaotic Systems PDF

2.5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Importance Sampling of Rare Events in Chaotic Systems

Importance Sampling of Rare Events in Chaotic Systems Jorge C. Leit˜ao,1,2 Jo˜ao M. Viana Parente Lopes,3,4 and Eduardo G. Altmann1,5 1Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany 2DTU Compute, Technical University of Denmark, Kgs. Lyngby, Denmark 3Department of Physics and Center of Physics, University of Minho, P-4710-057, Braga, Portugal 4Physics Engineering Department, Engineering Faculty of the University of Porto, 4200-465 Porto, Portugal 5School of Mathematics and Statistics, University of Sydney, 2006 NSW, Australia Finding and sampling rare trajectories in dynamical systems is a difficult computational task underlyingnumerousproblemsandapplications. InthispaperweshowhowtoconstructMetropolis- HastingsMonteCarlomethodsthatcanefficientlysampleraretrajectoriesinthe(extremelyrough) phasespaceofchaoticsystems. Asexamplesofourgeneralframeworkwecomputethedistribution of finite-time Lyapunov exponents (in different chaotic maps) and the distribution of escape times (intransient-chaosproblems). Ourmethodssampleexponentiallyrarestatesinpolynomialnumber 7 of samples (in both low- and high-dimensional systems). An open-source software that implements 1 our algorithms and reproduces our results can be found in Ref. [1]. 0 2 n CONTENTS I. INTRODUCTION a J I. Introduction 1 3 Extremeeventsplayacrucialroleinoursociety. Land- 2 slides, floods, meteorite collisions, solar flares, earth- II. Review of problems 2 quakesarealleventsthatarerarebutoftenleadtocatas- ] A. Variability of trajectories’ chaoticity 3 trophic consequences to our well being [2]. Science often D B. Transient chaos 4 studiesextremeeventsbyrecreatingtheprocessthatgen- C C. Summary: numerical problems in the study eratesthemsufficientlymanytimes. Whileinsomecases n. of rare events in chaotic systems 5 the process can be reproduced experimentally, in many i others (e.g., astronomical and weather events) the best l III. Review of methods 6 n we can do is to simulate physical models. A. Importance Sampling 6 [ B. Metropolis-Hastings algorithm 7 Physical models often contain non-linearities in the 1 equations of motion that amplify fluctuations and give C. Sampling distribution 7 v origintoextremeevents. Afamouseffectofnon-linearity D. Characterisation of the efficiency 8 5 is chaos, the extremely sensitive dependence on initial E. Summary: challenges for the application of 6 conditions. Chaos has a fascinating history dating back 2 Monte Carlo methods to chaotic systems 8 totheseminalworkofPoincar´ein1890[3]andistodaya 6 well established field of research with applications in Bi- 0 IV. Proposal distribution 9 1. A. Aim of the proposal distribution 9 ology, Geology, Economy, Chemistry, and Physics [4, 5]. Chaotic dynamics often hinders our ability to study the 0 B. Propose correlated trajectories 9 evolution of the system analytically, e.g. it forbids de- 7 C. Guarantee local proposals 11 1 D. Summary: how to propose 13 scribing trajectories in a closed formula. In this situa- : tion, again, often the best we can do is to numerically v simulate the model in a computer, a paradigm popular- i V. Numerical Tests 14 X ized by Lorenz since the 1960s [4, 6]. Extreme events A. Finite-time Lyapunov exponent 14 r B. Transient chaos 16 are then studied statistically, over an ensemble of initial a conditions. VI. Conclusions 17 In this paper we introduce a framework for perform- A. Summary of results 17 ing numerical simulations in chaotic systems in such B. Comparison to previous results 18 a way that rare trajectories are generated more likely C. Discussion 18 than if they would be generated by chance. This is an importance-sampling [7, 8] (or rare-event-simulation [9]) VII. Appendices 19 strategy that builds on methods that have proven suc- A. Appendix 1: Dynamical Systems 19 cessful in the characterization of rare configurations in B. Appendix 2: Efficiency of the uniform various problems. The distinguishing feature of our proposal 20 framework is that it considers general classes of observ- C. Appendix 3: Simplified proposals 20 ables [10] in deterministic chaotic [5, 11] systems, being D. Appendix 4: Algorithmic description 22 therefore able to find and sample initial conditions lead- ingtoextremeeventsindifferentproblems. Weapplyour References 23 framework to two traditional problems in the numerical 2 exploration of chaotic systems: step is the choice of the proposal distribution of the MH algorithm,theconditionalprobabilityof“trying”astate • trajectories with high or low finite-time Lyapunov x(cid:48) in the phase-space of the system, given the current exponents; statex. ThequestionwehavetoanswerinorderforMH to work is: what proposal distribution guarantees that • long-living trajectories in open systems. an observable of the trajectory starting at x(cid:48), e.g. its Lyapunov exponent, is similar to the same observable of The results shown in this paper are general, but simula- the trajectory starting at x? Our main contribution is tionsaredoneinsimpledynamicalsystems(time-discrete a methodology to answer this question for a broad class maps with up to 16 dimensions). Our goal is to illus- of chaotic systems and observables. More specifically, trate the generic computational challenges of sampling we show how incorporating properties of trajectories of rare events in chaotic systems, in line with the tradition chaotic systems in the proposal is a necessary condition that simple systems often possess the basic mechanisms to obtain an efficient MH algorithm. This methodology responsible for extreme events. allows to construct efficient MH algorithms to sample Theimportanceofraretrajectoriesinchaoticsystems, rare events in different problems and classes of chaotic including methods designed to find and sample them, systems. We expect the ideas and formalism presented havebeenstudiedthroughdifferentperspectives[10–20]. heretofindapplicationsinotherproblemsofchaoticsys- For example, the method stagger and dagger, used to tems and in the study of extreme events more generally. find long-living trajectories of transiently chaotic sys- Therefore, we expect our results to be useful both to tems, has been an important tool to characterize chaotic those studying extreme events in non-linear systems and saddles [11, 12]. Lyapunov weighted dynamics is a pop- to those studying numerical techniques. ulation Monte Carlo method that was successfully ap- The paper is organized as follows: Sec. II reviews tra- plied to find trajectories with atypically low or high ditionalnumericalproblemsinchaoticsystemsandshows chaoticity [16, 21]. Similar genealogical particle analy- howtheycanbeformulatedasaproblemofsamplingrare sis was applied to compute rare events in simple cli- trajectories; Sec. III introduces the MH algorithm as a mate models [20]. Another powerful and widely used method to perform importance-sampling simulations in method within trajectory sampling is transition path chaoticsystems,andshowsthatnaiveapproachesdonot sampling [14, 15], that has been used to sample rare tra- leadtoanefficientMH;Sec. IVshowshowtoincorporate jectories(e.g. chemicalreactions),typicallyinfluencedby general features of chaotic systems, such as exponential thermal noise [15]. These are typically trajectories that divergence or self-similarity of some of its properties, in transit from one stable configuration to another stable the proposal distribution in order to achieve an efficient configuration [15, 22]. These different methods achieve MH;Sec.Vpresentsnumericaltestsofthegeneralframe- their goal through different, often ingenious, solutions. work that confirm the applicability of Monte Carlo algo- Here we aim to construct methods that are applicable rithmstosamplerareeventsindifferentclassesofchaotic to different classes of problems and to quantitatively un- systems;Sec. VIsummarizesourresultsanddiscussesits derstand the impact of different parameters and choices implications. on the efficiency of the algorithm. We then show that some of the existing methods can be derived from our construction under suitable approximations. OurframeworkreliesonMetropolis-Hastings(MH)im- II. REVIEW OF PROBLEMS portance sampling [7, 8], a well established numerical technique that has been used in statistical physics to We consider dynamical systems whose states x in a studyrareeventssincethe1950s. MHproducesarandom phase space Ω, x ∈ Ω ⊂ RD, evolve in time from an walk x → x(cid:48) in the phase-space of the system that gen- initial condition x=x according to 0 erates initial conditions leading to extreme events (rare states) more often than they would be found by chance, x =F(x )=Ft+1(x ), (1) t+1 t 0 consequentlyreducingthecomputationalcostassociated withtheirrareness. TheflexibilityofMHisconfirmedby whereF(x)∈ΩandFt isF composedttimes,Ft(x)≡ its success in numerous fields in Physics, Chemistry, Fi- F(F(...F(x)...)). Such discrete-time systems can be ob- nance, among many others [7, 8]. While transition path tainedfromcontinuous-timesystemsthroughaPoincare sampling[15],Lyapunovweighteddynamics[16],andge- surface of section, a stroboscopic projection, or by the nealogicalmethods[20]alreadyuseimportancesampling time discretization of the numerical integration of differ- indeterministicchaoticsystems,threefundamentalques- ential equations. We are also interested in the dynamics tionsremainslargelyopen: 1. canMHbeusedtosystem- in the tangent space, which quantifies the divergence be- aticallysampleraretrajectoriesof(deterministic)chaotic tweentwoinitialconditions[5]. Specifically,thedistance systems? If yes, 2. how and 3. at what (computational) of a state displaced from x by h, x(cid:48) = x+h, to the cost? original state x evolves in time according to To answer these questions, we develop a systematic approach to apply MH in chaotic systems. The crucial x(cid:48) −x =Ft(x(cid:48))−Ft(x). (2) t t 3 Expanding Ft(x(cid:48)) around Ft(x) allows x(cid:48) − x to be t t written as 1(cid:88)n (cid:88)n ∂2Ft(x) x(cid:48)−x =J (x)·h+ h h +O(|h|3) (3) t t t 2 ∂x ∂x i j i j i=1j=1 where J (x)≡dFt(x)/dx is the Jacobian matrix of Ft, t and ∂2Ft(x)/(∂x ∂x ) is the (i,j) entry of the Hessian i j matrix of F. The first term of Eq. 3 can be expanded usingthederivativeofthecompositionandbewrittenas (cid:32) 0 (cid:33) (cid:89) D(x,h,t)≡J (x)·h= J(x ) ·h=h (4) t i t i=t−1 where J ≡ J and h evolves in the tangent space ac- 1 t cording to h =h ;h =J(x )·h . (5) 0 i+1 i i For small |h|, the growth of x(cid:48) −x is characterized by t t the eigenvalues and eigenvectors of Jt [23]. The largest finite-time Lyapunov exponent (FTLE) of a point x can be defined[24] as log(µ ) λ (x)= 1 , (6) t t where µ is the real part of the largest eigenvalue of Jt. 1 Thus, when at least one direction is unstable (µ > 1), 1 x(cid:48) −x increases exponentially with time, and at most t t by FIG.1. TwomaincharacteristicsoftheFTLE.Upperpanel: x(cid:48)t−xt =δ0eλt(x)t . (7) theintricatedependencyofλto=4(x)(z-axis)withthestatex (2D,xandyaxis). Lowerpanel: thedistributionofFTLEfor When the system is one dimensional, the “Jacobian ma- different finite times t shows exponential decaying tails with trix”isasinglenumber,theproductinEq.4isaproduct λto, and decay with to. The system used was the Standard Map(Eq.65)withK =6,overthefullphase-space,Γ=Ω= ofnumbers,andtheonly“eigenvalue”istheresultofthis [0,1]2. P(λ ) was computed from 105 uniformly distributed product. Thus, in this case Eq. 6 can be written as to initial conditions on Γ. t−1 λ (x)= 1(cid:88)log|dF(xi)| . (8) t t dx used to study turbulent flows [28], Hamiltonian dynam- i=0 ics[29],chimerastates[30],characterizedynamicaltrap- We now describe two computational problems in ping [25, 31–34], among others [35, 36]. The distribution chaotic dynamical systems. ofFTLEisrelatedtothegeneralizeddimensions[37]and often follows a large deviation principle, where t is the o extensive parameter [18, 37]. In strongly chaotic sys- A. Variability of trajectories’ chaoticity tems,thedistributionofFTLEisGaussian[37],whereas for intermittent chaos and other weakly chaotic systems For a chaotic system, λ ≡ λ > 0 in Eq. 6. The L t→∞ the distribution is typically non-Gaussian [38]. variation of the (maximum) finite-time Lyapunov expo- Figure 1 shows the phase-space dependency and dis- nents λ (FTLE) across different initial conditions char- t tribution of the FTLE in one chaotic system. It con- acterizes the variation of chaoticity of the system, yield- tains characteristic features observed in strongly chaotic ing a distribution of the FTLE, P(λ ), or, equivalently, to systems: λt is a quantity that depends sensitively on the distribution of E ≡t λ : o to the state x, and its distribution P(λt) decays exponen- (cid:90) tially to zero on both sides. In the limit t → ∞, o P(E)= δ(E−toλto(x))U(x)dx . (9) P(λto)→δ(λto −λL). ThetailsofthedistributionP(E)playasignificantrole whereU(x)isachosenprobabilitydistribution(e.g. uni- in the characterization of chaotic systems, as, for exam- form in the phase-space). The FTLE and its distribu- ple,thehighermomentsofthedistributionarerelatedto tion was introduced in the 80s [25–27] and has been higher qs in the generalized dimensions D of the attrac- q 4 tor[37]. Furthermore,theregionsofthephase-spacewith sampling [16–18]. Such techniques have been success- small (large) finite-time Lyapunov exponent are associ- fully applied to find [16] and sample [17, 18] states with atedwithslow(fast)decayofcorrelations[27], andtheir extremal λs in different chaotic systems. Refs. [16, 18] characterization has been used to get insight on whether use a population Monte Carlo canonical ensemble where the system is ergodic or not [29]. Moreover, trajecto- λ plays the roleofthe energy E tofind orsamplestates t ries characterized by a low or high finite-time Lyapunov with high or low FTLE. The method computes stochas- exponent can play a significant role in the dynamics of tictrajectoriesthat,fromthenumericaltestsperformed, interfaces in chaotic flows [28] and others [16]. are indistinguishable from (deterministic) trajectories of A typical analysis of the FTLE is to measure how a the system. Ref. [17] proposes a flat-histogram simula- quantity, W(x), depends on the FTLE λ [27–29]. This tion to find high or low chaotic states by developing an to requires estimating an integral of the form observable to quantify the chaoticity of the state. (cid:90) W(λ )≡ W(x)δ(λ −λ (x))U(x)dx (10) to to to B. Transient chaos Γ where W(x) is the pre-selected quantity, Γ is a pre- The best known examples of chaotic systems have a selected sampling region (often Γ = Ω), and U(x) is fractal attractor in the phase space [5], the Lorenz at- the weight attributed to x (often uniform, U(x) = tractor being the most prominent example [6]. However, 1/|Γ|). Let us look for two examples. First, consider chaoticdynamicscanappearalsowhenthefractalinvari- the problem of estimating the distribution P(λ ), e.g. to ant set in the phase space is not purely attracting, e.g. Refs. [16, 27, 29, 38]. The traditional technique is to it may have stable and unstable directions (a saddle). sample M states x according to U(x) = 1/|Γ| and esti- Trajectories close to this set perform chaotic motion for mateP(λ )usingtheestimatorM /M,whereM isthe to λ λ an arbitrarily long (yet finite) time. This phenomenon number of samples with λ (x) ∈ [λ ,λ +∆λ ]. For- to to to to appears in a variety of physical systems and is known as mally, this corresponds to W(λ ) with W(x) = 1. The to transient chaos [41, 42]. second example is retrieved from Ref. [27]. There, in or- Numericalinvestigationsoftransiently-chaoticsystems dertoevaluatethecontributionofthealgebraicregionof are computationally difficult because most trajectories thephase-spacetothepowerspectra,theauthorsdecom- quicklyescapethevicinityofthechaoticsaddle(onwhich poseditintwotermscorrespondingtothepowerspectra the chaotic dynamics is properly defined) [12, 13, 43]. oftrajectorieswithlowFTLEandtrajectorieswithhigh More precisely, the escape time of a state x in a pre- FTLE. This required estimating the power spectra from selected region x ∈ Γ ⊂ Ω is defined as the first passage a set of trajectories conditioned to an interval of λs on time of its trajectory to a pre-selected exit set Λ⊂Ω: the tails of the distribution. Associating the power spec- trum (S(f) in the ref.) with W(x) = Wf(x) where f is te(x)≡min{t:Ft(x)∈Λ} . (12) the frequency, the power spectra represented in Fig. 4 of Almost all trajectories start at Γ and eventually leave Ref. [27] corresponds to integrals (for different frequen- when x ∈ Λ, but they do so at different times t . cies f) given by t e Suchvariabilityisquantifiedbythedistributionofescape 1 (cid:90) time: the probability that a random initial condition x E[W |λ ]= W (x)δ(λ −λ (x))U(x)dx , f to P(λ ) f to to leaves at a given time te, to Γ (11) (cid:90) which contains integrals of the form of Eq. 10. P(te)= δte,te(x)U(x)dx , (13) Numerically estimating the integral in Eq. 10 is chal- Γ lenging for two reasons: first, P(λ ) decays with t where δ is the Kronecker delta and U(x) is the to o te,te(x) (lower panel of Fig. 1): the distribution of FTLE of- probability density assigned to each state in Γ, which (cid:82) ten follows a large deviation principle, P(λ (x)) ∝ is often constant, U(x) = 1/ dx ≡ 1/|Γ|. Figure 2 to Γ exp(t s(λ )), where s is intensive in respect to t and shows the typical features of t (x): it strongly depends o to o e is often concave [37]. Consequently, the traditional on the initial state x and its distribution P(t ) decays e methodologyofsamplingstatesuniformlytofindorsam- exponentially to zero (i.e., it has a constant escape rate ple states with increasing t requires an exponentially κ). o high number of initial conditions. Second, the depen- There are two main numerical techniques to study dencyofλ (x)onxhasmultiplelocalminimaandmax- transiently chaotic systems. The first is to find one long to ima (upper panel of Fig. 1). Such rough (fractal [39]) livingstatexwitht (x)(cid:29)1andtocomputeanaverage e landscapes are known to challenge numerical techniques over states of this trajectory [11, 12]. For large t (x), e (e.g., simulations get trapped in local minima or max- the trajectory between times [t (x)/2−t ,t (x)/2+t ] e s e s ima) [40]. wheret (cid:28)t (x)/2isclosetoatrajectoryonthechaotic s e The problem of finding and sampling states with high saddle. When the saddle is ergodic, an average over this or low-λ has been addressed in the literature with nu- long trajectory corresponds to an average over the natu- merical techniques that go beyond traditional uniform ral invariant density and therefore an average over these 5 Let us enumerate 3 examples of computational problems that can be interpreted as numerical estimations of an integral of the form of W(t ) in Eq. 14. e a. Compute the escape time distribution: Numeri- cally, P(t ) is often computed by drawing states from e U(x) (e.g. uniform density), and counting the relative number of states that exited at escape time t [11]. This e corresponds to W(x) = 1 in which case Eq. 14 reduces to Eq. 13. b. Compute generalized dimensions of the chaotic saddle: The generalized dimensions are an important property of the chaotic saddle and its calculation is of- ten performed by box counting [5, 11, 44]. Essentially, thephase-spaceisdividedinequal,non-overlapping,and space-filling boxes i = 1,...,B(ε) of linear size ε (inter- vals in 1 dimension, squares in 2, etc.) and the gen- 100 eralized dimension D of exponent q is proportional to q 10-1 Coupled Henon N=4 log(cid:80)Bi (ε)µqi, where µi is the fraction of points of the Standard Map K=6 saddle that belong to the box i [44]. Numerically, µ is i 10-2 estimatedbyfirstobtainingasetofpointsxj inthesad- ) dle and then counting how many are in the particular (te 10-3 box i. Such an estimate can be written as the expecta- P tionofanindicatorfunctionthattellswhetherastatein 10-4 the saddle, Fte/2(x) for t (x) (cid:29) 0, is inside the box i, e 10-5 W(x) = δFte/2(x)∈i. This expectation can be written as a conditional expectation of W(x) over states that leave 10-6 at time t , 0 20 40 60 80 100 e t (cid:90) e 1 E[W|t ]≡ W(x)δ U(x)dx . (15) e P(t ) te,te(x) e Γ FIG. 2. Main characteristics of the escape time of an open chaotic system. Upper panel: the dependency of te(x) with Computing this essentially requires computing integrals the state x, shows an intricate landscape with multiple local of the form of W(t ) in Eq. 14. e and global maxima. The map is the 4-dimensional coupled c. Compute the distribution of FTLE on the chaotic H´enonmap(definedinAppendixVIIA),andthetwodimen- saddle: The distribution of FTLE P(λ) is another im- sions are a surface of section on the plane x = 0,y = 0. 2 2 portant property of the chaotic saddle [11, 44]. The Lower panel: the exponential decay of the distribution of es- FTLE λ = λ for a fixed t of an open system is com- capetimeof(1)theCoupledH´enonmapwithD=4and(2) t puted for trajectories on the chaotic saddle. Like in the theStandardMap,bothdefinedinAppendixVIIA.P(t =t) e was computed by uniformly drawing 106 states x ∈ Γ, and previous problem, each of these trajectories can be ob- measure the relative number of times that t (x)=t . tained by generating a state x according to U(x) (e.g. e e uniformly distributed in Γ) that has a large escape time, t (x) (cid:29) 1, and compute λ (Fte/2(x)) using Eq. 6 for e t states characterizes invariant properties of the system. a fixed t. The distribution of FTLE is then computed The second technique, which we focus in this work, is to fromanensembleofthesehigh-escape-timestatesbycon- computeaveragesoveranensembleU(x)ofinitialcondi- structingdifferentbinsofthehistogramI =[λ,λ+∆λ], λ tions x that leave the system at time te(x)=te [11, 42]. and numerically compute the relative number of states Forsmallte observationsdependontheparticularinitial in each bin. Formally, this equates to compute the ex- density U(x) (a point of interest in itself [42]), while for pected number of states with a given escape time t e large t they characterize invariant properties of the sys- whose FTLE is in a bin, and thus corresponds to com- tem (like in the previous technique, the states Ft/2(x) puting a conditional expectation of the form of Eq. 15 with t → ∞ are independent samples of the natural in- with W(x)=W (x)=δ . variant density). λ λ(Fte/2(x))∈Iλt A typical analysis within sampling transiently chaotic systems is to measure how a quantity, W(x), changes C. Summary: numerical problems in the study of with increasing escape time te. Numerically, this can be rare events in chaotic systems written as (cid:90) Toprovideanunifiedtreatmentofthenumericalprob- W(t )≡ W(x)δ U(x)dx . (14) e te,te(x) lemsintransientchaosandincomputingFTLEofclosed Γ 6 systemsweuseacommonnotationwheneverpossible. It Thenumericalchallengesofthesetwonumericalprob- indicatesalsohowthemethodscanbegeneralizedtodif- lems are common: the states x are exponentially diffi- ferent problems. Firstly, there is a quantity that we call cult to find with increasing N (or E), the function E(x) ”observable” and denote by E(x): contains multiple local and global minima embedded in fractal-like structures, and a potential high dimensional- • E(x)=t λ (x) in FTLE of closed systems o to ity D of the phase-space (see Figs. 1 and 2). The goal of • E(x)=t (x) in strongly chaotic open systems this paper is to develop a systematic approach to tackle e the two numerical problems and these three challenges. Secondly, we use δ(E,E(cid:48)) to denote both the Kronecker delta and the Dirac delta (δ(E −E(cid:48))), for discrete and continuous case respectively. This is needed because, III. REVIEW OF METHODS even though in practice we will always consider E to be adiscretefunctionduetobinningofthehistograms, for- A. Importance Sampling mally the observable E is either discrete (t for maps) e or continuous (λ or t for flows). Thirdly, the observ- t e As described in the previous section, the traditional ableE isafunctionofthephase-spaceofthesystembut methodologytocomputeanintegraloftheformofEq.16 typically depends also on an external quantity N that istodrawmsamples{x }distributedaccordingtoU(x) i parameterizes how high E can be: from the relevant region Γ and approximate the integral • t in FTLE of closed systems W(E) by the estimator o m • (a pre-selected) maximum escape time, t , in 1 (cid:88) max W(E)≡ δ(E,E(x))W(x ) (17) i strongly chaotic open systems. m i=1 This parameter is important to us because, the higher where δ(E,E(x)) = 1 when E(x ) ∈ [E,E +∆E[ and i it is, the rarer a state with the maximum or minimum zero otherwise (when E is discrete, ∆E = 1). The rela- possible E is. In this notation, the two general problems tivedistanceoftheestimatorW(E)toE[W(E)]isquan- presented in the previous sections can be summarized as (cid:104) (cid:105) tified by the ratio (cid:15)(E) ≡ σ W(E) /E[W(E)] where follows: σ(cid:104)W(E)(cid:105)2 ≡E(cid:104)W(E)2(cid:105)−E(cid:104)W(E)(cid:105)2. For the estima- • theanalysisisconditionedtoaprojectionofxinto a one-dimensional observable E(x). tor in Eq. 17, this is given by 1 • the probability distribution of E, P(E), decays ex- (cid:15)(E)∝ . (18) (cid:112) ponentially with increasing E. mG(E) whereG(E)isthedensityofstates: thenumberofstates • thefocusoftheanalysisisonrarestatesinrespect per bin with an observable E.[47] Therefore, the number to P(E): states with E in one of the tails of P(E). ofsamplesm requiredtoachieveagivenprecision(cid:15) for ∗ ∗ • the rareness increases with N. a given E is m (E ) ∝ 1/G(E ). The critical problem ∗ ∗ ∗ ∗ in sampling rare states is that, because G(E) decays ex- In transient chaos problems we are interested in states ponentially with E, m (E) increases exponentially with ∗ with increasing N = t that correspond to the tail of max E. P(E)=P(t ),P(t ). IncomputationsoftheFTLEin e max Importance sampling techniques aim to improve this closed systems we are interested in states with increas- scaling by drawing samples from a distribution π(x) (cid:54)= ing N = to and on the tails of P(E). These problems U(x) on the phase-space (e.g. non-uniformly in the share two distinct computational problems: find rare phase-space) [8]. Specifically, consider m independent states [11–13, 16] and sample rare states [17, 18, 45, 46], samples{x }drawnfromπ(x),andthefunctionπ(x)to i which can be formalized as follows: depend only on E, π(x) = π(E(x)) = π(E). Because the samples are not drawn from U(x), the estimator in • Find rare states: minimize or maximize E(x) over Eq. 17 would be biased. Importance sampling uses an a constraining region Γ, x ∈ Γ ⊂ Ω, for increasing unbiased estimator for W(E) given by [8] N. m • Samplerarestates: computetheintegralofafunc- W(E)≡ 1 (cid:88)δ(E,E(x ))W(x )U(xi) , (19) tion W(x) conditioned to a particular value of E m i i π(xi) i=1 and for increasing N, over an ensemble of states which reduces to Eq. 17 when π(x)=U(x). The advan- x distributed according to U(x) in a constraining tage of importance sampling is that the relative error of region of the phase-space x∈Γ⊂Ω: the estimator in Eq. 19 is given by (cid:90) 1 W(E)= W(x)δ(E,E(x))U(x)dx . (16) (cid:15)(E)∝ . (20) (cid:112) Γ mG(E)π(E) 7 This is because, when sampling from π(x) = π(E(x)), Metropolis-Hastings algorithm is not the only way to the expected number of samples with a given E, m(E), achieve this. Another popular and alternative method is equal to to sample from π(x) is population Monte Carlo, which instead of a random walk, uses multiple stochastic tra- m(E)=mG(E)π(E) . (21) jectories that are cloned and destroyed. See e.g. ref. [18] foranapplicationto theproblemofthe FTLEdescribed Equation20impliesthatthefunctionπ(E)canbechosen tofavourstatesxwithobservableE onthetailsofG(E) above. Algorithmically, the MH algorithm is implemented as and therefore improve the precision of the estimator on follows: choosearegionΓandarandominitialcondition these tails. x∈Γ. Evolve the random walk in time according to: The standard deviations in Eqs. 18,20 were obtained assuming that the m samples were independent. In tra- 1. Propose a state x(cid:48) drawn from g(x(cid:48)|x); ditional methodologies such as uniform sampling, this is the case. However, in the algorithms discussed below, it 2. Compute a(x(cid:48)|x) replacing x and x(cid:48) in Eq. 23; isnot. Therefore,itisnecessarytomodifyEq.20forthe 3. Generate a random number r in [0,1]. If r < case where the samples {x } are drawn from π(x) and i a(x(cid:48)|x), make x(cid:48) to be the new x; are also correlated. This modification is given by (cid:115) 4. Store x and go to 1. 1+2T(E) (cid:15)(E)∝ , (22) mG(E)π(E) The set of sub-steps 1-4 brings the random walk from its current state x to the next state and it is called a where T(E) is the autocorrelation time [8], which in- Markov step. After a transient number of steps where creases with the correlation of the samples. the algorithm converges to the asymptotic distribution, Efficientimportance-sampling[7–9]techniqueshaveto the stored states x are (correlated) samples drawn from address the following three steps: π(x), and can be directly used in Eq. 19. 1. choose a suitable π(x) 2. have a method to generate samples from π(x) C. Sampling distribution 3. minimize the autocorrelation time T d. Canonical ensemble A sampling distribution A defining point in our method is our choice for the π(x) often used is the canonical distribution [7] Metropolis-Hasting algorithm to address point 2, differ- π(x)=π(E(x))∝e−βE(x) . (24) ently from Refs. [16, 20] which address similar problems through a different choice for point 2 (cloning trajecto- In the context of rare states, the canonical ensemble is ries). Below we first discuss point 2, then point 1, and useful because the number of sampled states m(E) in finally point 3. Eq. 21 becomes m(E)∝mG(E)e−βE ∝me−βE+S(E) . (25) B. Metropolis-Hastings algorithm In particular, the maximum of m is at E∗ solution of The Metropolis-Hastings (MH) algorithm asymptot- β = dS/dE(E∗). Therefore, β tunes which value of the ically generates states x according to π(x) using a a observableE issampledthemost. Forexample,theLya- Markovian, ergodic and detailed balance random walk punov weighted dynamics in Ref. [18] uses this distribu- in the sampling region Γ [8]. This random walk is ini- tion (Eq. 15 of the ref. with α replaced by β). tialized from a random state x∈Γ and evolves to a new e. Flat-histogramensemble Anotherdistributionof- statex(cid:48) ∈ΓwithatransitionprobabilityP(x(cid:48)|x)chosen ten used in the literature of Metropolis-Hastings [48–55] such that asymptotically the states x are drawn accord- istheflat-histogram(ormulticanonical),givenby[53,54] ing to π(x). In Metropolis-Hastings, P(x(cid:48)|x) is written 1 as π(x)=π(E(x))∝ , E ∈[E ,E ], (26) G(E(x)) min max P(x(cid:48)|x)=g(x(cid:48)|x)a(x(cid:48)|x), for a given choice of E ,E that defines the region min max where g(x(cid:48)|x) is the (proposal) distribution used to gen- of interest on the observable E. This is known as flat- erate new states and a(x(cid:48)|x) is the (acceptance) distri- histogram because, replacing Eq. 26 in Eq. 21 leads to a bution used to select them. The random walk fulfills constant average number of samples on each E, detailed balance because the acceptance probability is chosen as [8] m(E)=const. (27) (cid:18) g(x|x(cid:48))π(x(cid:48))(cid:19) Consequently, the dependence of the variance in Eq. 22 a(x(cid:48)|x)=min 1, . (23) g(x(cid:48)|x) π(x) is only due to the autocorrelation T(E), which implies 8 that the computational cost to draw a state on the tail ThereareknowndeviationsofthescalinginEq.28lead- of G(E) is no longer driven by the exponential decrease ing to a higher exponent, N2+z with z > 0 [46, 58–60] of G(E), but by the computational cost to draw uncor- and there are two common situations where this hap- related samples from π(x). pens: (1) the acceptance rate decreases drastically with The main limitation of the flat-histogram is that N (hypothesis (a) above is violated); (2) autocorrelation it requires knowing G(E) in advance, which is very drastically increases with N (hypothesis (b) above is vi- often unknown. The most well known, that we olated) [46, 60]. Nevertheless, these do not qualitatively use here, is the Wang-Landau algorithm [54], that change the argument: Monte Carlo with flat-histogram modifies the Metropolis-Hastings algorithm to a non- has the potential of generating rare states polynomially markovian chain that asymptotically converges to a flat- withincreasingN,whileuniformsamplinggeneratesrare histogramMetropolis-Hastings. TheWang-Landaualgo- states exponentially with increasing N. rithmstartswithanapproximationofG(E),G (E)= WL 1/|E − E |, and, on step 4 of the MH (see algo- max min rithm in Sec. IIIB), it multiplies GWL(E(x)) (x is the E. Summary: challenges for the application of current state of the random walk) by a constant f > 1. Monte Carlo methods to chaotic systems After a given number of steps, f is reduced by a factor 2 and this procedure is repeated until a final f (cid:39)1 is min To achieve a MH algorithm that scales polynomially reached [54]. The value f and how f is reduced dic- min with N, the autocorrelation time needs to be low. A key tates how close G (E) will be from G(E) [54, 56, 57]. WL ingredient for an efficient algorithm is a good proposal distribution g(x(cid:48)|x) [63]. The ideal proposal distribu- tion of a Metropolis-Hastings algorithm draws x(cid:48) inde- D. Characterisation of the efficiency pendentlyofxaccordingtoπ(x(cid:48)),g(x(cid:48)|x)=π(x(cid:48)). This is because i) the acceptance in Eq. 23 is always one and Therelativeerror(cid:15)(E)dependsonthevalueofN and ii) each step of the random walk generates an indepen- on the particular value of E/N. To avoid discussing the dent sample x, which implies that the error in Eq. 22 is dependency on E/N, the efficiency of the flat-histogram minimal. Thedifficultyofsamplingrareeventsinchaotic isoftenquantifiedintermsoftheaverageround-trip[58– systems is that a useful π(x) to sample them is a diffi- 60], which is an upper bound for the number of Markov cult function to sample from. For concreteness, consider steps m (samples) required to obtain an uncorrelated the problem of sampling high escape times in the open sample from π(x) [60, 61]. The round-trip, τ, is the tent map defined in Appendix VIIA3 – Eq. 66, whose average number of steps required for the algorithm to t (x) is represented in Fig. 12 – and consider the canon- e go from a state x with E(x) = E to a state x with min ical sampling distribution, π(x)∝exp(−βt (x)). In this e E(x) = E and return back. [62] Ref. [45] shows how max example, π(x)∝exp(−β1)between[1/a,1−1/a]andso the autocorrelation time of a canonical ensemble is re- forth (exp(−βt )) in subsequent intervals. The number e latedtotheautocorrelationtimeofaflat-histogram,and of intervals increases as 2te. Therefore, sampling from therefore we will use here the round-trip time of a flat- π(x) would require enumerating every interval, sample histogram simulation to quantify the efficiency for both one at random according to a correct distribution that distributions. The uniform sampling has a round-trip depends on β, and then sample a uniform point within that increases exponentially with N: on average it takes that interval. While this can be done in simple maps 1/G(E ) samples to get one sample with E , and max max such as the open tent map, this is unfeasible in a general 1/G(E ) increases exponentially with N. max system where the t (x) dependency is unknown. e Importance sampling Monte Carlo is widely used in One way to approach the problem could be to con- statistical physics because the computational cost often sider g(x(cid:48)|x) to be the uniform distribution over Γ. One scales polynomially with N [8, 60], which is a dramatic could imagine that changing the sampling distribution improvement over uniform sampling. Under the hypoth- π(x) could decrease the variance of the estimator, since esis that (a) ∆E ≡ E(x(cid:48)) − E(x) ≈ 1 (cid:28) N and (b) this gives more preference to rarer states. However, this the correlation between the different E(x) of the ran- isnotthecase: changingthesamplingdistributionalone dom walk decay fast enough, it can be shown that the does not decrease the scaling of the variance of the esti- roundtrip τ scales as [46, 60] mator. This is because changing the sampling distribu- τ(N)∼N2 . (28) tion (e.g. using a canonical ensemble) leads to an expo- nentialincreaseoftheautocorrelationtimeT(E)withE, For example, consider the problem of sampling states of see Appendix VIIB, making it as efficient as traditional an open chaotic system with different escape times t uniform sampling. e from t = 1 up to t = tmax. In this case, E = 1 and In summary, Metropolis-Hastings is an excellent can- e e e min E = N = tmax. Thus, under the hypothesis (a) and didate to approach the numerical challenges found in max e (b) above, the round-trip is expected to scale as the study of rare events in chaotic systems. Firstly, be- cause it is grounded in strong mathematical results such τ(tmax)∼(tmax)2 . (29) as importance sampling theorem and asymptotic con- e e 9 vergence of Markov processes. Secondly, because it is the next step is to approximate Eq. 31 by a simpler con- formulated with very little assumptions about the sys- dition. Let us first notice that π(x) only depends on x tem, the observable of interest or the dynamics of the through E ≡ E(x), π(x) = π(E ). Therefore, at the x x system, which gives enough freedom to adapt it to the veryleast,theproposalg(x(cid:48)|x)shouldguaranteethatx(cid:48) specific aim (sampling or finding), observable, and sys- is generated from x in such a way that π(Ex(cid:48)) is neither tem. Thirdly, because there seems to be no theoretical too close (high acceptance) nor too far (low acceptance) reason for the sampling to be exponential; Metropolis- from π(E ). Quantitatively, this can be written as x Hastingsisusedtosamplerarestatesinpolynomialtime (cid:20) (cid:21) in other problems of statistical physics. Finally, because E π(Ex(cid:48))|x =a (32) the numerical problems found in chaotic systems can π(E ) x be re-written as problems where Metropolis-Hastings is suitable for. On the other hand, the optimal proposal where0<a<1isaconstant. Whentheproposalisable of Metropolis-Hastings is unfeasible in chaotic systems, toachieveasmallvariationofE,π(Ex(cid:48))canbeexpanded and without any extra information about the system, in Taylor series around Ex(cid:48) =Ex, which allows to write Metropolis-Hastings is as efficient as the traditional uni- form sampling. π(Ex(cid:48)) =1+ dlogπ(Ex)(Ex(cid:48) −Ex) . (33) π(E ) dE x [65] Inserting Eq. 33 in Eq. 32, an average constant ac- IV. PROPOSAL DISTRIBUTION ceptance is thus achieved when The problem we address in this section is: how to in- E[Ex(cid:48) −Ex|x]= a−1 . (34) corporate general properties of chaotic systems into the dlogπ(Ex)/dE proposal distribution in such a way that the Metropolis- This equation, the main result of this section, is a condi- Hastingsalgorithmbecomesefficient? Wefirstsetanaim tion that an efficient Metropolis-Hastings imposes to the for the proposal distribution and we then show how this proposaldistributionintermsoftheaveragedifferencein aim can be achieved in the different problems involving the observable E. This condition is non-trivial because deterministic chaotic system. it depends on the particular π, E, F, and x. For the sampling distributions discussed in the previous section, we obtain: A. Aim of the proposal distribution f. Canonical ensemble: when π(x) = e−βEx, the condition in Eq. 34 is given by The goal of the proposal distribution g(x(cid:48)|x) we con- structherewillbetoboundtheacceptancerateinEq.23 1−a away from 0 and 1. Since a(x(cid:48)|x) in Eq. 23 depends on E[Ex(cid:48) −Ex|x]= β . (35) x(cid:48), it is essential to look at its expectation over the pro- posal, That is, the higher the β, the closer the proposed Ex(cid:48) has to be from E . x (cid:90) g. Flat-histogram When π(x) ∝ 1/G(E ), the con- E[a(x(cid:48)|x)|x]≡ a(x(cid:48)|x)g(x(cid:48)|x)dx(cid:48) , (30) x dition in Eq. 34 is given by Γ 1−a Othuart[6g4o]al is to construct a proposal distribution such E[Ex(cid:48) −Ex|x]= dldoEgG(Ex) . (36) E[a(x(cid:48)|x)|x]=a(cid:63) . (31) When Ex is close to the maximum of G(E), the deriva- tive of logG approaches 0 and Ex(cid:48) can be arbitrary dis- Themotivationtosetthisasthestartingpoint(andcor- tant from Ex. As Ex deviates from the maximum of nerstone) of our method is that it avoids the two typical G(E), smaller changes are necessary to achieve a con- origins of high correlation times T. When the accep- stant acceptance. Note that equation 34 is valid for the tance is low, T increases because the method remains Metropolis-Hastings algorithm in general and should be stuck in the same state x for long times. High accep- of interest also in other contexts (e.g., arbitrarily large tance typically indicates that x(cid:48) is too close to x (often number of spins can be flipped close to the maximum of E(x(cid:48))=E(x)), which implies that the simulation moves the density of states). too slowly (in E and in Ω). The goal here is not that the acceptance is exactly a , but rather that it remains (cid:63) boundedfromtheextremesandthatitdoesnotstrongly B. Propose correlated trajectories depend on N. In general it is non-trivial to construct a proposal dis- The proposal distribution requires correlating the tra- tribution that guarantees a constant acceptance. Thus, jectorystartingatx(cid:48) withatrajectorystartingatxsuch 10 states that are close within ∆. y Correlation by Correlation by The average correlation between two states whose one closeness shift is drawn according to a proposal distribution is here de- fined by (cid:90) t (x)≡E[t (x,x(cid:48))|x]= g(x(cid:48)|x)t (x,x(cid:48))dx(cid:48) . (38) (cid:63) (cid:63) (cid:63) Γ Notice that this quantity does not depend on the partic- ular sampling distribution or algorithm; it is a function of the proposal distribution and the state x. The goal of the next sub-sections is to construct proposal distri- x butionsthatguaranteeagivenaveragecorrelationt(cid:63)(x). We will show, for example, that we can enforce an av- erage correlation t (x) if we use a normal distribution FIG. 3. Two trajectories of length t = 6 starting at x (cid:63) and x(cid:48) can be correlated for five steps (to(x,x(cid:48))=5) by two centered around x with a specific standard deviation as (cid:63) different mechanisms: (left panel) they start close from each our proposal distribution. otherandareindistinguishablewithin∆uptotimet ;(right (cid:63) panel) they start shifted (by t =1 here) from each other shift and are thus indistinguishable for t −t =t =5 steps. 1. Shift proposal o shift (cid:63) Oneproposalthatguaranteesthattrajectoriesarecor- thatEq.34holds. Fulfillingthisrequirementrequiresthe relatedbyt istheshiftproposal,originallyintroducedin abilitytocontrolE[Ex(cid:48) −Ex|x],whichrequiresaproce- Ref. [14]in(cid:63)the contextof sampling pathsof chemical re- dure to correlate the state x(cid:48) with x. The aim of this actions, and has also been used in Ref [22]. It consists in section is to introduce a quantification of the correlation proposing a state x(cid:48) that is a forward or backward itera- of two trajectories of finite-time to that can be related tionofx,x(cid:48) =Ftshift(x),wheret isafreeparameter. with E[Ex(cid:48) −Ex|x], and present two different proposal The relation between t = t sh(ifxt) and t = t (x) is shift shift (cid:63) (cid:63) distributions that propose a state x(cid:48) on which this cor- that a shift of ±t guarantees that t −t elements shift o shift relation is controlled. of the original trajectory are preserved. Therefore, this The observables E introduced in section IIC are all proposalguaranteesthatt elementsarepreservedwhen (cid:63) dependentnotonlyofx,butalsoofthefulltrajectoryof |t | = t −t , see right panel of Fig. 3. Because de- shift o (cid:63) lengthto startingx.[66]Onenaturalwaytoquantifythe tailed balance has to be enforceable, the proposal must similarityoftwotrajectoriesisthelengththetwotrajec- contain backward and forward shifts. A proposal that tories remain within a distance ∆ much smaller than a automaticallyfulfilsdetailedbalanceisoneonwhichthe characteristic length of Γ. Formally, this can be quanti- backward and forward shifts are equally likely: fied by t (x,x(cid:48)) = max{t ≤ t : |Ft(x)−Ft(x(cid:48))| ≤ ∆}. (cid:63) o Under this definition, 0 ≤ t(cid:63)(x,x(cid:48)) ≤ to. When x(cid:48) = x, g(x(cid:48)|x)= 1δ(cid:0)x(cid:48)−Ftshift(x)(cid:1)+ 1δ(cid:0)x(cid:48)−F−tshift(x)(cid:1), t(cid:63)(x,x(cid:48))=tobecausethetrajectoriesarethesame;when 2 2 x(cid:48) is far from x, t (x,x(cid:48))=0. However, there is another (39) (cid:63) possibility for two trajectories starting at x(cid:48) and x to be with tshift = tshift(x) = to − t(cid:63)(x). Given the target similar: when the two trajectories are similar apart from average correlation t(cid:63)(x), this proposal can be imple- ashiftintime,seerightpaneloffigure3. Onewaytoin- mented as follows: generate a random number r ∈[0,1]; clude both cases in our measure of similarity is to define if r < 0.5, make x(cid:48) = Fto−t(cid:63)(x)(x), else, make x(cid:48) = t (x,x(cid:48)) as Ft(cid:63)(x)−to(x). (cid:63) This proposal unfortunately has some disadvantages: t (x,x(cid:48))≡max{t≤t −t :|Ft(x(cid:48))−Ftshift+t(x)|≤∆}. i) a priori there is no guarantee that Ftshift(x) ∈ Γ. It (cid:63) o shift (37) is applicable when Γ = Ω, which e.g. is not the case For t = 0, this recovers the case of two trajectories in open systems; ii) it requires the map to be invertible; shift starting close to each other; t (cid:54)= 0, it includes situa- iii) the proposal can only propose states that are for- shift tions where a trajectory starts at x(cid:48) close to Ftshift(x). ward or backward iterations of x. Consequently, for the This definition is motivated by the concept of symbolic random walk to be ergodic in the phase-space, the map sequences. [5] The similarity of the trajectory starting itself must be ergodic. iv) this proposal diffuses without from x(cid:48) with the one starting from x can be quantified drift on a trajectory passing through x, by shifting the by the number of symbols that both trajectories share, starting point forward or backward. Thus, it will always which corresponds to the t (x,x(cid:48)) in Eq. 37. The defini- sample fewer states than a time average of a trajectory (cid:63) tion in Eq. 37 avoids the necessity of the existence of a starting at x. On the other hand, the main advantage of phase-space partition, but, for the purposes of the argu- this proposal is that it performs non-local jumps in the ment below, the two trajectories share a sequence of t phase-space. That is, it allows to jump from a region of (cid:63)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.