Generalized Ensemble and Tempering Simulations: A Unified View Walter Nadler1,∗ and Ulrich H. E. Hansmann1,2,† 1 Department of Physics, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931-1295, USA 7 2 John-von-Neumann Institute for Computing, Forschungszentrum Ju¨lich, D-52425 Ju¨lich, Germany 0 (Dated: February 6, 2008) 0 From the underlying Master equations we derive one-dimensional stochastic processes that de- 2 scribe generalized ensemble simulations as well as tempering (simulated and parallel) simulations. n The representations obtained are either in the form of a one-dimensional Fokker-Planck equation a or a hopping process on a one-dimensional chain. In particular, we discuss the conditions under J which these representations are valid approximate Markovian descriptions of the random walk in 2 order parameter or control parameter space. They allow a unified discussion of the stationary dis- 1 tribution on, as well as of the stationary flow across each space. We demonstrate that optimizing the flow is equivalent to minimizing the first passage time for crossing the space, and discuss the ] consequencesofourresultsforoptimizingsimulations. Finally,wepointoutthelimitationsofthese h representations underconditions of broken ergodicity. c e m I. INTRODUCTION tion of control parameter space. - t Increasing the flow through order parameter space in a t The effective simulation of complex thermal systems GE sampling as wellas throughcontrolparameterspace .s like proteins and glasses is a constant challenge in con- in PT was always an incentive. However, usually it was at temporary computational physics. Markov chain Monte discussedonly informally,andonly recently Trebstetal. m Carlo simulation techniques for these systems have un- [6, 7, 8, 9] have made an attempt to look at that prob- dergone remarkable advances in the last decades. Two lem systematically. Instead of concentrating on the sta- - d main classes that have evolved are generalized ensemble tionary distributions that arise from the sampling, they n and parallel tempering methods. concentratedonthestationaryflowacrossorderparame- o Inthegeneralizedensemble(GE)approach[1]thegoal terandcontrolparameterspace. Inordertooptimizethe c [ is to sample the state space of a physical system so that flowtheyderivedweightfunctionsandcontrolparameter particularly rare but important states, e.g. low energy discretization schemes. 3 or barrier states, are encountered frequently. A variety In this contribution we want to give that approach a v of weight functions have been tested, as well as methods morefundamentalbasis. Wewillfirstconcentrateonthe 8 for iteratively improving these functions [2]. underlyingMasterequationsdescribingGEandPTsim- 0 4 A persistent problem of such simulations is that re- ulations. From them we will derive in a systematic way 5 laxation is slow due to barriers and bottlenecks, and the one-dimensional stochastic equations that form the 0 for a long time it was not clear whether and how they basis for a flow analysis in order parameter and control 6 can be controlled by using particular weight functions parameterspace. Theseequationswillalsoallowustoin- 0 on the usual order parameter spaces. Paralleltempering vestigate connections between flow analysis and another / t (PT) - sometimes also called replica exchange method - concept describing the dynamics of stochastic processes, a m promised a way out of this dilemma [3, 4, 5]. Here sim- the first passage time (FPT). The one-dimensional rep- ulations are performed in parallelat different values of a resentations are valid approximations for the underlying d- control parameter, most often the temperature. At cer- simulationsonlyundercertainconditions. Ifthey arevi- n taintimesthecurrentconformationsofreplicasatneigh- olated, optimization schemes may still fail. For parallel o boringcontrolparametervaluesareexchangedaccording temperingwewillfindacriterionfromwhichthevalidity c to a generalized Metropolis rule. Thereby an individual can be determined. : v replica couldperforman additionalrandomwalk in con- We will focus in the next section on generalized en- i trolparameterspaceand-duetoshorterrelaxationtimes semble sampling, while paralleltempering is the focus of X in some control parameter regime - explore state space the third section. We will close with a discussion of the r a more evenly. effectsofbrokenergodicityonourresultsandanoutlook. However,theproblemofslowrelaxationarisesalsothis time in the form of a slow and possibly uneven random walk through control parameter space. At least part of II. GENERALIZED ENSEMBLE SAMPLING this problem is related to finding an efficient discretiza- MarkovchainMonteCarlosimulationsutilizeacertain move set in combination with an acceptance probability, ∗Electronicaddress: [email protected] most often of Metropolis form [10], to impose stochastic †Electronicaddress: [email protected] dynamics on a physical system. A move from state s to 2 s′ is accepted with the probability well,dependingonwhichparticularaspectofstatespace should be emphasized. We will assume in the following pM(s s′)=min[1,w(s′)/w(s)] . (1) thattheweightfunctionisbasedonlyontheenergyE(s) → of the state s, and we will use w(s) = w[E(s)] = w(E) The original choice for the weight function w(s) is the interchangeably. thermal or Boltzmann weight In order to get a deeper understanding of simulations w(s) exp[ βE(s)] , (2) basedonEq.(1)weshouldlookattheirdescriptionviaa ∝ − Masterequationinstate space. P(s,t)is the probability resulting in canonical sampling at inverse temperature to be in state s at time t, and its evolution in discrete β = 1/k T. However, other choices are possible as computer time is given by B P(s,t+1)= P(s′,t)W (s′ s)+P(s,t) 1 W (s s′) , (3) s s → − → s′6=s s′6=s X X where the sums are over all possible states. The transition probabilities W(s s′) in this equation are → ψ(s s′) W (s s′)= → p (s s′) , (4) s M → ψ(s s”) → s” → where ψ(s s′) is the characteristic function of thPe move set with ψ(s s′)=1 if the move from s to s′ is allowed → → and zero otherwise. Provided the move set is balanced and ergodic, the stationary distribution reached by this Markov chain is P (s) w(s) because of detailed balance. Note however that in addition to ψ(s s′)=ψ(s′ s), 0 for (s,s′) with ψ(s s∝′)=1 the property ψ(s s”)= ψ(s′ s”) must be fulfilled. Par→ticularlythe→latter → s” → s” → property, i.e. every state must connect to the same number of other states, is sometimes overlooked. P P Equation(3)isanexactdescriptionofthesimulationprocessanditisalsothebasisformorethoroughinvestigations in the mathematics of Metropolis simulations [11, 12, 13, 14]. However from a physicist’s point of view a reduced description in terms of slow order parameters is of more interest. Prominent among the order parameters chosen is the energyitself. Using adiabaticelimination of fastdegreesof freedom, see the Appendix A, anapproximateMaster equation in energy space can be formulated P(E,t+1)= P(E′,t)W (E′ E)+P(E,t) 1 W (E E′) . (5) E E → − → E′6=E E′6=E X X TheeffectivetransitionprobabilitiesW (E E′)canbederivedfromEqs.(3and(4),andaregiveninthatappendix, E → too. Note howeverthat, in contrastto Eq. (3), Eq. (5) is an approximation,valid only if all other degrees of freedom relaxate much faster than the energy. Nevertheless, even if relaxation orthogonal to the energy is slow, Eq. (5) can still be viewed as a Markovian approximation to the fully non-Markovian process. Due to the degeneracy of states with energy, the stationary distribution of Eq. (5) is now P (E) n(E)w(E) . (6) 0 ∝ Here, n(E) is the density of states and we have assumed that the weight function for the Metropolis algorithm is based on energies only, as mentioned above in the discussion of Eq. (1). Coarse graining time and energy leads to a form of the Master equation that is continuous in both variables ∂ P(E,t)= P(E′,t)R (E′ E) P(E,t) R (E E′) . (7) E E ∂t ZE′ → − ZE′ → where the transition probabilities have been replaced by readybeenintegrals. The continuousformofthe Master rates R (E E′). Note that for various systems state equationinenergyspaceisnowthestartingpointforour E spaceanden→ergyE couldhavebeencontinuousfromthe final approximation. If the transition rates R (E E′) E start, i.e. the sums in Eqs. (3) and (5) could have al- are strongly peaked around E′ E, a second orde→r par- ≈ 3 tial differential equation can be derived from Eq. (7). It was the important new step by Trebst et al. [6] to by various techniques, e.g. Kramers-Moyal expansion look systematically at the flow in energy space in simu- [15,16,17]. ThisFokker-Planckequation[16]forP(E,t) lations. Instead of monitoring the histogram H(E), cor- is given by responding to the stationary distribution, they added a label to the system and changed its value whenever it ∂ ∂ ∂ P(E,t)= D(E) F(E) P(E,t), (8) reached minimal and maximal values in the energy, i.e. ∂t ∂E ∂E − E and E . By counting just these labels at each (cid:20) (cid:21) min max energy E, the distributions of systems moving up and and we have written it already in a form that sepa- down in energy, denoted by n (E) and n (E), re- ratesstaticanddynamicproperties. D(E)istheenergy- up down spectively, can be monitored. The original histogram is dependent diffusion coefficient that describes the local recovered from H(E) = n (E)+n (E). However, mobility andF(E) is the drift term which in one dimen- up down in this way it is also possible to measure the fraction of sion can always be derived from a potential, F(E) = systems moving up, (d/dE)U(E). In particular the stationary distribution − of Eq. (8) is fully determined by this potential only, n (E) up f (E)= , (11) up P (E) exp[ U(E)] . (9) n (E)+n (E) 0 up down ∝ − It is important to emphasize again that the transi- and,correspondingly,thatofsystemsmovingdowninen- tion from the Master equation (7) to the Fokker-Planck ergy, f (E). Note that f (E)+f (E)=1. Both down up down equation (8) is possible only if the the transition rates distributions actually are the stationary distributions of RE(E E′), i.e. the underlying transition probabilities probability flow in the systems with boundary condi- WE(E→→ E′), are sufficiently local in the energy. Only tions fup(Emin)=1,fup(Emax)= 0 and fdown(Emin) = in that limit the Fokker-Planck equation is an effective 0,f (E )=1, respectively. down max description of the more general Eq. (7). Nevertheless, This flow in energy space can now be analyzed using eveniftherearenon-localcontributionstothetransition the above Fokker Planck equations. Equations. (8, 10) rates,Eq.(8)canbe viewedas the bestlocal approxima- areactuallycontinuityequationsfortheprobabilityflow, tion to Eq. (7). Eq. (8) can be written in a more compact form using ∂ ∂ P(E,t)= J(E,t) , (12) the stationary distribution P (E), 0 ∂t ∂E ∂ ∂ ∂ P(E,t)= D(E)P (E) P (E)−1 P(E,t). with J(E,t) the probability current. The current be- 0 0 ∂t ∂E ∂E tween E and E can now be determined as the (cid:20) (cid:21) min max (10) stationary solution of (12), In this form the fact that P (E) is the stationary dis- 0 tribution can be seen immediately from the vanishing of ∂ the rightmostderivative ontherhs if P(E,t) is replaced J = D(E)P0(E) P0(E)−1 PJ(E) const, (13) ∂E ≡ by P (E). This equationwillbe the basisforour further (cid:20) (cid:21) 0 analysis of distribution and flow in energy space. withP (E)beingthestationarydistributionfortheflow J The stationary distribution in energy space, P (E), 0 undertheaboveboundaryconditions. Notethatthesta- is actually the histogram H(E) of the energy distribu- tionary distribution P (E) discussed before is actually 0 tion that is observed in an actual simulation. It is still the solution of Eq. (13) for zero flow! Integrating that given by Eq. (6), i.e. by the Metropolis weight func- equation we obtain tion w(E) multiplied with the density of states n(E). By an appropriatechoice of the weightfunction any his- P (E) P (E ) J J min togram can be produced in the simulation. The usual = P (E) − P (E ) 0 0 min choice of Boltzmann weights leads to the canonical dis- tribution, H(E) n(E)exp( βE), while the choice J E dE′ 1 (14) w(E) ∝ 1/n(E) le∝ads to a flat −histogram H(E) = const ZEmin D(E′)P0(E′) [2, 18]. A flat histogram would be most appropriate to describe the properties of the system in question over Using any of the above boundary conditions the total a wide temperature range with equal accuracy,provided flow across energy space is then given by theMonteCarloerrorateachenergyisthesame. Various −1 methods have been discussed to actually obtain approx- −1 J = [D(E)P (E)] (15) 0 imations to n(E) by iteratively improving simulations | | D E [2,19]. However,allofthemarestillplaguedbytheprob- lem that equilibration in the system can be slow, in par- where we used the notation <.>= EmaxdE. Emin ticular when a wide energy range is considered [20, 21]. Inordertooptimizetheweightfunctionw(E)usedfor R Moreover,it turned out that – even if a flat histogramis thesimulationtoreachmaximalflowacrossenergyspace, reached – the error distribution is not flat at all [6, 22]. Trebstetal. maximizedEq.(15) under the constraintof 4 keepingthedistributionP (E)normalized,whichisdone holds. Both, f (E) as well as f (E) can be used for 0 up down by adding a Lagrange multiplier that purpose. However, some smoothening may have to beperformedtoobtainasmoothderivative,asdiscussed δ [D(E)P (E)]−1 −1+λ P (E) =0 (16) in [6]. Since P0,opt(E) = n(E)wopt(E), the optimized 0 0 δP (E) h i weight function is given by 0 (cid:20)D E (cid:21) Stationary flow is not the only concept that can be 1 w (E)= . (24) used to investigate the stochastic dynamics in a system. opt n(E) D(E) Anotheroftenusedconceptisthemeanfirstpassagetime (FPT) [23]. It is the average time a particle starting at p Usingthefactthatn(E)isobtainedfromtheactualsim- one end of a diffusion space needs to reach the other ulationbyn(E)=H(E)/w(E),wearriveattheiteration end for the first time. The total mean first passage time formula for crossing energy space from E to E in both min max directions, f′(E) w (E)=w(E) (25) opt τ =τ(E E )+τ(E E ) (17) sH(E) min max max min → → can be derived from Eqs (8,10) and is given by [24] Atthefixedpointofthisiterationf′(E)=H(E),leading to H(E)=1/ D(E) as required, see Eq. (22). τ = EmaxdE 1 E dE′P0(E′)+ In an actuapl simulation situation one iteration might D(E)P (E) not suffice. This is discussed in detail in [6, 7, 8]. In the ZEmin 0 ZEmin Emax 1 Emax followingwe willattempt to apply the same approachto dE dE′P (E′) tempering simulations. 0 D(E)P (E) ZEmin 0 ZE −1 = [D(E)P (E)] P (E) (18) 0 0 h i III. PARALLEL TEMPERING D E Note that P (E) does not have to be normalized here. 0 Minimization of τ with respect to P0(E) leads to Generalized ensemble sampling can be extended by adding movement in control parameter space. In addi- δ −1 tion to moves among different states of the system at a [D(E)P (E)] P (E) =0 (19) 0 0 δP0(E) h i particular value of the control parameter, moves along hD E i one or more control parameter directions are possible. Both variational equations, (16) and (19), lead to the The motivation behind such an extension is that relax- same solution ation in some control parameter regime may be faster, therebyfacilitatingrelaxationinthewholestate+control 1 P0,opt , (20) parameter space [3]. ∝ D(E) Usually,themotionalongcontrolparameterdirections p is not made continuous. Instead, a list of monotonically which is already derived in [6] from Eq. (16). This increasing or decreasing values β is chosen, thereby in- current-optimizedstationarydistributionleadstoasym- n troducing control parameter hopping. The most com- metric form of the Fokker Planck equation, monly used control parameter is temperature, although otherchoicesarepossible,too[25]. Simulationsatapar- ∂ ∂ ∂ P(E,t)= D(E) D(E) P(E,t). (21) ticular control parameter value are performed using the ∂t ∂E ∂E (cid:20) (cid:21) Metropolis criterion (1) with a weight function w(β,s), p p as before. Although Boltzmann weights are usually cho- In order to reach such a current optimized histogram, sen, leading to canonical simulations at each parameter H (E),inanactualsimulation,theweightfunctionhas opt value, in principle any weight function is possible. tobechosenaccordingly. Forthispurposeitisnecessary to obtainthe localdiffusioncoefficientfroma simulation Parallel tempering is the parallellized extension of a using some initial weight function w(E). Differentiating method called simulated tempering which we will sketch Eq. (14), D(E) is obtained from only briefly. In simulated tempering [26, 27], a single instance of the system is simulated only. After certain −1 times,anattemptismadetochangethecontrolparame- d P (E) D(E)= P0(E) J =[H(E)f′(E)]−1 (22) tervalue. Inordertoensurethatthestationarydistribu- dE P (E) (cid:20) 0 (cid:21) tion at each control parameter value is given by w(β,s), the Metropolis criterion, where the we used the fact that P (E) w(β′,s) f(E)= J (23) p [(β,s) (β′,s)]=min 1, , (26) M P (E) → w(β,s) 0 (cid:18) (cid:19) 5 isusedforthe acceptanceofacontrolparameterchange. in simulated tempering, simply drops out in the parallel In order to ensure appreciable exchange between differ- form. Thereby, the problem of determining the free en- ent control parameter values using Boltzmann weights ergyin orderto optimize the simulationvanishes. In the (2), the weight function has to be adjusted by a control case of Boltzmann weights (28) reduces to parameter dependent function g(β), p [(β,s) (β′,s′)]=min[1,exp(∆β∆E] (29) w(β,s)=exp[ βE(s)+g(β)] . (27) M → − This function g(β) actually determines the distribution with ∆β =β′ β and ∆E =E′ E. of the single system among the control parameter values − − We will concentrate on temperature hopping in the and – up to an additive constant – the optimal choice is following and choose the list of inverse temperatures thefreeenergyg(β)=βF(β)ofthesystemanalyzed[27]. β >β >...>β . Inaparallelimplementationsimula- However,theproblemofdeterminingg(β)hashampered 0 1 N tionsataparticularvalueβ areusuallyrunonapartic- this approach. n ular node of the parallel computer, conveniently labeled This problem is solved in parallel tempering. Here, n. Inordertosimplifythenotation,wewillthereforeab- copies of the system are simulated in parallel at each of breviate β by n whenever possible and also use ”node” the various control parameter values. At certain times n synonymously with ”control parameter value”. exchangesofreplicaswithneighboringcontrolparameter valuesareattempted. Inordertoensurethatthestation- It is sufficient to follow only a single replica through ary distribution at a control parameter value β is given state and control parameter space, since all replicas are by w(β,s), the generalized Metropolis criterion equivalent. Fortimes betweenreplicaexchanges,simula- tions are performedateachnode independently, and the w(β,s′)w(β′,s) time evolution of the distribution function P [(β ,s),t] p [(β,s) β′,s′)]=min 1, (28) n M → w(β,s)w(β′,s′) at a particular node n is described by the Master equa- (cid:18) (cid:19) tion (3) with the respective transition probabilities de- has to be used for the acceptance of such an attempt. It termined by the appropriate temperature β . For times n can be seen easily that any function g(β) in the weight t=mT, m=1,2,..., replica exchange is attempted. For function (27), in particular the one that was necessary this time step the Master equation in state and temper- to ensure equilibration among control parameter values ature space is P[(β ,s),t+1] = P [(β ,s′),t]W [(β ,s′) (β ,s)]+P[(β ,s′),t]W [(β ,s′) (β ,s)] + n n−1 s n−1 n n+1 s n+1 n { → → } s′ X P [(β ,s),t] 1 W [(β ,s) (β ,s′)] W [(β ,s) (β ,s′)] . (30) n s n n−1 s n n+1 { − → − → } s′ X The transition probabilities are given by 1 W [(β,s) (β′,s′)]= p [(β,s) (β′,s′)] for s=s′ . (31) s M → ΩN → 6 Here, Ω is the normalization by state space and reflects the fact that any conformations can be exchanged, which is different from the case of GE where the move set was restricting possible conformationchanges,see Eq.(4). N takes into account that just for one random neighboring pair of nodes a replica exchange is attempted at time t = mT. However,otherstrategiesarepossible,too,thatwouldleadtoadifferentnormalizationconstant. Duetotheexchange of replicas in parallel tempering the transition probabilities are symmetric, i.e. W [(β,s) (β′,s′)]=W [(β′,s′) (β,s)] . (32) s s → → Elimination of fast degrees of freedom orthogonalto the energy is possible in the same wayas it was discussedin the last section. This leaves us with P [(β ,E),t+1] = P [(β ,E′),t]W [(β ,E′) (β ,E)]+P [(β ,E′),t]W [(β ,E′) (β ,E)] + n n−1 E n−1 n n+1 E n+1 n { → → } E′ X P[(β ,E),t] 1 W [(β ,E) (β ,E′)] W [(β ,E) (β ,E′)] (33) n E n n−1 E n n+1 { − → − → } E′ X The symmetry (32) naturally leads to a similar symmetry for the transition rates between nodes in reduced, i.e. energy, space, W [(β,E) (β′,E′)]=W [(β′,E′) (β,E)] . (34) E E → → 6 In order to finally derive an effective Master equation for hopping in temperature space, we have to additionally assume fast relaxationin energyspace, i.e. on times scales t<T. This means that at any particularnode we assume to have reached the respective equilibrated distribution P (β,E). Using similar reasoning as in Appendix A for GE, 0 this last approximationleads to the final form of the Master equation in temperature space on a coarse grained time scale t t/TN, → P(β ,t+1) = P(β ,t)W (β β)+P(β ,t)W (β β)+ n n−1 β n−1 n+1 β n+1 → → P(β ,t)[1 W (β β ) W (β β )] (35) n β n−1 β n+1 − → − → In a way similar to the derivation of Eq. (A3), effective transition probabilities can be derived from the equilibrated distributions at a node, W (β β′)= dE dE′P (β,E)p [(E,β) (E′,β′)]P (β′,E′) . (36) β 0 M 0 → → Z Z We will discuss in Appendix B the properties of these P(β ,t)W (β β ) . (41) n β n n+1 → effective transition probabilities in particular situations. Finally, the symmetry of the transition probabilities Consequently, the stationary current J is determined from W (β β′)=W (β′ β) (37) β β → → J = P (β )W (β β ) holds analogously here, too. J n+1 β n+1 → n − Due to this last property, the stationary distribution PJ(βn)Wβ(βn βn+1) → for PT can be derived easily from Eq. (35) and is given = const , (42) simply by withP (β )beingthestationarydistributionforflowun- P (β)=const , (38) J n 0 der the above boundary conditions. Taking into account i.e. in an equilibrated simulation every single replica the symmetry properties of the transition probabilities, appears on each control parameter node with the same the stationary distribution for constant current between probability, 1/(N +1) with N +1 being the number of nodes β0 and βN is given by nodes. This result is an important simplification of the n−1 situation over the case of simulated annealing and over 1 P (β )=J−1 (43) GE, and is due to the construction of replica exchange. J n W (β β ) β i i+1 Flowbetweenthecontrolparameternodescannowbe Xi=0 → analyzed, too. Here, replicas reaching node 0 or N are with the current labelled and the respective distributions over the nodes can be monitored. Using the same notation as before, N−1 −1 1 n (i) being the number of replicas at node i that came J = (44) up W (β β ) from node 0, and ndown(i) being the number of replicas "Xi=0 β i → i+1 # at node i that came from node N, we can measure the Due to the simple form of the stationary distribution fraction of replicas moving up among nodes (38), the analytic forms for P (i) and J J nup(i) are considerably simpler than for GE. f (i)= , (39) up n (i)+n (i) Again, as before with GE, a concept different from up down stationaryflowcanalsobeanalyzed,the totalmeanfirst and a corresponding quantity for those moving down, passagetime to crossthe network ofnodes. For the gen- f (i). Both distributions are stationary distribu- down eral hopping process (35), this is given by [15, 17] tions of probability flow between temperature nodes, with boundary conditions f (0) = 1,f (N) = 0 and up up τ = τ(0 N)+τ(N 0) f (0) = 0,f (N) = 1, respectively. As before, → → down down N−1 i they can be analyzed by looking at the stationary solu- 1 = P (β )+ tionoftheunderlyingstochasticequation(35). Itcanbe P (β )W (β β ) 0 i 0 i β i i+1 written as Xi=0 → Xj=0 N N 1 P(βn,t+1)−P(βn,t)=J(βn,t)−J(βn−1,t) (40) P (β )W (β β ) P0(βi) 0 i β i i−1 whichisthediscreteformofthecontinuityequation(12). Xi=1 → Xj=i The discrete case current J(βn,t) is given by N−1 1 N = P (β ) (4.5) 0 i J(βn,t) = P(βn+1,t)Wβ(βn+1 →βn)− Xi=0 P0(βi)Wβ(βi →βi+1)! Xi=0 ! 7 This final result is practically a discrete version of Eq. (18). Taking into account that the stationary dis- tribution in PT is constant, we finally obtain a result X that is just the inverse of Eq. (44), N−1 1 τ = . (46) W (β β ) β i i+1 i=0 → X Here isanimportantdifference to GE.There the local diffusion coefficient was fixed by the move set and - as it turned out in actual simulations - mostly independent from the chosen weight function. The stationary distri- bution, however, was free to be chosen by varying the weight function. In the case of PT this is exactly the opposite. Here thestationarydistributionisfixedbyconstructionofthe E E E min max replicaexchangeprocess. However,byadjustingthecon- trol parameter intervals, the transition probabilities can be chosen relatively freely. Hence, we are interested in FIG. 1: Sketch of state space for generalized ensemble sam- obtainingtheoptimaldistributionoftransitionprobabil- pling (GE) in the case of broken ergodicity; X denotes any ities that maximizes the flow across the control param- degree of freedom orthogonal to the control parameter (en- eter space. Since we are mainly interested in local vari- ergy). ations of the optimized transition probabilities, i.e. in deviations from an average value (that will be actually determined afterwards), we keep the average transition probability constant via a Lagrangianmultiplier, i.e. −1 The iteration scheme used in Refs. 7, 8, 9 for as- N−1 δ 1 + signing temperatures to nodes appeared to exhibit δWβ(j j+1) Wβ(βi βi+1)! fast convergence to the optimal behavior of 49. We → i=0 → X can rephrase it here without having to recur to some N−1 intermediate local diffusivity (and we stick to our use of λ W (β β ) =0 (.47) β i → i+1 ! inverse temperatures as example control parameters): i=0 X (i) a particular set of control parameters β > β > 0 1 An equivalent equation results for optimizing τ. ... > betaN 1 > βN gives rise to a flow distribution − It is easily seen that optimizing the current as well as fup(0) = 1 fup(1) ... fup(N 1) fup(N) = 0; ≥ ≥ ≥ − ≥ themeanfirstpassagetimesimplygivesaconstanttran- (ii) these latter values give rise to stepwise de- sition probability between neighboring nodes the whole fined function g[f], with g[f(i)] = βi, in par- range of control parameter values, ticular g[1] = β0 and g[1] = βN, and lin- ear interpolation in between these values; Wopt =const (48) (iii) the new control parameter values are determined from this function by β′ = i asoptimalsolution. Consequently,from(43)wecancon- g[i/N],i = 1,...,N 1, keeping β and β fixed. 0 N cludethattheoptimalflowdistributionamongthenodes This procedure is ac−tually illustrated quite nicely in is linear in the node number. Fig. 2 of Ref. 7, and we can refrain from repeating it here. P (β )=n/N . (49) J n Therefore, the temperature spacing is optimal if such a It is important to note at this point that the above flowdistributiontogether with constanttransitionprob- results, in particular the equivalence of constant transi- abilities can be obtained in an actual simulation. The tionprobabilitiesandaflow distributionthatis linearin linear dependence of the flow distribution on the node the node number, depend on the validity of the under- number,Eq.(49),wasobtainedforPTalreadyin[7,8,9] lying one-dimensional representation of the simulation bymappingPTontotheFokker-PlanckequationforGE, process. It turns out that, while the flow distribution Eq (8), and assuming a particular temperature depen- Eq. 49 is readily attainable in actual simulations, this denceforthelocaldiffusioncoefficient. Herehowever,we is usually not accompanied by acceptance probabilities see that it follows directly from the hopping-description that are constant over the whole system [7, 8, 9]. In the of PT.In addition, we obtainits equivalence to constant followingfinalsectionwewilldiscusspossiblereasonsfor transition probabilities, Eq. (48). such a discrepancy. 8 [28]. Although in such a situation Eq. (8) may not be able to capture the full dynamics correctly, it neverthe- less is still able to identify local bottlenecks. Optimizing the flow according to the methods discussed can handle these local bottlenecks in the transition between neigh- boring energy values. However, slow relaxation orthogonal to the energy leads to a more complicated situation. Now it may not be possible anymore to reach all values of the additional degreesoffreedombymoveslocalintheenergy. Instead, detoursviaother-usuallyhighenergy-areasofthestate space have to be performed. This leads to the comb-like structure of the accessiblestate space sketchedin Fig. 1. It describes the situation that free energy basins at con- stant energy are disconnected. Actually, it is this feature of state space partitioning 0 node N that lead us to the requirement of large flow between low and high energy areas of the state space in the first place. Only for a large flow the ”teeth” of the comb-like FIG.2: Sketchofbi-andmulti-furcationsinthecasebroken structure in Fig. 1 can be sampled adequately. Never- ergodicity for parallel tempering (PT); for certain nodes the theless, it also leads to the situation that the effective system partitions into several disjoint free energy wells. one-dimensional Fokker-Planck equation (8) is only the bestMarkovianone-dimensionalapproximationofanun- derlying effectively higher-dimensional process. IV. SUMMARY, DISCUSSION, AND OUTLOOK This situation is even clearer in the case of PT. If the relaxation at a particular control parameter value is The goal of most recent advances in Markov chain fasterthanthetimescaleofhoppingincontrolparameter MonteCarlosamplingistoanalyzeandincreasetheflow space, then the requirements for the analysis performed through state space. To do so, heuristic equations have in the previous section are fulfilled. However, if that is been used to describe the flow in reduced state space not the case the state space at such a node partitions along a slow order parameter, e.g. the energy, in gen- into disjoint free energy basins that do not communi- eralized ensemble sampling (GE) and among nodes per- cate. Viewed over the whole control parameter range, forming simulations at various control parameter values we are dealing with a hierarchicalnetworkof free energy in parallel tempering (PT) [6, 7, 8]. In this contribution basins as sketched in Fig 2. Such a situation has been we have derived such one-dimensional stochastic equa- aptlytermedbroken ergodicity[29,30]andwasdiscussed tions forGE andPT samplingfromthe underlyingMas- in the field of glassy dynamics several years ago. ter equations. Using these stochastic equations, weight Inprinciple,thetopologyofthetree-likecontrol+state functions for GE and strategies for finding optimal con- spacedependsonthetime scaleofthecontrolparameter trolparametervaluesforPTcanbedevisedthatoptimize hopping. Ifrelaxationispossibleatallnodes,thennobi- the flow throughorderparameter andcontrolparameter furcationsoccurandthesystemisjustaone-dimensional space, respectively. We have also demonstrated that op- hoppingchainasitwasanalyzedinthe previoussection. timization of flow is equivalent to minimizing the first However, this is true only in the limit of infinite time passage time to cross the system. between replica exchange steps, T . Practically the →∞ All considerations in the previous sections were based topology of the branching will be the same over a wide on the assumptions that the Fokker-Planck (8) or hop- range of time scales and only the position of the branch- ping(35)equationsareavalidMarkovianrepresentation ing nodes may vary. of the underlying more complex dynamics. That, how- Foratrulyone-dimensionalsystemtheoptimizedtran- ever, is true only if the approximations discussed there sition probabilities are constant and – equivalently – apply. the optimal flow stationary distribution is f(n) n, ∝ Inthe caseofGEtheseapproximationsarethatrelax- Eq.(49). Underconditionsofbrokenergodicity,thesitu- ationsinthedegreesoffreedomorthogonaltotheenergy ation is more complicated. Now, several transition rates are fast, together with the locality of transitions. Since betweenneighboringnodesmayhavetobetakenintoac- the ultimate goal of deriving Eq (8) is the optimization count, describing exchange along the different branches of flow throughstate space, a violationof the latter con- depicted in Fig. 2. Moreover, observed acceptance rates dition is not detrimental. Non-locality in the energy of are weighted averages of these various transition proba- themovesetusuallyleadstofasterrelaxationsincestate bilities. Therefore, the equivalence of constant observed spaceisconnectedmoredensely. This isforexampleone acceptance ratios to a flow distribution linear in the reason for the success of the Swendsen-Wang algorithm node number may no longer hold. This discrepancy has 9 been observed already in Refs. 7, 8, 9. While satisfy- i.e. determining the flow matrix between all individual ing Eq. (49) was possible by an appropriate choice of nodes, would have to be performed. node temperatures, constant acceptance rates were not We believe that an additional advantage of PT is that obtained concomitantly. Such a result can be used as suchbranchingsituationscanbeanalyzeddirectly,with- a clear signature of broken ergodicity occurring in PT. out resorting to an actual system. In a way, it will be In contrast to GE, where such a clear criterion is not possible to simulate the simulations to investigate possi- available yet, PT has a particular advantage here. ble flow behaviors. By analyzing the Master equations Naturally, the question arises how to optimize flow in modelling the hierarchical broken ergodicity networks, PTunderconditionsofbrokenergodicity. Controlviathe conclusionsaboutthe behaviorof actualsimulations can choice of node temperatures is somewhatlimited in such be drawn [31] Thereby, the present results open the way a situation, since changes may affect different transition to investigate the effects of broken ergodicity in GE and probabilities differently. Nevertheless,eveninsuch a sit- PT in a more systematic way. uation the choice of a flow distribution f(n) n still as- ∝ suresthattheflowofreplicasalongthe mainbranch,i.e. between the lowest and the highest temperature node, Acknowledgments is still optimal. In contrast, the effect of making appar- ent acceptance ratios constant – if possible at all – is It is a pleasure to thank P. Grassberger and S. Trebst unclear and depends on a knowledge of the particular for a critical reading of the manuscript and the referees structure of ergodicity breaking. In the final analysis, for posing stimulating questions that helped to improve however, optimizing flow under such conditions means the presentation. This research was supported by NSF- that, in addition to the flow between the lowest and the grant No. CHE-0313618. highestnode,alsoflowamongsidebrancheshastobe be considered. In order to assure that flow among all side branchesisoptimized,too,amoredetailedflowanalysis, APPENDIX A: ADIABATIC ELIMINATION OF FAST DEGREES OF FREEDOM As a first step to eliminate the fast degrees of freedom in Eq. (3), we have to single out the slow degree of freedom by appropriatelabelling. Therefore,we replacethe statelabels bya moredetailedone thatconsists ofthe energyE, i.e. theslowdegreeoffreedom,andanadditionallabels thatdesignatesallmicrostateswithenergyE,s (E,s ). E E ≡ Equation (3) is then replaced by P[(E,sE),t+1] = P [(E′,sE′),t]Ws[(E′,sE′) (E,sE)]+ → EX′6=EXsE′ P [(E,sE),t] 1 Ws[(E,sE) (E′,sE′)] , (A1) − → EX′6=EXsE′ where the sums are over all possible energies E and states s for each energy. If relaxation among the microstates E for a particular energy is fast, each of these states will assume the same probability, and we can approximate 1 P [(E,s ),t] P(E,t) , (A2) E ≈ N(E) withN(E)beingthenumberofstates,i.e. =N(E),introducedtoensurecorrectnormalization, P(E,t)=1. sE E This approximation is the crucial step, since it allows us to sum over all microstates s for each energy in Eq. (A1). E P P After carrying out such a summation, we arrive at the Master equation for the energy, Eq. (5), with the transition probabilities W (E E′) given by E → 1 WE(E E′)= Ws[(E,sE) (E′,sE′)] . (A3) → N(E) → XsE XsE′ Note the asymmetry with respect to E and E′ that is due to the non-constant density of states. A more rigorous and systematic treatment, which would also allow the derivation of corrections, would involve projection operator techniques as they are used, e. g.. in Chap. 6.4 of Ref. 17. However, since the above approach suffices for our purposes, we refrain from embarking on such a more detailed elaboration. APPENDIX B: TRANSITION PROBABILITIES strongly on the – usually unknown – distributions or- FOR PARALLEL TEMPERING In GE, a general analysis of the effective transition probabilities W(E E′) is not easy since they depend → 10 thogonal to the energy, combined with a possibly sparse sparsemovesetdonotarise. Theonlyrequirementisthe moveset. IncontrasttoGE,PTallowsmoreinsightinto assumptionof equilibration ateach temperature. In this theeffectivetransitionprobabilitiesthatgovernthetem- limit,wecancalculatetheeffectivetransitionprobability peraturehopping. Since transitionsarepossiblebetween by any energy values, the complications due to a particular W(β β′)= dE dE′P (β,E)p [(E,β) (E′,β′)]P (β′,E′) , (B1) 0 M 0 → → Z Z with P (β,E) being the equilibrated distribution at β and p given by Eq. (28). 0 M There have been several approaches to evaluate that evant. Moreover, Eq. (B1) is anyhow only valid in the formula. Predescu et al. [32] and Kofke [33] emphasize limitoffastrelaxationatanode. Soitisfromtheoutset the importance of taking into account the asymmetry of onlyanapproximationtotheactuallyobservedtransition an actual distribution, having a low energy cutoff and rate. Since, in order to optimize the flow, large values of an exponential tail at high energies. Nevertheless, Kone the transitionprobabilitiesaresoughtfor,a quantitative and Kofke [34] later use an approximation based on a analysismakessenseonlyforthecaseswheretheoverlap Gaussian approximation, i.e. symmetric without cutoff is appreciable, i.e. for small temperature differences. and non-exponential tail, together with the assumption We therefore add here our approach to evaluate (B1) of constant specific heat over the entire range. All au- using the first order approximations to these distribu- thors limit themselves to unimodal distributions in their tions, i.e. Gaussians, analysis and do not question the peak is quadratic in energy. E E(β) 2 icaHlovwaeluveesr,othfethdeistcroibnutrtoiolnpsacrhaamnegteedr,raim.e.aticaatllfiyrasttcarnitd- P0(β,E)∝exp"−(cid:2) 2−σ2(β) (cid:3) # (B2) second order phase transitions. While at second order phasetransitionsthefunctionalformofthepeakchanges with E(β) the average energy and σ2(β) = E E(β) 2 − fromquadratictoquartic,atfirstorderphasetransitions theenergyfluctuationsatβ. However,weavoidtheunre- (cid:2) (cid:3) the energy distributions become bimodal. With respect alisticandverylimitingassumptionofaconstantspecific to the distribution tails, on the other hand, since the heat [34]. goal is anyhow to optimize the transition probabilities, Assumingβ <β′,usingastepfunctionapproximation one should try to avoid control parameter intervals so of error functions that result from the inner integrals, large that the explicit structure of the tails become rel- and performing a symmetric evaluation we obtain 1 1 ∆E+∆β(σ2+σ′2) ∆E+∆β(σ2+σ′2) W(β β′) exp ∆β∆E+ ∆β2(σ+σ′) 2+erf +erf + → ≈ 4 2 √2σ √2σ′ (cid:20) (cid:21)(cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) 1 ∆E ∆E erfc +erfc , (B3) 4 √2σ √2σ′ (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) where we have used the abbreviations ∆β = β β′ and to obtain a relationship between the derivative of the ∆E = E(β) E(β′). We can now employ the −fact that average energy and the energy fluctuations − the specific heat is given by d 2 d d E(β)= E E(β) . (B6) c= E(T)= β2 E(β) , (B4) − dβ − dT − dβ (cid:2) (cid:3) Using as well as by c=β2 E E(β) 2 =β2σ2(β) (B5) E′(β)≡ ddβE(β)=−σ2(β) , (B7) − (cid:2) (cid:3)