Table Of Content

Konrad-Zuse-Zentrum Takustraße7 D-14195Berlin-Dahlem fu¨r Informationstechnik Berlin Germany CHRISTOF SCHU¨TTE, ADAM NIELSEN AND MARCUS WEBER Markov State Models and Molecular Alchemy ZIB-Report14-05(March2014,revisedOctober2014) Herausgegeben vom Konrad-Zuse-Zentrum fu¨r Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Telefon: 030-84185-0 Telefax: 030-84185-125 e-mail: [email protected] URL: http://www.zib.de ZIB-Report (Print) ISSN 1438-0064 ZIB-Report (Internet) ISSN 2192-7782 Markov State Models and Molecular Alchemy Christof Schu¨ttea,b∗∗, Adam Nielsenb∗∗†, Marcus Weberb∗∗∗‡ aInstitute for Mathematics, Freie Universität Berlin, Germany bZuse Institute Berlin (ZIB), Germany Abstract Inrecentyears,MarkovStateModels(MSMs)haveattractedaconsid- erableamountofattentionwithregardtomodellingconformationchanges and associated function of biomolecular systems. They have been used successfully, e.g., for peptides including time-resolved spectroscopic experiments, protein function and protein folding, DNA and RNA, and ligand-receptor interaction in drug design and more complicated multivalent scenarios. In this article, a novel reweighting scheme is introduced that allows to construct an MSM for certain molecular system out of an MSM for a similar system. This permits studying how molecular proper- ties on long timescales differ between similar molecular systems without performingfullmoleculardynamicssimulationsforeachsystemundercon- sideration. The performance of the reweighting scheme is illustrated for simpletestcases,includingonewherethemainwellsoftherespectiveen- ergy landscapes are located differently and an alchemical transformation of butane to pentane where the dimension of the state space is changed. 1 Introduction Applicationsinmodernbiochemistry,molecularmedicine,orpharmacydemand fornumericalsimulationsoflargebiomolecularsystemsinatomicrepresentation aimed at a detailed understanding of relations between molecular structures, dynamicalbehaviorandfunction. Theprocessesconstitutingmolecularfunction typically happen on timescales many orders of magnitude, say 10-15 orders, longerthanthetypicaltimestepsofthesimulation. Despiteallprogressinhigh performance, computing direct numerical simulation on such timescales often still is infeasible. Hence, there is an increasing need for coarse graining schemes to accurately capture the long-term kinetics of a molecular system. The last decadehasseenamanifoldofapproachestocoarsegrainingmoleculardynamics. One of these approach is Markov State Modelling in which the kinetics of a ∗∗ Email: [email protected] †∗∗ Email: [email protected] ‡∗∗∗ Email: [email protected] 1 molecular system is described by a Markov jump process or Markov chain with thedominantmetastableconformationsofamolecularsystemasMarkovstates [17, 18, 15]. In recent years Markov State Modelling has been applied with striking success to many different molecular systems like peptides including time-resolved spectroscopic experiments [3, 14, 9], proteins and protein folding [5, 11, 2], DNA [8], and ligand-receptor interaction in drug design [7, 4] and more complicated multivalent scenarios [23, 19]. Despite the tremendous progress in Markov State Models (MSMs) in recent years, one still has to put considerable effort into the construction of a MSM: a large number of short or medium-long molecular dynamics trajectories has to be generated for the specific molecular system and its precise environmental parameters (like temperature, solvent details, or pH value) of interest. How- ever, in many applications one is not only interested in a specific system or in specific environmental parameters. For example, when designing functional molecules like in ligand design, one wants to compare the kinetic behavior and related functions of a variety of molecules (screening), and/or one needs to understand the influence of environmental parameters on the binding affinity. In order to answer these questions with techniques presently available, one would havetoconstructindividualMSMsforeachmolecularspecies/parametervalue of interest which is prohibitive in most practical cases. In such cases, molecular alchemy [10] would be helpful: In silico molecular alchemy starts from a molecular system M of which a certain property P(M ) is known or has been A A computed preliminarily and computes the same property P(M ) for a different B molecular system, M , by successively transforming M into M during the B A B simulation or by reweighting the simulation for M due to the transformation A M →M . A B In this article, a computational scheme for in silico molecular alchemy in combination with MSMs is presented. We will assume that an MSM has been constructed from MD trajectory information for the specific molecular system M andpreciseenvironmentalparameters,p. Byusingexponentialreweighting A techniques in path space, we will show how to compute the MSM for another molecular system and/or changed environmental parameters (M ,p(cid:48)) solely B basedontheMDtrajectoriesofthe(M ,p)-simulations. InthisMSMreweight- A ingscheme,eachofthe(M ,p)-trajectoriesgetsanewweightwhichentersinto A thecomputationofthetransitionmatrixofthereweightedMSM.Intheory,the reweighting procedure is exact so that the (M ,p(cid:48))-MSM can be constructed B without a single (M ,p(cid:48))-simulation. In fact, we will demonstrate that this ap- B proach allows to handle even critical cases like differing numbers of dominant conformations, or different dimensions of state space. However, if the molecular system (M ,p) is ”too different” from (M ,p(cid:48)) then the reweighting will A B require very many original MD trajectories to converge due to inefficient sampling caused by large variance as typical for all reweighting schemes if initial and target structure are too far apart. Thisreweightingtechniquediffersessentiallyfromthemoretraditionalsam- pling approaches used in molecular simulation. Instead of sampling the sta- tionarydistributionbyclevertrajectoriesandreweightingafterwards, wetryto 2 avoid the sampling of new trajectories and obtain a new Markov State Model onlybyreweightingpre-assembledtrajectories,eventhecaseofotherdimensions ispossible(whichisnotthecaseforsophisticatedmethodslike,e.g.,nestedsam- pling[20]). Therefore,ourapproachisdifferentfromtheumbrellasampling[22] which uses penalty potentials in order to control the behavior of a trajectory, and is different from the Wang-Landau sampling which adjusts the potential stepwise by realizing new trajectories. Also our approach differs from the his- togram reweighting [1] and replica exchange [16] (and its recent optimization [13, 6]), because, despite the fact that we only use pre-assembled trajectories, we do not reweight the stationary distributions with space weights, but rather weights for each single trajectory. Even further, this approach does not only achieve an approximation of the stationary distribution, but an estimation of the complete Markov State Model. The article is structured as follows. First, the background of Markov State Modelling is introduced in Sec. 2. Next, the theory behind the reweighting scheme is discussed in Sec. 3. Finally, in Sec. 4, the algorithmic realization of the reweighting scheme is sketched and illustrative numerical experiments demonstrate the performance of the reweighting scheme for splitting conformations and state space different dimensions. 2 Markov State Models (MSMs) We consider diffusive molecular dynamics, dx =−∇ V(x )dt+σdB , σ2 =2β−1Id, (1) t q t t whereF =σB˙ denotesthe(interal)forcinggivenbya3N-dimensionalBrow- int t nian motion W , and V the engery landscape associated with the molecular t system under consideration. The process x admits a unique, positive invariant probability measure µ to t which it is ergodic. This invariant density is absolutely continuous wrt. the Lebesgue measure and given by the density ∝exp(−βV(x)) that we for conve- nience also denote by 1 (cid:90) µ(x)= exp(−βV(x)), Z = exp(−βV(x))dx, Z For an arbitrary complete decomposition {A ,...,A } of state space into m 1 m disjoint sets we define the transition matrix to be the m×m matrix T with entries T =P [x ∈A |x ∈A ], (2) ij µ t j 0 i where P indicates that X is distributed due to µ. It is a stochastic matrix µ 0 thatdescribesthetransitionprobabilitiesbetweenthesetsofthedecomposition on time scale t in equilibrium. Its entries can alternatively be computed by 1 (cid:16) (cid:17) T = E 1 (x )1 (x ) , ij µ(A ) µ Ai 0 Aj t i 3 where 1 denotes the indicator function of the set A [18]. In the following, we A assumethatwealreadyhaveidentifiedappropriatesets{A ,...,A }suchthat 1 m the associated transition matrix T forms a meaningful MSM and that we have already computed the entries of T up to sufficient accuracy by using sampling paths (i.e. trajectories) of the process {x }. t 3 MSM reweighting Now we are interested to understand the effect of a change of force in the SDE (1) on the entries of the MSM transition matrix T. This change of force can result from changing the environmental parameters of the simulation or from replacinggroupsofatomsofthemolecularsystembyotherones. Eventhecase of introducing additional atoms is included as will be illustrated in Sec. 4.3. Girsanov transformation Tothisend,letx =x (ω)andX =X (ω)bethesolutionsonsomeprobability t t t t space (Ω,Σ,P) of the stochastic differential equations dx =−∇V(x )dt+σdB (3a) t t t dX =−(∇V(X )+∇U(X ))dt+σdB (3b) t t t t and deterministic initial conditions x (ω)=X (ω)=x (almost surely). 0 0 Define ξ ∈Rn by t (cid:114) β ξ =σ−1∇U(x )= ·∇U(x ). t t 2 t It follows from the Girsanov theorem [12, Thm. 8.6.8], sometimes also called Cameron-Martin-Girsanov theorem [21] that for dQ:=M dP t with (cid:18) (cid:90) t 1(cid:90) t (cid:19) M :=exp − ξ ·dB − |ξ |2ds , (4) t s s 2 s 0 0 we get for any measurable set A P[X ∈A]=Q[x ∈A], t t which is identical to writing (cid:90) (cid:90) 1 (X (ω))dP = 1 (x (ω))dQ. A t A t 4 In particular, we obtain (cid:90) E[1 (X )]= 1 (X (ω))dP A t A t (cid:90) = 1 (x (ω))dQ A t (5) (cid:90) = 1 (x (ω))M (ω)dP A t t =E[M 1 (x )] t A t for any measurable set A. Updating MSM transition probabilities NowletTQ denotethetransitionmatrixassociatedwith{X }forthesamesets t {A ,...,A },andµ theassociatedinvariantmeasure. Then,ourresultyields 1 m Q that 1 (cid:16) (cid:17) TQ = E 1 (X )1 (X ) (6) ij µ (A ) µQ Ai 0 Aj t Q i 1 (cid:90) (cid:20) (cid:18) (cid:90) t 1(cid:90) t (cid:19)(cid:21) = 1 (x)·E 1 (x )exp − ξ ·dB − |ξ |2ds µ (x)dx. µ (A ) Ai x Aj t s s 2 s Q Q i 0 0 We have 1 (cid:16) (cid:17) Z µ (x) = exp −β(V(x)+U(x)) = µ(x)exp(−βU(x)), Q Z Z Q Q (cid:90) (cid:16) (cid:17) Z = exp −β(V(x)+U(x)) dx=Z·E (e−βU). Q µ such that 1 (cid:90) TQ = w (t,x)g(x)µ(x)dx, (7) ij µ (A ) j Q i Ai (cid:20) (cid:18) (cid:90) t 1(cid:90) t (cid:19)(cid:21) w (t,x) = E 1 (x )exp − ξ ·dB − |ξ |2ds (8) j x Aj t s s 2 s 0 0 (cid:114) β ξ = ·∇U(x ) s 2 s e−βU(x) g(x) = , E (e−βU) µ (cid:82) withµ (A )= g(x)µ(x)dx. Consequently,basedonthetrajectoryinforma- Q i Ai tion that was gained to compute T , we in principle can also compute TQ. ij ij Note that for (cid:90) C = w (t,x)g˜(x)µ(x)dx ij j Ai 5 with g˜(x)=e−βU(x) we obtain m (cid:90) (cid:88) C = g˜(x)µ(x)dx ij j=1 Ai and, therefore, C C 1 ij = ij c =TQ (cid:80)m C (cid:80)m C 1 ij j=1 ij j=1 ij c for c=E (e−βU) µ Linearization DenotebyTP thetransitionmatrixassociatedwith{x }forthesets{A ,...,A }. t 1 m Let us now consider the case of a small change in potential, i.e., U =εW with a small ε > 0. Then Taylor expansion around ε = 0 for the weighting factor w(t,x) yields the update formula TQ = TP +εLP +O(ε2), ij ij ij 1 (cid:114)β (cid:90) (cid:20) (cid:90) t (cid:21) LP = E 1 (x ) ∇W(x )dB g(x)µ(x)dx,(9) ij µ (A ) 2 x Aj t s s Q i Ai 0 (cid:90) µ (A ) = g(x)µ(x)dx. Q i Ai Here LQ still depends on ε via g and µ ; however, we do not linearize these ij Q factors for two reasons: (1) linearization may destroy the normalization of the measure and (2) these factors can be computed pointwise and thus rather effi- ciently. By exchanging groups of atoms in the molecular system under consideration one gets a specific update potential U. Then U =W and evaluation of CP gives the sensitivity of the MSM matrix to the exchange. Generalization NextweconsidertheLangevinequationwithpositionqandassociatedmomenta p: dq = M−1p (cid:104) (cid:105) dp = −∇V(q)+γp dt+σdB , t where γ is the friction coefficient. The state of the system is x = (q,p) with invariant measure 1 (cid:18) β(cid:104) (cid:105)(cid:19) 2γ µ(x)= exp − pTM−1p+V(q) , β = . Z 2 σ2 6 In this case the above reweighting argument for replacing V by V +U goes as before. This time the definition of ξ is (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) 0 0 ξ 0 q = . 0 σ ξ −∇U(q) p Thus our reweighting formula (7) remains unchanged, now with (cid:20) (cid:18) (cid:90) t 1(cid:90) t (cid:19)(cid:21) w (t,x) = E 1 (x )exp − ξ ·dB − |ξ |2ds j x Aj t p,s s 2 p,s 0 0 (cid:114) β ξ = ·∇U(q ). p,s 2 s Further generalization to other forms of molecular dynamics, like thermostat- ted MD, is possible by incorporating the respective environmental forces as an stochastic forcing. 4 Algorithmic Realization and Numerical Ex- periments In the following, we compare different approximations of the transition matrix TQ related to (3b). One form of approximation of TQ is through direct computation, i.e., based on (6) using trajectories of (3b); we will denote this approximation by TQ,dir. The other one results from the reweighting scheme based on trajectories of (3a), denoted by TQ,reweighted. We explain both approximation types now in detail. In the following, X ,x denote d-dimensional t t random variables. Direct computation TogainTQ,dir,wecomputealongtrajectory(X ) i i=0,...,n−1 fortheperturbeddynamics(3b)byperformingntimestepsofsizedtusing the Euler-Maruyama discretization √ X =X −(∇V(X )+∇U(X ))dt+σ dtη i+1 i i i i of (3b), where η = (η1,...,ηd) are independent d-dimensional random i i i variablesdistributedduetothestandardnormaldistribution. Thistrajec- toryischoppedintopiecesoflengthlyieldingM subtrajectories(Xk) := i i=1,...,l (X ,...,X )fork =0,...,M−1. Ifthetrajectoryislongenough, lk l(k+1)−1 itcanbeassumedthatthepointsX0,...,XM−1 aredistributedaccording 1 1 to µ and we can calculate TQ,dir by Q M−1 (cid:88) CD = 1 (Xk)1 (Xk) ij Ai 1 Aj l k=0 and CD TQ,dir = ij . (10) ij (cid:80)m CD i=1 ij 7 Reweighted computation To gain TQ,reweighted, we compute a long trajectory (x ) for the unperturbed dynamics (3a) by performing n i i=0,...,n−1 timesteps of size dt using the Euler-Maruyama discretization √ x =x −(∇V(x ))dt+σ dtη i+1 i i i of (3a), where η = (η1,...,ηd) are independent d-dimensional random i i i variablesdistributedduetothestandardnormaldistribution. Again, this trajectory is chopped into pieces of length l yielding M subtrajectories (xk) := (x ,...,x ) for k = 0,...,M − 1. If the trajec- i i=1,...,l lk l(k+1)−1 tory is long enough, it can be assumed that the points x0,...,xM−1 are 1 1 distributedaccordingtoµ. Now, wehavetoapproximateforeachsubtra- jectory(xk) thetermM (xk)witht=l·dt. Thisisdoneasfollows. i i=1,...,l t l First, for (cid:90) t 1(cid:90) t (cid:88)d (cid:18)(cid:90) t (cid:19) 1(cid:90) t R= ξ dB + |ξ |2ds= ξ (i)dBi + |ξ |2ds s s 2 s s s 2 s 0 0 i=1 0 0 (11) wehaveM =exp(−R),whereB =(cid:0)B1,...,Bd(cid:1)denotesthed-dimensional t s s s Brownian Motion with independent components. Each term (cid:82)tξ (i)dBi 0 s s can be approximated by using the Euler-Maruyama discretization by (cid:90) t ξ (i)dBi ≈ri−ri, s s l 0 0 where ri =xk 0 1 √ ri =ri +[ξ(ri)(i)]ηi dt j+1 j j (kl+j) and ξ(r)=σ−1∇U(r). Therefore, for each trajectory (xk) we calculate the weight w by i i=1,...,l k d l (cid:88) 1(cid:88) r = (ri−ri)+ |ξ(xk)|2dt k l 0 2 i i=1 i=1 w =exp(−r ). (12) k k Finally, we can compute TQ,reweighted by M−1 (cid:88) Cw = 1 (xk)1 (xk)w g˜(xk) (13) ij Ai 1 Aj l k 1 k=0 and Cw TQ,reweighted = ij . ij (cid:80)m Cw i=1 ij 8

Description:

ergy landscapes are located differently and an alchemical transformation of butane to pentane . space (Ω, Σ,P) of the stochastic differential equations.

PDF (ZIB-Report) PDF

22 Pages·2014·0.71 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PDF (ZIB-Report)

Description:

ergy landscapes are located differently and an alchemical transformation of butane to pentane . space (Ω, Σ,P) of the stochastic differential equations.

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.