Local transition gradients determine the global attributes of protein energy landscapes Francesco Rao Laboratoire de Chimie Biophysique/ISIS, Universite de Strasbourg, 8, alle G. Monge, Strasbourg, France (Dated: January 20, 2010) The dynamical characterization of proteins is crucial to understand protein function. From a microscopic point of view, protein dynamics is governed by the local atomic interactions that, in turn, trigger the functional conformational changes. Unfortunately, the relationship between local atomicfluctuationsandglobalproteinrearrangementsisstillelusive. Here,atomisticmoleculardy- namicssimulationsinconjunctionwithcomplexnetworkanalysisshowthatfastpeptiderelaxations effectively build the backbone of the global free-energy landscape, providing a connection between localandglobalatomicrearrangements. Aminimum-spanning-treerepresentation,builtonthebase of transition gradients networks, results in a high resolution mapping of the system dynamics and thermodynamicswithoutrequiringanyaprioriknowledgeoftherelevantdegreesoffreedom. These 0 results suggest the presence of a local mechanism for the high communication efficiency generally 1 observed in complex systems. 0 2 Keywords: ProteinDynamics,MolecularSimulations,Free-energyLandscapes,ComplexNetworks n a J Incomplexsystemsthebehaviorofthewholeishardly different microstates and whose links correspond to di- 0 predictable from the fundamental laws of interactions of recttransitionsbetweenthem[13–15]. Thisapproachhas 2 the single components [1]. Indeed, it is hard to reveal been successfully applied to the study of peptide folding the intimate connection between the local properties of and structural transitions [14–19], as well as to interpret ] M a complex system and its global behavior. This problem electron transfer experiments [20] and time-resolved IR has been formalized in the context of complex networks measurements [21, 22]. B [2]asaquestionof“navigability”;i.e.,themechanismfor In this letter, the relation between the local proper- . the efficient flow of information when the single network ties of the free-energy landscape and its global architec- o i nodes do not have a global view of the overall topology. ture is investigated by MD simulations of a 4 residues b As a matter of fact, many collective dynamical processes peptide, (GlySer) , and complex network analysis . In - 2 q are driven by the presence of (usually hidden) local gra- particular, when the CSN of a fully-atomistic peptide is [ dients [3]. reducedtothesubgraphcontainingonlyonelinkpermi- 1 The study of protein dynamics involve a similar prob- crostate pointing towards the most probable transition v lem – the coupling between the fast atomic fluctuations, (i.e. following the transition gradient), the presence of 0 which are local, and the slow conformational changes, energy valleys and subvalleys and their equilibrium pop- 4 which are global [4]. Those dynamical aspects have been ulations is naturally extracted as well as the hierarchy 5 recently recognized to be crucial for protein function, of transitions between them. Hence, the fast local mo- 3 . playing an important role in signalling, allosteric path- tionsbuildupthebackboneoftheglobalcommunication. 1 waysandenzymaticreactions[5,6]. Moleculardynamics The observed coupling between local and global dynam- 0 (MD) simulations are playing an increasing role in com- ical properties is expected to occur in a large class of 0 1 plementing the experimental results [7–9] which supply complex systems. : useful , but limited, information to this question. The GlySerpeptideshavebeenusedforquitesometimeas v recent combination of computational and experimental flexible linkers (and are known to show poor secondary i X studiesofaproteinenzymehaspointedoutthatthefast structure)forpolypeptidedynamics[23,24]. MDsimula- r atomic fluctuations are partly correlated to the displace- tions using the Langevin algorithm [25] and the implicit a ments occuring in the catalytic reaction [10]. As yet, the solvation FACTS [26, 27] have been performed. A tra- couplingbetweenthefastnanosecondtimescalesandthe jectory of 280 ns at 340 K was obtained and snapshots functional relevant transitions occuring in the microsec- were saved every 140 steps for a total of 106 conforma- ond to millisecond range largely remains obscure. tions. During the simulation the peptide visits several Most descriptions of the free-energy surface governing different conformations characterized by an end-to-end protein dynamics have been rather qualitative because distancebetween3.1and12.7˚A(Fig.1)indicatinglarge of the lack of proper order parameters and the intrinsic structural fluctuations. multidimensionality of the problem [11, 12]. These lim- The peptide microstates are defined as the inherent itations have triggered the development of a completely structures (IS), i.e. the potential energy minima, of the newarsenaloftoolsinspiredbynetworktheory[13]. The system [28, 29]. They are calculated minimizing all the essential idea is to map the protein trajectory, obtained 106 snapshots along the trajectory, resulting in 3044 dif- by computer simulations or experiments, on a confor- ferent IS. The IS are a natural, physically meaningfull, mation space network (CSN), whose nodes represent the partition into microstates [30–32]. The conformation 2 14 5 12 4 10 ol] m [A]e 8 cal/ 3 e k d 6 F [ ∆ 2 V4 4 V1 V V3 2 2 1 0 0.25 0.5 0.75 1 0 0.2 0.4 0.6 0.8 1 Time [ns] Z FIG. 1. Timeseries window of the end-to-end distance of the FIG. 2. (Color online) Cut-based free-energy profile for the (GlySer) peptide. The peptide is extremely flexible, alter- (GlySer) peptide. Microstates assigned by the gradient- 2 2 nating between compact states (d ∼ 3.4 ˚A) and extended approach to the four most populated valleys are shown on ee conformations (d ∼10.8 ˚A). the profile in different colors. The CFEP represents the free- ee energysurfaceprojectedonthecumulativepartitionfunction reaction coordinate Z [34], relative to a given reference mi- space network (CSN) is built on top of this microstate crostate(inthiscasethemostpopulatedone). Intheinseta profile is calculated on a subgraph made by the microstates definition: the nodes and the links are the microstates belongingtoV (triangles)andasmallergradient-cluster(cir- (i.e. the IS) and their direct transitions observed during 4 cles). the MD trajectory, respectively. The obtained network is weighted and it is equivalent to a classical transition matrix when the columns of the adjacency matrix (i.e. shown. The profile reveals that the peptide free-energy the network links) are appropriately normalized to one. landscapeisrugged. Remarkably, theobtainedgradient- Therelationbetweenfastlocalmodesandglobalchain clusters represent either a valley or a subvalley of the rearrangements is investigated by constructing from the free-energylandscape(Fig.2). InTableIthepopulation fullCSN,anewnetworkwithareducednumberoflinks. of the four most relevant free-energy basins detected by For each network node, the transition with the highest the gradient-approach is compared against the popula- probability (excluding self interactions) is kept, and all tions calculated by the minimum-cut method [16], which others are deleted. These transitions define a gradient isoneofthemostaccurateapproachesforthistypeofcal- in the network dynamics and result in a partitioning of culation[13]. Theresultsindicatethatthepopulationsof the network into several disconnected minimum span- thegradient-clustersareaccurate,effectivelyreproducing ning trees [33], called here gradient-clusters for conve- thecorrectthermodynamicsofthesystem. Thisispartic- nience. (Gradient networks were originally introduced ularly relevant since the gradient-approach does not use by Toroczkai and Bassler to study jamming [3], though anyglobalpropertyofthesystem,neitherintermsofbar- in their case the gradient is defined on the nodes as a rier heights or microstates energies. On the other hand, quenchedscalarfield). Followingthepathwaydefinedby the CFEP and the minimum-cut method do perform a the most probable transitions leads to microstates lower global analysis of the network. These observations sug- and lower in free energy, resulting in a kind of “steepest gest an interesting application of the proposed approach descent pathway” on the free-energy surface. High en- toautomaticallydetectthepresenceofmetastablestates ergymicrostates,intheneighborofafree-energybarrier, sampledbymultipleshortMDrunswiththeaimofbuild- wouldconnecteithertoonevalleyoranother. Asamat- ing simplified Markov models [37]. teroffact,thenetworkcharacterizingthefree-energysur- The gradient-approach can be applied in an iterative, faceissplitintoasetofdisconnectedminimum-spanning- heirarchical fashion, when the gradient-clusters are con- trees representing the local attractors of the system dy- sidered themselves as nodes of a higher level CSN. In namics. this case, the nodes and the links of the network are the To better understand the nature of the gradient- gradient-clusters and the connections between them, re- clusters (161 in total), a cut-based free-energy profile spectively. The iteration can be applied recursively until (CFEP) [34] is calculated on the CSN and compared to all the microstates are merged into one cluster, which theoutputofthegradient-partition. TheCFEPisbased is represented as a minimum-spanning-forest. The tree on a network flux analysis following the idea that the structure obtained for the (GlySer) peptide is shown in 2 network regions of minimum flow correspond to transi- Fig. 3. Link widths represent at which iteration step the tion states [34–36]. In Fig. 2 the calculated CFEP is edgewasintroduced. Forexample, theV andV valleys 1 2 3 Valley Gradient Mincut <d > [˚A] ee population a population V 0.429 0.428 4.167 1 V 0.120 0.120 4.253 2 V 0.068 0.064 10.960 3 V 0.037 0.051 4.160 4 a Totakeintoaccountoftheentropiceffects,gradient-clusters separatedbyfree-energybarrierssmallerthankBT/10have beenmerged. TABLE I. Comparison of the populations of the found val- leyscalculatedbythegradient-approachorbytheminimum- cut method [16]. Populations are defined as the sum of the number of occurencies calculated along the trajectory of the microstatesassignedtoagivenvalley. Inthelastcolumnthe averagevalueoftheend-to-enddistanced insideavalleyis ee given. are merged at the first iteration, indicating that they in- terconvertrapidly. Ontheotherhand,V ismergedtoV 4 1 at the fourth iteration indicating that this is the slowest FIG. 3. (Color online) Gradient-minimum-spanning-forest of transition to V . The gradient-minimum-spanning-forest 1 the (GlySer) peptide energy landscape. This free-energy 2 represents, in an intrinsically multidimensional fashion, representationconsistentlyrepresentsthefastrelaxationsbe- the valleys of the free-energy landscape and their dy- tween the microstates inside a valley (star-like structures) as namical organization in fast and slow relaxations. The wellasthedynamicalseparationbetweendifferentvalleys(ex- iteration step at which two gradient-clusters are merged pressedbylinkthickness). Largerlinkwidthsindicateslower together is a kinetic mesaure of the dynamical distance relaxations (see main text for details). The size of the nodes is proportional to the microstate population (to avoid over- between the two valleys. crowiding only nodes with populations larger than 10−3 are Concluding,wefoundthat,foranall-atompeptideMD shown). simulation, the fast local relaxations produce a partition ofthe conformationalspaceintodisconnectedminimum- spanning-trees. This partition reproduces the organiza- [2] M. Bogun˜a´, D. Krioukov, and K. Claffy, Nature Physics tion of the free-energy landscape into valleys and sub- 5, 74 (2008). valleys with the correct populations. The iterative con- [3] Z. Toroczkai and K. Bassler, Nature, 716(2004). nection of those valleys into a minimum-spanning-forest [4] H. Frauenfelder, S. Sligar, and P. Wolynes, Science 254, recovers the global backbone architecture of the dynam- 1598 (Dec 1991). [5] D.KernandE.R.Zuiderweg,CurrOpinStructBiol13, icsoccuringonthefree-energylandscape. Asimilarcou- 748 (2003). pling between local and global dynamics is expected to [6] E.Z.Eisenmesser,O.Millet,W.Labeikovsky,D.M.Ko- take place in other complex systems. These results are rzhnev,M.Wolf-Watz,D.A.Bosco,J.J.Skalicky,L.E. relevant to investigate the still unclear relationship be- Kay, and D. Kern, Nature 438, 117 (2005). tween network structure and dynamics in transport pro- [7] D. D. Boehr, D. McElheny, H. J. Dyson, and P. E. cesses ranging from metabolic pathways to air-travelling Wright, Science 313, 1638 (2006). and the internet. [8] B. Schuler and W. A. Eaton, Curr Opin Struct Biol 18, 16 (2008). [9] J. P. Colletier, D. Bourgeois, B. Sanson, D. Fournier, J. L. Sussman, I. Silman, and M. Weik, Proc Natl Acad Acknowledgments Sci U S A 105, 11742 (2008). [10] K. A. Henzler-Wildman, M. Lei, V. Thai, S. J. Kerns, I am grateful to Prof. A. Caflisch, Prof. M. Karplus, M. Karplus, and D. Kern, Nature 450, 913 (2007). Dr S. Muff, Dr. S. Krivov, Dr. M. Cecchini for useful [11] R. Du, V. Pande, A. Grosberg, T. Tanaka, and comments and discussions and Prof. P. Hamm, Dr. S. E. Shakhnovich, J. Chem. Phys. 108, 334 (1998). [12] V. Pande, A. Grosberg, T. Tanaka, and D. Rokhsar, Garrett-Roe for a critical reading of the manuscript. Curr. Opin. Struct. Biol. 8, 68 (1998). [13] A. Caflisch, Curr Opin Struct Biol 16, 71 (2006). [14] F. Rao and A. Caflisch, J. Mol. Biol. 342, 299 (2004). [15] D.Gfeller,P.DeLosRios,A.Caflisch,andF.Rao,Proc Natl Acad Sci U S A 104, 1817 (2007). [1] P. Anderson, Science 177, 393 (1972). 4 [16] S. Krivov and M. Karplus, Proc. Natl. Acad. Sci. USA [27] B. R. Brooks, C. L. Brooks, 3rd, A. D. Mackerell, Jr, 101, 14766 (Oct 2004). L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archon- [17] G.SettanniandA.R.Fersht,BiophysJ94,4444(2008). tis,C.Bartels,S.Boresch,A.Caflisch,L.Caves,Q.Cui, [18] S. Yang and B. Roux, PLoS Computational Biology 4 A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, (2008). W.Im,K.Kuczera,T.Lazaridis,J.Ma,V.Ovchinnikov, [19] D.Prada-Gracia,J.Go´mez-Garden˜es,P.Echenique,and E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schae- F.Falo,PLoScomputationalbiology5,e1000415(2009). fer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, [20] C.B.Li,H.Yang,andT.Komatsuzaki,ProcNatlAcad W.Yang,D.M.York,andM.Karplus,JComputChem Sci U S A 105, 536 (2008). 30, 1545 (2009). [21] J. A. Ihalainen, J. Bredenbeck, R. Pfister, J. Helbing, [28] F. Stillinger and T. Weber, Phys. Rev. A 25, 978 (Feb L.Chi,I.H.vanStokkum,G.A.Woolley,andP.Hamm, 1982). Proc Natl Acad Sci U S A 104, 5383 (2007). [29] F. Stillinger and T. Weber, Physical Review A 28, 2408 [22] J. A. Ihalainen, B. Paoli, S. Muff, E. H. Backus, J. Bre- (1983). denbeck,G.A.Woolley,A.Caflisch,andP.Hamm,Proc [30] A.Baumketner,J.-E.Shea,andY.Hiwatari,Phys.Rev. Natl Acad Sci U S A 105, 9588 (2008). E 67, 011912 (Jan 2003). [23] O. Bieri, J. Wirz, B. Hellrung, M. Schutkowski, [31] J.KimandT.Keyes,J.Phys.Chem.B111,2647(2007). M. Drewello, and T. Kiefhaber, Proc. Nat. Acad. Sci. [32] F. Rao and M. Karplus, to be published(2010). USA 96, 9597 (1999). [33] S. Carmi, P. Krapivsky, and D. Ben-Avraham, Phys. [24] A.Mo¨glich,K.Joder,andT.Kiefhaber,Proc.Nat.Acad. Rev. E 78, 66111 (2008). Sci. 103, 12394 (2006). [34] S. V. Krivov and M. Karplus, J. Phys. Chem. B 110, [25] MD simulations, using the Langevin algorithm with a 12689 (2006), ISSN 1520-6106 (Print). frictioncoefficientequalto0.6ps−1,werecalculatedwith [35] S. V. Krivov, S. Muff, A. Caflisch, and M. Karplus, J the CHARMM program [27], the polar hydrogen energy Phys Chem B 112, 8701 (2008). function(PARAM19)wasused.Theeffectsofwaterhave [36] S. Muff and A. Caflisch, J. Chem. Phys. 130, 125104 beenincludedusingthegeneralizedbornFACTSimplicit (2009). solvation model [26]. SHAKE was employed so that an [37] J.Chodera,N.Singhal,V.Pande,K.Dill,andW.Swope, integration step of 2 fs could be used. J. Chem. Phys. 126, 155101 (2007). [26] U. Haberthur and A. Caflisch, J Comput Chem 29, 701 (2008).