Table Of Content

The limited role of non-native contacts in folding pathways of a lattice protein Brian C. Gin,1,2,3 Juan P. Garrahan,4 and Phillip L. Geissler∗1,2 1Department of Chemistry, University of California at Berkeley, Berkeley, California 94720 2Chemical Sciences and Physical Biosciences Divisions, Lawrence Berkeley National Lab, Berkeley, California 94720 3School of Medicine, University of California at San Francisco, San Francisco, California 94143 4School of Physics and Astronomy, University of Nottingham, Nottingham, NG7 2RD, U.K. Models of protein energetics which neglect interactions between amino acids that are not adjacent in the native state, such as the Go¯ model, encode or underlie many influential ideas on protein folding. Implicit in this simplification is a crucial assumption that has never been critically evaluated in a broad context: Detailed mechanisms of protein folding are not biased by non-native contacts,typicallyimaginedasaconsequenceofsequencedesignand/ortopology. Herewepresent, 9 using computer simulations of a well-studied lattice heteropolymer model, the first systematic test 0 of this oft-assumed correspondence over the statistically significant range of hundreds of thousands 0 of amino acid sequences, and a concomitantly diverse set of folding pathways. Enabled by a 2 novel means of fingerprinting folding trajectories, our study reveals a profound insensitivity of the order in which native contacts accumulate to the omission of non-native interactions. Contrary n to conventional thinking, this robustness does not arise from topological restrictions and does not a J dependonfoldingrate. Wefindinsteadthatthecrucialfactorindiscriminatingamongtopological pathways is the heterogeneity of native contact energies. Our results challenge conventional 1 thinkingontherelationshipbetweensequencedesignandfreeenergylandscapesforproteinfolding, 2 and help justify the widespread use of Go¯-like models to scrutinize detailed folding mechanisms of real proteins. ] M Keywords: Go¯ Model, Non-Native Contacts, Lattice Model, Protein Folding, Principle of B Minimum Frustration, Energy Landscape . o i b I. INTRODUCTION The G¯o model was originally proposed as a schematic - but microscopic perspective on the stability and ki- q netic accessibility of proteins’ native states. It accord- [ Current understanding of protein folding has been inglyprovidedgenericinsightintoissuesofcooperativity, 2 strongly shaped by theoretical and computational stud- nucleation, and the relationship between sequence and v ies of simplified models1. Such models are typically con- structure1. Recent studies have ascribed a much more 1 structed by discarding fine details of molecular structure literal significance to the detailed dynamical pathways 3 or by making simplifying assumptions about the ener- definedbyG¯o-likemodels5. Inparticular,directcompar- 2 gies of interaction among amino acid residues. A spe- 2 isons have been drawn between folding mechanisms pre- cialclass ofmodels, basedonG¯o’s insights2, assertsthat . dicted by G¯o-like models for specific proteins and those 8 onlyasubsetofinteractions,thosebetweensegmentsofa suggested by experimental results16,17,18. However, it is 0 protein that contact one another in the native state, are 8 not clear to what extent such a detailed correspondence crucially important for folding. The G¯o model further 0 with G¯o-like models should be expected. General the- : assumes a unique energy scale for these native contacts. ories offer only rough guidance, and few computational v Here, we will focus on elaborated “G¯o-like” models that studies have compared folding pathways of G¯o-like mod- i X allow for a diversity of native contact energies. els and their “full” counterparts (in which non-native r Neglect of non-native contacts offers substantial com- contact energies are included) in a broad context19. a putational relief to numerical simulations, allowing thor- Veryfavorableinteractionsbetweensegmentsofapro- ough kinetic and thermodynamic studies to be per- teinthatarenotadjacentinthefoldedstategenerallyim- formed even for detailed molecular representations3,4,5,6. pedefolding. Theymightdosobyintroducingdetoursor It further establishes a basis for theories that fo- traps on the route to the native state, or simply by sta- cus on gaps in the spectrum of conformational bilizing the ensemble of unfolded conformations20,21,22. energies7,8 and the funnel-like nature of potential energy It is often imagined that the former possibility plagues a landscapes9,10,11,12,13. Corroboratedbyexperiment,con- vastmajorityofnon-naturalaminoacidsequences,which cepts intrinsic to and inspired by G¯o-like models now fold sluggishly if at all23,24. According to this picture, form a canon of widely accepted ideas about how pro- non-native contacts should feature prominently in the teins fold1,14,15. convoluted folding pathways of an undesigned sequence. Suchkineticfrustrationcouldposeseveralbiologicalrisks in vivo,whereaggregationandslowresponsecanbeseri- ous liabilities. Indeed, typical proteins taken from living ∗Correspondingauthor. E-mail: [email protected] organisms fold reliably and with relative efficiency25. 2 These notions and observations motivate a “principle comparison of fast-folding pathways in the two models of minimum frustration” asserting that natural amino would not be especially informative. In that case the acidsequenceshavebeen“designed”byevolutiontomin- sequence of events that advance a molecule toward the imize the disruptive influence of non-native contacts on native state (which we designate as its folding mecha- the dynamics of folding9. One might thus apply G¯o-like nism) would be exclusively a question of geometry and modelstothesedesignedsequenceswithconfidence,since local mobility. We have found, to the contrary, that a the omitted interactions are precisely the ones whose ef- wealth offolding mechanisms arepossible even fora sin- fects have been mitigated by natural selection. By con- gle native conformation. trast, one might expect G¯o-like models to poorly rep- Spanning a range of hundreds of thousands of se- resent folding mechanisms of slowly folding molecules, quences, with widely varying rates and mechanisms, the whose non-native interactions are presumably responsi- work reported in this paper constitutes a thorough test ble for hampering pathways to the native state21,22. ofcertainaspectsoftheprincipalofminimumfrustration Testingtheseideasofsequencedesignandkineticfrus- andaddressesatanewlevelofkineticdetailthedynam- tration is made difficult by several factors. Experimen- ical realism that can be expected from G¯o-like models. tally,microscopicdetailsoffoldingkineticscannotbere- Ourresultsforthelatticeheteropolymermodelevidence solved but only inferred from indirect observables or the aremarkablystrongmechanisticcorrespondencebetween effectsofmutations. Furthermore,themostconcretehy- full and G¯o-like models. Unexpectedly, this dynamical potheses stemming from the principle of minimum frus- conformity holds not only for fast-folding sequences but tration involve G¯o-like models, which cannot be real- also for the slowest sequences whose folding can be fol- ized in the laboratory. Computer simulations of detailed lowedinpractice. Closecorrespondenceinfoldingmech- molecular representations can generate, at great cost, anismsholdsaslongastheG¯o-likeapproximationretains dynamical information sufficient to determine a folding the heterogeneity in native contact energies of the full mechanism for only the smallest of natural proteins26. potential. These findings suggest a profound frustration Although the statistical dynamics of coarse-grained or invariance in the ensemble of trajectories that proceed schematic representations can be readily explored, biol- from deep within the unfolded state all the way to the ogy does not provide collections of fast-folding and slow- native structure. folding sequences to compare in these artificial contexts. Finally, even when appropriate ensembles of sequences and ensembles of folding trajectories are available, use- II. METHODS fulcomparisonofG¯o-likemodelsanditsfullcounterpart requires a compact way of characterizing the course of We focus on lattice heteropolymers, whose folding highly chaotic dynamics27. A general method for this properties have been studied extensively for specific ex- purpose is not available, though studies of nucleation amplesequences,structures,andchainlengths1,30. Here, as a rate-limiting fluctuation provide a useful starting a protein’s conformation is described by a self-avoiding point28,29. walk on a three dimensional lattice with spacing a (see This paper presents the first systematic, large-scale for example Fig 1a). Each vertex of this walk represents comparison of folding pathways within G¯o-like and full an amino acid monomer, which possesses no internal models. We focus on a schematic lattice representation structureandinteractsonlywith“contacting”monomers of proteins, well-suited for this task in several ways: (a) that occupy adjacent vertices. For a chain comprising N geometrically, because contacting segments of the chain monomers the energy of a particular configuration can can be unambiguously identified, (b) statistically, be- thus be written causerepresentativeensemblesoffoldingtrajectoriescan N−1 N N−3 N be generated for large numbers of amino acid sequences, (cid:88) (cid:88) (cid:88) (cid:88) E = u (r )+ B ∆(r −a), (1) core ij ij ij and (c) conceptually, because the essential competition i=1 j>i i=1 j=i+3 betweencontactenergeticsandchainconnectivitycanbe isolatedfromcomplicatingeffectsofsecondarystructure, where r = |r −r |. The hard-core potential u (r), ij i j core side-chainpacking,etc. Whiletheselattereffectsunques- which takes on values of ∞ for r = 0 and 0 for r > tionably bear in important ways on the folding of real 0, imposes the constraint of self-avoidance. Interaction proteins, it is nevertheless imperative to understand the energies B are determined by the sequence-dependent ij fundamental physical scenarios they enrich and modify. identities of monomers i and j according to the model of Indeed, much of biologists’ working intuition for protein MiyazawaandJernigan31 (MJ),andactonlyataspatial folding and design was developed in the context of sim- separation of one lattice spacing [∆(x) = 1 if x = 0 and ilarly schematic models. Our results challenge some of vanishes otherwise]. those notions. Thestandarddynamicalrulesforevolvingsuchachain It has been conjectured that well-designed lattice het- molecule proceed from a Metropolis Monte Carlo algo- eropolymers fold through mechanisms that are deter- rithm. Trial moves, in which one or two randomly se- mined solely by their native structures25. Were this lected monomers move in an “edge-flip” or “crankshaft” hypothesis correct, for both full and G¯o-like models, a fashion, are accepted with probabilities that generate a 3 FIG. 1: (a) 48-mer native structure of the lattice heteropolymer studied in this work. (b) Example of histograms of the order ofpermanentformationofnativecontacts(contactappearanceorder,orCAO)foreachoftheninenativecontactsofa12-mer latticestructure. Histogramsarecollectedfromthesetoffoldingtrajectoriesofagivenaminoacidsequence. (c)SameasFig. 1b but shown as a density map. (Right, Top) CAO maps of three fast folding sequences of the 48-mer (Fig. 1a), for both the full potential energy and the Go¯-like approximation (which disregards non-native contact energies, but maintains the original heterogeneityinnativecontactenergies). Theoverlapparameterq quantifiesthesimilarityofCAOmaps,andthustopological folding pathways. The overlap of CAO maps between full and Go¯-like potentials for each sequence is close to one, q ≈ 0.9, indicating the similarity of their folding mechanisms. In contrast, the overlap between CAO maps of different sequences is < muchsmaller,q∼0.2. (Right,Bottom)Sameasbeforebutnowforthreeslowfoldingsequences. Again,theCAOdistributions of full and Go¯-like potentials are very similiar, while those between different sequences are not. Boltzmann distribution at temperature T = 0.16(cid:15) /k , chains of modest length (say, N = 27), the number of 0 B where (cid:15) sets the energy scale of the MJ model. For ex- possible conformations is sufficiently immense to moti- 0 ample, the strongest attractive interaction (between two vate Levinthal’s paradox, i.e., it is not obvious that they cysteines) has an energy (cid:15) =−1.06(cid:15) ; for lysine-lysine should be able fold at all. Folding occurs in a coopera- CC 0 (cid:15) = 0.25(cid:15) . Folding trajectories are initiated from tive fashion, and occurs efficiently only for well-designed KK 0 swollen configurations drawn from a high-temperature sequences. For a given sequence certain residues figure (k T/(cid:15) = 100) equilibrium distribution in which con- much more prominently in folding kinetics than others; B 0 tact energies are negligible compared to typical thermal correspondingly, certain residues are more highly con- excitations. served than others in computer simulations of evolution- This caricature clearly lacks many of the chemical de- ary dynamics. tails underlying the function and secondary structure of real proteins. By capturing an essential interplay between diverse local interactions and constraints of polymer connectivity, it nonetheless recapitulates many non- The G¯o-like approximation of the model of Eq. (1) is trivial features of protein statistical mechanics: Even for constructedsimplybyignoringtheenergiesofnon-native 4 contacts, orderofacontact’sappearancecorrelatesstronglywitha statistical measure of commitment to folding at the time E˜ =N(cid:88)−1(cid:88)N u (r )+N(cid:88)−3 (cid:88)N N B ∆(r −a), (2) when that contact forms permanently. We use the pa- core ij ij ij ij rameter p , the probability that a trajectory initiated fold i=1 j>i i=1 j=i+3 fromagivenconfigurationwillreachthefoldedstatebe- fore first relaxing to a state with few native contacts33, where N = 1 if the monomers i and j are adjacent in ij to demonstrate this fact. Fig. 3c shows that the average the native configuration, and N = 0 otherwise. While ij value of p rises steadily with CAO, from a value well disregarding the energy contribution of non-native con- fold below p =1/2 up to p =1. tacts, the energy function E˜ of Eq. (2) retains the full fold fold Thepointatwhichp crosses1/2isoftenconsidered heterogeneity in native contacts energies of the original fold the transition state for folding. The set of contacts con- potential,Eq.(1). Wewillshowbelowthatitisacrucial sistenlypresentinsuchconfigurationsiscorrespondingly aspect of the G¯o-like models we study here. designated as the folding nucleus. We have confirmed Many studies previously suggested that lattice het- thatthenucleusidentifiedinthiswaycorrespondsclosely eropolymers of modest length fold via a nucleation mechanism28,29. Formation of a handful of key contacts with the set of contacts that have formed permanently whenp =1/2. Additionally,wehaveverifiedthatthe poises the system at a transition state, from which the fold CAO-identified nucleus of several sequences from Mirny chain can rapidly access the folded state or, with equal et al.25 are consistent with the nucleus identified in that probability,returntotheunfoldedstate. Thissetofcru- study. While this consistency check reflects favorably on cial contacts comprises a “folding nucleus” and serves as the soundness of exploring folding mechanisms by scru- a bare synopsis of dynamical pathways that lead to the tinizing CAOs, it does not imply that CAO analysis is native state. predicated on putative nucleation mechanisms for fold- A cogent comparison of folding mechanisms requires a ing. Regardless of whether the rate-determining steps meansofcharacterizingdynamicalpathwaysthatisboth in folding are uphill, downhill, or neutral in free energy; thorough and computationally inexpensive. Identifying regardless of whether folding is kinetically a two-state the folding nucleus satisfies neither or these necessities phenomenon; regardless of whether the progress of fold- well. In particular, locating configurations from which ing is plagued by long-lived kinetic traps, CAOs trace the folded and unfolded states are equally accessible in- a history of conformational change that emphasizes any volves propagation of many trajectories and, by itself, event with enduring topological consequences. doesnotdelineateroutestowardandawayfromthetran- sition state32. We have devised an alternative measure What CAOs do not resolve is the unproductive devel- that is not only succinct and computationally tractable, opment of native structure. Attention is focused solely but also characterizes the entire route from the unfolded onsegmentsoftimeevolutionthatbridgefoldedandun- to the folded state. Specifically, we record the order in foldedbasinsofattraction. Occasionalexcursionswithin which native contacts form permanently during a pro- the unfolded state amass an atypically large number of tein’s folding mechanism. Our parameters thus chroni- native contacts, but due either to topology or to the clelastingchangesinthechain’s“topology”,understood presence of interfering non-native contacts do not in fact in terms of linkages through the polymer backbone and makeprogresstowardfolding. CAOscontainnoinforma- through non-bonded contacts. tion about these excursions. In comparing full and G¯o- This contact appearance order (CAO) is a highly non- like models, we therefore make no statements about the trivial measure of the progress toward folding and pro- character of such non-folding dynamics. By exclusively vides a detailed characterization of mechanism in the examining accumulation of native contacts, we also lose sense we have defined. It is simple to calculate from direct information regarding the evolution of non-native the time-dependence of a trajectory spanning unfolded contacts. If the rupture of a particular non-native con- and folded states. Like persistence times34 in the contact were a crucial step in folding of a certain sequence, text of non-equilibdium systems, such as glasses, it is in- our methods would not detect its occurrence explicitly. trinsically a multi-time quantity; it can neither be com- Westress,however,thatsubstantialnon-nativestructure puted for a single configuration, nor can it be used to is present when the first permanent native contacts are build constrained ensembles whose statistics shed light formed. We could therefore indirectly detect the signifi- on the nature of reaction coordinates. But, also like per- cance of non-native contact dynamics through influences sistence times34, it focuses attention on key dynamical on the pattern of early topological changes. events with unmatched precision. For our purpose of di- Compiling the order of permanent contact formation agnosing the occurrence of lasting topological changes, over many folding trajectories of a given sequence, we CAOs serve almost ideally. For some other approaches, constructforeachnativecontactastatisticaldistribution e.g., surveying the free energy landscapes on which fold- ofCAO.Fig. 1b,cillustratehowthesetofresultingCAO ing takes place, CAOs would serve poorly. histograms form a visual fingerprint of a sequence’s fold- We have verified that the mechanistic meaning we as- ing mechanism. Because the dynamical events it chroni- cribetoCAOsisconsistentwithmoreconventionalchar- clesspanawiderangeofp , aCAOhistogramcharac- fold acterizations of reaction progress. Most importantly, the terizes not only the transition state for folding, but also 5 FIG. 2: (a) Distribution of CAO overlaps, P(q), between different sequences, and between full and Go¯-like potential, for 1000 sequenceschosenrandomlyoutof105 sequencesthatfoldtothe48-merstructureofFig. 1a. Thesequencesinthisdistribution were generated by a single high T evolutionary trajectory (see Appendix). The inset shows that the similarity between full ev and Go¯-like pathways for each sequence is independent of folding rate. Data for this inset was generated from 2000 sequences chosen randomly from 5 independent evolutionary runs (5×105 total sequences), all folding to the native 48-mer structure of √ Fig. 1a. (b)Distributionoftheroot-mean-squaredfluctuationsofcontactorder, δC,overthesetofG¯o-likesequences. CAOs in heterogeneous Go¯-like potentials vary less from one folding trajectory to another than in the homogeneous Go¯ model. It is the heterogeneity in native contact energies that selects specific folding pathways; this selectivity is absent in a homogeneous Go¯ potential. The inset shows the CAO map of the homogeneous Go¯ potential, cf. Fig. 1. (c) Average p as a function of fold number of permanent native contacts formed, for the full and Go¯-like potentials, for a fast and a slow folding sequence. In all casesp isclosetozerountilthefirstpermanentcontactsaremade,confirmingthatourCAOanalysiscapturestherelevant fold dynamical folding regime. p is the probability for a given conformation to reach the folded state before unfolding. For a fold givenfoldingtrajectory,wecalculatep accordingtothemethodofFaiscaetal.33,byrunningindependenttrajectoriesfrom fold configurations chosen at evenly-spaced time intervals. We regard a molecule as unfolded when the instantaneous number of native contacts drops to a value consistent with the average number of native contacts in the unfolded state. Additionally, we require that this threshold lie below any value found in equilibrium fluctuations of the native state. the dynamics of ascent to and descent from the transi- others have proposed that such variations are weak, i.e., tion state. The correspondence between an amino acid that topology of the folded structure prescribes a nearly sequence and its CAO histogram is as subtle as (if not unique topological route for folding. Using methods de- more so) the connection between sequence and native scribed in the Appendix, we have generated an unprece- conformation that defines some of the most challenging dentedly diverse set of sequences that fold to the same aspects of the protein folding problem. Most of the re- target structure within the full model. As shown in Fig. sults we will present concern a single native structure 1 variations in CAO statistics within this set are much (that shown in Fig. 1a for N = 48), removing a po- more substantial than previously thought. Any success tentiallytrivialagreementbetweenfullandG¯o-likemod- ofG¯o-likemodelsinreproducingfoldingpathwaysofthe els. Even for this unique structure, sequences of the full full model cannot be attributed simply to their sharing model differing by only a few point mutations can ex- a common native structure. hibit qualitatively different CAO histograms, reflecting We quantify similarity of CAO statistics (for two se- substantial changes in folding pathway. The distribution quences within the same model, or for full and G¯o-like of contact energies can thus play a critical and complex models with the same sequence) using an “overlap” pa- role in determining folding mechanism, over and above rameter q35. Inspired by the theory of spin glasses, we dictating its endpoint. Given this nontrivial relationship define q such that 0 ≤ q ≤ 1, with larger q representing itwouldbesurprisingifnon-nativecontactsdidnotgen- greater similarity. The analogy with spin glasses would erally act to shape or bias CAO statistics. assign an overlap q(α,β) between the CAO distributions for two sequences α and β proportional to TheprimarygoalofthispaperistocomparetheCAO statistics of sequences propagated using full and G¯o-like 1 n(cid:88)maxnm(cid:88)ax−1 models. In judging their similarities and differences, it is P(α)(C)P(β)(C), (3) essentialtoestablishforreferencehowsignificantlyCAO nmax n n n=1 C=1 histograms can vary, within either model, for sequences that fold to a common structure. As mentioned above, whereP(α)(C)istheprobabilitythatnativecontactnis n 6 made permanently at order C in a folding trajectory of structures, not between different sequences that adopt sequence α, and n is the total number of native con- them. In loose terms folding dynamics of the homoge- max tacts. An accurate numerical estimate of the quantity in neousG¯omodelresembleasuperpositionofthosewede- Eq. (3), however, is problematic to obtain, requiring the terminedfordiversesequencesofthefullmodel. Whereas generation of an inordinate number of folding trajecto- in the full model a typical set of contact energies selects ries. Asanalternative,wedefineq usingacloselyrelated a well-defined folding pathway, an egalitarian set of sta- quantity, bilizing energies permits broad sampling of routes to the native state. q(α,β) = 1 n(cid:88)max(cid:118)(cid:117)(cid:117)(cid:116)2(cid:32) σn(α)σn(β) (cid:33) enGer¯og-ielisk,ehmowodeveelsr,thcaaptteumrebrtahceetvoaproielotygicinalnpaattivhewcaoynstfaoclt- nmax n=1 (σn(α))2+(σn(β))2 lowed by their full model counterparts with striking ac-  (cid:16) (cid:17)2 curacy. CAO histograms obtained from full and G¯o-like (cid:104)C(cid:105)(α)−(cid:104)C(cid:105)(β) n n dynamics for any particular sequence can hardly be dis- ×exp− , (4)  (cid:16) (cid:17)2 (cid:16) (cid:17)2  tinguished, see Fig. 1. Not only are the average CAOs σ(α) + σ(β) n n of each contact nearly equivalent, but also fine details of CAO statistics are unaffected by neglect of non-native where (cid:104)C(cid:105)(nα) = (cid:80)nnm=a1xPn(α)(C)C is the average CAO contact energies. While previous work hypothesized a of contact #n for sequence α and (σ(α))2 = dynamical correspondence for fast folders, the topologi- n (cid:80)nmaxP(α)(C)(C −(cid:104)C(cid:105)(α))2 is its variance. Equations calconformityoffullandG¯o-likemechanismsweobserve n=1 n n for slow folders is highly unexpected. (3)and(4)arecompletelyequivalentinthecaseofGaus- sian distributed CAOs. Even for non-Gaussian statis- For sequences with folding rates <∼10−9, we are un- tics, q(α,β) remains a useful, computationally tractable, able to harvest folding trajectories in sufficient numbers and similarly bounded measure of how similarly two se- to construct CAO histograms. According to microscopic quences fold. reversibility, however, topological routes for folding are identical to time-reversed routes of unfolding. We have thereforeextendedouranalysisofcontactappearanceor- III. RESULTS AND DISCUSSION derforefficientlyfoldingsequencestooneofcontactdis- appearance order (CDO) for very sluggishly folding se- In the ensemble of sequences we generated, the fastest quences. The agreement between CDO histograms of folding sequences access the native state more than 1000 full and G¯o-like models is no less striking than that of times more rapidly than the slowest. CAO histograms the CAO histograms plotted in Fig. 1, even in cases weregeneratedforallsequences,eachoneevincingawell- where the “native” state is grossly unstable. These cal- defined topological pathway. Typically, the appearance culationsaresomewhatlessstraigthforward: theorderof order C of a given native contact varies from one trajec- first disappearance (CDO) is equivalent to the order of torytoanotherbyonlyafewpositions(seebelow). This permanent appearance (CAO), but only for trajectories regularity belies substantial conformational fluctuations reaching the unfolded state without revisiting the native attending each folding event, which exert little influence state. As such, they require specifying when a molecule on the formation of permanent contacts. Sharply peaked has unfolded. For this purpose, we regard a molecule as CAOhistogramsdonotindicatealackofcomplexity,but unfolded when the instantaneous number of native con- instead a successful characterization of forward progress tactsdropstoavalueconsistentwiththeaveragenumber along the reaction coordinate for folding. ofnativecontactsintheunfoldedstate. Additionally,we Figure 1 shows CAO histograms for several sequences require that this threshold lie below any value found in folding to this specific 48-mer structure (depicted in Fig. equilibrium fluctuations of the native state. We have 1a). Results are presented for dynamics propagated ac- verified that CAO and CDO histograms indeed match cording to both full and Go¯-like models. Comparing for sequences folding at moderate rates. these topological fingerprints across different sequences Quantitativemeasuresofmechanisticdiversityarepre- hints at the broad variety of possible folding pathways. sentedinFig. 2a. Foreachpairofsequencesgeneratedby Contacts essential to early stages of folding for one se- our evolutionary simulation we computed the similarity quencecanbeirrelevantinthepathwaytakenbyanother. parameterq betweenCAOhistogramsforthefullmodel. This finding contrasts strongly with the “one-structure The resulting distribution of q values is broadly peaked one-nucleus”hypothesis, bolsteringrecentreportsofdis- at q ≈0.4, signifying that there is a significant diversity similar folding nuclei29. ofCAOpathwaysrepresentedbythesequencesintheen- Strong variations in the topological folding pathways semble. For each individual sequence we also quantified chosen from one sequence to another immediately in- the relationship between CAO histograms generated by dicate that the original homogeneous G¯o model27 can- full and G¯o-like models. These q values are distributed not capture the folding behavior of a typical sequence. much more narrowly about a considerably higher aver- With a homogeneous set of native contact energies, that age, q ≈ 0.9. Using sequence-to-sequence variation in model can only discriminate between different native CAO pathways as a yardstick, the irrelevance of non- 7 FIG. 3: (a) Number of native contact as a function of time in a folding trajectory, illustrating the “prefolding” (blue) and “folding”(red)phasesofthedynamics. Theprefoldingphaseextendsfromthefoldingtrajectory’sstarttimeuntilthetimethe first permanent native contact is formed. The folding phase extends from this time to the time when the native conformation isreached. Thefull(green)curveshowsthep ,whichonlydepartsfromzeroafterthefoldingphasehasstarted(cf. Fig. 2). fold (b, right panels) Distribution of the duration of the prefolding and folding phases, in the full potential and its corresponding Go¯-likeapproximation. Forfast-foldingsequences(toppanel)thedistributionsforbothfoldingandprefoldingdurationsofthe Go¯-like model are close to those of the full potential. For slow-folding sequences (middle panel) the Go¯-like model reproduces thedistributionoffoldingduration,butunderestimatestheprefoldingtimes. IftheGo¯-likepotentialofslow-foldingsequences is supplemented by random non-native contact energies (bottom panel) the prefolding distributions can be made to mach, without disrupting the correspondence in the folding phase distributions. (c) Ratio between full and Go¯-like models’ folding (red) and prefolding (blue) phase durations, for all sequences ordered according to folding rate; the full lines are the average ratios for each scatter plot. For fast folders, the average times as calculated from the full and Go¯-like models are comparable, bothforthefoldingandprefoldingphases. Forslowfolders,theprefoldingtimeinGo¯-likemodelismuchsmallerthanthatin the full potential, and this difference increases with decreasing folding rate. nativecontactsforthetopologicalfoldingpathwayisbe- The relevance of CAOs for the folding dynamics is il- yond doubt. The inset to Fig. 2a emphasizes that this lustrated in Fig. 2c. For two sequences and their G¯o-like result has little to do with folding efficiency. Typical q approximations, it plots p 32,36 as a function of the fold valuesforthefull/G¯o-likecomparisonarejustashighfor totalnumberofpermanentnativecontactsformed,aver- the slowest folders examined as for the fastest. aged over 200 folding trajectories. p gives the prob- fold Figure 2b quantifies the variation of CAO between ability for trajectories initiated from a particular config- folding trajectories. For each sequence we quantify the uration to fold completely before visiting the unfolded root mean-squared fluctuation in the contact order: state, and provides a standard basis for defining transition states in complex systems32,36. Fig. 2c shows that δC = 1 n(cid:88)max(cid:112)(cid:104)C2(cid:105)−(cid:104)C (cid:105)2. (5) pfold (cid:28) 1 when the first permanent contact is formed. nmax n n Since pfold = 1 by definition when the last permanent n=1 contact is formed, CAO histograms chronicle nearly the entire course of folding dynamics, all the way from the Fig.2bshowsthedistributionofδC amongtheensemble unfoldedbasinofattraction(p =0)tothenativestate ofG¯o-likesequences. ItispeakedatavalueofδC ≈7.5. fold (p =1). In contrast, for the homogeneous G¯o model δC ≈ 12.5, fold indicating that CAO values are much more broadly dis- Insensitivity of topological folding pathways to non- tributed between trajectories (see inset to Fig. 2b). The native contact energies by no means implies a complete homogenous G¯o model indeed lacks the pathway speci- dynamical equivalence of full and G¯o-like models. For ficity exhibited when contact energies are diverse, as in example, a sequence’s mean first passage time for fold- heterogeneous G¯o-like models. ing can differ by as many as three orders of magnitude 8 FIG.4: TheCAOcorrespondencebetweenthefullpotentialandtheGo¯-likeapproximationisrobusttochangesinchainlength or target native structure. (Left) CAO maps is a 12-mer folding to the structure shown in the figure. (Center) A sequence of the 48-mer of Fig. 1 which has a secondary stable configuration. Each target structure defines a G¯o-like approximation from the set of their native contacts. Each G¯o-like model predicts accurately the CAO map for folding to the corresponding structure. (Right) Correspondence of Go¯-like/full CAO maps in a 64-mer. for full and G¯o-like models. This discrepancy is larger permanentcontactismade,bylessthananorderofmag- for sequences with slower folding rates. Such discrep- nitude. By contrast, pre-folding dynamics of poorly de- ancies may be due to the presence of off-pathway traps signedsequencesarequitesensitivetonon-nativecontact in the unfolded state, and possibly non-native stabilized energies. For the example shown in the middle panel of intermediates along the folding pathway. However, our Fig. 3b, the waiting period prior to formation of a single calculations suggest that such marked distinctions are permanent contact is roughly three orders of magnitude largely limited to dynamics occurring before the value of longerinthefullmodelasintheG¯o-likemodel. Nosuch the committor function p increases significantly from dilationisobservedforsequencesthatfoldquicklyinthe fold zero,i.e. beforesignificantprogresshasbeenmadealong full model. the folding reaction coordinate. Because contact appearance order is a sensitive mea- AsillustratedinFig.3a,wecandivideeachfoldingtra- sure of approach to the dynamical bottleneck for fold- jectory into a period before any permanent contacts are ing, our division of pre-folding and folding phases is a made(the“pre-foldingphase”)andtheremainingperiod kinetically meaningful one. Most importantly, p (cid:28) 1 fold in which lasting native structure develops (the “folding throughout pre-folding dynamics as seen in Fig. 3a, in- phase”). Note that this division takes place well before dicating that the system remains well within the un- a molecule commits to the folded state (p > 1/2); folded basin of attraction. Only when permanent con- fold indeed, the number of non-native contacts at the begin- tacts are made does p rise significantly, so that the fold ning of the folding phase is typically comparable to that folding phase encompasses entirely departure from the of the unfolded state. Fig. 3b shows the distributions unfolded state and transit to the native structure. It is of pre-folding and folding phases’ durations for two se- remarkablethatnon-nativecontacts, whichcansubstan- quences representative of fast and slow folders. In both tially prolong dwell times in the unfolded state, exert cases the influence of non-native contacts on the folding no discernible influence on the topological folding order, phase dynamics is weak. Non-native contacts mildly ex- and only a small effect on the duration of folding phase tend the time required to complete folding after the first dynamics. 9 Our simulations suggest that progress toward the na- folded states is essentially determined by native energies tive state is essentially orthogonal to the formation and alone, despite the fact that substantial non-native struc- rupture of non-native contacts. A number of such con- ture must be disrupted en route. tactsarecertainlypresentovermuchofthecourseoffold- Lattice heteropolymers are perhaps the crudest rep- ing,buttheydolittletodecidewhatconformationalrear- resentation of protein mechanics to which our analysis rangementsbringachainclosertoitstransitionstatefor could be meaningfully applied. The correspondence be- folding. To further test this idea, we studied folding dy- tween full and G¯o-like folding mechanisms we have re- namicsgovernedbypotentialenergyfunctionsthatcom- vealed might break down in more detailed models. For bine aspects of full and G¯o-like models. Specifically, we example, it has been reported that lattice heteropoly- selected a set of non-native contact energies at random mersdonotexhibitglassyfoldingdynamicsevenatvery from a Gaussian distribution, see Fig. 3b. The “frus- low temperatures, while non-Arrhenius temperature de- trating” influence of these random energies match pre- pendence naturally arises in slightly elaborated models cisely the behavior we have reported for the full model: that describe side chain packing in addition to back- CAO histograms are completely insensitive to the aver- boneconformation24. G¯o-likeenergeticscouldalterfold- agestrengthandvarianceofnon-nativeattractions,while ing pathways by abating the frustration underlying such overall folding rates decrease with increasing non-native glassy relaxation. This possibility, which merits further attraction strength. investigation, does not however negate the significance The observation of correspondence between dynam- of our findings. Our primary purpose is not to justify ics of the full lattice model and that of a heterogeneous the use of G¯o-like models for detailed study of real pro- G¯o-like approximation does not noticeably depend upon teins’ folding mechanisms. It is instead to establish the chain length or on details of native structure. We have influenceofnon-nativeinteractionsondynamicsintrinsic generatedsequenceswitharangeoffoldingratesforsev- to the fundamental interplay between chain connectivity eralnativeconformationsofchainswithlengths8,12,48, and heterogeneous contact interactions. That interplay, and 64. For the two shortest chains, we used each max- whose understanding is central to any instructive physi- imally compact lattice structure as a folded state. For calpictureofproteinfolding,isnotjustpresentinsimple the two longest chains, we studied several native struc- lattice models – it is the exclusive source of their com- turesvaryingsignificantlyincompactnessandincontact plexity. Theresultswehavepresentedthereforeestablish order37. Typical results shown in Fig. 4 highlight that an important point: Mechanistic aspects of protein fold- the fidelity of G¯o-like folding mechanisms is a very gen- ing that arise from the basic physics of heteropolymer eral feature of these lattice heteropolymers. freezing are remarkably insensitive to non-native structure. IV. CONCLUSIONS Acknowledgments Several arguments have been presented in the liter- ature to justify the use of G¯o models in studying the WewishtothankD.Chandler,J.Chodera,K.DuBay, folding mechanisms of real proteins. Most commonly as- R. Jack, and S. Whitelam for useful discussions, and W. serted (based on the principle of minimum frustration) Eaton,E.Shakhnovich,andA.Szaboforcriticalreadings is that evolutionary optimization of real sequences re- of the manuscript. moves kinetic barriers and renders the energy landscape This research used resources of the National Energy smoothlyfunneledandthereforeG¯o-like11,15. Biasesdue Research Scientific Computing Center, which is sup- totopologicalfeaturesofthenativestate,unchangedina portedbytheOfficeofScienceoftheU.S.Departmentof protein’s G¯o-like represention, have also been invoked to Energy under Contract No. DE-AC02-05CH11231. This justifymechanisticfidelity38,39. Ourresultsdemonstrate, work was supported by the Director, Office of Science, however, that neither of these assumptions need hold for Office of Basic Energy Sciences, Chemical Sciences and aG¯o-likemodeltoreproduceinfinedetailthetopological Physical Biosciences Divisions, of the U.S. Department ordering of folding events of a lattice heteropolymer. of Energy under Contract No. DE-AC02-05CH11231. In Robustness of the detailed mechanism for folding to carrying out this work JPG was supported by EPSRC omission of non-native contacts is not a consequence of grant GR/S54074/01. sequence design within the schematic lattice models we have studied. It is a fundamental emergent feature of their statistical dynamics, independent of folding efficiency over the entire range accessible to our numerical V. APPENDIX simulations. Rather than introducing kinetic roadblocks thatreshapetransitionstatesforfolding,energeticdiver- Ourmethodofsequencegeneration,whicheffectsabi- sionsduetonon-nativecontactsappeartostronglyaffect ased random walk in the space of all possible sequences, only physical properties of the unfolded state. Even the is an extension of the method of Mirny et al.25. To gen- durationoftrajectorysegmentsthatspanfoldedandun- erate ensembles of sequences folding to a specific native 10 structure, we introduce random point mutations and ac- rapidly than mean first passage time calculations em- cept them with a Metropolis probability ployed in Mirny et al.25. (cid:20) (cid:18) ∆F‡(β)−∆F‡(α)(cid:19)(cid:21) Our evolutionary simulations, conducted at moder- P =min 1,exp − (6) ate “temperature” T = 0.05/k , demonstrate that acc T ev B ev in fact many folding pathways can provide efficient access to a single native state. It is therefore not at all that generates a Boltzmann-like distribution. Here, self-evident that a particular, well-designed amino acid ∆F‡(α) is an estimated activation free energy for fold- sequence should arrive at its native structure via simi- ing of sequence α, k(α) = k exp(−∆F‡(α)/k T). We 0 B lar routes in full and G¯o-like versions of the lattice het- estimate the folding rate constant k(α) for sequence α, eropolymer model. relative to the rate of basic microscopic motions k , by 0 computing the fraction of trajectories (cid:104)h (cid:105) ≈ 1 − Using this method, we have generated hundreds of fold τ exp(−k(α)τ) that fold within a fixed amount of time τ thousands of sequences which fold to given structures (with k(α) (cid:28) τ−1 (cid:28) k ). This strategy offers two dis- (forexamplethatofFig. 1a)throughavarietyoffolding 0 tinct advantages: (1) the evolutionary temperature T , mechanisms. Thisistheensembleofsequencesweusein ev whichgovernsthestringencyofselectionforefficientfold- this paper. Further details of the evolutionary dynamics ing, can be controlled systematically; and (2) estimates used to generate these large ensembles of sequences will of folding efficiency via (cid:104)h (cid:105) can converge much more be given in a forthcoming publication40. fold τ 1 Shakhnovich, E., 2006. Protein folding thermodynamics 1998. Pathwaysforproteinfolding: isanewviewneeded? anddynamics: wherephysics,chemistry,andbiologymeet. Curr Opin Struct Biol 8:68–79. Chem Rev 106:1559–1588. 15 Onuchic,J.N.,andP.G.Wolynes,2004.Theoryofprotein 2 Go,N.,1983. Theoreticalstudiesofproteinfolding. Annu folding. Curr Opin Struct Biol 14:70–75. Rev Biophys Bioeng 12:183–210. 16 Karanicolas, J., and C. L. Brooks, 2003. Improved Go- 3 Shimada,J.,A.V.Ishchenko,andE.I.Shakhnovich,2000. like models demonstrate the robustness of protein folding Analysis of knowledge-based protein-ligand potentials us- mechanisms towards non-native interactions. J Mol Biol ing a self-consistent method. Protein Sci 9:765–775. 334:309–325. 4 Shimada, J., E. L. Kussell, and E. I. Shakhnovich, 2001. 17 Levy,Y.,andJ.N.Onuchic,2006. Mechanismsofprotein Thefoldingthermodynamicsandkineticsofcrambinusing assembly: lessons from minimalist models. Acc Chem Res anall-atomMonteCarlosimulation.JMolBiol 308:79–95. 39:135–142. 5 Takada,S.,1999. Go-ingforthepredictionofproteinfold- 18 Simler,B.R.,Y.Levy,J.N.Onuchic,andC.R.Matthews, ing mechanisms. Proc Natl Acad Sci U S A 96:11698– 2006. Thefoldingenergylandscapeofthedimerizationdo- 11700. mainofEscherichiacoliTrprepressor: ajointexperimental 6 Shoemaker, B. A., and P. G. Wolynes, 1999. Exploring and theoretical investigation. J Mol Biol 363:262–278. structuresinproteinfoldingfunnelswithfreeenergyfunc- 19 Clementi, C., and S. S. Plotkin, 2004. The effects of non- tionals: thedenaturedensemble. J Mol Biol 287:657–674. native interactions on protein folding rates: theory and 7 Shakhnovich,E.I.,andA.M.Gutin,1990. Implicationsof simulation. Protein Sci 13:1750–1766. thermodynamicsofproteinfoldingforevolutionofprimary 20 Sali,A.,E.Shakhnovich,andM.Karplus,1994. Howdoes sequences. Nature 346:773–775. a protein fold? Nature 369:248–251. 8 Sali,A.,E.Shakhnovich,andM.Karplus,1994.Kineticsof 21 Paci, E., M. Vendruscolo, and M. Karplus, 2002. Validity proteinfolding.Alatticemodelstudyoftherequirements of Go¯ models: comparison with a solvent-shielded empiri- for foldingto the native state. J Mol Biol 235:1614–1636. cal energy decomposition. Biophys J 83:3032–3038. 9 Bryngelson, J. D., and P. G. Wolynes, 1987. Spin glasses 22 Paci, E., M. Vendruscolo, and M. Karplus, 2002. Native andthestatisticalmechanicsofproteinfolding. Proc Natl and non-native interactions along protein folding and un- Acad Sci U S A 84:7524–7528. folding pathways. Proteins 47:379–392. 10 Onuchic, J. N., P. G. Wolynes, Z. Luthey-Schulten, and 23 Shakhnovich,E.I.,andA.M.Gutin,1993. Engineeringof N.D.Socci,1995. Towardanoutlineofthetopographyof stable and fast-folding sequences of model proteins. Proc a realistic protein-folding funnel. Proc Natl Acad Sci U S Natl Acad Sci U S A 90:7195–7199. A 92:3626–3630. 24 Gutin, A., A. Sali, V. Abkevich, M. Karplus, and E. I. 11 Onuchic,J.N.,N.D.Socci,Z.Luthey-Schulten,andP.G. Shakhnovich, 1998. Temperature dependence of the fold- Wolynes, 1996. Protein folding funnels: the nature of the ing rate in a simple protein model: Search for a “glass” transition state ensemble. Fold Des 1:441–450. transition. Journal of Chemical Physics 108:6466–6483. 12 Onuchic, J. N., Z. Luthey-Schulten, and P. G. Wolynes, 25 Mirny,L.A.,V.I.Abkevich,andE.I.Shakhnovich,1998. 1997. Theory of protein folding: the energy landscape Howevolutionmakesproteinsfoldquickly. ProcNatlAcad perspective. Annu Rev Phys Chem 48:545–600. Sci U S A 95:4976–4981. 13 Socci,N.D.,J.N.Onuchic,andP.G.Wolynes,1998. Pro- 26 Schaeffer,R.D.,A.Fersht,andV.Daggett,2008.Combin- tein folding mechanisms and the multidimensional folding ing experiment and simulation in protein folding: closing funnel. Proteins 32:136–158. the gap for small model systems. Curr Opin Struct Biol 14 Pande,V.S.,A.Grosberg,T.Tanaka,andD.S.Rokhsar, 18:4–9.

The limited role of non-native contacts in folding pathways of a lattice protein PDF

2.2 MB·English

by Brian C. Gin

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The limited role of non-native contacts in folding pathways of a lattice protein

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.