ebook img

Inferring the Rate-Length Law of Protein Folding PDF

0.56 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Inferring the Rate-Length Law of Protein Folding

Inferring the Rate-Length Law of Protein Folding Thomas J. Lane and Vijay S. Pande ∗ Department of Chemistry, Stanford University (Dated: January 21, 2013) We investigate the rate-length scaling law of protein folding, a key undetermined scaling law in the analytical theory of protein folding. We demonstrate that chain length is a dominant factor determining folding times, and that the unambiguous determination of the way chain length corre- lateswithfoldingtimescouldprovidekeymechanisticinsightintothefoldingprocess. Fourspecific proposed laws (power law, exponential, and two stretched exponentials) are tested against one an- other, and it is found that the power law best explains the data. At the same time, the fit power lawresultsinratesthatareveryfast,nearlyunreasonablysoinabiologicalcontext. Weshowthat any of the proposed forms are viable, conclude that more data is necessary to unequivocally infer 3 therate-lengthlaw,andthatsuchdatacouldbeobtainedthroughasmallnumberofproteinfolding 1 experiments on large protein domains. 0 2 n INTRODUCTION der of magnitude, and that most studies of folding have a focused on small, well-behaved model systems. J 8 A deep understanding of protein folding must involve Further complicating the inference of the rate-length 1 a description of the general mechanisms involved. It scaling law is the fact that chain length is certainly not is reasonable to suspect this will consist of a simple theonlyfactoraffectingfoldingrates. Infact,ithasbeen ] model, basedonmicroscopicphysics, expressedinasim- arguedthatitisafairlyweakpredictorofthefoldingtime h p ple mathematical language. Such a model would show [5, 6]. For instance, there seems to be some correlation - how biological sequences are able to employ physics to offoldingtimeswithtopologicalcomplexityofthenative o spontaneously self-assemble into intricate molecular ma- state, such that if two proteins have the same number i b chines. of residues, but different folds, they may take different . amounts of time to fold [5, 7]. Moreover, even protein s Simple models of this sort often start by postulating a c mutantswiththesamenativestructurecanhaveatleast mechanism of folding, and then derive the consequences i s ofthatmechanism[1,2,9–18]. Thissuggestsitmightbe 3 orders of magnitude variation in their folding rates [8]. y Thus,weexpectthatexperimentaldataonthescalingof possible to infer the general mechanisms of protein fold- h folding time with chain length should be very noisy, and p ing by verifying the specific predictions of these models. difficult to statistically estimate. [ For a simple model of protein folding, however, there is Nonetheless, there does seem to be a significant corre- a limited set of general experimental trends that can be 1 lation between the number of residues (N) and folding readilypredicted. Onesuchexperimentaltrendisthelaw v times (τ). Many different forms of the scaling law have 2 governinghowfoldingtimesscalewithchainlength. This 0 is perhaps the simplest comparison of theory and exper- beenpredicted,butallfallintooneofthreebasicclasses. 3 iment possible, but has not yet been unambiguously in- Shaknovich [2], Cieplak [9, 10], and co-workers have pro- 4 posed a power-law, τ Nν. We recently constructed a ferreddespitenearlytwodecadesofactiveresearch[1,2]. 1. The rate-length law is also an interesting result in and model that suggested ∼exponential scaling, τ ∼ eαN [11], 0 consistentwithpredictionsmadebyZwanzig[12,13]. Fi- of itself. Such a law can be viewed as a statement of 3 nally,Thirumalai[1,14],Mun˜oz[15],Takada[16],Finkel- the computational complexity of protein folding – given 1 stein [17], and co-workers have suggested a stretched ex- : a problem of size N (residues), how does one expect the v time-to-solution(folding)toscale? Levinthalpointedout ponentials,τ eαNβ,withβ as1/2or2/3. Wolyneshas i ∼ X that an exhaustive search would result in exponential proposed the law may conditionally change between all r scaling,andsuggestedthatthiswouldresultinunreason- four suggested models [18]. a ablylargefoldingtimes[3]. Thus,inmanyways,aresolu- In what follows, we develop and apply two methods tiontoLevinthal’sparadoxislikelytobephraseddirectly forchoosingbetweenthesemodelsandevaluatehoweach asaratescalinglaw,eithernon-exponential(polynomial) proposed model performs. or exponential with an explicitly small exponential fac- tor. A number of issues complicate inferring such a law MODELING from experiment, most importantly the fact that avail- able kinetic data on protein folding spans a very limited Below, we outline two complementary methods for in- rangeofchainlengths-about30to300residues[4]. The ferringwhichproposedscalinglawisthemostreasonable. statistical power of the data is inherently limited by the First,wepresentamethodcapableoffittingeachmodel’s fact that protein domain sizes barely span a single or- parameters to the known data, and examining how well 2 eachmodelexplainsthedata. Next,weinvestigateasec- We have fit these parameters by maximizing the like- ond discriminatory method, which proposes that folding lihood . Model comparison can then be performed by L times must be below a certain threshold value to be bio- investigating the ratio of the likelihoods of two alterna- logically viable. It has been demonstrated that in the tive models (Table I). We have adopted the simple like- crowded milieu of the cell, proteins must fold rapidly lihood approach (versus a full-fledged Bayesian analy- to avoid aggregation or degradation. We suggest that sis) because the number of fit parameters are small and this implies that we can check the reasonableness of any equal for each model, the models are simple and low- model by seeing if its prediction for this threshold time dimensional, and we have little prior information about isreasonablewithwhathasbeenobservedempiricallyin theparameters. Seethesupplementalinformation,Fig.4 biology. and Table II for a Bayesian analysis and comparison. In this study we focus on single-domain globular pro- teins. Kinetic data for the folding times of proteins were takenfromtheKineticDB[4],whichreportsproteinfold- Indirect Method: Biological Limits ing times at zero denaturant, near room temperature, and under neutral pH. Other data sets exist [19–21], but We postulate that there exists a critical time, τ , that c were not consistent with one another - despite this, they placesabiologicalupperboundonfoldingtimes. Specifi- yielded very similar results (see SI, Fig. 2 and Table I). cally,ifaproteinfoldsslowerthanthistime(i.e. τ >τ ) c then that protein will be much more likely to aggregate during the course of folding, and therefore is evolution- Direct Method: Likelihood Maximization arily selected against. The majority of biologically observed proteins should Wewanttoestimatetheparametersforeachproposed have folding times less than τ , but we postulate that c form of the scaling law. In what follows, we adopt a someproteinswillhavegreater times. Theseproteinsare model that accounts not only for this scaling law, but those that receive help folding from chaperones or other allotherfactors(topology,experimentalconditions,etc.) cellular machinery. It has been estimated that about viaarandomGaussiancomponent. Thus,byfittingeach C 10% of proteins fall into this category [22]. model, we not only learn parameters for each proposed T≈ogether, these assumptions allow us to build a model model, but also get an estimate for the relative impor- for the predicted distribution of protein chain lengths. tanceoftheseotherfactorsindeterminingfoldingtimes. The size distribution of domains (SI Fig. 1) can be We assert the following model for the folding time, roughly approximated by a Gaussian with parameters µ and σ . In that case, logτ/τ =f(N)+X (1) N N 0 where ∞ 1 exp (N −µN)2 =C Zf−1(τc/τ0) 2πσN2 (cid:20) 2σN2 (cid:21) νlogN power-law f(N)=αN exponential wheref−1(τc/τ0)=pNc isthechainlengthcorresponding αN1/2 stretched exp. (1/2) taondτcCforisathspeepciefirccemntoadgeel o(pfopwreortelianws,wexitphonfoelndtiinagl,teitmc.e)s, αN2/3 stretched exp. (2/3) slower than τ . are the proposedfolding rate laws, X represents a ran- Thisframewcorkis,ofcourse,anapproximation. There are undoubtedly many other factors affecting the opti- dom variable distributed as a zero-mean Gaussian, X (0,σ2), and τ is a fit constant accounting for units o∼f mal sizes of proteins beyond merely their folding times. 0 N Metabolic efficiency, structural packing constraints [23, time. By adding X to the logarithm of the folding time 24],andthebehaviorofspecificproteinsintheirlocalcel- (1), we model random variation in relative terms, and it lularenvironmentscertainlyplayarole. Nonetheless,the enters as a multiplicative factor. concept of an upper limit to the folding times is reason- Equation (1) implies τ is distributed as a log-normal, able, and our aim here is to simply extract some general with location parameter f(N) and scale parameter σ. comments about the reasonableness of predicted folding The likelihood of the entire data set (assuming n inde- times, rather than make quantitatively accurate predic- pendent measurements) is tions. n 1 (logτ /τ f(N ))2 i 0 i = exp − (2) L τ √2πσ2 − 2σ2 iY=1 i (cid:20) (cid:21) RESULTS We have three parameters for each model, σ, τ , and α 0 or ν for the exponential and power-law families, respec- Direct fitting of all proposed models to the available tively. data yields reasonable results for each (Fig. 1). Each 3 Power-Law Exponential Str. Exp. (Beta 1/2) Str. Exp. (Beta 2/3) s) 102 ν=5.4 s) 102 s) 102 s) 102 e ( 100 σ=3.06 e ( 100 e ( 100 e ( 100 m m m m Ti10-2 Ti10-2 Ti10-2 Ti10-2 ng 10-4 ng 10-4 ng 10-4 ng 10-4 Foldi10-6 Foldi10-6 ασ==30..30346 Foldi10-6 ασ==31..1141 Foldi10-6 ασ==30..1397 10-8 10-8 10-8 10-8 101 102 100 200 300 100 200 300 100 200 300 Residues Residues Residues Residues y103 y103 y103 y103 c c c c n n n n ue102 ue102 ue102 ue102 q q q q e e e e Fr101 Fr101 Fr101 Fr101 100 100 100 100 10-3100103106109 10-3100103106109 10-3100103106109 10-3100103106109 Folding Time (s) Folding Time (s) Folding Time (s) Folding Time (s) FIG. 1. The predicted models for the folding rate law, overlaid with measurements of folding times (top), and the putative foldingtimedistributionsthesemodelsimply(bottom). Parametervaluesderivedfromamaximumlikelihoodfitaredisplayed, along with intervals indicating the spread in the fit probability distribution. Dark grey shading indicates a factor of eσ, while light grey indicates e2σ. model is better supported by the data, but only by a TABLE I. Likelihood Ratios of -Maximized Models L modest margin. Further, an attempt to fit the stretched Model a Pr. Law Exp. S. E. 1/2 S. E. 2/3 exponentialformwithβ asavariableparameterresulted Power Law 1.59 103 7.98 100 3.82 101 in an unreasonably small value of β along with a very · · · Exponential 6.30 10−4 5.03 10−3 2.41 10−2 largevalueofα,resultinginafitthatisveryclosetothe · · · S. E. 1/2 1.25 10−1 1.99 102 4.79 100 power law (SI Fig. 3). Finally, the power law model has · · · S. E. 2/3 2.61 10−2 4.15 101 2.09 10−1 the smallest fit σ, indicating that it explains the most · · · variation in the data, and attributes less to orthogonal a Primarymodelisontheleft,alternatemodelalongthetop- factors. thus,alargernumberfavorsthemodelintheleftmostcolumn. It is clear, however, that there is little difference be- tween the models in the range of available data. These model reports a scale parameter (σ) of approximately models diverge significantly only for very large proteins 3, which indicates that 68% of proteins will have fold- (Fig.2). Theyyieldsignificantlydifferentpredictionsfor ing times within a factor of eσ 20 from the time pre- thedistributionoffoldingtimes,generatedbytransform- dictedbytheratelaw,and95%≈willbewithinafactorof ing the known distribution of domain sizes into each of e2σ 400. Sincetheavailabledataspansfoldingtimesof the different models (Fig. 1). The most significant dif- mor≈ethan9ordersofmagnitude(between1.9 10 7 and ferences are in the tails of these distributions, where the − 9.9 102 seconds), this demonstrates that cha·in length exponential forms predict much longer folding times for capt·ures the majority of variation in folding times, since thelargestproteins(TableII).Thepowerlawmodelpre- orthogonal factors (topology, mutations, etc.) account dictsnoproteinsfoldintimeslongerthananhour,while for approximately 4002 105 orders of magnitude of the exponential forms show a significant number of pro- variation. ∼ teins with folding times longer than a day (Fig. 2). Do the data support any one model? The power-law An evaluation of the reasonableness of these folding model is slightly favored by comparing the likelihoods time distributions is provided by the critical time τ for c that each model generated the observed data (Table I). eachmodel(SITableIII).ForreasonablevaluesofC,the In such comparisons, typically a ratio of 102 or greater power law τ is on the order of minutes. The exponen- c is considered significant, and often models differ by hun- tial forms, on the other hand, predict τ is on the order c dreds of orders of magnitude [25] – thus, the power law of hours. Indeed, picking a reasonable value of τ and c 4 1011 34 3 0 109 1× 1 year y 36 me (s) 110057 1 day babilit 38 ding Ti 110013 11 mhoiunrute del Pro 40 Fol 1100--31 PESoxtrwp. oeEnrx epLn.a tw1ia/2l og Mo10 42 PESSox.. wpEEo..e n12r e//L23natwial 10-5 Str. Exp. 2/3 L 41400 101 102 103 104 100 200 300 400 500 600 700 τ (seconds) Domain Size (residues) c FIG. 2. The predicted folding times from each model in Fig- FIG. 3. The of probability of observing the observed domain ure1,inadirectcomparison. Intuitivetimescalesaredenoted sizes (SI Fig. 1) given a rate scaling law and a folding com- for clarity. plexity cutoff time, τc. Assumes the Gaussian model (with C =0.10)describedinthemaintext. Notethey-axisislog 10 anddividedby103 forclarity,sosmalldifferencesontheplot TABLEII.EstimatedFractionofProteinDomainswithFold- are actually quite large. ing Times Greater than Time Indicated Hour Day Month Year the case of the exponential form, since exponential scal- Power Law 0.41% 0.01% 0.00% 0.00% ing of the folding times has often been associated with Exponential 9.56% 5.70% 3.34% 2.46% Levinthal’s paradox. This study shows that exponential S. E. 1/2 5.48% 2.37% 0.95% 0.57% scaling is reasonable given current experimental data, so S. E. 2/3 7.49% 3.53% 1.74% 1.11% longastheexponentialscalingconstant(α)issufficiently small. Clear evidence for any one rate law remains missing, calculating the probability of observing the empirically however with a few clear examples of very large globular observeddomainsizedistribution(Fig.3)showsthatfor proteins (500 residues or larger) capable of folding unas- values of τc 10 seconds, the power law model is clearly sisted in vitro, it might be possible to discriminate be- ∼ the best. However, for any values of τc greater than 100 tween the models proposed here. Figure 2 clearly shows seconds, the exponential laws are much better models. thedivergencesbetweenpredictedfoldingtimesforlarge proteins, and shows how just a few data points in this extremeregimemightbeabletobegindifferentiatingbe- CONCLUSIONS tween the proposed models investigated here. Thus, we have a mixed conclusion. While the power lawmodelappearstobestexplaintheavailablerawdata, it results in very fast predicted folding times. The ex- ponential forms, while doing a marginally poorer job of ∗ [email protected] explaining the raw data, yield a distribution of folding [1] D. Thirumalai, J Phys I 5, 1457 (1995). [2] A. M. Gutin, V. I. Abkevich, and E. I. Shakhnovich, times much more in line with what we expect from bi- Phys. Rev. Lett. 77, 5433 (1996). ology. Given the current available data, no clear victor [3] C.Levinthal,MossbauerSpectroscopyinBiologicalSys- emerges. tems Proceedings of a meeting held at Allerton House , Previous theories have claimed that simply predicting 22 (1969). one of the four laws investigated here is strong evidence [4] N. S. Bogatyreva, A. A. Osypov, and D. N. Ivankov, insupportofthattheory. Thisismanifestlynotthecase Nucleic Acids Research 37, D342 (2009). – not only must the proposed law be reasonable, but it [5] K. W. Plaxco, K. T. Simons, and D. Baker, Journal of Molecular Biology 277, 985 (1998). must also predict reasonable parameter estimates, and [6] H. Nakamura and M. Takano, Phys. Rev. E 71 (2005). even then the supporting evidence the rate scaling law [7] D. N. Ivankov, S. O. Garbuzynskiy, E. Alm, K. W. can provide given current data is limited. Conversely, Plaxco, D. Baker, and A. V. Finkelstein, Protein Sci- the analytical theories mentioned here are not ruled out ence 12, 2057 (2003). by the current available data. This is most striking in [8] C. Lawrence, J. Kuge, K. Ahmad, and K. W. Plaxco, 5 Journal of Molecular Biology 403, 446 (2010). and O. V. Galzitskaya, Curr. Protein Pept. Sci. 8, 521 [9] M.CieplakandT.X.Hoang,Biophys.J.84,475(2003). (2007). [10] M. Cieplak, T. Hoang, and M. Li, Phys. Rev. Lett. 83, [18] P. Wolynes, Proc. Natl. Acad. Sci. 94, 6170 (1997). 1684 (1999). [19] Z.OuyangandJ.Liang,ProteinScience17,1256(2008). [11] T. J. Lane and V. S. Pande, J. Phys. Chem. B , [20] D.DeSanchoandV.Mun˜oz,Phys.Chem.Chem.Phys. 120411093425008 (2012). 13, 17030 (2011). [12] R.Zwanzig,A.Szabo, andB.Bagchi,Proc.Natl.Acad. [21] D. N. Ivankov and A. V. Finkelstein, J. Phys. Chem. B Sci. 89, 20 (1992). 114, 7930 (2010). [13] R. Zwanzig, Proc. Natl. Acad. Sci. 92, 9801 (1995). [22] F.U.HartlandM.Hayer-Hartl,NatStructMolBiol16, [14] M. S. Li, D. K. Klimov, and D. Thirumalai, J. Phys. 574 (2009). Chem. B 106, 8302 (2002). [23] D. Xu and R. Nussinov, Fold Des 3, 11 (1998). [15] A. N. Naganathan and V. Mun˜oz, J. Am. Chem. Soc. [24] M.-y. Shen, F. P. Davis, and A. Sali, Chemical Physics 127, 480 (2005). Letters 405, 224 (2005). [16] N. Koga and S. Takada, Journal of Molecular Biology [25] R.E.KassandA.E.Rafterty,JAmStatAssoc90,773 313, 171 (2001). (1995). [17] A. V. Finkelstein, D. N. Ivankov, S. O. Garbuzynskiy, Supplemental Information: Inferring the Rate-Length Law of Protein Folding Thomas J. Lane and Vijay S. Pande ∗ Department of Chemistry, Stanford University (Dated: January 21, 2013) 700 102 600 101 ) 100 500 s cy e ( 10-1 13 uen 400 Tim 10-2 0 eq 300 ng 10-3 2 Fr di 10-4 200 ol n F 10-5 KineticDB a 100 Liang J 10-6 Munoz Finkelstein 8 0 10-7 0 200 400 600 800 1000 0 50 100 150 200 250 300 1 Domain Size (Residues) Chain Length (residues) ] h p FIG. 1. The size distribution of non-homologous protein FIG. 2. The rate-length data from each source used, plotted - domainslistedinthePDB.Ofthe12,151sequencesreported, together. Even though there is some overlap in reported se- o i 39 are larger than 1000 residues (0.32%) - while not shown quences, many identical proteins have different chain lengths b here for clarity, they were included in subsequent analy- or folding times reported. We combined these data for our s. ses. Data were obtained from the NIH’s VAST algorithm analysis, eliminating only identical measurements. c (http://www.ncbi.nlm.nih.gov/Structure/VAST/nrpdb.html) si on 5/5/2012 with a dissimilarity p-value of 10−7. y TABLE I. Parameters Estimated from Different Datasets h p Pr. Law Expon. S. E. 1/2 S. E. 2/3 DIFFERENT DATASETS OF PROTEIN [ Dataset ν σ α σ α σ α σ FOLDING KINETIC DATA KineticDB 5.44 3.06 0.046 3.32 1.11 3.13 0.37 3.19 1 v KDB (mut) a 4.46 2.42 0.040 2.53 0.91 2.45 0.31 2.47 We are aware of four collections of protein folding ki- 2 Liang 4.75 2.54 0.041 2.90 0.94 2.70 0.31 2.77 0 netic data, here termed “KineticDB” [1], “Liang” [2], Munoz 4.79 3.05 0.054 3.14 1.04 3.09 0.37 3.11 3 “Mun˜oz” [3], “Finkelstein” [4]; note Finkelstein is re- 4 portedly a subset of KineticDB. Each is purported to Finkelstein 5.24 3.08 0.048 3.32 1.08 3.17 0.36 3.22 . 1 be extracted directly from the primary literature. Inter- a TheKineticDBdatasetwithmutantsincluded. Sincethereare 0 estingly, whilethereismuchoverlapofreportedproteins onlyafewproteinswithmutants,andtherearemanymutants 3 between each dataset, there are systematic inconsisten- forthesefewproteins,thisdatabasegivesartificiallymore 1 weighttothoseindividualproteins. cies between them. Most report that they extrapolate : v the folding times to zero denaturant - we suspect this is i X the origin of the discrepancy, but have not undertook a chain lengths and folding times in water. detailed investigation. Instead, we chose the KineticDB r a datasetbecauseitcontainedthemostproteins,appeared wellcurated,andrestrictedentriestoproteinsatzerode- STRETCHED EXPONENTIAL WITH β AS A naturant, near room temperature, and at neutral pH. FREE PARAMETER Interestingly, despite their discrepancies, the datasets appear to be similar on a coarse level (Fig. 2). Further, In the main text we investigated the specific stretched our fitting analysis run on each independently results in exponentialformsproposedbyextantsimplemodels(i.e. parameters that are not too far from one another (Table withβ as1/2and2/3). Thereisnoreasonwhyβ cannot I),andtherestofourresults(whichprimarilyfollowfrom be simply fit as a free parameter, however, and we have these parameters) are very similar between datasets. performed this fit (Fig. 3). Remarkably, the most likely Forthisstudy,wesimplyextractedallnon-mutanten- parameters are those with α 102 and β 10 2, which − ≈ ≈ tries from the KineticDB and employed their reported is far outside the range predicted by theory. With these 2 me (s) 1111100000-01231 αβσ===331...403765ee−+0022 g Prob. (Normalized)10 110460822 Power Law g Prob. (Normalized)10 108642000000 Exponential Ti 10-2 Lo 140 2 4 6 8 10 Lo 1200.0 0.2 0.4 0.6 0.8 1.0 1.2 g 10-3 ν α n Stretched Exponential 1/2 Stretched Exponential 2/3 Foldi 1111100000-----87654 g Prob. (Normalized)10 765432100000000 g Prob. (Normalized)10 108642000000 101 102 Lo 800 2 4 6 8 10 Lo 1200 1 2 3 4 5 6 7 8 α α Residues FIG.4. Theparameterposteriorsforeachmodelaresharply FIG.3. Theparameterfitforastretchedexponential,logτ peaked around their modes - plotted here is the maximum αNβ,withβ afreeparameter. Noticehowthefitisperfectl∝y (not the marginal value) of the posterior at various values of straightonalog-logplot,acharacteristictraitofpowerlaws. the key parameters α or ν. One can see that the likelihood has a sharply peaked value along this dimension. extremeparameters,themodelappearstobeapowerlaw TABLE II. Bayes Factors Comparing Datasets form over the range of fit data, and is linear on a log-log plottoabout1025 residues. Thissimplyverifiesthatthe Pr. Law Exp. S. E. 1/2 S. E. 2/3 dataarebestexplainedbyapowerlaw,anddemonstrates Power Law 1.32 105 3.55 101 4.74 102 that the power law fit with free β is sufficiently flexible Exponential 7.55 10−6 · 2.68 ·10−4 4.74·102 to capture this fact. · · · S. E. 1/2 2.81 10−2 3.73 103 1.33 101 · · · S. E. 2/3 2.11 10−3 2.80 102 7.50 10−2 · · · BAYESIAN MODEL COMPARISON Asmentionedinthemaintext,itisentirelypossibleto the main manuscript. This choice is justified because we pursue a full-fledged Bayesian analysis to determine the have found the posterior to be strongly peaked around parameters for each model. Here, we show the common- the mode, as seen in Figure 4. alities and differences between the maximum likelihood approach pursued in the main text and a Bayesian ap- The only other difference between a Bayesian analysis proach. We show that it makes little difference which is and the likelihood methods are the way models are com- chosen. pared. In contract to a likelihood ratio, Bayesian statis- We have little a priori information about the parame- tics recommends a Bayes’ factor [5], which compares F terswearetofitbesidesthefactthateach(α,ν,σ)must two models (M1, M2) begreaterthanzero. Thus,wechooseauniformpriorfor each over the interval [0, ], and write this distribution (D θ)π (θ) π(θ), where θ will stand i∞n for the vector of parameters 12 = θLM1 | M1 F (D θ)π (θ) relevant to a particular model. Then we can write the RθLM2 | M2 Bayesian posterior as R which explicitly includes information from all possible P(θ D) (D θ)π(θ) values of the parameters. The likelihood ratios used in | ∝L | themaintextareasaddleapproximationtotheintegrals, with the likelihood from the main text. Since π is L andthusthesetwomethodsmatchinthecasewherethe uniform, however, we can write posteriors are highly peaked. Table II shows the calcu- latedBayes’factorsforeachmodel. TheseBayes’factors P(θ D) (D θ) | ∝L | were calculated by also integrating over τ , the param- 0 as long as all the parameters are in the interval [0, ]. eter accounting for a constant offset (or units) in our ∞ Given the posteriorP(θ D), it is common to simply take fits. The domain of integration for τ was restricted to 0 | the mode as the best representative set of parameters - [ 50,10] for numerical stability - increasing it beyond − which are the maximum likelihood parameters used in this range resulted in little difference. 3 of values of C were chosen, and these implied values of TABLE III. Most Likely τ for Values of C (in seconds) c τ for each model, resulting in Table III. c C 0.1 0.05 0.01 0.001 Power Law 1.10 101 2.88 101 1.23 102 4.36 102 · · · · It is also possible to calculate the likelihood of observ- Exponential 3.03 102 5.85 103 1.51 106 7.64 108 · · · · ingtheknownempiricaldatagivenaspecificτ ,vialikeli- S. E. 1/2 9.99 101 6.54 102 1.53 104 3.47 105 c · · · · hoodmaximization. Theprocedureherewasstraightfor- S. E. 2/3 1.68 102 1.57 103 7.67 104 4.25 106 · · · · ward; first, C wassetto0.10, µN wasagainfixedat105, τ wasfixedatanarbitraryvalueforachosenmodel,and c σ was varied to maximize the likelihood of witnessing N DETAILS OF THE GAUSSIAN FIT MODEL the empirical distribution given that model. The results for a scan of τ for each model is reported in Figure 3 in c Here we go over some of the details of the Gaussian the main text. modelthatwasusedtopredictthefoldingtimedistribu- tion. Inthemaintext,wepresentedthecentralequation ∞ 1 (N µN)2 exp − =C Zf−1(τc/τ0) 2πσN2 (cid:20) 2σN2 (cid:21) [1∗] [email protected],dAu.A.Osypov, andD.N.Ivankov,Nu- from which all of opur subsequent analysis begins. Using cleic Acids Research 37, D342 (2009). [2] Z.OuyangandJ.Liang,ProteinScience17,1256(2008). this model, we can predict τ for a chosen model and c [3] D. De Sancho and V. Mun˜oz, Phys. Chem. Chem. Phys. chosen C. To do this, µ was fixed at 105, the empirical N 13, 17030 (2011). mode of the distribution of domain sizes (Fig. 1) – this [4] D. N. Ivankov and A. V. Finkelstein, J. Phys. Chem. B appeared to make subsequent fits more robust. The pa- 114, 7930 (2010). rameter σ was then fit, via likelihood maximization, to [5] R. E. Kass and A. E. Rafterty, J Am Stat Assoc 90, 773 N the same empirical distribution. Subsequently, a series (1995).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.