ebook img

Progress and Prospects PDF

18 Pages·2005·0.16 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Progress and Prospects

P1:GDL March24,2001 12:13 AnnualReviews AR128-08 Annu.Rev.Biophys.Biomol.Struct.2001.30:173–89 Copyright(cid:176)c 2001byAnnualReviews.Allrightsreserved A I P S P : B NITIO ROTEIN TRUCTURE REDICTION Progress and Prospects g Richard Bonneau and David Baker or ws. DepartmentofBiochemistry,Box357350,UniversityofWashington,Seattle, e vi Washington,98195;e-mail:[email protected],[email protected] e ualrnly. no s.anuse KeyWords structuralgenomics,proteinfolding,scoringfunctions,foldprediction urnalonal n Abstract Considerablerecentprogresshasbeenmadeinthefieldofabinitiopro- os arjper tein structure prediction, as witnessed by the third Critical Assessment of Structure 9. Downloaded from GE on 03/24/05. For Ptwfwooroeueprdnrkrideo.cvadttiiuieoocwnne(stcChooAefnsrSfeiePscat3eetu)nn.rttIelpnysrsoropegfilrticeeausbosrlfreientanhbttihsaiepnbrfiiotieignoldritesiastornsu,dpcmrtsouuutrgcoehgcpoewrslestodpriiknrcotraimeonmnisaaiptnitnrgeosmt,dofipcorteorclttsothi.oehInnfiisgethflhdoliirsghfhwautsotuytrhrkeeet, 8E 3-1LL 7O 30:1L C CONTENTS 1.A 200DIC INTRODUCTION :::::::::::::::::::::::::::::::::::::::::::::::: 174 Struct. V ME RELDaUttiCceEDMoCdOelMsP:L::E:X:I:T:Y:M::O:D::E:L:S::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 117756 ol. UNI DiscreteStateOff-LatticeModels ::::::::::::::::::::::::::::::::::: 176 omL NarrowingtheSearchwithLocalStructurePrediction::::::::::::::::::::: 176 s. BiNEL SCORINGFUNCTIONSFORREDUCEDCOMPLEXITYMODELS :::::::::: 177 phyOR Solvation-BasedScores :::::::::::::::::::::::::::::::::::::::::: 178 Bioy C PairInteractions:::::::::::::::::::::::::::::::::::::::::::::::: 178 v. b SequenceIndependentTerms/SecondaryStructureArrangement ::::::::::::: 179 e nnu. R THhigehU-RseesoofluMtiuolntipSlteruScetuqrueenPcreedAicltiigonnm—ePntrsosinpeTcetsrtifaorryRPerfiendeicmtieonnt:::::::::::::::::::::::::: 118801 A THEROSETTAMETHOD ::::::::::::::::::::::::::::::::::::::::: 181 RELATIONSHIPTOPROTEINFOLDING?::::::::::::::::::::::::::::: 183 APPLICATIONSOFABINITIOSTRUCTUREPREDICTION ::::::::::::::: 183 GenomeAnnotation ::::::::::::::::::::::::::::::::::::::::::::: 183 HomologyModeling :::::::::::::::::::::::::::::::::::::::::::: 184 StructuresfromLimitedConstraintSets::::::::::::::::::::::::::::::: 184 CONCLUSION:::::::::::::::::::::::::::::::::::::::::::::::::: 185 1056-8700/01/0610-0173$14.00 173 P1:GDL March24,2001 12:13 AnnualReviews AR128-08 174 BONNEAU ¥ BAKER INTRODUCTION Thegoalofabinitiostructurepredictionissimple:Givenaprotein’saminoacid sequence predict the structure of its native state. It is generally assumed that a proteinsequencefoldstoanativeconformationorensembleofconformationsthat isatorneartheglobalfree-energyminimum.Thus,theproblemoffindingnative- likeconformationsforagivensequencecanbedecomposedintotwosubproblems: (a) developing an accurate potential and (b) developing an efficient protocol for g or searchingtheresultantenergylandscape. s. w Todate,themostsuccessfulmethodsforstructurepredictionhavebeenhomo- e evi logy-based comparative modeling and fold recognition (58). When homologous ualrnly. orweaklyhomologoussequencesofknownstructurearenotavailable,themost no s.anuse successful structure prediction methods have been those that predict secondary urnalonal stitmruect(u1r2e,a4n0d,4lo3c,4al6,s7tr4u)c.tuTrheismroetvifise;wthfeosceusmeesthoondcsuhrarevnetbmeeenthaovdasilfaobrlepfroedriscotimnge os arjper tertiarystructureintheabsenceofhomologytoaknownstructureanddiscusses 9. Downloaded from GE on 03/24/05. For ttophhfreeokMsapencraohlonowetycsela,eionladfpngdtrdehae-ettdbahimaecsbteeciatodonhnokosrdmcd(osPierntDtioahndtBogead)yfl.siubTtonhrnhcaalitrtysiiopeiinsrnnesofd,tofhitrchemmtecepaottrtrhnioaoottienednxisincnttagoshntfarstutebeucterststuiefoarofferurymaunsgsdatemrciuihnencinfnttutohesrreemolerppaaatrtreeirnoamdinmnipcgfeltraitoaoetpemnrss-. 8E fromthePDB.Inordertotesttheperformanceofanyoneoftheseapproaches,one 3-1LL must carefully remove sequences that are homologous to proteins in the test set 7O 30:1L C fromalldatabasesusedbythemethodinquestion.Anyerrorsoroversightsmade 1.A atthisstagecouldleadtooverestimatesofsuccess.Forthesereasons,theCritical 0C 20DI Assessment of Structure Prediction (CASP), a biannual, community-wide blind uct. ME testofpredictionmethods,wasconceivedandimplemented(4,6,27).Threesuch omol. StrL UNIV tTehstrsouhgavheouotccthuirsrerdevainedwt,hoeufroausrstehswsmaseunntdoefrtthaekepnertfhoisrmsuamncmeeorf(iv.aer.,io2u0s00m)e(t2h0o,d3s9i)s. BiEL influencedbytheresultsofthesecommunity-widetests,especiallyCASP3(58). ys. RN CASP1andCASP2showedthatinspiteofsomesuccessreportedintheliterature, ophCO littlesuccesswasseenforproteinsofthesizeandtypebeingsolvedbystructural Biy v. b biologists (50). The consensus after these first two experiments was that the ab Re initiostructurepredictionproblemwouldmostprobablybesolvedtoolatetobe nu. appliedtoanyrealbiologicalproblems(20,50).CASP3showedsomereversalof n A thisconsensus.Severalmethodsmadegoodpredictionsintheabinitiocategory, and some ab initio methods outperformed fold recognition methods for certain proteinsinthefoldrecognitioncategory(61,63,64). Inspiteofrecentprogress,manyissuesmuststillberesolvedifaconsistently reliableabinitiopredictionschemeistobedeveloped.Noonemethodperforms consistently across all classes of proteins (most methods perform worse on all- betaproteins),andallmethodsexaminedseemtofailtotallyonsequenceslonger than 150 residues in length (the longest contiguous blind predictions to date are »100residuesinlength).Inaddition,thesuccessfulpredictionmethods,asjudged P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 175 byperformanceatCASP3andbyperformancereportedintheliterature,showa large diversity in their formulation. Our goal here is to review the common fea- turesoftheserecentmethodsinordertohighlightcurrentchallengesandfuture possibilitiesforthefieldofabinitiostructureprediction. Themostnaturalstartingpointforsimulatingproteinfoldingisstandardmolec- ular dynamics (MD) simulation (numerically integrating Newton’s equations of motionforthepolypeptidechain)usingaphysicallyreasonablepotentialfunction. Thisapproachhasalonghistoryandisstillpopular,asillustratedperhapsmost g or dramatically by IBM’s Blue Gene project, which will apparently be devoted, at s. w leastinpart,tosuchanendeavor.Thereareseveralratherobviousproblemsthat e vi havelimitedthesuccessofsuchapproachesthusfar.First,MDiscomputationally e ualrnly. expensive—withexplicitrepresentationofsufficientwatermoleculestominimally no urnals.anonal use staaonkldveasint»ecrt4eh0aes0efhsoolidnuirnasvgoacinlhaaabilcneu,craroenmnatpnsuoitnseegrclpeoonpwdreoMrcehDsasvsoeirm.cAounldasvtiidaonenrcaeobsflyiane1sx0itm0en-urdleaestdiidosuniemsputrrlaoattteiegoinyn os arjper times; for example, Kollman and coworkers carried out a microsecond simula- 89. Downloaded from EGE on 03/24/05. For rmamtHiesooqoosnwourleecioecrifvceauoedtaleermf,d3sosp6wiriu-nmaritetiuwhsnsligiaanMdttgeutiiDnlreme.gpaAfetore.hlelpteTdhttiifhhondoeeugeldgisutnihernsacaigdionnmensogqipdftuaio,aoarac1nctnoa0irdenn0ests-qpiripduenererisorhcriegaduasbruprermleseesnspomatrrhmooepatrootesehtuieanbsnnneetftreoisoinaorifxlutmshfsouue,arpnddctecyelertapr,icssoiotscnhomaoseflrfpfm»eoup1artiregsomsrnebsaticltiteciomulmrlndoedaes-. 3-1LL lackofconsensusastothebestcomputationallytractableyetphysicallyrealistic 7O 30:1L C model for water (a number of quite different models for water, both polarizable 1.A andnonpolarizable,havebeenusedincurrentsimulationmethods),andsomeun- 0C 20DI certainty exists as to the values of the parameters used in molecular mechanics uct. ME potentials(partialatomiccharges,Lennard-Joneswelldepthsandradii).Accurate omol. StrL UNIV rgerpereeosefnptoatliaornizoabfielilteyctorofswtaattiecrs,itshaelslaorgaecodnifsfiedreernacbeleinchthaellednigeleecgtirviecnptrhoepherigtihesdoe-f BiEL thesolventandtheprotein,andtheuncertaintiesinthemagnitudeandlocationof s. N atomicpartialcharges.Becausethefreeenergyofaproteinrepresentsadelicate yR ophCO balanceoflargeandopposingcontributions,theseproblemssignificantlyreduce v. Biby the likelihood that the native state will be found at the global free-energy mini- Re mumusingcurrentpotentials(2).ThebestcurrentuseofMDmethodsmaybein u. refininganddiscriminatingamongmodelsproducedbylower-resolutionmethods n n A (28a,49). REDUCEDCOMPLEXITYMODELS To overcome the sampling problems mentioned above, most methods for fold predictiontodatehaveinvolvedsomesignificantcomplexityreduction.Methods for reducing protein structure to discrete low-complexity models can be divided intotwomajorclasses:latticeandoff-latticemodels. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 176 BONNEAU ¥ BAKER LatticeModels Latticemodelshavealonghistoryinthemodelingofpolymersduetotheiranalyt- icalandcomputationalsimplicity(22).Theevaluationofenergiesonalatticecan beachievedquiteefficiently(integermathcanbemadequitefast),andmethods involvingexhaustivesearchesoftheavailableconformationalspacebecomefea- sible (30,35,85). However, lattice methods have a somewhat restricted ability to represent subtle geometric considerations (strand twist, secondary structure g propensities, and packings) and can reproduce the backbone with accuracies no or s. greater than approximately half the lattice spacing (73). The most common sys- w vie tematicerrorobservedforavarietyoflatticemodelsistheirinabilitytoreproduce e ualrnly. helices, and most lattice models exhibit various degrees of secondary structure no bias(70).Giventheimportanceofregularsecondarystructureinproteins,thisis s.anuse clearlyaproblem.Recentsuccessfultests,however,haveshownthatthecompu- urnalonal tationaladvantagesoflatticemodelsmayoutweightheproblemsassociatedwith os arjper theirsystematicbiases(30,47,48,73). 9. Downloaded from GE on 03/24/05. For DiscrMaseinntodegsltaSelotlrfaofbt-toelaanmtOdteicflree,f-norLgertadhftuustc.ritecThdeehcreMotmmoooptdhsleeetxlcCsiotyflm,mmoorodtnoelpsornafiecxtioacrlelmsisiodrteoecclhiemaninittrodtihedegsrsepiedlsuesocfbhafarcieknebdtooonmea 3-18LLE atoms(82,87).Discretestatemodelsoftheproteinbackboneusuallyfixallside 7O chaindegreesoffreedomandlimitthebackbonetospecificPhi/Psipairs:Models 30:1L C containingfrom4to32Phi/Psistatesrepresentingvariousstrand,helix,andloop 1.A 0C conformationshavebeendescribedintheliterature(70).Properlyoptimizedsix- 20DI Struct. V ME satsatsetrmanoddse,lhse(lii.cee.,sm,aonddelcsatnhoantiaccacloluonotpfso)rclaoncarlefperaotduurecseonbasteivrevecdoinntapcrtost,epinressseurcvhe ol. UNI secondary structure, and fit the overall coordinates of the native state as well as omL 18-state lattice models that do not account for such protein-specific information BiEL (70). s. N yR phO oC Biy NarrowingtheSearchwithLocalStructurePrediction v. b e R Local structures excised from proteins can fold independent of the full protein, nu. demonstratingthatstronglocalstructuralbiasescanexistforshortsequenceseg- n A ments(8,15,54,60).Severalexamplesofexcisedfragmentshavinglittleobserv- able structure also exist, indicating that the strength and multiplicity of these localbiasesarehighlysequencedependent.Somesequencesareobservedtofoldto differentconformationsdependingontheirglobalsequencecontext,againdemon- strating the possible multiplicity of local structure biases (17,42). Bystroff et al developed a method that recognizes sequence motifs (ISITES) with strong ten- dencies to adopt a single local conformation that was used to make some good local structure predictions in CASP2 (12–14, 29). Despite the ambiguities in P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 177 local sequence-structure relationships, secondary structure prediction methods havebeensteadilyimproving(63,64). Theabovementionedexperimentsandobservationssuggestthatanymethodat- temptingtouselocalsequence-structurebiasestoguidecomplexityreductionswill havetobeadaptivetothestrengthandtothemultiplicityofdifferentsequence- structure patterns. The majority of methods proving successful at CASP3 used secondarystructurepredictionsinonewayoranother.Inonecase,predictedsec- ondarystructureelementswerefittotheresultsofinitiallattice-basedexhaustive g or enumeration,thuserasinganypossiblesecondarystructurebiasintheinitiallat- s. w ticemodelpriortoall-atomrefinement(1).TheRosettamethodusedsecondary e vi structuretobiastheselectionoffragmentsofknownstructurefromthePDB.Yet e ualrnly. another paradigm is, given a secondary structure string, to reduce the problem no s.anuse ofpredictingthetertiaryfoldtotheproblemofhowtoassemblerigidsecondary urnalonal sintrduecptuenredeenletmoefnstesc(o2n4d).arAydsdtirtuiocntuarlempertehdoidctsiothnataldgeotreirtmhminse(lboycaclasltcruulcattuirnegbtihaeseses os arjper biasesduringthefoldingsimulation)havealsobeendescribed(86). 89. Downloaded from EGE on 03/24/05. For pT7cpor6rheen%eTddsiihb–iccsee7tttrsiie8ooetn%nnstil,e,symcwaloseinhnkutdihdeccolachayrednytssytasoskfotauerwbbluseicpnsitnuarguenicrtdhetioouicpipnttrmpihetoeeedenrritiasrhclctoif(itmda4oioi0nilnmtu,as7roule4ginsn)ott.totroahiAtceaahcccmmoacccuosoicnluuheutnnsartttfavo,ocefnwryoterhitolhrflnfieosobersne-eesalcrtotbrhaocotenaierndlpairartiacriontyocedtueussrrtcttaaorrtccuuiitmocceinottsauunkoorrseeeff. 3-1LL modelswithsecondarystructurepredictedmoreaccuratelythanispossiblewith 7O 30:1L C traditionalsecondarystructuremethods. 1.A 0C 20DI SCORINGFUNCTIONSFORREDUCED uct. ME COMPLEXITYMODELS StrV ol. UNI omL Onceamodelforrepresentingtheproteinischosenthatsufficientlyreducesthe BiEL complexityoftheconformationalsearch,ascoringorenergyfunctionthatworks ys. RN inthechosenlow-complexityspacemustbedeveloped.Theenergyfunctionmust ophCO adequatelyrepresenttheforcesresponsibleforproteinstructure:solvation,strand Biy v. b hydrogenbonding,etc.Giventhatmostlow-complexitymodelsdonotexplicitly e R represent all atoms and can reproduce even the native state backbone with only nu. limited accuracy, any energy function designed to work in the low-complexity n A regime must represent these forces in a manner robust to such systematic error (thesystematiclimitationsofthemodel).Last,thesefunctionsmustbecomputa- tionallyefficient,forduringtheinitialstagesofanyconformationalsearch,huge numbers of energy evaluations are necessary. Because of the shortcomings of molecularmechanics-basedpotentials,andtheconsiderationsabove,manymeth- odsdevelopedinthelasttenyearsutilizescoringfunctionsderivedfromtheprotein databasethatinessencefavorarrangementsofresiduesfrequentlyfoundinknown proteinstructuresanddisfavorrarelyseenarrangements. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 178 BONNEAU ¥ BAKER Solvation-BasedScores Ithasbeenlongthoughtthatthehydrophobiceffectistheprincipaldrivingforce behindproteinfolding(3).Manydiversemethodsforjudgingthefitnessofcon- formationsbasedonsolvationorhydrophobicpackingexist,andthedebateover the proper functional form for representing solvation effects represents an open question of considerable importance and interest (69). A common approach is to classify sites in proteins according to their degree of solvent exposure (either g throughthesurfaceareaorthenumberofnearbyresidues)(11)andtodetermine or s. thefrequenciesofoccurrenceoftheaminoacidsineachtypeofsite.Theenergyor w vie scoreofanaminoacidatasiteisthentakentobethelogarithmoftheaminoacid’s e ualrnly. frequencyofoccurrenceatthattypeofsite.Thistypeofresidue-environmentterm no favorsplacementofhydrophobicaminoacidsatburiedpositionsandofhydrophilic s.anuse aminoacidsatexposedpositions. urnalonal Anadditionalcommonlyusedclassoffunctionalformsconsistsofglobalmea- os arjper suresofhydrophobicarrangement.Onesimpleglobalquantityisaresidue’sdis- 9. Downloaded from GE on 03/24/05. For tquturaeeussnrceiamnocdneggsttni,fhtariiiinozensemscteylnavutpnahodetaeliiunloveotgfeingoftsoaunitrunraesusrcucycttriotofouananrltcfgeh,oesocerrmoa(ih1ritaneyh,taedm3i–dor4obn)h(pa.’1ysshO1deco)rdne.boeniHptctepheurrrromaoaobndb,figicltmueocemsaotfsonoawsftl,lrdigwatuhyssshtrme,tiahcditanheilotlcchanoaabi.snmlopBvtbbhyoeeiapwn-ugeahisleteoeoilobdf&incatfalouwElfncuiipcastnhretlioccnoouttbinetloheiannrtetogessr 8E 3-1LL is that they assume that proteins are ideally spherical in shape when in actual- 30:17L CO iptryoancahtivuesepsraonteeilnlispseoxihdiablitapaprmouxcimhaltairogneorfrtahnegsehaopfesohfapthees.hAydrmopohreobfliecxciobrleetahpa-t 1.A 200DIC doesnotrequireasignificantincreaseincomputationandaidsintheselectionof uct. ME near-nativeconformationsfromdecoysetscontainingahighnumberofprotein- StrV likeyetincorrectcompactconformations(9).Theproblemassociatedwiththese ol. UNI functionsisthattheywillinevitablyexcludeasmallpercentageofproteinstruc- omL turesthatdeviatefromtheirassumptionsconcerningshapeandthusfailwhena s. BiNEL protein is divided into small subdomains or contains large invaginations (1HQI yR phO is a toroid). In spite of this potential downfall, they have demonstrated their oC Biy usefulnessinseveralmethodsowingtotheirabilitytorecognizethemajorityof v. b smallhydrophobiccores,theirsimplicity,andthespeedwithwhichtheycanbe e R u. computed. n n A PairInteractions Manylow-resolutionpotential/scoringfunctionsutilizeanempiricallyderivedpair potentialinplaceoforinadditiontotheresidue-environmenttermdescribedabove. Themostcommonofthesepotentialsarefunctionsofthepositionofasinglecenter perresidue(C ,C ,orcentroid/unitedatomcenter)andarethusquitecomputa- fi fl tionallyefficient;all-atomfunctionshavealsobeenused(77).Manyvariationsof pair terms have been developed, with the two main branches of methods being P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 179 distancedependentandcontactbased(57,84).Liketheresidue-environmentterm mentionedabove,thesescoringfunctionsaresometimesjustifiedbypositingthat thearrangementsofresiduesinproteinsfollowaBoltzmandistribution:E(x) D kTlnP(x),wherexisafeaturesuchastheoccurrenceoftworesiduesseparated byadistancelessthanr.Alternatively,thesescoringfunctionsmaybeseenpurely asprobabilitydistributionfunctions(23,83).Intheformercase,theoptimization may be viewed as a search for the lowest energy configurations; in the latter, a search for the highest probability configurations. For most applications, there is g or little practical difference between the two viewpoints. The issue becomes more s. w substantive,however,whensuchdatabase-derivedscoringfunctionsarecombined e vi withphysics-basedpotentials,aswilllikelybecomeincreasinglyusefuloverthe e ualrnly. nextseveralyears. no s.anuse Severalproblemsareassociatedwithstatisticallyderivedpairpotentials.The urnalonal ainstseurmacptitoionnstihsantoftregeeneenrearlglyievsacliadnabcerorsesparellseinntteerdacbtyionsusmprmesinegntoivneprrcooteminpso,naenndt os arjper thus,thebasicfunctionalformmaynotbeadequatetorepresentthefreeenergyof 89. Downloaded from EGE on 03/24/05. For aia(mWs8npe9itorthn)mho.attsttTaehloitheonhfiueesctyslhocimeenaafnrftifeownebracdomettoisraocmetsonisuiroinorcdneafhuctt(eeht2aedses1d,,boaw5bythl3hyohe)incc.rywgohTd-nrhilrsodaaeepirntmgohigeovoeolnebysrirtinewcespg/lihpiugmteolnhlsliimeainfiorpaicnnptaaegaibnsrretitndttpihwtfliriseoouetsrbneeeinnlibenucumhgnety,idoowdwefnrsihhositrhiypoecdhpdhnroaotegbihprfiifvechpeeooecrsbenttessrivcniii(sdrtp8eiouaa2netlr)oss--. 3-1LL titioning,specificinteractionssuchaselectrostaticattractionbetweenoppositely 7O 30:1L C charged residues dominate the pair scoring/energy functions, and hydrophobic 1.A interactions make relatively modest contributions. The pair term, in this case, is 0C 20DI perhapsbestviewedasthesecondterminaseriesexpansionfortheresidue-residue uct. ME distributionsinthedatabaseinwhichtheresidue-environmentdistributionsarethe omol. StrL UNIV firsStotemrme.oftheearliestcomprehensivetestsofthediscriminatorypowerofthese BiEL pairpotentialsweredoneinthecontextofthreadingself-recognition(62).Later s. N workdemonstratedthattheself-recognitionproblemwasnotasufficientlychal- yR ophCO lengingtestofscoringfunctionsandfocusedontheperformanceofmultiplepair- v. Biby wiseenergyfunctionsonlarger,morediversesetsofconformations(24,38,41,68, Re 69). The performance of the various energy functions at recognizing native-like u. structuresinlargeensemblesofincorrectdecoysishighlydependentonthemeth- n n A odsusedtocreatethedecoysets,highlightingthefactthatanenergyfunctionthat works well in the context of one method will not necessarily work well given a decoysetcreatedusinganorthogonalmethod. SequenceIndependentTerms/SecondaryStructure Arrangement Manyfeaturesofproteins,suchastheassociationofbetastrandsintosheets,canbe described by sequence independent scoring functions. Several early approaches P1:GDL March24,2001 12:13 AnnualReviews AR128-08 180 BONNEAU ¥ BAKER to folding all-beta proteins were protocols dominated by initial low-resolution combinatorialsearchesofpossiblestrandarrangementsandwereconcernedonly withtheprobabilityofdifferentstrandarrangements(16,18,19,72).Theseearly methodsnarrowedtheconformationalspacebyconsideringsequencespecificef- fects only in the context of highly probable strand arrangements. Several of the relatively successful methods at CASP3 incorporated secondary structure pack- ing terms (51,81). Ortiz et al used an explicit hydrogen bond term in combi- nation with a flfifl and a flflfi chirality term to ensure protein-like secondary g or structureformation.Itcannotbeexpectedthatlow-complexitylatticemodelspro- s. w ducethecorrectchiralityorsubtlehigher-ordereffectssuchasstrandtwist,and e vi these rules sensibly correct for these expected shortcomings (65). The Rosetta e ualrnly. method used three terms that monitored strand-strand pairing, sheet forma- no s.anuse tion, and helix-strand interactions to ensure protein-like secondary structure urnalonal arrangements. os 89. Downloaded from arjEGE on 03/24/05. For per TinhTeeUrTrpceotrshpeiveeardaeriorlscaiyfeatringnPoMtecnsreuneampudlatmepiitctotpbhettleoreienrodnsstoSni.ifaenlCqhlmyooumruerrlienotclilhacpotelgeseodoAsuuelmsrqicgusueenetaqnomtucfioeeeinnnancflaetoignssrnmaomlyfatteseiinonstnsa[cv(uMoasinelStfaauAbcllstet)op]frowaerbdaasiciupntisriotoeindtoeaibsnsatrspfueaacdmrttuioorlnyef 3-1LL onerelativelysuccessfulmethodatCASP3(67).Theaccuracyofthesecovariance 7O 30:1L C methodsisnotgreat(63,64);variousmethods,includingthefittingofsecondary 1.A structuretoinitialconstraintsinordertoobtainadditionalconstraints,wereused 0C 20DI to increase the robustness of ab initio protocols based on such information to uct. ME expectederrorlevels(66).PreviousmethodshaveusedMSAstopredicttheburial omol. StrL UNIV ooffsveaqriuaetniocne(p5o,s7it)i.onsinthequerysequencebasedonhydrophobicityandpatterns BiEL AnotherwaytouseMSAsisintheselectionoflow-resolutiondecoysbyrequir- s. N ingthatalldecoysbeconsistentwithallsequencesinthefamily.Inoneprocedure, yR ophCO the aligned sequences are mapped onto decoys one at a time and the energy of v. Biby each decoy averaged over all aligned sequences is computed (2). In another ap- Re proach, simulations of multiple aligned homologs are coupled and carried out u. simultaneously(45).Inthesetwomethods,themodelswiththelowestscoreare n n A selected as the best models. In a recent method for applying MSA information to fold prediction, an independent simulation is carried out for each aligned se- quence.Onlyaftermultiplesimulationsforeachalignedsequencearecompleted is the multiple sequence alignment information utilized by simultaneously clus- tering all of the structures generated from the different aligned sequences. The largest and most diverse cluster often contains the best model (9). In all above describedcases,significantimprovementswereseenintheabilityofthepredic- tion methods to select good models using highly simplified representations of proteins. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 181 High-ResolutionStructurePrediction—Prospects forRefinement Reducedcomplexityapproaches,asdescribedabove,cannotbeexpectedtoconsis- tentlygeneratepredictionswithresolutionsofbetterthan3–7 A˚.Low-resolution methodsareperhapsbestviewedasnarrowingthepossibleconformationsfroman exponentiallylargenumbertoanumbersmallenoughthatmorecomputationally expensivemethodscanbeapplied.High-resolutionpotentialsmustbeimproved org in order for significant progress to be made in the field of ab initio protein fold ws. prediction. e vi Someprogresshasbeenmadeinmodelingtwoofthemoredifficulttermsin e ualrnly. the potential—solvation and electrostatics—in the context of full atom models. urnals.annonal use o Bwofeatceteanrudsaeerestchcreiobrterredalnaustsfeeidnr-gwfrsietuehrfesanocelevrgeainretes-aao–cfbcaesssmesiadblllmemesotuhlreofcdaucsl.eeWsarfeersaos,mosnonl&ovnatpEiooislnaernebnsoeerrlgvgeiefnostusanrtdoe os arjper thattheadditionofasolvent-accessiblesurfacearea–basedterm,theparameters 9. Downloaded from GE on 03/24/05. For fdaatbhonuyserdrniyimwaeedmdhinoliiatjccrunrohssospttwtuibrrcraeeefjralsteoeeoccwrelbtmvoaaetrsshrieteeeh(das3es–6uo(bc9)rnoa.f0asnwO)ecs.adeindPteaeeernrmorp-admvrpboaciilbprsheiiloacnedrramgigflpefrasesweorrstbeliiuvtunthilarcotitsesenioudbwenredxfetateprweceeereepmraeiawmlnsrwioeetthianhtoe–htisnbbf,mtratasehsotieeeanldebeepncidrmlueoizbrletgaeeytrydhincomomodcfemosoccrlhbhieesaia.cnrnuTtgihilnhceaagessrt 8E 3-1LL generalizedBornmodelremediesthisproblembytreatingthenonpolarinteractions 7O 1.30:1AL C wtoitdheaalsuwriftahcethaeredae–sboalsveadtiotenrmofwchhialerguessinagndagchenaregrealipzaaitrisoninofthtehepBrootrenineqinutaetriioonr 0C 20DI (37,52,88).Thismethodhasproventobeusefulformodelingloopsasdescribed uct. ME aboveandisfastenoughtobeincludedinalmostanydiscriminationorrefinement StrV procedure (71). An alternative implicit solvation model that is able to partially ol. UNI discriminatebetweennative-likeandincorrectdecoyswasdevelopedbyLazaridis BiomELL &Karplus(48a). s. N Fromtheabovediscussionitisapparentthatmostcurrentenergy/scoringfunc- yR phO tionsarebasedeitheronsmall-moleculeparameterizedfunctionswithformssug- oC Biy gested by physical chemistry or on protein database statistics. We believe that ev. b progresswillcomefromacombinationofthesetwoapproaches. R u. n n A THEROSETTAMETHOD Howcanabinitiostructurepredictionmethodsproducereasonablemodelsgiven theinadequaciesofcurrentpotentialfunctions?Onepossiblesolutionistolimit the conformational space searched to that compatible with the local sequence- structurepropensitiesoftheproteinsequence.Suchaprocedureisconsistentwith a view of the folding process in which local structure propensities influence the conformationssampledbyshortsequencesegments,andfoldinginvolvesasearch P1:GDL March24,2001 12:13 AnnualReviews AR128-08 182 BONNEAU ¥ BAKER through these local conformations for a stable tertiary structure simultaneously consistent with these local biases and with nonlocal constraints (compactness, electrostatics,hydrophobicburial,etc.).Localbiasesareinfluencedbyrelatively subtleinteractions(side-chain/main-chainhydrogenbonding,sidechainconfigu- rational entropy losses, etc.) as witnessed by the debate over the origins of sec- ondary structure propensities; current physical chemistry–based models do not capture all of the subtleties. To circumvent this problem, the Rosetta procedure makestheassumptionthatthedistributionofconfigurationssampledbyapeptide g or segment are reasonably well-approximated by the distribution of configurations s. w observedintheproteindatabase.TertiarystructuresaregeneratedusingaMonte e vi Carlo search of the possible combinations of likely local structures, minimizing e ualrnly. a scoring function that accounts for nonlocal interactions such as compactness, no s.anuse hydrophobic burial, specific pair interactions (disulfides and electrostatics), and urnalonal smtraannydspuabitrliyndgif(fseereenFtivgeurrseio1n)s. oWfiethsselinsttisalolfy2si5mfirlaagrmloecnatlsstpreurctsuerqeusemncaeybpeosrietpioren-, os arjper sented;thus,thenumberofstatesperpositioniseffectivelymuchlowerthan25. 9. Downloaded from GE on 03/24/05. For Ofsrtarpagtnimmdesin,zatatnsioedtnsootphfreonrdopunrclooetcseaislntir-nulitcketeruarfceetsaiotwunrsiethwsibtthuharitineadcroehn,yfbodyrrmocpoahntiosobtnriuacclrtseiposanidc,ueceodsne,fispinsaetierdendbtywbteihtthea 8E 3-1LL 7O 30:1L C 1.A 0C 20DI uct. ME StrV ol. UNI omL BiEL s. N yR phO oC Biy v. b e R u. n n A Figure1 VenndiagramillustratingtheconceptualbasisforRosetta.Near-nativestructuresare labeledN.

Description:
ular dynamics (MD) simulation (numerically integrating Newton's equations of motion for the polypeptide chain) using a Searles, Aleister J. Saunders, Dorothy A. Erie, Donald J. Winzor, Gary J. Pielak. 271. PHOTOSYSTEM II: The
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.