P1:GDL March24,2001 12:13 AnnualReviews AR128-08 Annu.Rev.Biophys.Biomol.Struct.2001.30:173–89 Copyright(cid:176)c 2001byAnnualReviews.Allrightsreserved A I P S P : B NITIO ROTEIN TRUCTURE REDICTION Progress and Prospects g Richard Bonneau and David Baker or ws. DepartmentofBiochemistry,Box357350,UniversityofWashington,Seattle, e vi Washington,98195;e-mail:[email protected],[email protected] e ualrnly. no s.anuse KeyWords structuralgenomics,proteinfolding,scoringfunctions,foldprediction urnalonal n Abstract Considerablerecentprogresshasbeenmadeinthefieldofabinitiopro- os arjper tein structure prediction, as witnessed by the third Critical Assessment of Structure 9. Downloaded from GE on 03/24/05. For Ptwfwooroeueprdnrkrideo.cvadttiiuieoocwnne(stcChooAefnsrSfeiePscat3eetu)nn.rttIelpnysrsoropegfilrticeeausbosrlfreientanhbttihsaiepnbrfiiotieignoldritesiastornsu,dpcmrtsouuutrgcoehgcpoewrslestodpriiknrcotraimeonmnisaaiptnitnrgeosmt,dofipcorteorclttsothi.oehInnfiisgethflhdoliirsghfhwautsotuytrhrkeeet, 8E 3-1LL 7O 30:1L C CONTENTS 1.A 200DIC INTRODUCTION :::::::::::::::::::::::::::::::::::::::::::::::: 174 Struct. V ME RELDaUttiCceEDMoCdOelMsP:L::E:X:I:T:Y:M::O:D::E:L:S::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 117756 ol. UNI DiscreteStateOff-LatticeModels ::::::::::::::::::::::::::::::::::: 176 omL NarrowingtheSearchwithLocalStructurePrediction::::::::::::::::::::: 176 s. BiNEL SCORINGFUNCTIONSFORREDUCEDCOMPLEXITYMODELS :::::::::: 177 phyOR Solvation-BasedScores :::::::::::::::::::::::::::::::::::::::::: 178 Bioy C PairInteractions:::::::::::::::::::::::::::::::::::::::::::::::: 178 v. b SequenceIndependentTerms/SecondaryStructureArrangement ::::::::::::: 179 e nnu. R THhigehU-RseesoofluMtiuolntipSlteruScetuqrueenPcreedAicltiigonnm—ePntrsosinpeTcetsrtifaorryRPerfiendeicmtieonnt:::::::::::::::::::::::::: 118801 A THEROSETTAMETHOD ::::::::::::::::::::::::::::::::::::::::: 181 RELATIONSHIPTOPROTEINFOLDING?::::::::::::::::::::::::::::: 183 APPLICATIONSOFABINITIOSTRUCTUREPREDICTION ::::::::::::::: 183 GenomeAnnotation ::::::::::::::::::::::::::::::::::::::::::::: 183 HomologyModeling :::::::::::::::::::::::::::::::::::::::::::: 184 StructuresfromLimitedConstraintSets::::::::::::::::::::::::::::::: 184 CONCLUSION:::::::::::::::::::::::::::::::::::::::::::::::::: 185 1056-8700/01/0610-0173$14.00 173 P1:GDL March24,2001 12:13 AnnualReviews AR128-08 174 BONNEAU ¥ BAKER INTRODUCTION Thegoalofabinitiostructurepredictionissimple:Givenaprotein’saminoacid sequence predict the structure of its native state. It is generally assumed that a proteinsequencefoldstoanativeconformationorensembleofconformationsthat isatorneartheglobalfree-energyminimum.Thus,theproblemoffindingnative- likeconformationsforagivensequencecanbedecomposedintotwosubproblems: (a) developing an accurate potential and (b) developing an efficient protocol for g or searchingtheresultantenergylandscape. s. w Todate,themostsuccessfulmethodsforstructurepredictionhavebeenhomo- e evi logy-based comparative modeling and fold recognition (58). When homologous ualrnly. orweaklyhomologoussequencesofknownstructurearenotavailable,themost no s.anuse successful structure prediction methods have been those that predict secondary urnalonal stitmruect(u1r2e,a4n0d,4lo3c,4al6,s7tr4u)c.tuTrheismroetvifise;wthfeosceusmeesthoondcsuhrarevnetbmeeenthaovdasilfaobrlepfroedriscotimnge os arjper tertiarystructureintheabsenceofhomologytoaknownstructureanddiscusses 9. Downloaded from GE on 03/24/05. For ttophhfreeokMsapencraohlonowetycsela,eionladfpngdtrdehae-ettdbahimaecsbteeciatodonhnokosrdmcd(osPierntDtioahndtBogead)yfl.siubTtonhrnhcaalitrtysiiopeiinsrnnesofd,tofhitrchemmtecepaottrtrhnioaoottienednxisincnttagoshntfarstutebeucterststuiefoarofferurymaunsgsdatemrciuihnencinfnttutohesrreemolerppaaatrtreeirnoamdinmnipcgfeltraitoaoetpemnrss-. 8E fromthePDB.Inordertotesttheperformanceofanyoneoftheseapproaches,one 3-1LL must carefully remove sequences that are homologous to proteins in the test set 7O 30:1L C fromalldatabasesusedbythemethodinquestion.Anyerrorsoroversightsmade 1.A atthisstagecouldleadtooverestimatesofsuccess.Forthesereasons,theCritical 0C 20DI Assessment of Structure Prediction (CASP), a biannual, community-wide blind uct. ME testofpredictionmethods,wasconceivedandimplemented(4,6,27).Threesuch omol. StrL UNIV tTehstrsouhgavheouotccthuirsrerdevainedwt,hoeufroausrstehswsmaseunntdoefrtthaekepnertfhoisrmsuamncmeeorf(iv.aer.,io2u0s00m)e(t2h0o,d3s9i)s. BiEL influencedbytheresultsofthesecommunity-widetests,especiallyCASP3(58). ys. RN CASP1andCASP2showedthatinspiteofsomesuccessreportedintheliterature, ophCO littlesuccesswasseenforproteinsofthesizeandtypebeingsolvedbystructural Biy v. b biologists (50). The consensus after these first two experiments was that the ab Re initiostructurepredictionproblemwouldmostprobablybesolvedtoolatetobe nu. appliedtoanyrealbiologicalproblems(20,50).CASP3showedsomereversalof n A thisconsensus.Severalmethodsmadegoodpredictionsintheabinitiocategory, and some ab initio methods outperformed fold recognition methods for certain proteinsinthefoldrecognitioncategory(61,63,64). Inspiteofrecentprogress,manyissuesmuststillberesolvedifaconsistently reliableabinitiopredictionschemeistobedeveloped.Noonemethodperforms consistently across all classes of proteins (most methods perform worse on all- betaproteins),andallmethodsexaminedseemtofailtotallyonsequenceslonger than 150 residues in length (the longest contiguous blind predictions to date are »100residuesinlength).Inaddition,thesuccessfulpredictionmethods,asjudged P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 175 byperformanceatCASP3andbyperformancereportedintheliterature,showa large diversity in their formulation. Our goal here is to review the common fea- turesoftheserecentmethodsinordertohighlightcurrentchallengesandfuture possibilitiesforthefieldofabinitiostructureprediction. Themostnaturalstartingpointforsimulatingproteinfoldingisstandardmolec- ular dynamics (MD) simulation (numerically integrating Newton’s equations of motionforthepolypeptidechain)usingaphysicallyreasonablepotentialfunction. Thisapproachhasalonghistoryandisstillpopular,asillustratedperhapsmost g or dramatically by IBM’s Blue Gene project, which will apparently be devoted, at s. w leastinpart,tosuchanendeavor.Thereareseveralratherobviousproblemsthat e vi havelimitedthesuccessofsuchapproachesthusfar.First,MDiscomputationally e ualrnly. expensive—withexplicitrepresentationofsufficientwatermoleculestominimally no urnals.anonal use staaonkldveasint»ecrt4eh0aes0efhsoolidnuirnasvgoacinlhaaabilcneu,craroenmnatpnsuoitnseegrclpeoonpwdreoMrcehDsasvsoeirm.cAounldasvtiidaonenrcaeobsflyiane1sx0itm0en-urdleaestdiidosuniemsputrrlaoattteiegoinyn os arjper times; for example, Kollman and coworkers carried out a microsecond simula- 89. Downloaded from EGE on 03/24/05. For rmamtHiesooqoosnwourleecioecrifvceauoedtaleermf,d3sosp6wiriu-nmaritetiuwhsnsligiaanMdttgeutiiDnlreme.gpaAfetore.hlelpteTdhttiifhhondoeeugeldgisutnihernsacaigdionnmensogqipdftuaio,aoarac1nctnoa0irdenn0ests-qpiripduenererisorhcriegaduasbruprermleseesnspomatrrhmooepatrootesehtuieanbsnnneetftreoisoinaorifxlutmshfsouue,arpnddctecyelertapr,icssoiotscnhomaoseflrfpfm»eoup1artiregsomsrnebsaticltiteciomulmrlndoedaes-. 3-1LL lackofconsensusastothebestcomputationallytractableyetphysicallyrealistic 7O 30:1L C model for water (a number of quite different models for water, both polarizable 1.A andnonpolarizable,havebeenusedincurrentsimulationmethods),andsomeun- 0C 20DI certainty exists as to the values of the parameters used in molecular mechanics uct. ME potentials(partialatomiccharges,Lennard-Joneswelldepthsandradii).Accurate omol. StrL UNIV rgerpereeosefnptoatliaornizoabfielilteyctorofswtaattiecrs,itshaelslaorgaecodnifsfiedreernacbeleinchthaellednigeleecgtirviecnptrhoepherigtihesdoe-f BiEL thesolventandtheprotein,andtheuncertaintiesinthemagnitudeandlocationof s. N atomicpartialcharges.Becausethefreeenergyofaproteinrepresentsadelicate yR ophCO balanceoflargeandopposingcontributions,theseproblemssignificantlyreduce v. Biby the likelihood that the native state will be found at the global free-energy mini- Re mumusingcurrentpotentials(2).ThebestcurrentuseofMDmethodsmaybein u. refininganddiscriminatingamongmodelsproducedbylower-resolutionmethods n n A (28a,49). REDUCEDCOMPLEXITYMODELS To overcome the sampling problems mentioned above, most methods for fold predictiontodatehaveinvolvedsomesignificantcomplexityreduction.Methods for reducing protein structure to discrete low-complexity models can be divided intotwomajorclasses:latticeandoff-latticemodels. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 176 BONNEAU ¥ BAKER LatticeModels Latticemodelshavealonghistoryinthemodelingofpolymersduetotheiranalyt- icalandcomputationalsimplicity(22).Theevaluationofenergiesonalatticecan beachievedquiteefficiently(integermathcanbemadequitefast),andmethods involvingexhaustivesearchesoftheavailableconformationalspacebecomefea- sible (30,35,85). However, lattice methods have a somewhat restricted ability to represent subtle geometric considerations (strand twist, secondary structure g propensities, and packings) and can reproduce the backbone with accuracies no or s. greater than approximately half the lattice spacing (73). The most common sys- w vie tematicerrorobservedforavarietyoflatticemodelsistheirinabilitytoreproduce e ualrnly. helices, and most lattice models exhibit various degrees of secondary structure no bias(70).Giventheimportanceofregularsecondarystructureinproteins,thisis s.anuse clearlyaproblem.Recentsuccessfultests,however,haveshownthatthecompu- urnalonal tationaladvantagesoflatticemodelsmayoutweightheproblemsassociatedwith os arjper theirsystematicbiases(30,47,48,73). 9. Downloaded from GE on 03/24/05. For DiscrMaseinntodegsltaSelotlrfaofbt-toelaanmtOdteicflree,f-norLgertadhftuustc.ritecThdeehcreMotmmoooptdhsleeetxlcCsiotyflm,mmoorodtnoelpsornafiecxtioacrlelmsisiodrteoecclhiemaninittrodtihedegsrsepiedlsuesocfbhafarcieknebdtooonmea 3-18LLE atoms(82,87).Discretestatemodelsoftheproteinbackboneusuallyfixallside 7O chaindegreesoffreedomandlimitthebackbonetospecificPhi/Psipairs:Models 30:1L C containingfrom4to32Phi/Psistatesrepresentingvariousstrand,helix,andloop 1.A 0C conformationshavebeendescribedintheliterature(70).Properlyoptimizedsix- 20DI Struct. V ME satsatsetrmanoddse,lhse(lii.cee.,sm,aonddelcsatnhoantiaccacloluonotpfso)rclaoncarlefperaotduurecseonbasteivrevecdoinntapcrtost,epinressseurcvhe ol. UNI secondary structure, and fit the overall coordinates of the native state as well as omL 18-state lattice models that do not account for such protein-specific information BiEL (70). s. N yR phO oC Biy NarrowingtheSearchwithLocalStructurePrediction v. b e R Local structures excised from proteins can fold independent of the full protein, nu. demonstratingthatstronglocalstructuralbiasescanexistforshortsequenceseg- n A ments(8,15,54,60).Severalexamplesofexcisedfragmentshavinglittleobserv- able structure also exist, indicating that the strength and multiplicity of these localbiasesarehighlysequencedependent.Somesequencesareobservedtofoldto differentconformationsdependingontheirglobalsequencecontext,againdemon- strating the possible multiplicity of local structure biases (17,42). Bystroff et al developed a method that recognizes sequence motifs (ISITES) with strong ten- dencies to adopt a single local conformation that was used to make some good local structure predictions in CASP2 (12–14, 29). Despite the ambiguities in P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 177 local sequence-structure relationships, secondary structure prediction methods havebeensteadilyimproving(63,64). Theabovementionedexperimentsandobservationssuggestthatanymethodat- temptingtouselocalsequence-structurebiasestoguidecomplexityreductionswill havetobeadaptivetothestrengthandtothemultiplicityofdifferentsequence- structure patterns. The majority of methods proving successful at CASP3 used secondarystructurepredictionsinonewayoranother.Inonecase,predictedsec- ondarystructureelementswerefittotheresultsofinitiallattice-basedexhaustive g or enumeration,thuserasinganypossiblesecondarystructurebiasintheinitiallat- s. w ticemodelpriortoall-atomrefinement(1).TheRosettamethodusedsecondary e vi structuretobiastheselectionoffragmentsofknownstructurefromthePDB.Yet e ualrnly. another paradigm is, given a secondary structure string, to reduce the problem no s.anuse ofpredictingthetertiaryfoldtotheproblemofhowtoassemblerigidsecondary urnalonal sintrduecptuenredeenletmoefnstesc(o2n4d).arAydsdtirtuiocntuarlempertehdoidctsiothnataldgeotreirtmhminse(lboycaclasltcruulcattuirnegbtihaeseses os arjper biasesduringthefoldingsimulation)havealsobeendescribed(86). 89. Downloaded from EGE on 03/24/05. For pT7cpor6rheen%eTddsiihb–iccsee7tttrsiie8ooetn%nnstil,e,symcwaloseinhnkutdihdeccolachayrednytssytasoskfotauerwbbluseicpnsitnuarguenicrtdhetioouicpipnttrmpihetoeeedenrritiasrhclctoif(itmda4oioi0nilnmtu,as7roule4ginsn)ott.totroahiAtceaahcccmmoacccuosoicnluuheutnnsartttfavo,ocefnwryoterhitolhrflnfieosobersne-eesalcrtotbrhaocotenaierndlpairartiacriontyocedtueussrrtcttaaorrtccuuiitmocceinottsauunkoorrseeeff. 3-1LL modelswithsecondarystructurepredictedmoreaccuratelythanispossiblewith 7O 30:1L C traditionalsecondarystructuremethods. 1.A 0C 20DI SCORINGFUNCTIONSFORREDUCED uct. ME COMPLEXITYMODELS StrV ol. UNI omL Onceamodelforrepresentingtheproteinischosenthatsufficientlyreducesthe BiEL complexityoftheconformationalsearch,ascoringorenergyfunctionthatworks ys. RN inthechosenlow-complexityspacemustbedeveloped.Theenergyfunctionmust ophCO adequatelyrepresenttheforcesresponsibleforproteinstructure:solvation,strand Biy v. b hydrogenbonding,etc.Giventhatmostlow-complexitymodelsdonotexplicitly e R represent all atoms and can reproduce even the native state backbone with only nu. limited accuracy, any energy function designed to work in the low-complexity n A regime must represent these forces in a manner robust to such systematic error (thesystematiclimitationsofthemodel).Last,thesefunctionsmustbecomputa- tionallyefficient,forduringtheinitialstagesofanyconformationalsearch,huge numbers of energy evaluations are necessary. Because of the shortcomings of molecularmechanics-basedpotentials,andtheconsiderationsabove,manymeth- odsdevelopedinthelasttenyearsutilizescoringfunctionsderivedfromtheprotein databasethatinessencefavorarrangementsofresiduesfrequentlyfoundinknown proteinstructuresanddisfavorrarelyseenarrangements. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 178 BONNEAU ¥ BAKER Solvation-BasedScores Ithasbeenlongthoughtthatthehydrophobiceffectistheprincipaldrivingforce behindproteinfolding(3).Manydiversemethodsforjudgingthefitnessofcon- formationsbasedonsolvationorhydrophobicpackingexist,andthedebateover the proper functional form for representing solvation effects represents an open question of considerable importance and interest (69). A common approach is to classify sites in proteins according to their degree of solvent exposure (either g throughthesurfaceareaorthenumberofnearbyresidues)(11)andtodetermine or s. thefrequenciesofoccurrenceoftheaminoacidsineachtypeofsite.Theenergyor w vie scoreofanaminoacidatasiteisthentakentobethelogarithmoftheaminoacid’s e ualrnly. frequencyofoccurrenceatthattypeofsite.Thistypeofresidue-environmentterm no favorsplacementofhydrophobicaminoacidsatburiedpositionsandofhydrophilic s.anuse aminoacidsatexposedpositions. urnalonal Anadditionalcommonlyusedclassoffunctionalformsconsistsofglobalmea- os arjper suresofhydrophobicarrangement.Onesimpleglobalquantityisaresidue’sdis- 9. Downloaded from GE on 03/24/05. For tquturaeeussnrceiamnocdneggsttni,fhtariiiinozensemscteylnavutpnahodetaeliiunloveotgfeingoftsoaunitrunraesusrcucycttriotofouananrltcfgeh,oesocerrmoa(ih1ritaneyh,taedm3i–dor4obn)h(pa.’1ysshO1deco)rdne.boeniHptctepheurrrromaoaobndb,figicltmueocemsaotfsonoawsftl,lrdigwatuhyssshtrme,tiahcditanheilotlcchanoaabi.snmlopBvtbbhyoeeiapwn-ugeahisleteoeoilobdf&incatfalouwElfncuiipcastnhretlioccnoouttbinetloheiannrtetogessr 8E 3-1LL is that they assume that proteins are ideally spherical in shape when in actual- 30:17L CO iptryoancahtivuesepsraonteeilnlispseoxihdiablitapaprmouxcimhaltairogneorfrtahnegsehaopfesohfapthees.hAydrmopohreobfliecxciobrleetahpa-t 1.A 200DIC doesnotrequireasignificantincreaseincomputationandaidsintheselectionof uct. ME near-nativeconformationsfromdecoysetscontainingahighnumberofprotein- StrV likeyetincorrectcompactconformations(9).Theproblemassociatedwiththese ol. UNI functionsisthattheywillinevitablyexcludeasmallpercentageofproteinstruc- omL turesthatdeviatefromtheirassumptionsconcerningshapeandthusfailwhena s. BiNEL protein is divided into small subdomains or contains large invaginations (1HQI yR phO is a toroid). In spite of this potential downfall, they have demonstrated their oC Biy usefulnessinseveralmethodsowingtotheirabilitytorecognizethemajorityof v. b smallhydrophobiccores,theirsimplicity,andthespeedwithwhichtheycanbe e R u. computed. n n A PairInteractions Manylow-resolutionpotential/scoringfunctionsutilizeanempiricallyderivedpair potentialinplaceoforinadditiontotheresidue-environmenttermdescribedabove. Themostcommonofthesepotentialsarefunctionsofthepositionofasinglecenter perresidue(C ,C ,orcentroid/unitedatomcenter)andarethusquitecomputa- fi fl tionallyefficient;all-atomfunctionshavealsobeenused(77).Manyvariationsof pair terms have been developed, with the two main branches of methods being P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 179 distancedependentandcontactbased(57,84).Liketheresidue-environmentterm mentionedabove,thesescoringfunctionsaresometimesjustifiedbypositingthat thearrangementsofresiduesinproteinsfollowaBoltzmandistribution:E(x) D kTlnP(x),wherexisafeaturesuchastheoccurrenceoftworesiduesseparated byadistancelessthanr.Alternatively,thesescoringfunctionsmaybeseenpurely asprobabilitydistributionfunctions(23,83).Intheformercase,theoptimization may be viewed as a search for the lowest energy configurations; in the latter, a search for the highest probability configurations. For most applications, there is g or little practical difference between the two viewpoints. The issue becomes more s. w substantive,however,whensuchdatabase-derivedscoringfunctionsarecombined e vi withphysics-basedpotentials,aswilllikelybecomeincreasinglyusefuloverthe e ualrnly. nextseveralyears. no s.anuse Severalproblemsareassociatedwithstatisticallyderivedpairpotentials.The urnalonal ainstseurmacptitoionnstihsantoftregeeneenrearlglyievsacliadnabcerorsesparellseinntteerdacbtyionsusmprmesinegntoivneprrcooteminpso,naenndt os arjper thus,thebasicfunctionalformmaynotbeadequatetorepresentthefreeenergyof 89. Downloaded from EGE on 03/24/05. For aia(mWs8npe9itorthn)mho.attsttTaehloitheonhfiueesctyslhocimeenaafnrftifeownebracdomettoisraocmetsonisuiroinorcdneafhuctt(eeht2aedses1d,,boaw5bythl3hyohe)incc.rywgohTd-nrhilrsodaaeepirntmgohigeovoeolnebysrirtinewcespg/lihpiugmteolnhlsliimeainfiorpaicnnptaaegaibnsrretitndttpihwtfliriseoouetsrbneeeinnlibenucumhgnety,idoowdwefnrsihhositrhiypoecdhpdhnroaotegbihprfiifvechpeeooecrsbenttessrivcniii(sdrtp8eiouaa2netlr)oss--. 3-1LL titioning,specificinteractionssuchaselectrostaticattractionbetweenoppositely 7O 30:1L C charged residues dominate the pair scoring/energy functions, and hydrophobic 1.A interactions make relatively modest contributions. The pair term, in this case, is 0C 20DI perhapsbestviewedasthesecondterminaseriesexpansionfortheresidue-residue uct. ME distributionsinthedatabaseinwhichtheresidue-environmentdistributionsarethe omol. StrL UNIV firsStotemrme.oftheearliestcomprehensivetestsofthediscriminatorypowerofthese BiEL pairpotentialsweredoneinthecontextofthreadingself-recognition(62).Later s. N workdemonstratedthattheself-recognitionproblemwasnotasufficientlychal- yR ophCO lengingtestofscoringfunctionsandfocusedontheperformanceofmultiplepair- v. Biby wiseenergyfunctionsonlarger,morediversesetsofconformations(24,38,41,68, Re 69). The performance of the various energy functions at recognizing native-like u. structuresinlargeensemblesofincorrectdecoysishighlydependentonthemeth- n n A odsusedtocreatethedecoysets,highlightingthefactthatanenergyfunctionthat works well in the context of one method will not necessarily work well given a decoysetcreatedusinganorthogonalmethod. SequenceIndependentTerms/SecondaryStructure Arrangement Manyfeaturesofproteins,suchastheassociationofbetastrandsintosheets,canbe described by sequence independent scoring functions. Several early approaches P1:GDL March24,2001 12:13 AnnualReviews AR128-08 180 BONNEAU ¥ BAKER to folding all-beta proteins were protocols dominated by initial low-resolution combinatorialsearchesofpossiblestrandarrangementsandwereconcernedonly withtheprobabilityofdifferentstrandarrangements(16,18,19,72).Theseearly methodsnarrowedtheconformationalspacebyconsideringsequencespecificef- fects only in the context of highly probable strand arrangements. Several of the relatively successful methods at CASP3 incorporated secondary structure pack- ing terms (51,81). Ortiz et al used an explicit hydrogen bond term in combi- nation with a flfifl and a flflfi chirality term to ensure protein-like secondary g or structureformation.Itcannotbeexpectedthatlow-complexitylatticemodelspro- s. w ducethecorrectchiralityorsubtlehigher-ordereffectssuchasstrandtwist,and e vi these rules sensibly correct for these expected shortcomings (65). The Rosetta e ualrnly. method used three terms that monitored strand-strand pairing, sheet forma- no s.anuse tion, and helix-strand interactions to ensure protein-like secondary structure urnalonal arrangements. os 89. Downloaded from arjEGE on 03/24/05. For per TinhTeeUrTrpceotrshpeiveeardaeriorlscaiyfeatringnPoMtecnsreuneampudlatmepiitctotpbhettleoreienrodnsstoSni.ifaenlCqhlmyooumruerrlienotclilhacpotelgeseodoAsuuelmsrqicgusueenetaqnomtucfioeeeinnnancflaetoignssrnmaomlyfatteseiinonstnsa[cv(uMoasinelStfaauAbcllstet)op]frowaerbdaasiciupntisriotoeindtoeaibsnsatrspfueaacdmrttuioorlnyef 3-1LL onerelativelysuccessfulmethodatCASP3(67).Theaccuracyofthesecovariance 7O 30:1L C methodsisnotgreat(63,64);variousmethods,includingthefittingofsecondary 1.A structuretoinitialconstraintsinordertoobtainadditionalconstraints,wereused 0C 20DI to increase the robustness of ab initio protocols based on such information to uct. ME expectederrorlevels(66).PreviousmethodshaveusedMSAstopredicttheburial omol. StrL UNIV ooffsveaqriuaetniocne(p5o,s7it)i.onsinthequerysequencebasedonhydrophobicityandpatterns BiEL AnotherwaytouseMSAsisintheselectionoflow-resolutiondecoysbyrequir- s. N ingthatalldecoysbeconsistentwithallsequencesinthefamily.Inoneprocedure, yR ophCO the aligned sequences are mapped onto decoys one at a time and the energy of v. Biby each decoy averaged over all aligned sequences is computed (2). In another ap- Re proach, simulations of multiple aligned homologs are coupled and carried out u. simultaneously(45).Inthesetwomethods,themodelswiththelowestscoreare n n A selected as the best models. In a recent method for applying MSA information to fold prediction, an independent simulation is carried out for each aligned se- quence.Onlyaftermultiplesimulationsforeachalignedsequencearecompleted is the multiple sequence alignment information utilized by simultaneously clus- tering all of the structures generated from the different aligned sequences. The largest and most diverse cluster often contains the best model (9). In all above describedcases,significantimprovementswereseenintheabilityofthepredic- tion methods to select good models using highly simplified representations of proteins. P1:GDL March24,2001 12:13 AnnualReviews AR128-08 ABINITIOPREDICTION 181 High-ResolutionStructurePrediction—Prospects forRefinement Reducedcomplexityapproaches,asdescribedabove,cannotbeexpectedtoconsis- tentlygeneratepredictionswithresolutionsofbetterthan3–7 A˚.Low-resolution methodsareperhapsbestviewedasnarrowingthepossibleconformationsfroman exponentiallylargenumbertoanumbersmallenoughthatmorecomputationally expensivemethodscanbeapplied.High-resolutionpotentialsmustbeimproved org in order for significant progress to be made in the field of ab initio protein fold ws. prediction. e vi Someprogresshasbeenmadeinmodelingtwoofthemoredifficulttermsin e ualrnly. the potential—solvation and electrostatics—in the context of full atom models. urnals.annonal use o Bwofeatceteanrudsaeerestchcreiobrterredalnaustsfeeidnr-gwfrsietuehrfesanocelevrgeainretes-aao–cfbcaesssmesiadblllmemesotuhlreofcdaucsl.eeWsarfeersaos,mosnonl&ovnatpEiooislnaernebnsoeerrlgvgeiefnostusanrtdoe os arjper thattheadditionofasolvent-accessiblesurfacearea–basedterm,theparameters 9. Downloaded from GE on 03/24/05. For fdaatbhonuyserdrniyimwaeedmdhinoliiatjccrunrohssospttwtuibrrcraeeefjralsteoeeoccwrelbtmvoaaetrsshrieteeeh(das3es–6uo(bc9)rnoa.f0asnwO)ecs.adeindPteaeeernrmorp-admvrpboaciilbprsheiiloacnedrramgigflpefrasesweorrstbeliiuvtunthilarcotitsesenioudbwenredxfetateprweceeereepmraeiawmlnsrwioeetthianhtoe–htisnbbf,mtratasehsotieeeanldebeepncidrmlueoizbrletgaeeytrydhincomomodcfemosoccrlhbhieesaia.cnrnuTtgihilnhceaagessrt 8E 3-1LL generalizedBornmodelremediesthisproblembytreatingthenonpolarinteractions 7O 1.30:1AL C wtoitdheaalsuwriftahcethaeredae–sboalsveadtiotenrmofwchhialerguessinagndagchenaregrealipzaaitrisoninofthtehepBrootrenineqinutaetriioonr 0C 20DI (37,52,88).Thismethodhasproventobeusefulformodelingloopsasdescribed uct. ME aboveandisfastenoughtobeincludedinalmostanydiscriminationorrefinement StrV procedure (71). An alternative implicit solvation model that is able to partially ol. UNI discriminatebetweennative-likeandincorrectdecoyswasdevelopedbyLazaridis BiomELL &Karplus(48a). s. N Fromtheabovediscussionitisapparentthatmostcurrentenergy/scoringfunc- yR phO tionsarebasedeitheronsmall-moleculeparameterizedfunctionswithformssug- oC Biy gested by physical chemistry or on protein database statistics. We believe that ev. b progresswillcomefromacombinationofthesetwoapproaches. R u. n n A THEROSETTAMETHOD Howcanabinitiostructurepredictionmethodsproducereasonablemodelsgiven theinadequaciesofcurrentpotentialfunctions?Onepossiblesolutionistolimit the conformational space searched to that compatible with the local sequence- structurepropensitiesoftheproteinsequence.Suchaprocedureisconsistentwith a view of the folding process in which local structure propensities influence the conformationssampledbyshortsequencesegments,andfoldinginvolvesasearch P1:GDL March24,2001 12:13 AnnualReviews AR128-08 182 BONNEAU ¥ BAKER through these local conformations for a stable tertiary structure simultaneously consistent with these local biases and with nonlocal constraints (compactness, electrostatics,hydrophobicburial,etc.).Localbiasesareinfluencedbyrelatively subtleinteractions(side-chain/main-chainhydrogenbonding,sidechainconfigu- rational entropy losses, etc.) as witnessed by the debate over the origins of sec- ondary structure propensities; current physical chemistry–based models do not capture all of the subtleties. To circumvent this problem, the Rosetta procedure makestheassumptionthatthedistributionofconfigurationssampledbyapeptide g or segment are reasonably well-approximated by the distribution of configurations s. w observedintheproteindatabase.TertiarystructuresaregeneratedusingaMonte e vi Carlo search of the possible combinations of likely local structures, minimizing e ualrnly. a scoring function that accounts for nonlocal interactions such as compactness, no s.anuse hydrophobic burial, specific pair interactions (disulfides and electrostatics), and urnalonal smtraannydspuabitrliyndgif(fseereenFtivgeurrseio1n)s. oWfiethsselinsttisalolfy2si5mfirlaagrmloecnatlsstpreurctsuerqeusemncaeybpeosrietpioren-, os arjper sented;thus,thenumberofstatesperpositioniseffectivelymuchlowerthan25. 9. Downloaded from GE on 03/24/05. For Ofsrtarpagtnimmdesin,zatatnsioedtnsootphfreonrdopunrclooetcseaislntir-nulitcketeruarfceetsaiotwunrsiethwsibtthuharitineadcroehn,yfbodyrrmocpoahntiosobtnriuacclrtseiposanidc,ueceodsne,fispinsaetierdendbtywbteihtthea 8E 3-1LL 7O 30:1L C 1.A 0C 20DI uct. ME StrV ol. UNI omL BiEL s. N yR phO oC Biy v. b e R u. n n A Figure1 VenndiagramillustratingtheconceptualbasisforRosetta.Near-nativestructuresare labeledN.
Description: