Ancestral Sequence Reconstruction This page intentionally left blank Ancestral Sequence Reconstruction EDITED BY David A. Liberles University of Wyoming, Laramie, WY, USA 1 1 GreatClarendonStreet,OxfordOX26DP OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwidein Oxford NewYork Auckland CapeTown DaresSalaam HongKong Karachi KualaLumpur Madrid Melbourne MexicoCity Nairobi NewDelhi Shanghai Taipei Toronto Withofficesin Argentina Austria Brazil Chile CzechRepublic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore SouthKorea Switzerland Thailand Turkey Ukraine Vietnam OxfordisaregisteredtrademarkofOxfordUniversityPress intheUKandincertainothercountries PublishedintheUnitedStates byOxfordUniversityPressInc.,NewYork #OxfordUniversityPress2007 Themoralrightsoftheauthorhavebeenasserted DatabaserightOxfordUniversityPress(maker) Firstpublished2007 Allrightsreserved.Nopartofthispublicationmaybereproduced, storedinaretrievalsystem,ortransmitted,inanyformorbyanymeans, withoutthepriorpermissioninwritingofOxfordUniversityPress, orasexpresslypermittedbylaw,orundertermsagreedwiththeappropriate reprographicsrightsorganization.Enquiriesconcerningreproduction outsidethescopeoftheaboveshouldbesenttotheRightsDepartment, OxfordUniversityPress,attheaddressabove Youmustnotcirculatethisbookinanyotherbindingorcover andyoumustimposethesameconditiononanyacquirer BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressCataloginginPublicationData Dataavailable TypesetbyNewgenImagingSystems(P)Ltd., PrintedinGreatBritain onacid-freepaperby AntonyRowe,Chippenham,Wiltshire ISBN9780199299188 10 9 8 7 6 5 4 3 2 1 Contents Foreword andintroduction vii Introduction tothemeetinginKristineberg, Sweden x Contributors xii I Introductory scientificoverview 1 The earlydaysofpaleogenomics: connectingmolecules totheplanet 3 StevenA.Benner 2 Ancestral sequence reconstructionasatooltounderstandnatural history andguidesynthetic biology:realizingandextendingthevision of ZuckerkandlandPauling 20 EricA.Gaucher 3 Linkingsequence tofunctionindrugdesign withancestral sequence reconstruction 34 JanosT.Kodra,MarieSkovgaard,DennisMadsen,andDavidA.Liberles II Computational methodology andconcerns 4 Probabilistic modelsandtheirimpactontheaccuracy ofreconstructed ancestral proteinsequences 43 TalPupko,AdiDoron-Faigenboim, DavidA.Liberles,and GinaM.Cannarozzi 5 Probabilistic ancestralsequencesbasedontheMarkovianmodelof evolution: algorithmsandapplications 58 GinaM.Cannarozzi,Adrian Schneider,and GastonH.Gonnet 6 Estimating thehistoryofmutationsonaphylogeny 69 JonathanP.Bollback,PaulP.Gardner,andRasmusNielsen 7 Coarse projectionsoftheprotein-mutationalfitnesslandscape 80 F.Nicholas Braun 8 Dealingwithuncertaintyinancestralsequence reconstruction: sampling fromtheposterior distribution 85 DavidD.Pollockand BelindaS.W.Chang 9 Evolutionary propertiesofsequencesandancestralstatereconstruction 95 LesleyJ.CollinsandPeterJ.Lockhart v vi CONTENTS 10Reconstructing theancestraleukaryote:lessonsfromthepast 103 MaryJ.O’ConnellandJamesO. McInerney III Computational applicationsofancestralsequence reconstruction 11Usingancestralsequence inferencetodeterminethetrendoffunctional divergence aftergene duplication 117 XunGu,YingZheng,Yong Huang,andDongpingXu 12Reconstruction ofancestralproteomes 128 ToniGabaldo´nandMartijnA.Huynen 13Computational reconstruction ofancestralgenomicregions from evolutionarily conserved geneclusters 139 EtienneG.J.Danchin,EricA.Gaucher,and PierrePontarotti IV Experimentalmethodology andconcerns 14Experimental resurrectionofancientbiomolecules: genesynthesis, heterologous proteinexpression,andfunctionalassays 153 EricA.Gaucher 15Dealingwithmodeluncertaintyinreconstructing ancestralproteinsin thelaboratory: examplesfromarchosaurvisualpigmentsandcoral fluorescent proteins 164 BelindaS.W.Chang, MikhailV.Matz,StevenF.Field,Johannes Mu¨ller, andIlkevan Hazel V Experimentalsynthesisofancestralproteinstotest biological hypotheses 16Usingancestralgene resurrectiontounraveltheevolutionof protein function 183 JosephW.ThorntonandJamieT.Bridgham 17A thermophiliclastuniversalancestorinferred fromitsestimated aminoacidcomposition 200 DawnJ.Brooks andEricA.Gaucher 18The resurrectionofribonucleases frommammals:fromecologytomedicine 208 SlimO.SassiandStevenA.Benner 19Evolution ofspecificity anddiversity 225 DenisC.Shields,Catriona R.Johnston,IainM.Wallace,andRichard J.Edwards Conclusions andawayforward 236 DavidA.Liberles Index 239 Foreword and introduction With the realization that the combination of com- within a gene represented the homologous site, putational reconstruction of ancestral protein wheresubstitutionmodelsshouldthenbeapplied sequences andthe experimental synthesis ofthese tocharacterizeitsevolution.Thisisthetraditional proteins could be used to test specific molecular, view of homology as embodied in the vast litera- biomedical, ecological, and evolutionary hypoth- tures of molecular evolution and population eses, this methodological combination has been genetics.However,RichardGoldstein(whohasin used with increasing popularity. Because a num- the past generated substitution matrices that ber of scientific issues surround the use of ances- characterize substitution differentially between tral sequence reconstruction that need to be different structural elements) and David Pollock fleshed out, a scientific meeting was organized to took a structural perspective on homology, discuss the use of ancestral sequence reconstruc- arguing that a homologous position might some- tion. Beyond procedures and pitfalls, a number of timesbebetterdefinedbythestructuralattributes new applications of ancestral sequence recon- constraining it in a three-dimensional structure structionhavebeguntoemergeandapresentation rather than the position within a gene sequence. of several of these was deemed valuable. For example, position 3 in an a-helix could be With funding from the European Science aligned with position 3 in the homologous a-helix Foundation (ESF), Vetenskapsra˚det (the Swedish ofanotherprotein,eveniftheyrepresentpositions Research Council), and the Linnaeus Centre for 65and68inthegenesequenceandnoinsertionor Bioinformatics (Uppsala University, Sweden), deletion events have occurred. This latter view David Ardell (Uppsala University, Sweden), requires the use of different types of substitution Giorgio Matassi (University of Paris VI, France), modelsthantheformerview,sothedivergenceof and I organized a meeting entitled, ‘‘Using opinion has practical as well as philosophical Ancestral Sequence Reconstruction to Understand concerns. Protein Function’’ in Kristineberg, Sweden, on Another active area of discussion involved sour- 30–31 March 2005. The meeting consisted of 38 cesofbiasandanongoingdiscussionofthevalidity participants from 12 different countries attending of using maximum-likelihood or -parsimony 18 scientific presentations. Following the meeting ancestralsequencereconstructionscomparedwitha and the vibrant discussion, it was decided that a sampling fromthe posterior distribution ofa Baye- book involving chapters by those attending the sian ancestral sequence reconstruction. The discus- meeting and others in the field would be worth- sionatthemeeting(inadditiontoChapter8inthis while, which was the origin of this project. volume) has spawned an active discussion in the One philosophical discussion that emerged was peer-reviewed scientific literature. The argument is on the true meaning of homology and what a that the maximum-likelihood or maximum-parsi- homologoussiteiswhenasequenceslidesthrough mony ancestral sequence is under-represented by a structure generating diverging alignments from rare variants, such as hydrophobic residues on the sequence and structure-based methods. surface, that ultimately attribute overly stable or David Ardell (in his lecture in Sweden) presented overly active properties to the reconstructed ances- examples of cases where the two diverged and tor. It is argued that this is avoided by sampling recommended sequence analysis using a DNA- from the posterior distribution, even if sampling based view of homology, where a single position from the posterior results in less accurate vii viii FOREWORD AND INTRODUCTION reconstruction at the sequence level. The experi- strategies and applications of using ancestral mental implications of this proposal are presented sequence data to reconstruct entire proteomes. In in both Chapters 8 and 15. A brief rebuttal to this work not presented in this book, David Haussler view and defense of maximum likelihood is pre- and colleagues have extended this type of sented by Eric Gaucher in Chapter 2. Further ana- approach to reconstructing the entire genome of lysisanddiscussionsofthistopicaresuretoappear the last common ancestor of mammals. The in the literature overthe coming years. thoughtful introduction by Emile Zuckerkandl A third topic raised for discussion at the proposes further extension of the analysis from meetingbyGiorgioMatassiwas,‘‘Areallproteins entire proteomes to interactomes and the field, reconstructable?’’ Clearly some proteins, like the although not there yet, will surely move in this green fluorescent protein-like proteins worked on direction. byMikhailMatzandcolleagues(andpresentedin Moving to experimental work to test computa- Chapter 15) are more amenable to experimental tional hypotheses, Chapters 14 and 15 present study than other proteins. However, functional strategies for converting computationally recon- assays in vitro or in vivo are indeed available for structed ancestral sequences to proteins resur- a great many proteins. Chapter 17 presents a rected in the laboratory. Chapter 15 includes an reconstruction back to the last universal ancestor expanded discussion of how to accommodate the and other chapters deal with various complexities controversial computational strategy suggested in in sequence evolution that will enable more accu- Chapter 8. Chapters 16–19 then address various rate reconstruction. biological questions using ancestral sequence The first two chapters provide a historical and reconstruction and resurrection, across different scientific overview of ancestral sequence recon- evolutionary depths and drawing on widely dif- struction, and Chapter 3 extends the use of the ferent scientific disciplines. technique to applications of drug design and Rather than presenting a view that is consistent mentions the companion technique of substitu- fromchaptertochapter,severalcontradictoryviews tional mapping. A discussion of standard approa- arepresenteddifferentlybydifferentauthorstogive ches for ancestral sequence reconstruction is readers a chance to appreciate ongoing debates in presented in Chapters 4 and 5, with Chapter 6 the field and formulate their own opinions. In the presenting a method (with a companion software concludingsection,Iprovidealistofseveralavail- package) for substitutional mapping. ablesoftwarepackagesthatareavailabletoperform Chapters7and8presentsomeofthelimitations different analyses described in the book. I also and considerations that should go into computa- attempt to tie together some of the discussion to tionally reconstructing ancestors, including meth- presenttheexperimentalmolecularbiologistwitha odological sources of bias and biophysical potentialwayforwardinattemptingthesemethods implications. Chapter 9 presents a discussion of intheirownlaboratory. covarion or heterotacheous processes, where Theimageofcrocodiliansonthebookcoverwas sites shift rates due to intra- or intermolecular generated with the enthusiastic help of John coevolution, and their effects on ancestral Brueggen at the St. Augustine Alligator Farm sequence reconstruction. Chapter 10 analyzes Zoological Park (http://www.alligatorfarm.us). some controversies in our knowledge of the The picture shows all 23 extant species of croco- referencespeciestreeandhowdifferenttopologies dilans and as a Postdoctoral Researcher at Uni- can affect reconstructed ancestral sequences. The versity of Florida, I always enjoyed visiting the covarion processes discussed in Chapter 9, while alligator farm and comparing the species in my sometimes neutral, are also sometimes linked mind. While I have never worked with crocodi- to functional shifts. Chapter 11 discusses metho- lians, one of the constant battles that my lab has dology for linking this process to functional shifts faced is the search for DNA from different closley after gene duplication using ancestral sequences. related species. Ultimately in this process, we are Chapters 12 and 13 present computational interestedinaddressingthequestion,‘‘Whatwere FOREWORD AND INTRODUCTION ix the molecular events that made each species (especially Alexander Churbanov and Steven unique from its closest relatives?’’ So, as you look Massey),andmywifeJessica,aswellastoauthors at the crocodilians on the cover of the book, ask whotookthetimetoreviewotherchaptersinthis yourself how these species are different, what the effort (a special thanks for extra effort go to Eric molecular underpinnings of this are, what the Gaucher, Tal Pupko, and Denis Shields). I also selective forces that drove this were, and how the needtothankIan ShermanandStefanie Gehrigat techniques described in this book can help us Oxford University Press for their patience. Thank answer these questions. youforyourinterestinthegrowingresearchfield. I am grateful to the external reviewers of David A. Liberles chapters for this book, notably Aoife McLysaght University of Wyoming, (Trinity College, Ireland), Arthur Lesk (Pennsyl- Laramie, WY, USA vania State University, USA), my research group
Description: