ebook img

Parameter Advising for Multiple Sequence Alignment PDF

156 Pages·2017·6.77 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parameter Advising for Multiple Sequence Alignment

Computational Biology Dan DeBlasio · John Kececioglu Parameter Advising for Multiple Sequence Alignment Computational Biology Volume 26 Editors-in-Chief AndreasDress,CAS-MPGPartnerInstituteforComputationalBiology,Shanghai,China MichalLinial,HebrewUniversityofJerusalem,Jerusalem,Israel OlgaTroyanskaya,PrincetonUniversity,Princeton,NJ,USA MartinVingron,MaxPlanckInstituteforMolecularGenetics,Berlin,Germany EditorialBoard RobertGiegerich,UniversityofBielefeld,Bielefeld,Germany JanetKelso,MaxPlanckInstituteforEvolutionaryAnthropology,Leipzig,Germany GeneMyers,MaxPlanckInstituteofMolecularCellBiologyandGenetics,Dresden, Germany PavelA.Pevzner,UniversityofCalifornia,SanDiego,CA,USA AdvisoryBoard GordonCrippen,UniversityofMichigan,AnnArbor,MI,USA JoeFelsenstein,UniversityofWashington,Seattle,WA,USA DanGusfield,UniversityofCalifornia,Davis,CA,USA SorinIstrail,BrownUniversity,Providence,RI,USA ThomasLengauer,MaxPlanckInstituteforComputerScience,Saarbrücken,Germany MarcellaMcClure,MontanaStateUniversity,Bozeman,MO,USA MartinNowak,HarvardUniversity,Cambridge,MA,USA DavidSankoff,UniversityofOttawa,Ottawa,ON,Canada RonShamir,TelAvivUniversity,TelAviv,Israel MikeSteel,UniversityofCanterbury,Christchurch,NewZealand GaryStormo,WashingtonUniversityinSt.Louis,St.Louis,MO,USA SimonTavaré,UniversityofCambridge,Cambridge,UK TandyWarnow,UniversityofIllinoisatUrbana-Champaign,Champaign,IL,USA LonnieWelch,OhioUniversity,Athens,OH,USA The Computational Biology series publishes the very latest, high-quality research devotedtospecificissuesincomputer-assistedanalysisofbiologicaldata.Themain emphasis is on current scientific developments and innovative techniques in computationalbiology(bioinformatics),bringingtolightmethodsfrommathemat- ics, statistics and computer science that directly address biological problems currentlyunderinvestigation. The series offers publications that present the state-of-the-art regarding the problemsinquestion;showcomputationalbiology/bioinformaticsmethodsatwork; and finally discuss anticipated demands regarding developments in future methodology. Titles can range from focused monographs, to undergraduate and graduatetextbooks,andprofessionaltext/referenceworks. Moreinformationaboutthisseriesathttp://www.springer.com/series/5769 Dan DeBlasio • John Kececioglu Parameter Advising for Multiple Sequence Alignment 123 DanDeBlasio JohnKececioglu ComputationalBiologyDepartment DepartmentofComputerScience CarnegieMellonUniversity TheUniversityofArizona Pittsburgh,Pennsylvania,USA Tucson,Arizona,USA ISSN1568-2684 ComputationalBiology ISBN978-3-319-64917-7 ISBN978-3-319-64918-4 (eBook) https://doi.org/10.1007/978-3-319-64918-4 LibraryofCongressControlNumber:2017955035 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Toallmyfriendsandfamily DD ToDimitri,Lorene,andZoe JK Preface While multiple sequence alignment is essential to many biological analyses, its standardformulationsareallNP-complete.Duetobothitspracticalimportanceand computational difficulty, a plethora of heuristic multiple sequence aligners are in useinbioinformatics.Eachofthesetoolshasamultitudeofparameterswhichmust be set, and that greatly affect the quality of the output alignment. How to choose thebestparametersettingforauser’sinputsequencesisabasicquestion,andmost userssimplyrelyonthealigner’sdefaultsetting,whichmayproducealow-quality alignmentoftheirspecificsequences. Inthismonograph,wepresentanewgeneralapproachcalledparameteradvising for finding a parameter setting that produces a high-quality alignment for a given set of input sequences. In this framework, a parameter advisor is a procedure that automatically chooses a parameter setting for the aligner, and has two main ingredients: (a) the set of parameter choices considered by the advisor, and (b) an estimatorofalignmentaccuracyusedtorankalignmentsproducedbythealigner.On couplingaparameteradvisorwithanaligner,oncetheadvisoristrainedinalearning phase, the user simply inputs sequences to align and receives an output alignment fromthealigner,wheretheadvisorhasautomaticallyselectedtheparametersetting. Thebookisorganizedintwoparts:thefirstlaysoutthefoundationsofparameter advising, and the second provides applications and extensions of advising. The content examines formulations of parameter advising and their computational complexity, develops methods for learning good accuracy estimators, presents approximationalgorithmsforfindinggoodsetsofparameterchoices,andassesses software implementations of advising that perform well on real biological data. Alsoexploredareapplicationsofparameteradvisingtoadaptivelocalrealignment, where advising is performed on local regions of the sequences to automatically adapttovaryingmutationrates;andensemblealignment,whereadvisingisapplied to an ensemble of aligners to effectively yield a new aligner of higher quality than the individual aligners in the ensemble. Finally, future directions in advising researchareoffered. vii viii Preface This work arose from a series of joint research papers by the coauthors, that initiated and developed the theory and practice of parameter advising, and that formedthebasisofthefirstauthor’sdoctoraldissertation. Parameteradvisingisageneraltechnique,withthepotentialtobeofbroadutility beyondsequencealignment.Wehopethismonographencouragesotherstoexplore thisfruitfulareaofinvestigation. DanDeBlasio Pittsburgh,Pennsylvania JohnKececioglu Tucson,Arizona October2017 Acknowledgements The authors gratefully acknowledge funding from the US National Science Foun- dation, through Grant IIS-1217886 to John Kececioglu, and by a PhD fellowship to Dan DeBlasio from the University of Arizona IGERT Program in Comparative GenomicsthroughGrantDGE-0654435,whichmadethisresearchpossible.While a postdoctoral fellow at Carnegie Mellon University, Dan DeBlasio also received support from Carl Kingsford through Gordon and Betty Moore Foundation Grant GBMF4554,NSFGrantCCF-1256087,andNIHGrantR01HG007104. ix Contents 1 IntroductionandBackground............................................. 1 1.1 MultipleSequenceAlignment....................................... 1 1.2 ParameterAdvising .................................................. 3 1.3 RelatedApproaches.................................................. 6 1.3.1 AccuracyEstimation ....................................... 7 1.3.2 APrioriAdvising........................................... 10 1.3.3 Meta-alignment............................................. 11 1.3.4 ColumnConfidenceScoring............................... 12 1.3.5 RealignmentMethods...................................... 12 1.4 BackgroundonProteinStructure.................................... 13 1.5 Overview.............................................................. 14 PartI FoundationsofParameterAdvising 2 AlignmentAccuracyEstimation.......................................... 19 2.1 ConstructingEstimatorsfromFeatureFunctions................... 19 2.2 LearningtheEstimatorfromExamples............................. 21 2.2.1 FittingtoAccuracyValues................................. 21 2.2.2 FittingtoAccuracyDifferences............................ 23 3 TheFacetAccuracyEstimator.......................................... 29 3.1 FeatureFunctionsofanAlignment.................................. 29 3.2 SecondaryStructureBlockiness..................................... 30 3.3 SecondaryStructureAgreement..................................... 33 3.4 GapCoilDensity..................................................... 34 3.5 GapExtensionDensity............................................... 35 3.6 GapOpenDensity.................................................... 35 3.7 GapCompatibility.................................................... 36 3.8 SubstitutionCompatibility........................................... 36 3.9 AminoAcidIdentity ................................................. 37 xi

Description:
This book develops a new approach called parameter advising for finding a parameter setting for a sequence aligner that yields a quality alignment of a given set of input sequences. In this framework, a parameter advisor is a procedure that automatically chooses a parameter setting for the input, an
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.