ebook img

Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Evaluation PDF

244 Pages·2016·2.68 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Evaluation

(cid:2) MULTIPLE BIOLOGICAL SEQUENCE ALIGNMENT (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) (cid:2) WileySerieson Bioinformatics:ComputationalTechniquesandEngineering Acompletelistofthetitlesinthisseriesappearsattheendofthisvolume. (cid:2) (cid:2) MULTIPLE BIOLOGICAL SEQUENCE ALIGNMENT Scoring Functions, Algorithms and Applications KENNGUYEN XUANGUO YIPAN (cid:2) (cid:2) (cid:2) (cid:2) Copyright©2016byJohnWiley&Sons,Inc.Allrightsreserved PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey PublishedsimultaneouslyinCanada Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformor byanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise,exceptas permittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,withouteithertheprior writtenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfeeto theCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,(978)750-8400,fax (978)750-4470,oronthewebatwww.copyright.com.RequeststothePublisherforpermissionshould beaddressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ 07030,(201)748-6011,fax(201)748-6008,oronlineathttp://www.wiley.com/go/permissions. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyor completenessofthecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesof merchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysales representativesorwrittensalesmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitable foryoursituation.Youshouldconsultwithaprofessionalwhereappropriate.Neitherthepublishernor authorshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedto special,incidental,consequential,orotherdamages. Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,pleasecontactour CustomerCareDepartmentwithintheUnitedStatesat(800)762-2974,outsidetheUnitedStatesat (cid:2) (cid:2) (317)572-3993orfax(317)572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmay notbeavailableinelectronicformats.FormoreinformationaboutWileyproducts,visitourwebsiteat www.wiley.com. LibraryofCongressCataloging-in-PublicationData: Names:Nguyen,Ken,1975-author.|Guo,Xuan,1987-author.|Pan,Yi,1960- author. Title:Multiplebiologicalsequencealignment:scoringfunctions,algorithms andapplications/KenNguyen,XuanGuo,YiPan. Description:Hoboken,NewJersey:JohnWiley&Sons,2016.|Includes bibliographicalreferencesandindex. Identifiers:LCCN2016004186|ISBN9781118229040(cloth)|ISBN9781119273752 (epub) Subjects:LCSH:Sequencealignment(Bioinformatics) Classification:LCCQH441.N482016|DDC572.8–dc23LCrecordavailableat http://lccn.loc.gov/2016004186 CoverimagecourtesyofGettyImages/OktalStudio Typesetin10/12ptTimesLTStdbySPiGlobal,Chennai,India PrintedintheUnitedStatesofAmerica 10987654321 (cid:2) (cid:2) CONTENTS Preface xi (cid:2) 1 Introduction 1 (cid:2) 1.1 Motivation, 2 1.2 TheOrganizationofthisBook, 2 1.3 SequenceFundamentals, 3 1.3.1 Protein, 5 1.3.2 DNA/RNA, 6 1.3.3 SequenceFormats, 6 1.3.4 Motifs, 7 1.3.5 SequenceDatabases, 9 2 Protein/DNA/RNAPairwiseSequenceAlignment 11 2.1 SequenceAlignmentFundamentals, 12 2.2 Dot-PlotMatrix, 12 2.3 DynamicProgramming, 14 2.3.1 Needleman–Wunsch’sAlgorithm, 15 2.3.2 Example, 16 2.3.3 Smith–Waterman’sAlgorithm, 17 2.3.4 AffineGapPenalty, 19 2.4 WordMethod, 19 2.4.1 Example, 20 2.5 SearchingSequenceDatabases, 21 (cid:2) (cid:2) vi CONTENTS 2.5.1 FASTA, 21 2.5.2 BLAST, 21 3 QuantifyingSequenceAlignments 25 3.1 EvolutionandMeasuringEvolution, 25 3.1.1 JukesandCantor’sModel, 26 3.1.2 MeasuringRelatedness, 28 3.2 SubstitutionMatricesandScoringMatrices, 28 3.2.1 IdentityScores, 28 3.2.2 Substitution/MutationScores, 29 3.3 GAPS, 32 3.3.1 SequenceDistances, 35 3.3.2 Example, 35 3.4 ScoringMultipleSequenceAlignments, 36 3.4.1 Sum-of-PairScore, 36 3.5 CircularSumScore, 38 3.6 ConservationScoreSchemes, 39 3.6.1 WuandKabat’sMethod, 39 3.6.2 Jores’sMethod, 39 3.6.3 LocklessandRanganathan’sMethod, 40 3.7 DiversityScoringSchemes, 40 (cid:2) 3.7.1 Background, 41 (cid:2) 3.7.2 Methods, 41 3.8 StereochemicalPropertyMethods, 42 3.8.1 Valdar’sMethod, 43 3.9 HierarchicalExpectedMatchingProbabilityScoringMetric(HEP), 44 3.9.1 BuildinganAACCHScoringTree, 44 3.9.2 TheScoringMetric, 46 3.9.3 ProofofScoringMetricCorrectness, 47 3.9.4 Examples, 48 3.9.5 ScoringMetricandSequenceWeightingFactor, 49 3.9.6 EvaluationDataSets, 50 3.9.7 EvaluationResults, 52 4 SequenceClustering 59 4.1 UnweightedPairGroupMethodwithArithmeticMean – UPGMA, 60 4.2 Neighborhood-JoiningMethod – NJ, 61 4.3 OverlappingSequenceClustering, 65 5 MultipleSequencesAlignmentAlgorithms 69 5.1 DynamicProgramming, 70 5.1.1 DCA, 70 5.2 ProgressiveAlignment, 71 (cid:2) (cid:2) CONTENTS vii 5.2.1 ClustalFamily, 73 5.2.2 PIMA:Pattern-InducedMultisequenceAlignment, 73 5.2.3 PRIME:Profile-BasedRandomizedIterationMethod, 74 5.2.4 DIAlign, 75 5.3 ConsistencyandProbabilisticMSA, 76 5.3.1 POA:PartialOrderGraphAlignment, 76 5.3.2 PSAlign, 77 5.3.3 ProbCons:ProbabilisticConsistency-BasedMultipleSequence Alignment, 78 5.3.4 T-Coffee:Tree-BasedConsistencyObjectiveFunctionfor AlignmentEvaluation, 79 5.3.5 MAFFT:MSABasedonFastFourierTransform, 80 5.3.6 AVID, 81 5.3.7 EulerianPathMSA, 81 5.4 GeneticAlgorithms, 82 5.4.1 SAGA:SequenceAlignmentbyGeneticAlgorithm, 83 5.4.2 GAandSelf-OrganizingNeuralNetworks, 84 5.4.3 FAlign, 85 5.5 NewDevelopmentinMultipleSequenceAlignmentAlgorithms, 85 5.5.1 KB-MSA:Knowledge-BasedMultipleSequence Alignment, 85 5.5.2 PADT:ProgressiveMultipleSequenceAlignmentBasedon (cid:2) (cid:2) DynamicWeightedTree, 94 5.6 TestDataandAlignmentMethods, 97 5.7 Results, 98 5.7.1 MeasuringAlignmentQuality, 98 5.7.2 RT-OSMResults, 98 6 PhylogenyinMultipleSequenceAlignments 103 6.1 TheTreeofLife, 103 6.2 PhylogenyConstruction, 105 6.2.1 DistanceMethods, 106 6.2.2 Character-BasedMethods, 107 6.2.3 MaximumLikelihoodMethods, 109 6.2.4 Bootstrapping, 110 6.2.5 SubtreePruningandRe-grafting, 111 6.3 InferringPhylogenyfromMultipleSequenceAlignments, 112 7 MultipleSequenceAlignmentonHigh-PerformanceComputing Models 113 7.1 ParallelSystems, 113 7.1.1 Multiprocessor, 113 7.1.2 Vector, 114 (cid:2) (cid:2) viii CONTENTS 7.1.3 GPU, 114 7.1.4 FPGA, 114 7.1.5 ReconfigurableMesh, 114 7.2 ExitingParallelMultipleSequenceAlignment, 114 7.3 Reconfigurable-MeshComputingModels – (R-Mesh), 116 7.4 PairwiseDynamicProgrammingAlgorithms, 118 7.4.1 R-MeshMaxSwitches, 118 7.4.2 R-MeshAdder/Subtractor, 118 7.4.3 Constant-TimeDynamicProgrammingonR-Mesh, 120 7.4.4 AffineGapCost, 123 7.4.5 R-MeshOn/OffSwitches, 124 7.4.6 DynamicProgrammingBacktrackingonR-Mesh, 125 7.5 ProgressiveMultipleSequenceAlignmentONR-Mesh, 126 7.5.1 HierarchicalClusteringonR-Mesh, 127 7.5.2 ConstantRun-TimeSum-of-PairScoringMethod, 128 7.5.3 ParallelProgressiveMSAAlgorithmandItsComplexity Analysis, 129 8 SequenceAnalysisServices 133 8.1 EMBL-EBI:EuropeanBioinformaticsInstitute, 133 8.2 NCBI:NationalCenterforBiotechnologyInformation, 135 (cid:2) (cid:2) 8.3 GenomeNetandDataBankofJapan, 136 8.4 OtherSequenceAnalysisandAlignmentWebServers, 137 8.5 SeqAna:MultipleSequenceAlignmentwithQualityRanking, 138 8.6 PairwiseSequenceAlignmentandOtherAnalysisTools, 140 8.7 ToolEvaluation, 142 9 MultipleSequenceforNext-GenerationSequences 145 9.1 Introduction, 145 9.2 OverviewofNextGenerationSequenceAlignmentAlgorithms, 147 9.2.1 AlignmentAlgorithmsBasedonSeedingandHashTables, 147 9.2.2 AlignmentAlgorithmsBasedonSuffixTries, 151 9.3 Next-GenerationSequencingTools, 154 10 MultipleSequenceAlignmentforVariationsDetection 161 10.1 Introduction, 161 10.2 GeneticVariants, 163 10.3 VariationDetectionMethodsBasedonMSA, 165 10.4 EvaluationMethodology, 172 10.4.1 PerformanceMetrics, 172 10.4.2 SimulatedSequenceData, 174 10.4.3 RealSequenceData, 175 10.5 ConclusionandFutureWork, 176 (cid:2)

Description:
Covers the fundamentals and techniques of multiple biological sequence alignment and analysis, and shows readers how to choose the appropriate sequence analysis tools for their tasks This book describes the traditional and modern approaches in biological sequence alignment and homology search.  Thi
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.