ebook img

Integer Linear Programming in Computational and Systems Biology: An Entry-Level Text and Course PDF

432 Pages·2019·12.736 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Integer Linear Programming in Computational and Systems Biology: An Entry-Level Text and Course

Integer Linear Programming in Computational and Systems Biology Integerlinearprogramming(ILP)isaversatilemodelingandoptimization techniquethatisincreasinglyusedinnontraditionalwaysinbiology,with thepotentialtotransformbiologicalcomputation.However,fewbiologists know about it.This how-to and why-do text introduces ILP through the lensofcomputationalandsystemsbiology.Itusesin-depthexamplesfrom genomics,phylogenetics,RNAandproteinfolding,networkanalysis,can- cer,ecology,co-evolution,DNAsequencingandsequenceanalysis,pedi- greeandsiblinginference,haplotyping,clustering,andmoretoestablish thepowerofILP.Thisbookaimstoteachthelogicofmodelingandsolving problemswithILP,andtoteachthepractical“workflow”involvedinusing ILPinbiology. Written for a wide audience, with no biological or computational prerequisites,thisbookisappropriateforbothentry-levelandadvanced courses aimed at biological and computational students,and as a source forspecialists.Numerousexercisesandaccompanyingsoftware(inPython andPerl)demonstratetheconcepts. Dan Gusfield is Distinguished Professor of Computer Science at the University of California, Davis (UCD), and a fellow of the IEEE, the ACM,and the International Society of Computational Biology (ISCB). HispreviousbooksincludeTheStableMarriageProblem(withRobertW. Irving)andAlgorithmsonStrings,TreesandSequences;andReCombina- torics.HehasservedaschairofthecomputersciencedepartmentatUCD (2000–2004), and was the founding editor-in-chief of The IEEE/ACM TransactionsofComputationalBiologyandBioinformaticsuntilJanuary 2009.Hehasbeeninstrumentalinthedefinitionanddevelopmentofthe intersectionbetweencomputerscienceandcomputationalbiology. INTEGER LINEAR PROGRAMMING IN COMPUTATIONAL AND SYSTEMS BIOLOGY An Entry-Level Text and Course DAN GUSFIELD UniversityofCalifornia,Davis UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom OneLibertyPlaza,20thFloor,NewYork,NY10006,USA 477WilliamstownRoad,PortMelbourne,VIC3207,Australia 314–321,3rdFloor,Plot3,SplendorForum,JasolaDistrictCentre,NewDelhi–110025,India 79AnsonRoad,#06–04/06,Singapore079906 CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitofeducation, learning,andresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781108421768 DOI:10.1017/9781108377737 ©DanGusfield2019 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2019 PrintedandboundinGreatBritainbyClaysLtd,ElcografS.p.A. AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Gusfield,Dan,author. Title:Integerlinearprogrammingincomputationalandsystemsbiology:an entry-leveltextandcourse/DanGusfield,UniversityofCalifornia, Davis. Description:Cambridge,UnitedKingdom;NewYork,NY:CambridgeUniversity Press,2019.|Includesbibliographicalreferencesandindex. Identifiers:LCCN2018061722|ISBN9781108421768(hardback:alk.paper) Subjects:LCSH:Computationalbiology–Mathematicalmodels.|Systems biology–Mathematicalmodels.|Linearprogramming. Classification:LCCQH324.2.G872019|DDC570.285–dc23 LCrecordavailableathttps://lccn.loc.gov/2018061722 ISBN978-1-108-42176-8Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracy ofURLsforexternalorthird-partyinternetwebsitesreferredtointhispublication anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. DedicatedtoDickKarp,mentorandrolemodeltomefor45yearsand counting.ThankyouDickforallyourwisdom,support,andfriendship. Contents Introduction to the Book and Course pagexi I.1 WhyIntegerProgramming? xi I.2 AboutThisBookandCourse xii I.3 TasksandSkills xiii I.4 WhyLearnfromMe? xiv I.5 TheOrganizationoftheBook xv I.6 NoMath xvii I.7 Acknowledgments xvii Part I 1 A Flyover Introduction to Integer Linear Programming 3 1.1 LinearProgramming(LP)andItsUse 3 1.2 IntegerLinearProgramming(ILP) 11 1.3 ExpressibilityofIntegerLinearFormulations 14 2 Biological Networks,Graphs,and High-Density Subgraphs 15 2.1 BiologicalGraphsandNetworks 15 2.2 TheMaximum-CliqueProblemandItsSolutionUsingILP 29 2.3 BoundsandGurobiProgressReporting 44 3 Maximum Character Compatibility in Phylogenetics 49 3.1 BasicDefinitions 49 3.2 PhylogeneticTrees 50 3.3 PerfectPhylogenyandCancer 53 3.4 CharacterRemovalinthePerfect-PhylogenyModel 54 3.5 MinimumCharacterRemoval(MCR)intheStudyofCancer 60 4 Near Cliques,Dense Subgraphs,and Motifs in Biological Networks 65 4.1 NearCliques 65 4.2 InvertingtheNear-CliqueProblem 72 4.3 AShortInterruption:OurFirstILPIdioms 73 4.4 ReturntoNearCliques 75 vii viii Contents 4.5 Finally:TheLargestHigh-DensitySubgraphProblem 77 4.6 MotifSearchingviaCliqueFinding 81 5 Convergent and Maximum Parsimony Problems in Phylogenetics 89 5.1 PhylogeneticsviaMaximumParsimony 90 5.2 ImprovingthePracticality 98 5.3 Software 101 5.4 ConcertedConvergentEvolution:CliquesAgain! 101 5.5 CatchingInfeasibilityErrorsUsingAnIISinGurobi 102 6 The RNA-Folding Problem 105 6.1 ACrudeFirstModelofRNAFolding 105 6.2 MoreComplexBiologicalEnhancements 114 6.3 FoldPredictionUsingaKnownRNAStructure 121 7 Protein Problems Solved By Integer Programming 122 7.1 TheProteinSide-ChainPositioningProblem 122 7.2 ProteinFoldingviatheHPModel 128 7.3 PredictingDomain-DomainInteractioninProteins 137 8 Tanglegrams and Coevolution 142 8.1 IntroductiontoCoevolution 142 8.2 Tree Drawings,SubtreeExchanges,andtheTanglegram Problem 145 8.3 LogicforSolvingtheTanglegramProblem 147 8.4 AnILPFormulation 148 8.5 SoftwarefortheTanglegramProblem 153 8.6 TheIf-XORIdiomforBinaryVariables 154 9 Traveling Salesman Problems in Genomics 156 9.1 TheTravelingSalesmanProblem(TSP)inGenomics 156 9.2 TheTravelingSalesmanProblem(TSP) 157 9.3 TSProblemsinDNASequencingandAssembly 159 9.4 MarkerOrdering:ADifferentFragmentLayoutProblem 162 9.5 FindingSignalingPathwaysinCervicalCancer 169 (cid:2) 9.6 AnILPSolutiontotheTSTourProblemonG 169 9.7 EfficiencyandAlternativeFormulations 176 9.8 TheSubtour-EliminationApproachtoSolvingTSProblems 180 9.9 ExtendedModelingExercises 185 10 Integer Programming in Molecular Sequence Analysis 186 10.1 TheImportanceofSequenceAnalysis 186 10.2 TheStringSite-RemovalProblem(SSRP) 187 10.3 RepresentativeSequences 191 10.4 TheLongestCommon-Subsequence(LCS)Problem 196 10.5 OptimalPathogenandSpeciesBarcoding 199 11 Metabolic Networks and Metabolic Engineering 205 11.1 BooleanNetworks 205 Contents ix 11.2 ExtendingBooleanNetworks,withILP 211 11.3 FantasyNetworkAnalysis 216 11.4 Time:TheFinalFrontier 219 12 ILP Idioms 221 12.1 GeneralIf-ThenIdiomsforLinearFunctionswithBinary Variables 221 12.2 GeneralOnly-If IdiomsforLinearFunctionsandBinary Variables 224 12.3 ExploitingtheIdioms 225 12.4 TheKeytotheIdioms 229 Part II 13 Communities,Cuts,and High-Density Subgraphs 235 13.1 CommunityDetectioninaNetwork 235 13.2 Cuts:Max,Min,andMulti 247 13.3 High-DensitySubgraphs:ARefinementforLarge, SparseGraphs 256 14 Character Compatibility with Corrupted Data and Generalized Phylogenetic Models 260 14.1 HandlingMissingandCorruptedDatainCharacter CompatibilityProblems 260 14.2 HandlingBothMissingandCorruptedData 264 14.3 BacktotheArtifactProbleminPancreaticCancer 266 14.4 AnExtensionofPerfectPhylogenytoLess RestrictedModels 268 15 More Tanglegrams,More Trees,More ILPs 273 15.1 MinimizingSubtreeExchangesWithinAnOptimalSolution 273 15.2 ADistance-BasedObjectiveFunctionforTanglegrams 274 15.3 AnILPFormulation 275 15.4 RootedSubtree-Prune-and-Regraft(rSPR)Distance 281 16 Return to Steiner Trees and Maximum Parsimony 287 16.1 TheSteiner-TreeProblemandExtensions 287 16.2 iPoint:DeducingProteinPathways 288 16.3 MaximumParsimonyandDuctalCarcinomaProgression 290 17 Exploiting and Leveraging Protein Networks 295 17.1 Example 1:Exploiting PPI Networks to Find Disease- RelatedProteins 295 17.2 Example2:LeveragingPPIKnowledgeAcrossSpecies 298 17.3 Example3:IdentifyingDriverGenesinCancer 309 18 More String and Sequence Problems Solved by ILP 313 18.1 TheLCSProblemforMultipleStrings 313 18.2 TransformingGeneOrderbyReversals 316

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.