ebook img

Analysis and Enumeration: Algorithms for Biological Graphs PDF

151 Pages·2015·4.25 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Analysis and Enumeration: Algorithms for Biological Graphs

Atlantis Studies in Computing Series Editors: Jan A. Bergstra · Michael W. Mislove Andrea Marino Analysis and Enumeration Algorithms for Biological Graphs Atlantis Studies in Computing Volume 6 Series editors Jan A. Bergstra, Amsterdam, The Netherlands Michael W. Mislove, New Orleans, USA Aims and Scope of the Series The series aims at publishing books in the areas of computer science, computer andnetworktechnology,ITmanagement,informationtechnologyandinformatics from the technological, managerial, theoretical/fundamental, social or historical perspective. We welcome books in the following categories: Technical monographs: these will be reviewed as to timeliness, usefulness, relevance, completeness and clarity of presentation. Textbooks. Books of a more speculative nature: these will be reviewed as to relevance and clarity of presentation. For more information on this series and our other book series, please visit our website at: www.atlantis-press.com/publications/books Atlantis Press 29, avenue Laumière 75019 Paris, France More information about this series at http://www.springer.com/series/10530 Andrea Marino Analysis and Enumeration Algorithms for Biological Graphs Andrea Marino Dipartimentodi Informatica Milan Italy ISSN 2212-8557 ISSN 2212-8565 (electronic) Atlantis Studies inComputing ISBN 978-94-6239-096-6 ISBN 978-94-6239-097-3 (eBook) DOI 10.2991/978-94-6239-097-3 LibraryofCongressControlNumber:2015933151 ©AtlantisPressandtheauthors2015 Thisbook,oranypartsthereof,maynotbereproducedforcommercialpurposesinanyformorbyany means, electronic or mechanical, including photocopying, recording or any information storage and retrievalsystemknownortobeinvented,withoutpriorpermissionfromthePublisher. Printedonacid-freepaper To My Parents, Maria, Giovanna, Marco, and Alessandro, Lucilla. Foreword 1 TheItalianChapteroftheEATCS(EuropeanAssociationforTheoreticalComputer Science) was founded in 1988, and aims at facilitating the exchange of ideas and resultsamongItaliantheoreticalcomputerscientists,andatstimulatingcooperation between the theoretical and the applied communities in Italy. One of the major activities of this Chapter is to promote research in theoretical computer science, stimulating scientific excellence by supporting and encouraging theverybestandcreativeyoungItaliantheoreticalcomputerscientists.Thisisdone alsobysponsoringaprizeforthebestPh.D.thesis.Aninterdisciplinarycommittee selects the best two Ph.D. theses, among those defended in the previous year, one on the themes of Algorithms, Automata, Complexity and Game Theory and the other on the themes of Logics, Semantics and Programming Theory. In 2012 we started a cooperation with Atlantis Press so that the selected Ph.D. theses would be published as volumes in the Atlantis Studies in Computing. The present volume contains one of the two theses selected for publication in 2014: Type Disciplines for Systems Biology by Livio Bioglio (supervisor: Prof. Mariangiola Dezani, University of Torino, Italy) and AlgorithmsforBiologicalGraphs:AnalysisandEnumerationbyAndreaMarino (supervisor: Prof. Pierluigi Crescenzi, University of Firenze, Italy). The scientific committee which selected these theses was composed of Profs. FrancoBarbanera(UniversityofCatania),ArturoCarpi(UniversityofPerugia)and Rossella Petreschi (Sapienza University of Rome). They gave the following reasons to justify the assignment of the award to the thesis by Andrea Marino: ThePh.D.dissertation“Algorithmsforbiologicalgraphs:analysisandenumer- ation”byAndreaMarinodealswithefficientalgorithmsforenumeration problems ongraphs.Themainapplicationfieldsforthesealgorithmsarebiologicalandsocial networks,forwhichdatacanbeconvenientlymodeledasgraphs.Thisthesispresents both deep theoretical results and extensive experimental implementations. vii viii Foreword1 Moreover,inChap.2,anoverviewofbasictechniquesusedforenumerationalgo- rithms is reported. Namely in this thesis it is possible to find algorithms for enumerating: (cid:129) all diametral and radial vertices; (cid:129) allmaximaldirectedacyclicsub-graphsofwhichsourcesandtargetsbelongtoa predefined subset of the vertices (stories); (cid:129) all cycles and/or paths in an undirected graph; (cid:129) all pairs of (s, t)-paths sharing only nodes s and t ((s, t)-bubbles). Summarizing,thisthesiscontainsseveralimportantcontributionsintheareaof graph algorithms and can be considered an important reference for all the researchers that have to work with enumerating problems. I would like to thank the members of the scientific committee, and I hope that this initiative will further contribute to strengthen the sense of belonging to the same community of all the young researchers that have accepted the challenges posed by any branch of theoretical computer science. Rome, January 2015 Tiziana Calamoneri President of the Italian Chapter of EATCS Foreword 2 The development of algorithms for enumerating all possible solutions of a specific combinatorial problem has a long history, which dates back to, at least, the 1960s, when the problem of enumerating some specific graph-theoretic structures (such shortest paths and cycles) has been attacked. As already observed by David Eppstein in 1997, these enumeration problems have several applications, such as (1) looking for structures which satisfy some additional constraints which are hard to optimize, (2) evaluating the quality of a model for a specific problem, in terms ofthenumberofincorrectstructures,(3)computinghowsensitivethestructuresare to variation of some problem’s parameters and (4) examining not just the optimal structures, but a larger class of structures, to gain a better understanding of the problem. As a matter of fact, in the last 50 years a large variety of enumeration problemshavebeenconsideredintheliterature,rangingfromgeometryproblemsto graph and hypergraph problems, from order and permutation problems to logic problems,andfromsetproblemstostringproblems.Averyrecentcompendiumhas beencompiledbyKunihiroWasa,whichincludes350combinatorialproblemsand more than 230 references. Nevertheless, the research area of enumeration algo- rithmsisstillveryactiveandstillincludesmanyinterestingopenproblems.Thisis where this book comes into play, by first presenting an overview of the main computational issues related to the design and analysis of enumeration algorithms, and by then contributing to this research area with several significant results, both theoretical and experimental. Although the emphasis of the book is on enumeration problems, it is worth noting that the original main application area of the thesis of Andrea Marino has been computational biology. Indeed, in the previous years, biologists have accu- mulated a huge amount of information, at different levels of observation, from the molecular level to the population one. This information usually describes interac- tions or relationships among entities of biological nature, and they are often rep- resentedbymeansofnetworks(or,equivalently,graphs).Graphsallowresearchers to abstract from the specific individual information: the complexity of a biological entity is enclosed into a vertex of the network and the complex interaction ix x Foreword2 mechanismsbetweentwoentitiesaresimplydescribedbymeansofanarc.Clearly, thebiologicalapplication determinesthemeaning ofthenodesand ofthearcs and influences the network topology: typical networks at the molecular level are gene regulation networks, protein interaction networks and metabolic networks, while typical networks at the macroscopic level are, instead, phylogenetic networks and ecological networks. Reducing problems arising in biology to the analysis of net- works allows us to take advantage of the many results and algorithmic techniques that have been developed in graph theory and, more recently, in the analysis of complex networks. In other words, the observation of biological phenomena is turnedintotheobservationofthenetwork,ofitsstructureandofitsproperties.The network becomes a tool to investigate the macromolecular interactions at the level of genes, metabolites and proteins to extract the cellular phenotypes, or the con- glomerate of several cellular processes resulting from the expression of the genes and of the proteins. Themaingoalofthisbookistheapplicationofalgorithmdesignandcomplexity analysistechniquestotheanalysisofbiological(and,moreingeneral,ofcomplex) networks,byfocusingmainlyontopologicalpropertycomputationandsubnetwork extraction tasks. Several quantifiable tools of network theory offer unforeseen possibilities to understand biological network organization and evolution. Some well-known examples of these tools are measures like the degree distribution, the diameter (that is, the longest shortest path) and the clustering coefficient. These topologicalpropertiesofbiologicalnetworkscanbeseenastheresultofanetwork evolutionprocess:hence,onecanformulateevolvingnetworkmodelsforbiological networkswhichproducenetworksconsistentwiththeabovetopologicalproperties. Thisimpliesthatefficientalgorithmshavetobedesignedinordertocomputethese properties in a very little amount of time and (maybe more importantly) of space (notethat,sometimes, evenpolynomial-time/spacealgorithmsmightturnouttobe tooexpensive ifamassiveexperimentationhas tobedoneand/or ifthe size ofthe network is quite large). For what concerns the second task, that is, subnetwork extraction,observethat,ingeneralterms,thistaskconsistsinextractingasubgraph that best explains the relationships between a given set of nodes of interest in a graph. A typical example in communication networks of such a problem is the Steinertreeproblemwhichconsistsinfindingthelightesttreeconnectingaspecific subset of vertices of the network. Subnetwork extraction is a common tool while studying biological networks: for example, in 2010, Faust et al. investigated six different approaches, all based on subnetwork extraction, to extract relevant path- waysfrommetabolicnetworks.Oneofthemainissueswiththesubgraphextraction approachistodeterminethekindofsubgraphtobeextracted,whichclearlyhasto bemeaningful fromabiologicalpointofview.Afterthat,eveninthiscaseitturns out that most of the times the extraction of desired subgraphs is a computationally difficult problem. Finally, as it is common in the bioinformatics research area, finding one subgraph is not usually enough: no clear optimization criterium is usuallyknown,sothattheproblembecomesevenmoredifficultsinceitrequiresto enumerate all possible subgraphs.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.