Methods in Molecular Biology 1883 Guido Sanguinetti Vân Anh Huynh-Thu Editors Gene Regulatory Networks Methods and Protocols M M B E T H O D S I N O L E C U L A R I O L O G Y SeriesEditor JohnM.Walker SchoolofLifeandMedicalSciences UniversityofHertfordshire Hatfield,Hertfordshire,AL109AB,UK Forfurthervolumes: http://www.springer.com/series/7651 Gene Regulatory Networks Methods and Protocols Editedby Guido Sanguinetti SchoolofInformatics,UniversityofEdinburgh,Edinburgh,UK Vân Anh Huynh-Thu DepartmentofElectricalEngineeringandComputerScience,UniversityofLiège,Liège,Belgium Editors GuidoSanguinetti VânAnhHuynh-Thu SchoolofInformatics DepartmentofElectricalEngineering UniversityofEdinburgh andComputerScience Edinburgh,UK UniversityofLiège Liège,Belgium ISSN1064-3745 ISSN1940-6029 (electronic) MethodsinMolecularBiology ISBN978-1-4939-8881-5 ISBN978-1-4939-8882-2 (eBook) https://doi.org/10.1007/978-1-4939-8882-2 LibraryofCongressControlNumber:2018962962 ©SpringerScience+BusinessMedia,LLC,partofSpringerNature2019 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,reproduction onmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorageandretrieval,electronicadaptation, computersoftware,orbysimilarordissimilarmethodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelawsand regulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookarebelieved tobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditorsgiveawarranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.Thepublisherremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of SpringerNature. Theregisteredcompanyaddressis:233SpringStreet,NewYork,NY10013,U.S.A. Preface High-throughput technologies have brought about a revolution in molecular biology. Over the last two decades, the research paradigm has moved from a characterization of individualgenesandtheirfunctionincreasinglytowardasystems-levelappreciationofhow the complex interactions between multiple genes shape the dynamics and functions of biological systems. At the same time, the computational and statistical challenges posed by the interpretation of such data have motivated an exciting cross-fertilization between the disciplines of biology and the mathematical and computational sciences, leading to the birthoftheinterdisciplinaryfieldofsystemsbiology. Acrucialcomputationaltaskinsystemsbiologyistheso-calledreverseengineeringtask: givenobservationsofmultiplebiologicalfeatures(e.g.,proteinlevels)acrossdifferenttime points/conditions, determine computationally the interaction structure (the network) that best explains the data. Within the context of modeling gene expression, this is the task of inferringgeneregulatorynetworks(GRNs)fromdata. GRN inference has been a major challenge in systems biology for nearly two decades, and, while challenges still abound, it is rapidly reaching maturity both in terms of the concepts involved, and in terms of the software tools available. In this book, we aim to take stock of the situation, providing an overview of methods that cover the majority of recentdevelopments,aswellasindicatingthepathforwardforfuturedevelopments. The book opens with a tutorial overview of the main biological and mathematical concepts and a survey of the current software landscape. This is meant to be an entry level chapter, which the interested, graduate-level practitioner (either computational or biological) can consult as a rough guide to the concepts described more in detail in the more technical chapters in the book. The next two chapters then focus on Bayesian methods to infer networks from time varying data, while Chapters 4 and 5 describe how to attempt to extract causal information (as opposed to purely correlative) from biological data. Chapters 6 and 7 describe network inference techniques in the presence of multiple heterogeneous data sets, while Chapters 8 and 9 focus on nonparametric and hybrid statistical methods for network inference. The following five Chapters 10–14 focus on the idea of inference of different (but related) networks, arising either from intrinsic heterogeneity(suchasinsingle-celldata)orduetomultipleconditionsbeingassayed,and furtherexploreconceptsofdifferentialnetworksandnetworkstability.Finally, thelasttwo chapters focus more on a mechanistic view of the biological process, covering methods for exploringnetworkswithinlarge,mechanisticmodelsofbiologicaldynamics. As most books, this volume presents an incomplete snapshot of an evolving field, and, given the considerable research activity in this area, it is clear that we can look forward to considerable progress within the next decades. Our hope is that this collection will be instrumental in assessing the current state of the art and in focusing research on the commonchallengesfacedbythefield. Edinburgh,UK GuidoSanguinetti Liège,Belgium VânAnhHuynh-Thu v Contents Preface.................................................................................... v Contributors.............................................................................. ix 1 GeneRegulatoryNetworkInference:AnIntroductorySurvey................... 1 VânAnhHuynh-ThuandGuidoSanguinetti 2 StatisticalNetworkInferenceforTime-VaryingMolecularData withDynamicBayesianNetworks.................................................. 25 FrankDondelingerandSachMukherjee 3 OverviewandEvaluationofRecentMethodsforStatisticalInference ofGeneRegulatoryNetworksfromTimeSeriesData............................ 49 MarcoGrzegorczyk,AndrejAderholdandDirkHusmeier 4 Whole-TranscriptomeCausalNetworkInferencewithGenomic andTranscriptomicData........................................................... 95 LingfeiWangandTomMichoel 5 CausalQueriesfromObservationalDatainBiologicalSystemsviaBayesian Networks:AnEmpiricalStudyinSmallNetworks ................................ 111 AlexWhiteandMatthieuVignes 6 A Multiattribute Gaussian Graphical Model for Inferring Multiscale RegulatoryNetworks:AnApplicationinBreastCancer .......................... 143 JulienChiquet,GuillemRigaill,andMartinaSundqvist 7 Integrative Approaches for Inference of Genome-Scale Gene Regulatory Networks............................................................................ 161 AlirezaFotuhiSiahpirani,DeborahChasman,andSushmitaRoy 8 Unsupervised Gene Network Inference with Decision Trees and Random Forests .............................................................................. 195 VânAnhHuynh-ThuandPierreGeurts 9 Tree-Based Learning of Regulatory Network Topologies and Dynamics withJump3 ......................................................................... 217 VânAnhHuynh-ThuandGuidoSanguinetti 10 NetworkInferencefromSingle-CellTranscriptomicData........................ 235 HelenaTodorov,RobrechtCannoodt,WouterSaelens,andYvanSaeys 11 InferringGeneRegulatoryNetworksfromMultipleDatasets.................... 251 ChristopherA.Penfold,IuliaGherman,AnastasiyaSybirna, andDavidL.Wild 12 UnsupervisedGRNEnsemble ..................................................... 283 PauBellot,PhilippeSalembier,NgocC.Pham,andPatrickE.Meyer 13 Learning Differential Module Networks Across Multiple Experimental Conditions.......................................................................... 303 PauErola,EricBonnet,andTomMichoel vii viii Contents 14 StabilityinGRNInference......................................................... 323 GiuseppeJurman,MicheleFilosi,RobertoVisintainer, SamanthaRiccadonna,andCesareFurlanello 15 GeneRegulatoryNetworks:APrimerinBiologicalProcessesandStatistical Modelling........................................................................... 347 OliviaAngelin-Bonnet,PatrickJ.BiggsandMatthieuVignes 16 ScalableInferenceofOrdinaryDifferentialEquationModels ofBiochemicalProcesses........................................................... 385 FabianFröhlich,CarolinLoos,andJanHasenauer Index ..................................................................................... 423 Contributors ANDREJADERHOLD (cid:129) SchoolofMathematicsandStatistics,UniversityofGlasgow,Glasgow, UK OLIVIA ANGELIN-BONNET (cid:129) Institute of Fundamental Sciences, Palmerston North, New Zealand PAUBELLOT (cid:129) CentreforResearchinAgriculturalGenomics(CRAG),CSIC-IRTA-UAB- UBConsortium,Bellaterra,Barcelona,Spain PATRICK J. BIGGS (cid:129) Institute of Fundamental Sciences, Palmerston North, New Zealand; SchoolofVeterinaryScience,MasseyUniversity,PalmerstonNorth,NewZealand ERIC BONNET (cid:129) Centre Nationale de Recherche en Genomique Humaine, Institut de BiologieFrancoisJacob,DirectiondelaRechercheFondamentale,CEA,Evry,France ROBRECHT CANNOODT (cid:129) Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium; Center for Medical Genetics, Ghent University Hospital,Ghent,Belgium DEBORAH CHASMAN (cid:129) WisconsinInstituteforDiscovery,UniversityofWisconsin-Madison, Madison,WI,USA JULIEN CHIQUET (cid:129) UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris,France FRANK DONDELINGER (cid:129) LancasterMedicalSchool,LancasterUniversity,Lancaster,UK PAU EROLA (cid:129) Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh,Midlothian,Scotland,UK MICHELE FILOSI (cid:129) CIBIO,UniversityofTrento,Trento,Italy FABIAN FRÖHLICH (cid:129) Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany; Center for Mathematics, Technische Universität München, Garch- ing,Germany CESARE FURLANELLO (cid:129) FondazioneBrunoKessler,Trento,Italy PIERREGEURTS (cid:129) DepartmentofElectricalEngineeringandComputerScience,University ofLiège,Liège,Belgium IULIA GHERMAN (cid:129) Warwick Integrative Synthetic Biology Centre, School of Engineering, UniversityofWarwick,Coventry,UK MARCOGRZEGORCZYK (cid:129) JohannBernoulliInstitute,UniversityofGroningen,Groningen, TheNetherlands JAN HASENAUER (cid:129) Institute of Computational Biology, Helmholtz Zentrum Muenchen, Neuherberg, Germany; Center for Mathematics, Technische Universität München, Garch- ing,Germany DIRK HUSMEIER (cid:129) School of Mathematics and Statistics, University of Glasgow, Glasgow, UK VÂN ANH HUYNH-THU (cid:129) Department of Electrical Engineering and Computer Science, UniversityofLiège,Liège,Belgium GIUSEPPE JURMAN (cid:129) FondazioneBrunoKessler,Trento,Italy CAROLIN LOOS (cid:129) Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany; Center for Mathematics, Technische Universität München, Garching,Germany ix x Contributors PATRICKE.MEYER (cid:129) BioinformaticsandSystemsBiology(BioSys)Unit,UniversitédeLiège, Liège,Belgium TOM MICHOEL (cid:129) Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian, Scotland, UK; Computational Biology Unit, Department of Informatics,UniversityofBergen,Bergen,Norway SACH MUKHERJEE (cid:129) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany CHRISTOPHER A. PENFOLD (cid:129) Wellcome/CRUK Gurdon Institute, University of Cam- bridge,Cambridge,UK NGOC C. PHAM (cid:129) Bioinformatics and Systems Biology (BioSys) Unit, Universite de Liège, Liège,Belgium SAMANTHA RICCADONNA (cid:129) FondazioneEdmundMach,SanMicheleall’Adige,Italy GUILLEM RIGAILL (cid:129) Institute of Plant Sciences Paris-Saclay, UMR 9213/UMR1403, CNRS, INRA, Université Paris -Sud, Université d’Evry, Université Paris-Diderot, SorbonneParis-Cité,Paris,France;LaboratoiredeMathématiquesetModélisationd’Evry (LaMME), Université d’Evry, Val d’Essonne, UMR CNRS 8071, ENSIIE, USC INRA, Paris,France SUSHMITA ROY (cid:129) Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA; Department of Biostatistics and Medical Informatics, University ofWisconsin-Madison,Madison,WI,USA WOUTERSAELENS (cid:129) DataMiningandModellingforBiomedicine,VIBCenterforInflam- mationResearch,Ghent,Belgium;DepartmentofAppliedMathematics,ComputerScience andStatistics,GhentUniversity,Ghent,Belgium YVAN SAEYS (cid:129) DataMiningandModellingforBiomedicine,VIBCenterforInflammation Research, Ghent, Belgium; Department of Applied Mathematics, Computer Science and Statistics,GhentUniversity,Ghent,Belgium PHILIPPE SALEMBIER (cid:129) UniversitatPolitecnicadeCatalunya,Barcelona,Spain GUIDO SANGUINETTI (cid:129) SchoolofInformatics,UniversityofEdinburgh,Edinburgh,UK ALIREZAFOTUHISIAHPIRANI (cid:129) WisconsinInstituteforDiscovery,UniversityofWisconsin- Madison,Madison,WI,USA;DepartmentofComputerSciences,UniversityofWisconsin- Madison,Madison,WI,USA MARTINA SUNDQVIST (cid:129) UMR MIA-Paris, AgroTechParis, INRA, Université Paris- Saclay,Paris,France ANASTASIYA SYBIRNA (cid:129) Wellcome/CRUK Gurdon Institute, University of Cambridge, Cambridge, UK; Wellcome/MRC Cambridge Stem Cell Institute, University of Cam- bridge,Cambridge,UK;Physiology,DevelopmentandNeuroscienceDepartment,Univer- sityofCambridge,Cambridge,UK HELENA TODOROV (cid:129) Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium; Department of Applied Mathematics, Com- puter Science and Statistics, Ghent University, Ghent, Belgium; Centre International de Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308,ÉcoleNormaleSupérieuredeLyon,UnivLyon,Lyon,France MATTHIEU VIGNES (cid:129) Institute of Fundamental Sciences, Massey University, Palmerston North,NewZealand ROBERTO VISINTAINER (cid:129) CIBIO,UniversityofTrento,Trento,Italy LINGFEI WANG (cid:129) Division of Genetics and Genomics, The Roslin Institute, The University ofEdinburgh,Midlothian,Scotland,UK Contributors xi ALEX WHITE (cid:129) Institute of Fundamental Sciences, Massey University, Palmerston North, NewZealand DAVID L. WILD (cid:129) Department of Statistics and Systems Biology Centre, University of Warwick,Coventry,UK