ETH Library Estimating Causal Networks from Multivariate Observational Data Doctoral Thesis Author(s): Nowzohour, Christopher Publication date: 2015 Permanent link: https://doi.org/10.3929/ethz-a-010564540 Rights / license: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information, please consult the Terms of use. Diss. ETH No. 22930 Estimating Causal Networks from Multivariate Observational Data. A dissertation submitted to ETH ZURICH for the degree of Doctor of Sciences presented by CHRISTOPHER NOWZOHOUR Master of Science, University of Oxford born July 2, 1986 citizen of Germany accepted on the recommendation of Prof. Dr. Peter Bühlmann, examiner Prof. Dr. Marloes Maathuis, co-examiner 2015 To my family Acknowledgements First, I wouldliketo thankmysupervisor, Prof.Peter Bühlmann, who was ever encouraging and set an example balancing an unbelievable number of personal and professional committments. I am also grateful to Prof. Marloes Maathuis, who co-supervised me during the second part of my doctoral studies and whose enthusiasm and attention to detail made our collaboration very enjoyable. Without the encourage- ment of Prof. Nicolai Meinshausen, while being a student at Oxford, I would have never considered a PhD in statistics. IwouldnothavegottenthroughmyPhDwithoutthecopiousamounts of chocolate and good vibes, generously supplied by my two amazing officematesAnnaandEwa. Iwon’tforgettheclimbsdonewithAlain, theSalsadancingwithSophie,ortheroundsofCrazyDogplayedwith Jan, Anna, Jonas, Ruben, and all the others! I am thankful to all SfS colleagues for the great atmosphere at our institute, be it at work or outside. Finally,Iamverygratefultomyparentsandmysisterfortheirlasting support and encouragement. v Contents Abstract ix Zusammenfassung xi 1. Introduction 1 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1. Causal Models . . . . . . . . . . . . . . . . . . . 4 1.1.2. Structure Learning . . . . . . . . . . . . . . . . . 8 1.2. Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . 10 2. Score-based Causal Learning in Additive Noise Models 13 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2. The Method . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1. Notation and Definitions. . . . . . . . . . . . . . 17 2.2.2. Penalized maximum likelihood estimation . . . . 19 2.3. Theoretical Results . . . . . . . . . . . . . . . . . . . . . 21 2.4. Numerical Results . . . . . . . . . . . . . . . . . . . . . 24 2.4.1. Identifiability depending on Linearity and Gaus- sianity . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2. Random Edge Functions . . . . . . . . . . . . . . 25 2.4.3. Larger Networks and Thresholding . . . . . . . . 27 2.5. Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.A. Consistency Proof . . . . . . . . . . . . . . . . . . . . . 31 3. Structure Learning with Bow-free Acyclic Path Dia- grams 39 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2. Model and Estimation . . . . . . . . . . . . . . . . . . . 45 3.2.1. Graph Terminology . . . . . . . . . . . . . . . . 45 vii Contents 3.2.2. The Model . . . . . . . . . . . . . . . . . . . . . 46 3.2.3. Penalized Maximum Likelihood . . . . . . . . . . 48 3.2.4. Equivalence Properties . . . . . . . . . . . . . . . 49 3.3. Greedy Search . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.1. Score Decomposition . . . . . . . . . . . . . . . . 52 3.3.2. Uniformly Random Restarts. . . . . . . . . . . . 53 3.3.3. Greedy Equivalence Class Construction . . . . . 55 3.3.4. Implementation . . . . . . . . . . . . . . . . . . . 56 3.4. Empirical Results . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1. Causal Effects Discovery on Simulated Data . . . 56 3.4.2. Genomic Data . . . . . . . . . . . . . . . . . . . 58 3.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.A. Distributional Equivalence . . . . . . . . . . . . . . . . . 62 3.B. Likelihood Separation . . . . . . . . . . . . . . . . . . . 71 4. Conclusions and Future Work 73 4.1. Specific Extensions for DAG learning . . . . . . . . . . . 74 4.2. Specific Extensions for BAP learning . . . . . . . . . . . 74 A. 3-node DAGs 77 A.1. Full . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.2. Double Sink . . . . . . . . . . . . . . . . . . . . . . . . . 78 A.3. Double Source . . . . . . . . . . . . . . . . . . . . . . . 78 A.4. Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.5. Single Edge . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.6. Empty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 B. Equivalence Classes for 3-node BAPs 81 B.1. Finding Equivalent BAPs . . . . . . . . . . . . . . . . . 81 B.2. Finding Equivalence Classes . . . . . . . . . . . . . . . . 82 C. Algorithms 87 C.1. DAG Learning Algorithms . . . . . . . . . . . . . . . . . 87 C.2. BAP Learning Algorithms . . . . . . . . . . . . . . . . . 89 Bibliography 95 viii Abstract The field of statistical causal inference is concerned with estimating cause-effect relationships between some variables from i.i.d. observa- tions. Thisisimpossibleingeneral,e.g.onecannotdistinguishX →Y from X ← Y without making further assumptions. However, when more variables are involved or certain structural or distributional as- sumptionsaremade,causalinferencebecomespossible. Thisisrelevant forapplications, whererandomizedexperimentsarenotfeasibletotest causal hypotheses (econometrics) or the large number of hypotheses requires some kind of pre-screening (genomics). This thesis is about structure learning, which means estimating the underlying causal graph from data. Specifically, the focus of interest is score-based methods, which assign every possible causal graph a numeric score (depending on the observed data) and then try to find the graph maximizing this score. The two main challenges are: 1. Defining a meaningful score, that is maximized by the true underlyinggraphonly,andiseasilycomputableatthesametime. 2. Solving the combinatorial optimization problem of maximizing the score over all possible graphs. Animportantclassofcausalmodelsaredirectedacyclicgraphs(DAGs), wheretherearenocyclicrelationsandnohiddenvariables. DAGs(also knownasBayesiannetworks)encodeconditionalindependenciesinthe jointdistribution,andareonlyidentifiableuptotheirequivalenceclass in general (there is generally more than one DAG encoding the same set of conditional independencies). When the model is restricted to additive noise, the independence of the noise terms can be used to identify DAGs completely, unless the model is linear and Gaussian (in the continuous case). This thesis presents a score-based method for continuous identifiable additive noise models. Specifically, a penalized ix
Description: