ebook img

Causal Models and Intelligent Data Management PDF

192 Pages·1999·12.987 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Causal Models and Intelligent Data Management

Causal Models and Intelligent Data Management Springer-Verlag Berlin Heidelberg GmbH Alex Gammerman (Ed.) Causal Models and Intelligent Data Management With 27 Figures and 13 Tables Springer Editor Alex Gammerman Department of Computer Science University of London Royal Holloway, Surrey, TW20 OEX UK E-mail: [email protected] Library of Congress Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Causal models and intelligent data management/Alex Gammerman (ed.). - Berlin; Heidelberg; New York; Barcelona; Budapest; Hong Kong; London; Milan; Paris; Singapore; Tokyo: Springer, 1999 ISBN 978-3-642-63682-0 ISBN 978-3-642-58648-4 (eBook) DOI 10.1007/978-3-642-58648-4 ACM Subject Classification (1998): 1.2, H.3, J .2, F.4.I, El This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifica11y the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. . © Springer-Verlag Berlin Heidelberg 1999 Origina1ly published by Springer-Verlag Berlin Heidelberg New York in 1999 Softcover reprint of the hardcover 1s t edition 1999 The use of general descriptive names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by the editor Cover Design: design + production GmbH, Heidelberg Printed on acid-free paper SPIN 10706454 06/3142SR - 5 4 3 2 1 O Preface This bookdescribesseveralmajordirectionsofresearchinComputer Science: the development of new intelligent data analysis methods and tools, and research in the field ofinference, in particular causal inference. Data analysis and inference have traditionally been the research areas of statistics, but the need to store, manipulate and analyse very large scale high-dimensional data (such as in the human genome project) requires new methods and new tools. It requires the development of new types of database, new efficient algorithms, data structures, etc. - in effect new computational methods. The main aim of this book is to draw attention to several recent devel opments on the boundary between computing and statistics, and to outline the new possibilities they provide for researchers in other disciplines as well as in industry and in commerce. The first part of the book describes new research in the field of causal inference. Recent advances, and in particular the development over the last decade or so of powerful graphical models, provide a quantitative basis for describing causality. The use ofquantitative models opens new and exciting perspectivesfor researchersfromdifferentdisciplinessuchaseconomics,social sciences and epidemiology that frequently require the use of causal models. After many centuries of philosophical discussion but no great advances in this field, there is now an opportunity to use a well-developed language of graphicalmodelsfor the analysisofcausalrelations and inference. Themajor contributors to the topic are Glenn Shafer, Judea Pearl, Phil Dawid and Nancy Cartwright. The reader may find it interesting to see a discussion between the authors on the topic of how much we can rely on the inference methods developed when we use counterfactuals. The second part of the book is concerned with different intelligent tools andtechniquesfor handlingthe "information-rich" environment.Thecontrib utors to this part present and discuss different techniques loosely connected under the general title of intelligent data analysis. For example, a newly developed method, called the Support Vector Ma chine, makes it possible to avoid the "curse ofdimensionality" problem, and therefore to process very high-dimensional data sets. Naturally these types VI Preface ofmethod are vitally important for applications in computer vision, biolog ical databases, data visualisation, etc., that often require the processing of millions or even billions ofattributes. The two parts of the book are of course just a "subset" of the current work in this field but in our view, they fairly reflect the major tendencies, and future directions. The topics Causal Models and Intelligent Data Management were dis cussed at a series of research seminars organised by UNICOM and held in Londonin 1996, 1997and 1998. Given theconsiderable interestexpressed by researchers in the field, it was decided to publish a volume devoted to the current state of the art in these topics. We hope that this book will stimu latefurther explorationofmodern methodsofcausalinferenceand intelligent analysis ofmultidimensional data. Acknowledgments This work was partially supported by EPSRC GR/L35812 and GR/MI5972 grants, and also EU INTAS-93-725-ext grant. We would like to thank the authors who worked hard to meet the dead lines, and our referees who helped to improve the quality ofthe papers, and make the book more accessible to a wider audience. I am also grateful to my brother Misha who inspires me in all my work, and to my wife Sue and our children Yasha, Anyaand Sonia for their patience. February 1999 Alexander Gammerman University ofLondon Table of Contents Part I. Causal Models 1. Statistics, Causality, and Graphs J.Pearl 3 1.1 A Century ofDenial ............................. 3 1.2 Researchers in Search ofa Language. .................... 5 1.3 Graphs as a Mathematical Language. .................... 8 1.4 The Challenge 13 References .... ............................................. 14 2. Causal Conjecture Glenn Shafer ................................................ 17 2.1 Introduction. ......................................... 17 2.2 Variables in a Probability Tree .......................... 18 2.3 Causal Uncorrelatedness. ............................... 19 2.4 Three Positive Causal Relations ......................... 20 2.5 Linear Sign ........................................... 22 2.6 Causal Uncorrelatedness Again. ......................... 26 2.7 Scored Sign. .......................................... 27 2.8 Tracking. ............................................ 28 References 32 3. Who Needs Counterfactuals? A. P. Dawid ................................................. 33 3.1 Introduction. ......................................... 33 3.1.1 Decision-Theoretic Framework. ................... 33 3.1.2 Unresponsiveness and Insensitivity. ................ 34 3.2 Counterfactuals. ...................................... 35 3.3 Problems of Causal Inference. ........................... 36 3.3.1 Causes ofEffects ................................ 36 3.3.2 Effects ofCauses. ............................... 36 VIII Table ofContents 3.4 The Counterfactual Approach. .......................... 37 3.4.1 The Counterfactual Setting. ...................... 37 3.4.2 Counterfactual Asswnptions ...................... 38 3.5 Homogeneous Population. .............................. 39 3.5.1 Experiment and Inference. ....................... 40 3.6 Decision-Analytic Approach. ............................ 43 3.7 Sheep and Goats 45 3.7.1 ACE " .. .. .. .. ... .. .... .... ... 45 3.7.2 Neyman and Fisher. ............................. 45 3.7.3 Bioequivalence. ............. ..................... 46 3.8 Causes ofEffects. ..................................... 47 3.8.1 A Different Approach? .... ....................... 48 3.9 Conclusion... .. .. .. .. .. .. ... 48 References .. ............................................... 49 4. Causality: Independence and Determinism Nancy Cartwright............................................ 51 4.1 Introduction. ......................................... 51 4.2 Conclusion. ........................................... 61 References .... ............................................. 63 Part II. Intelligent Data Management 5. Intelligent Data Analysis and Deep Understanding David J. Hand. .............................................. 67 5.1 Introduction. ......................................... 67 5.2 The Question: The Strategy 68 5.3 Diminishing Returns. ............. .................. ... 74 5.4 Conclusion. ........................................... 78 References ..... 79 6. Learning Algorithms in High Dimensional Spaces A. Gammerman and V. Vovk. ................................. 81 6.1 Introduction. ......................................... 81 6.2 SVM for Pattern Recognition 82 6.2.1 Dual Representation ofPattern Recognition. ....... 83 6.3 SVM for Regression Estimation 84 6.3.1 Dual Representation ofRegression Estimation.. .. ... 84 6.3.2 SVM Applet and Software. ....................... 85 6.4 RidgeRegressionandLeastSquaresMethodsinDualVariables 86 6.5 Transduction. ......................................... 87 6.6 Conclusion. ........................................... 88 References ....................... 88 Table ofContents IX 7. Learning Linear Causal Models by MML Sampling Chris S. Wallace and Kevin B. Korb. ........................... 89 7.1 Introduction. ......................................... 89 7.2 Minimum Message Length Principle. ..................... 90 7.3 The Model Space... .... .... ........ ...... ............. 92 7.4 The Message Format. .................................. 93 7.5 Equivalence Sets. ...................................... 95 7.5.1 Small Effects. ................................... 96 7.5.2 Partial Order Equivalence. ....................... 97 7.5.3 Structural Equivalence. .......................... 97 7.5.4 Explanation Length. ............................. 98 7.6 Finding Good Models. ................................. 98 7.7 Sampling Control , 102 7.8 By-products , 102 7.9 Prior Constraints 102 7.10 Test Results 103 7.11 Remarks on Equivalence 106 7.11.1 Small Effect Equivalence 106 7.11.2 Equivalence and Causality 107 7.12 Conclusion 110 References .......................................... 110 8. Game Theory Approach to Multicommodity Flow Network Vulnerability Analysis Y. E. Malashenko, N. M. Novikova and O. A. Vorobeichikova 112 References .... ............................................. 118 9. On the Accuracy of Stochastic Complexity Approximations Petri Kontkanen, Petri Myllymiiki, Tomi Silander, and Henry Tirri. 120 9.1 Introduction 120 9.2 Stochastic Complexity and Its Applications 122 9.3 ApproximatingtheStochasticComplexityin theIncomplete Data Case ............................................ 124 9.4 Empirical Results 125 9.4.1 The Problem 125 9.4.2 The Experimental Setting 127 9.4.3 The Algorithms 129 9.4.4 Results 130 9.5 Conclusion 132 References 134 X TableofContents 10. AI Modelling for Data Quality Control Xiaohui Liu 137 10.1 Introduction 137 10.2 Statistical Approaches to Outliers 137 10.3 Outlier Detection and Analysis 139 10.4 Visual Field Test 139 10.5 Outlier Detection 141 10.5.1 Self-Organising Maps (SOM) 141 10.5.2 Applications ofSOM 142 10.6 Outlier Analysis by Modelling 'Real Measurements' 143 10.7 Outlier Analysis by Modelling Noisy Data 145 10.7.1 Noise Model I: Noise Definition 145 10.7.2 Noise Model II: Construction " 146 10.7.3 Noise Elimination 147 10.8 Concluding Remarks 147 References 148 11. New Directions in Text Categorization Richard S. Forsyth 151 11.1 Introduction 151 11.2 Machine Learning for Text Classification 153 11.3 Radial Basis Functions and the Bard 156 11.4 An Evolutionary Algorithm for Text Classification 158 11.5 Text Classification by Vocabulary Richness 161 11.6 Text Classification with Frequent Function Words 163 11.7 Do Authors Have Semantic Signatures? 164 11.8 Syntax with Style 166 11.9 Intermezzo 167 11.10 Some Methods ofTextual Feature-Finding 168 11.10.1 ProgressivePairwise Chunking 169 11.10.2 Monte Carlo Feature Finding 170 11.10.3 How Long Is a Piece of Substring?................. 173 11.10.4 Comparative Testing. ............................ 175 11.11 Which Methods Work Best? - A Benchmarking Study 177 11.12 Discussion 180 11.12.1 In PraiseofSemi-Crude Bayesianism 180 11.12.2 What's So Special About Linguistic Data? 180 References 181

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.