ebook img

Robust Diagnostic Regression Analysis PDF

341 Pages·2000·6.199 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robust Diagnostic Regression Analysis

Springer Series in Statistics Advisors: P. Bickel, P. Diggle, s. Fienberg K Krickeberg, 1. Olkin, N. Wermuth, S. Zeger Springer Science+Business Media, LLC Springer Series in Statistics Andersen/Borgan/Gill/Keiding: Statistical Models Based on Counting Processes. Atkinson/Riani: Robust Diagnotstic Regression Analysis. Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition. BolJarine/Zacks: Prediction Theory for Finite Populations. Borg/Groenen: Modem Multidimensional Scaling: Theory and Applications Brockwell/Davis: Time Series: Theory and Methods, 2nd edition. Chen/Shao/Ibrahim: Monte Carlo Methods in Bayesian Computation. Efromovich: Nonparametric Curve Estimation: Methods, Theory, and Applications. Fahrmeir/Tutz: Multivariate Statistical Modelling Based on Generalized Linear Models. Farebrother: Fitting Linear Relationships: A History of the Calculus of Observations 1750-1900. Federer: Statistical Design and Analysis for Intercropping Experiments, Volume I: Two Crops. Federer: Statistical Design and Analysis for Intercropping Experiments, Volume II: Three or More Crops. Fienberg/Hoaglin/Kruskal/Tanur (Eds.): A Statistical Model: Frederick Mosteller's Contributions to Statistics, Science and Public Policy. Fisher/Sen: The Collected Works ofWassily Hoeffding. Good: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edition. Gourieroux: ARCH Models and Financial Applications. Grandell: Aspects of Risk Theory. Haberman: Advanced Statistics, Volume I: Description of Populations. Hall: The Bootstrap and Edgeworth Expansion. Hardie: Smoothing Techniques: With Implementation in S. Hart: Nonparametric Smoothing and Lack-of-Fit Tests. Hartigan: Bayes Theory. Hedayat/Sloane/Stujken: Orthogonal Arrays: Theory and Applications. Heyde: Quasi-Likelihood and its Application: A General Approach to Optimal Parameter Estimation. Huet/Bouvier/Gruet/Jolivet: Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples. Kolen/Brennan: Test Equating: Methods and Practices. Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume I. Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume II. Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume III. Kiichler/Sorensen: Exponential Families of Stochastic Processes. Le Cam: Asymptotic Methods in Statistical Decision Theory. Le Cam/Yang: Asymptotics in Statistics: Some Basic Concepts, 2nd edition. Longford: Models for Uncertainty in Educational Testing. Miller, Jr.: Simultaneous Statistical Inference, 2nd edition. Mosteller/Wallace: Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Parzen/Tanabe/Kitagawa: Selected Papers of Hirotugu Akaike. Politis/Romano/Wolf: Subsampling. (continued after index) Anthony Atkinson Marco Riani Robust Diagnostic Regression Analysis With 192 Illustrations " Springer Anthony Atkinson Marco Riani Department of Statistics Departimento di Economia (Sezione di Statistica) London School of Economics Universita di Parma London WC2A 2AE 43100 Parma UK Italy [email protected] mriani®unipr.it Library of Congress Cataloging-in-Publication Data Atkinson, A.C. (Anthony Curtis) Robust diagnostic regression analysis / Anthony Atkinson, Marco Riani. p. cm.-(Springer texts in statistics) Includes bibliographical references and indexes. ISBN 978-1-4612-7027-0 ISBN 978-1-4612-1160-0 (eBook) DOI 10.1007/978-1-4612-1160-0 1. Regression analysis. 2. Robust statistics. 1. Riani, Marco. II. Title. III. Series. QA278.2.A85 2000 519.5'36-dc21 00-026154 Printed on acid-free paper. © 2000 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 2000 Softcover reprint of the hardcover 15t edition 2000 AII rights reserved. This work may not be translated or copied in whole or in part without the written permission ofthe publisher Springer Science+Business Media, LLC. except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not epecially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may be accordingly used freely by anyone. Production managed by A. Orrantia; manufacturing supervised by Jerome Basma. Electronic copy prepared from the authors' Latex2e files by Bartlett Press, Inc., Marietta, GA. 987 6 5 4 3 2 1 ISBN 978-1-4612-7027-0 Dla Basi a Fabia Preface This book is about using graphs to understand the relationship between a regression model and the data to which it is fitted. Because of the way in which models are fitted, for example, by least squares, we can lose infor mation about the effect of individual observations on inferences about the form and parameters of the model. The methods developed in this book reveal how the fitted regression model depends on individual observations and on groups of observations. Robust procedures can sometimes reveal this structure, but downweight or discard some observations. The novelty in our book is to combine robustness and a "forward" search through the data with regression diagnostics and computer graphics. We provide easily understood plots that use information from the whole sample to display the effect of each observation on a wide variety of aspects of the fitted model. This bald statement of the contents of our book masks the excitement we feel about the methods we have developed based on the forward search. We are continuously amazed, each time we analyze a new set of data, by the amount of information the plots generate and the insights they provide. We believe our book uses comparatively elementary methods to move regression in a completely new and useful direction. We have written the book to be accessible to students and users of statistical methods, as well as for professional statisticians. Because statis tics requires mathematics, computing and data, we give an elementary outline of the theory behind the statistical methods we employ. The pro gramming was done in GAUSS, with graphs for publication prepared in S-Plus. We are now developing S-Plus functions and have set up a web site http://stat.ecan. uuipr. it/riani/ar which includes programs and the viii Preface data. As our work on the forward search grows, we hope that the material on the website will grow in a similar manner. The first chapter of this book contains three examples of the use of the forward search in regression. We show how single and multiple outliers can be identified and their effect on parameter estimates determined. The second chapter gives the theory of regression, including deletion diagnostics, and describes the forward search and its properties. Chapter Three returns to regression and analyzes four further examples. In three of these a better model is obtained if the response is transformed, perhaps by regression with the logarithm of the response, rather than with the response itself. The transformation of a response to normality is the subject of Chapter Four which includes both theory and examples of data analysis. We use this chapter to illustrate the deleterious effect of outliers on methods based on deletion of single observations. Chapter Four ends with an example of transforming both sides of a regression model. This is one example of the nonlinear models that are the subject of Chapter Five. The sixth chapter is concerned with generalized linear models. Our methods are thus extended to the analysis of data from contingency tables and to binary data. The theoretical material is complemented by exercises. We give references to the statistical literature, but believe that our book is reasonably self contained. It should serve as a textbook for courses on applied regression and generalized linear models, even if the emphasis in such courses is not on the forward search. This book is concerned with data in which the observations are inde pendent and in which the response is univariate. A companion volume, coauthored with Andrea Cerioli and tentatively called Robust Diagnostic Data Analysis, is under active preparation. This will cover topics in the analysis of multivariate data including regression, transformations, princi pal components analysis, discriminant analysis, clustering and the analysis of spatial data. The writing of this book, and the research on which it is based, has been both complicated and enriched by the fact that the authors are separated by half of Europe. Our travel has been supported by the Italian Ministry for Scientific Research, by the Staff Research Fund of the London School of Economics and, also at the LSE, by STICERD (The Suntory and Toyota International Centres for Economics and Related Disciplines). The develop ment of S-Plus functions was supported by Doug Martin of MathSoft Inc. Kjell Konis helped greatly with the programming. We are grateful to our numerous colleagues for their help in many ways. In England we especially thank Dr Martin Knott at the London School of Economics, who has been an unfailingly courteous source of help with both statistics and computing. In Italy we thank Professor Sergio Zani of the University of Parma for his insightful comments and continuing support and Dr Aldo Corbellini of the same university who has devoted time, energy and skill to the creation of Preface ix our web site. Luigi Grossi and Fabrizio Laurini read the text with great care and found some mistakes. We would like to be told about any others. Anthony Atkinson's visits to Italy have been enriched by the warm hospi tality of Giuseppina and Luigi Riani. To all our gratitude and thanks. Anthony Atkinson a.c.atkinson©lse.ac.uk www.lse.ac.uk/experts/ Marco Riani mriani©unipr.it stat.econ.unipr.it/riani London and Parma, February 2000 Contents Preface vii 1 Some Regression Examples 1 1.1 Influence and Outliers 1 1.2 Three Examples .... . 2 1.2.1 Forbes' Data .. . 2 1.2.2 Multiple Regression Data. 5 1.2.3 Wool Data ...... . 9 1.3 Checking and Building Models . . 14 2 Regression and the Forward Search 16 2.1 Least Squares ......... . 16 2.1.1 Parameter Estimates .. 16 2.1.2 Residuals and Leverage. 18 2.1.3 Formal Tests. 19 2.2 Added Variables ........ . 20 2.3 Deletion Diagnostics ..... . 22 2.3.1 The Algebra of Deletion 22 2.3.2 Deletion Residuals ... 23 2.3.3 Cook's Distance . . . . . 24 2.4 The Mean Shift Outlier Model. 26 2.5 Simulation Envelopes . . . 27 2.6 The Forward Search .... 28 2.6.1 General Principles. 28 xii Contents 2.6.2 Step 1: Choice of the Initial Subset . . . . . .. . 31 2.6.3 Step 2: Adding Observations During the Forward Search . . . . . . . . . . . . . 32 2.6.4 Step 3: Monitoring the Search 33 2.6.5 Forward Deletion Formulae 34 2.7 Further Reading . 35 2.8 Exercises 36 2.9 Solutions 37 3 Regression 43 3.1 Hawkins' Data . 43 3.2 Stack Loss Data 50 3.3 Salinity Data 62 3.4 Ozone Data 67 3.5 Exercises 73 3.6 Solutions. 74 4 Transformations to Normality 81 4.1 Background 81 4.2 Transformations in Regression 82 4.2.1 Transformation of the Response 82 4.2.2 Graphics for Transformations 86 4.2.3 Transformation of an Explanatory Variable 87 4.3 Wool Data. 88 4.4 Poison Data 95 4.5 Modified Poison Data . 98 4.6 Doubly Modified Poison Data: An Example of Masking 101 4.7 Multiply Modified Poison Data-More Masking 104 4.7.1 A Diagnostic Analysis 104 4.7.2 A Forward Analysis . 106 4.7.3 Other Graphics for Transformations. 108 4.8 Ozone Data 110 4.9 Stack Loss Data . III 4.10 Mussels' Muscles: Transformation of the Response. 116 4.11 Transforming Both Sides of a Model. 121 4.12 Shortleaf Pine 124 4.13 Other Transformations and Further Reading 127 4.14 Exercises 128 4.15 Solutions. . . . . 129 5 Nonlinear Least Squares 136 5.1 Background 137 5.1.1 Nonlinear Models 137 5.1.2 Curvature 141

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.