Texts in Statistical Science Generalized Additive Models An Introduction with R J CHAPMAN & HALL/CRC Texts in Statistical Science Series Series Editors Bradley P. Carlin, University of Minnesota, USA Chris Chatfield, University of Bath, UK Martin Tanner, Northwestern University, USA Jim Zidek, University of British Columbia, Canada Analysis of Failure and Survival Data Epidemiology — Study Design and Peter J. Smith Data Analysis, Second Edition M. Woodward The Analysis and Interpretation of Multivariate Data for Social Scientists Essential Statistics, Fourth Edition David J. Bartholomew, Fiona Steele, D.A.G. Rees Irini Moustaki, and Jane Galbraith Extending the Linear Model with R: The Analysis of Time Series— Generalized Linear, Mixed Effects and An Introduction, Sixth Edition Nonparametric Regression Models Chris Chatfield Julian J. Faraway Applied Bayesian Forecasting and Time Series A First Course in Linear Model Theory Analysis Nalini Ravishanker and Dipak K. Dey A. Pole, M. West and J. Harrison Generalized Additive Models: Applied Nonparametric Statistical Methods, An Introduction with R Third Edition Simon N. Wood P. Sprent and N.C. Smeeton Interpreting Data — A First Course Applied Statistics — Handbook of GENSTAT in Statistics Analysis A.J.B. Anderson E.J. Snell and H. Simpson An Introduction to Generalized Applied Statistics — Principles and Examples Linear Models, Second Edition D.R. Cox and E.J. Snell A.J. Dobson Bayes and Empirical Bayes Methods for Data Introduction to Multivariate Analysis Analysis, Second Edition C. Chatfield and A.J. Collins Bradley P. Carlin and Thomas A. Louis Introduction to Optimization Methods and Their Applications in Statistics Bayesian Data Analysis, Second Edition B.S. Everitt Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin Large Sample Methods in Statistics P.K. Sen and J. da Motta Singer Beyond ANOVA — Basics of Applied Statistics Linear Models with R R.G. Miller, Jr. Julian J. Faraway Computer-Aided Multivariate Analysis, Markov Chain Monte Carlo — Stochastic Fourth Edition Simulation for Bayesian Inference A.A. Afifi and V.A. Clark D. Gamerman A Course in Categorical Data Analysis Mathematical Statistics T. Leonard K. Knight A Course in Large Sample Theory Modeling and Analysis of Stochastic Systems T.S. Ferguson V. Kulkarni Data Driven Statistical Methods Modelling Binary Data, Second Edition P. Sprent D. Collett Decision Analysis — A Bayesian Approach Modelling Survival Data in Medical Research, J.Q. Smith Second Edition Elementary Applications of Probability D. Collett Theory, Second Edition Multivariate Analysis of Variance and H.C. Tuckwell Repeated Measures — A Practical Approach Elements of Simulation for Behavioural Scientists B.J.T. Morgan D.J. Hand and C.C. Taylor Practical Data Analysis for Designed Statistical Methods for SPC and TQM Experiments D. Bissell B.S. Yandell Statistical Methods in Agriculture and Practical Longitudinal Data Analysis Experimental Biology, Second Edition D.J. Hand and M. Crowder R. Mead, R.N. Curnow, and A.M. Hasted Practical Statistics for Medical Research Statistical Process Control — Theory D.G. Altman and Practice, Third Edition G.B. Wetherill and D.W. Brown Probability — Methods and Measurement A. O’Hagan Statistical Theory, Fourth Edition B.W. Lindgren Problem Solving — A Statistician’s Guide, Second Edition Statistics for Accountants C. Chatfield S. Letchford Randomization, Bootstrap and Statistics for Epidemiology Monte Carlo Methods in Biology, Nicholas P. Jewell Second Edition Statistics for Technology — A Course in B.F.J. Manly Applied Statistics, Third Edition Readings in Decision Analysis C. Chatfield S. French Statistics in Engineering — Sampling Methodologies with Applications A Practical Approach Poduri S.R.S. Rao A.V. Metcalfe Statistical Analysis of Reliability Data Statistics in Research and Development, M.J. Crowder, A.C. Kimber, Second Edition T.J. Sweeting, and R.L. Smith R. Caulcutt Statistical Methods for Spatial Data Analysis Survival Analysis Using S—Analysis of Oliver Schabenberger and Carol A. Gotway Time-to-Event Data Mara Tableman and Jong Sung Kim The Theory of Linear Models B. Jørgensen CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2006 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 2011928 International Standard Book Number-13: 978-1-4200-1040-4 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy- right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users. For organizations that have been granted a pho- tocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood J Boca Raton London New York CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2006 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20110713 International Standard Book Number-13: 978-1-4200-1040-4 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy- right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users. For organizations that have been granted a pho- tocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface xv 1 LinearModels 1 1.1 Asimplelinearmodel 2 Simpleleastsquaresestimation 3 1.1.1 Samplingpropertiesofβˆ 3 1.1.2 Sohowoldistheuniverse? 5 1.1.3 Addingadistributionalassumption 7 Testinghypothesesaboutβ 7 Confidenceintervals 9 1.2 Linearmodelsingeneral 10 1.3 Thetheoryoflinearmodels 12 1.3.1 Leastsquaresestimationofβ 12 1.3.2 Thedistributionofβˆ 13 1.3.3 (βˆ β )/σˆ t 14 i− i βˆi ∼ n−p 1.3.4 F-ratioresults 15 1.3.5 Theinfluencematrix 16 1.3.6 Theresiduals,(cid:15)ˆ,andfittedvalues,µˆ 16 1.3.7 ResultsintermsofX 17 1.3.8 The Gauss Markov Theorem: What’s special about least squares? 17 1.4 Thegeometryoflinearmodelling 18 1.4.1 Leastsquares 19 1.4.2 Fittingbyorthogonaldecompositions 20 vii viii CONTENTS 1.4.3 Comparisonofnestedmodels 21 1.5 Practicallinearmodelling 22 1.5.1 Modelfittingandmodelchecking 23 1.5.2 Modelsummary 28 1.5.3 Modelselection 30 1.5.4 Anothermodelselectionexample 31 Afollow-up 35 1.5.5 Confidenceintervals 36 1.5.6 Prediction 36 1.6 Practicalmodellingwithfactors 37 1.6.1 Identifiability 38 1.6.2 Multiplefactors 39 1.6.3 ‘Interactions’offactors 40 1.6.4 UsingfactorvariablesinR 41 1.7 GenerallinearmodelspecificationinR 44 1.8 Furtherlinearmodellingtheory 45 1.8.1 ConstraintsI:Generallinearconstraints 46 1.8.2 ConstraintsII:‘Contrasts’andfactorvariables 46 1.8.3 Likelihood 48 1.8.4 Non-independentdatawithvariablevariance 49 1.8.5 AICandMallow’sstatistic 51 1.8.6 Non-linearleastsquares 53 1.8.7 Furtherreading 55 1.9 Exercises 55 2 GeneralizedLinearModels 59 2.1 ThetheoryofGLMs 60 2.1.1 Theexponentialfamilyofdistributions 62 2.1.2 Fittinggeneralizedlinearmodels 63 2.1.3 The IRLS objective is a quadratic approximation to the log-likelihood 66 CONTENTS ix 2.1.4 AICforGLMs 68 2.1.5 Largesampledistributionofβˆ 69 2.1.6 Comparingmodelsbyhypothesistesting 69 Deviance 70 Modelcomparisonwithunknownφ 71 2.1.7 φˆandPearson’sstatistic 71 2.1.8 Canonicallinkfunctions 72 2.1.9 Residuals 73 Pearsonresiduals 73 Devianceresiduals 73 2.1.10 Quasi-likelihood 74 2.2 GeometryofGLMs 76 2.2.1 ThegeometryofIRLS 77 2.2.2 GeometryandIRLSconvergence 78 2.3 GLMswithR 81 2.3.1 Binomialmodelsandheartdisease 81 2.3.2 APoissonregressionepidemicmodel 87 2.3.3 Log-linearmodelsforcategoricaldata 93 2.3.4 SoleeggsintheBristolchannel 97 2.4 Likelihood 102 2.4.1 Invariance 102 2.4.2 Propertiesoftheexpectedlog-likelihood 103 2.4.3 Consistency 106 2.4.4 Largesampledistributionofθˆ 107 2.4.5 Thegeneralizedlikelihoodratiotest(GLRT) 108 2.4.6 Derivationof2λ χ2underH 109 ∼ r 0 2.4.7 AICingeneral 111 2.4.8 Quasi-likelihoodresults 113 2.5 Exercises 115
Description: