ebook img

Model selection and model averaging PDF

332 Pages·2010·2.615 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Model selection and model averaging

This page intentionally left blank Model Selection and Model Averaging Givenadataset,youcanfitthousandsofmodelsatthepushofabutton,buthowdo youchoosethebest?Withsomanycandidatemodels,overfittingisarealdanger. IsthemonkeywhotypedHamletactuallyagoodwriter? Choosing a suitable model is central to all statistical work with data. Selecting the variables for use in a regression model is one important example. The past two decades have seen rapid advances both in our ability to fit models and in the theoreticalunderstandingofmodelselectionneededtoharnessthisability,yetthis book is the first to provide a synthesis of research from this active field, and it contains much material previously difficult or impossible to find. In addition, it givespracticaladvicetotheresearcherconfrontedwithconflictingresults. Modelchoicecriteriaareexplained,discussedandcompared,includingAkaike’s informationcriterionAIC,theBayesianinformationcriterionBICandthefocused informationcriterionFIC.Importantly,theuncertaintiesinvolvedwithmodelselec- tionareaddressed,withdiscussionsoffrequentistandBayesianmethods.Finally, modelaveragingschemes,whichcombinethestrengthsofseveralcandidatemod- els,arepresented. Worked examples on real data are complemented by derivations that provide deeper insight into the methodology. Exercises, both theoretical and data-based, guide the reader to familiarity with the methods. All data analyses are compati- ble with open-source R software, and data sets and R code are available from a companionwebsite. Gerda Claeskens is Professor in the OR & Business Statistics and Leuven StatisticsResearchCenterattheKatholiekeUniversiteitLeuven,Belgium. Nils Lid Hjort is Professor of Mathematical Statistics in the Department of MathematicsattheUniversityofOslo,Norway. CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS EditorialBoard R.Gill(DepartmentofMathematics,UtrechtUniversity) B.D.Ripley(DepartmentofStatistics,UniversityofOxford) S.Ross(DepartmentofIndustrialandSystemsEngineering,UniversityofSouthernCalifornia) B.W.Silverman(St.Peter’sCollege,Oxford) M.Stein(DepartmentofStatistics,UniversityofChicago) Thisseriesofhigh-qualityupper-divisiontextbooksandexpositorymonographscoversallaspectsof stochasticapplicablemathematics.Thetopicsrangefrompureandappliedstatisticstoprobability theory,operationsresearch,optimization,andmathematicalprogramming.Thebookscontainclear presentationsofnewdevelopmentsinthefieldandalsoofthestateoftheartinclassicalmethods. While emphasizing rigorous treatment of theoretical methods, the books also contain applications anddiscussionsofnewtechniquesmadepossiblebyadvancesincomputationalpractice. Alreadypublished 1. BootstrapMethodsandTheirApplication,byA.C.DavisonandD.V.Hinkley 2. MarkovChains,byJ.Norris 3. AsymptoticStatistics,byA.W.vanderVaart 4. WaveletMethodsforTimeSeriesAnalysis,byDonaldB.PercivalandAndrewT.Walden 5. BayesianMethods,byThomasLeonardandJohnS.J.Hsu 6. EmpiricalProcessesinM-Estimation,bySaravandeGeer 7. NumericalMethodsofStatistics,byJohnF.Monahan 8. AUser’sGuidetoMeasureTheoreticProbability,byDavidPollard 9. TheEstimationandTrackingofFrequency,byB.G.QuinnandE.J.Hannan 10. DataAnalysisandGraphicsusingR,byJohnMaindonaldandJohnBraun 11. StatisticalModels,byA.C.Davison 12. SemiparametricRegression,byD.Ruppert,M.P.Wand,R.J.Carroll 13. ExercisesinProbability,byLoicChaumontandMarcYor 14. StatisticalAnalysisofStochasticProcessesinTime,byJ.K.Lindsey 15. MeasureTheoryandFiltering,byLakhdarAggounandRobertElliott 16. EssentialsofStatisticalInference,byG.A.YoungandR.L.Smith 17. ElementsofDistributionTheory,byThomasA.Severini 18. StatisticalMechanicsofDisorderedSystems,byAntonBovier 19. TheCoordinate-FreeApproachtoLinearModels,byMichaelJ.Wichura 20. RandomGraphDynamics,byRickDurrett 21. Networks,byPeterWhittle 22. SaddlepointApproximationswithApplications,byRonaldW.Butler 23. AppliedAsymptotics,byA.R.Brazzale,A.C.DavisonandN.Reid 24. RandomNetworksforCommunication,byMassimoFranceschettiandRonaldMeester 25. DesignofComparativeExperiments,byR.A.Bailey Model Selection and Model Averaging Gerda Claeskens K.U.Leuven Nils Lid Hjort UniversityofOslo CAMBRIDGEUNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521852258 © G. Claeskens and N. L. Hjort 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008 ISBN-13 978-0-511-42410-6 eBook (NetLibrary) ISBN-13 978-0-521-85225-8 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. ToMaartenandHanne-Sara –G.C. ToJens,AudunandStefan –N.L.H. Contents Preface pagexi Aguidetonotation xiv 1 Modelselection:dataexamplesandintroduction 1 1.1 Introduction 1 1.2 Egyptianskulldevelopment 3 1.3 Who wrote ‘The Quiet Don’? 7 1.4 Survivaldataonprimarybiliarycirrhosis 10 1.5 Lowbirthweightdata 13 1.6 Footballmatchprediction 15 1.7 Speedskating 17 1.8 Previewofthefollowingchapters 19 1.9 Notesontheliterature 20 2 Akaike’sinformationcriterion 22 2.1 Informationcriteriaforbalancingfitwithcomplexity 22 2.2 MaximumlikelihoodandtheKullback–Leiblerdistance 23 2.3 AICandtheKullback–Leiblerdistance 28 2.4 Examplesandillustrations 32 2.5 Takeuchi’smodel-robustinformationcriterion 43 2.6 CorrectedAICforlinearregressionandautoregressivetimeseries 44 2.7 AIC,correctedAICandbootstrap-AICforgeneralised ∗ linearmodels 46 ∗ 2.8 BehaviourofAICformoderatelymisspecifiedmodels 49 2.9 Cross-validation 51 2.10 Outlier-robustmethods 55 2.11 Notesontheliterature 64 Exercises 66 vii viii Contents 3 TheBayesianinformationcriterion 70 3.1 ExamplesandillustrationsoftheBIC 70 3.2 DerivationoftheBIC 78 3.3 Whowrote‘TheQuietDon’? 82 3.4 TheBICandAICforhazardregressionmodels 85 3.5 Thedevianceinformationcriterion 90 3.6 Minimumdescriptionlength 94 3.7 Notesontheliterature 96 Exercises 97 4 Acomparisonofsomeselectionmethods 99 4.1 Comparingselectors:consistency,efficiencyandparsimony 99 4.2 Prototypeexample:choosingbetweentwonormalmodels 102 4.3 StrongconsistencyandtheHannan–Quinncriterion 106 4.4 Mallows’sC anditsoutlier-robustversions 107 p 4.5 Efficiencyofacriterion 108 4.6 EfficientorderselectioninanautoregressiveprocessandtheFPE 110 4.7 Efficientselectionofregressionvariables 111 ∗ 4.8 Ratesofconvergence 112 ∗ 4.9 Takingthebestofbothworlds? 113 4.10 Notesontheliterature 114 Exercises 115 5 Biggerisnotalwaysbetter 117 5.1 Someconcreteexamples 117 5.2 Large-sampleframeworkfortheproblem 119 5.3 Aprecisetolerancelimit 124 5.4 Toleranceregionsaroundparametricmodels 126 5.5 Computingtolerancethresholdsandradii 128 5.6 Howthe5000-mtimeinfluencesthe10,000-mtime 130 5.7 Large-samplecalculusforAIC 137 5.8 Notesontheliterature 140 Exercises 140 6 Thefocussedinformationcriterion 145 6.1 Estimatorsandnotationinsubmodels 145 6.2 Thefocussedinformationcriterion,FIC 146 6.3 Limitdistributionsandmeansquarederrorsinsubmodels 148 6.4 Abias-modifiedFIC 150 6.5 CalculationoftheFIC 153 6.6 Illustrationsandapplications 154 ∗ 6.7 Exactmeansquarederrorcalculationsforlinearregression 172

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.