ebook img

Model selection and model averaging PDF

330 Pages·2008·2.523 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Model selection and model averaging

P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 Model Selection and Model Averaging Givenadataset,youcanfitthousandsofmodelsatthepushofabutton,buthowdo youchoosethebest?Withsomanycandidatemodels,overfittingisarealdanger. IsthemonkeywhotypedHamletactuallyagoodwriter? Choosing a suitable model is central to all statistical work with data. Selecting the variables for use in a regression model is one important example. The past two decades have seen rapid advances both in our ability to fit models and in the theoreticalunderstandingofmodelselectionneededtoharnessthisability,yetthis book is the first to provide a synthesis of research from this active field, and it contains much material previously difficult or impossible to find. In addition, it givespracticaladvicetotheresearcherconfrontedwithconflictingresults. Modelchoicecriteriaareexplained,discussedandcompared,includingAkaike’s informationcriterionAIC,theBayesianinformationcriterionBICandthefocused informationcriterionFIC.Importantly,theuncertaintiesinvolvedwithmodelselec- tionareaddressed,withdiscussionsoffrequentistandBayesianmethods.Finally, modelaveragingschemes,whichcombinethestrengthofseveralcandidatemodels, arepresented. Worked examples on real data are complemented by derivations that provide deeper insight into the methodology. Exercises, both theoretical and data-based, guide the reader to familiarity with the methods. All data analyses are compati- ble with open-source R software, and data sets and R code are available from a companionwebsite. Gerda Claeskens is Professor in the OR & Business Statistics and Leuven StatisticsResearchCenterattheCatholicUniversityofLeuven,Belgium. Nils Lid Hjort is Professor of Mathematical Statistics in the Department of MathematicsattheUniversityofOslo,Norway. i P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS EditorialBoard R.Gill(DepartmentofMathematics,UtrechtUniversity) B.D.Ripley(DepartmentofStatistics,UniversityofOxford) S.Ross(DepartmentofIndustrialandSystemsEngineering,UniversityofSouthernCalifornia) B.W.Silverman(St.Peter’sCollege,Oxford) M.Stein(DepartmentofStatistics,UniversityofChicago) Thisseriesofhigh-qualityupper-divisiontextbooksandexpositorymonographscoversallaspectsof stochasticapplicablemathematics.Thetopicsrangefrompureandappliedstatisticstoprobability theory,operationsresearch,optimization,andmathematicalprogramming.Thebookscontainclear presentationsofnewdevelopmentsinthefieldandalsoofthestateoftheartinclassicalmethods. While emphasizing rigorous treatment of theoretical methods, the books also contain applications anddiscussionsofnewtechniquesmadepossiblebyadvancesincomputationalpractice. Alreadypublished 1. BootstrapMethodsandTheirApplication,byA.C.DavisonandD.V.Hinkley 2. MarkovChains,byJ.Norris 3. AsymptoticStatistics,byA.W.vanderVaart 4. WaveletMethodsforTimeSeriesAnalysis,byDonaldB.PercivalandAndrewT.Walden 5. BayesianMethods,byThomasLeonardandJohnS.J.Hsu 6. EmpiricalProcessesinM-Estimation,bySaravandeGeer 7. NumericalMethodsofStatistics,byJohnF.Monahan 8. AUser’sGuidetoMeasureTheoreticProbability,byDavidPollard 9. TheEstimationandTrackingofFrequency,byB.G.QuinnandE.J.Hannan 10. DataAnalysisandGraphicsusingR,byJohnMaindonaldandJohnBraun 11. StatisticalModels,byA.C.Davison 12. SemiparametricRegression,byD.Ruppert,M.P.Wand,R.J.Carroll 13. ExercisesinProbability,byLoicChaumontandMarcYor 14. StatisticalAnalysisofStochasticProcessesinTime,byJ.K.Lindsey 15. MeasureTheoryandFiltering,byLakhdarAggounandRobertElliott 16. EssentialsofStatisticalInference,byG.A.YoungandR.L.Smith 17. ElementsofDistributionTheory,byThomasA.Severini 18. StatisticalMechanicsofDisorderedSystems,byAntonBovier 20. RandomGraphDynamics,byRickDurrett 21. Networks,byPeterWhittle 22. SaddlepointApproximationswithApplications,byRonaldW.Butler 23. AppliedAsymptotics,byA.R.Brazzale,A.C.DavisonandN.Reid 24. RandomNetworksforCommunication,byMassimoFranceschettiandRonaldMeester 25. DesignofComparativeExperiments,byR.A.Bailey ii P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 Model Selection and Model Averaging Gerda Claeskens K.U.Leuven Nils Lid Hjort UniversityofOslo iii P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 cambridgeuniversitypress Cambridge,NewYork,Melbourne,Madrid,CapeTown,Singapore,Sa˜oPaulo,Delhi CambridgeUniversityPress TheEdinburghBuilding,CambridgeCB28RU,UK PublishedintheUnitedStatesofAmericabyCambridgeUniversityPress,NewYork www.cambridge.org Informationonthistitle:www.cambridge.org/9780521852258 (cid:2)C G.ClaeskensandN.L.Hjort2008 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithout thewrittenpermissionofCambridgeUniversityPress. Firstpublished2008 PrintedintheUnitedKingdomattheUniversityPress,Cambridge AcataloguerecordforthispublicationisavailablefromtheBritishLibrary LibraryofCongressCataloguinginPublicationdata ISBN 978-0-521-85225-8hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceor accuracyofURLsforexternalorthird-partyinternetwebsitesreferredto inthispublication,anddoesnotguaranteethatanycontentonsuch websitesis,orwillremain,accurateorappropriate. iv P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 ToMaartenandHanne-Sara –G.C. ToJens,AudunandStefan –N.L.H. v P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 vi P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 Contents Preface pagexi Aguidetonotation xiv 1 Modelselection:dataexamplesandintroduction 1 1.1 Introduction 1 1.2 Egyptianskulldevelopment 3 1.3 Whowrote‘TheQuietDon’? 7 1.4 Survivaldataonprimarybiliarycirrhosis 10 1.5 Lowbirthweightdata 13 1.6 Footballmatchprediction 15 1.7 Speedskating 17 1.8 Previewofthefollowingchapters 19 1.9 Notesontheliterature 20 2 Akaike’sinformationcriterion 22 2.1 Informationcriteriaforbalancingfitwithcomplexity 22 2.2 MaximumlikelihoodandtheKullback–Leiblerdistance 23 2.3 AICandtheKullback–Leiblerdistance 28 2.4 Examplesandillustrations 32 2.5 Takeuchi’smodel-robustinformationcriterion 42 2.6 CorrectedAICforlinearregressionandautoregressivetimeseries 44 2.7 AIC,correctedAICandbootstrap-AICforgeneralised ∗ linearmodels 46 ∗ 2.8 BehaviourofAICformoderatelymisspecifiedmodels 49 2.9 Cross-validation 51 2.10 Outlier-robustmethods 55 2.11 Notesontheliterature 64 Exercises 66 vii P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 viii Contents 3 TheBayesianinformationcriterion 70 3.1 ExamplesandillustrationsoftheBIC 70 3.2 DerivationoftheBIC 78 3.3 Whowrote‘TheQuietDon’? 82 3.4 TheBICandAICforhazardregressionmodels 85 3.5 Thedevianceinformationcriterion 90 3.6 Minimumdescriptionlength 94 3.7 Notesontheliterature 96 Exercises 97 4 Acomparisonofsomeselectionmethods 99 4.1 Comparingselectors:consistency,efficiencyandparsimony 99 4.2 Prototypeexample:choosingbetweentwonormalmodels 102 4.3 StrongconsistencyandtheHannan–Quinncriterion 106 4.4 Mallows’sC anditsoutlier-robustversions 107 p 4.5 Efficiencyofacriterion 108 4.6 EfficientorderselectioninanautoregressiveprocessandtheFPE 110 4.7 Efficientselectionofregressionvariables 111 ∗ 4.8 Ratesofconvergence 112 ∗ 4.9 Takingthebestofbothworlds? 113 4.10 Notesontheliterature 114 Exercises 115 5 Biggerisnotalwaysbetter 117 5.1 Someconcreteexamples 117 5.2 Large-sampleframeworkfortheproblem 119 5.3 Aprecisetolerancelimit 124 5.4 Toleranceregionsaroundparametricmodels 126 5.5 Computingtolerancethresholdsandradii 128 5.6 Howthe5000-mtimeinfluencesthe10,000-mtime 130 5.7 Large-samplecalculusforAIC 137 5.8 Notesontheliterature 140 Exercises 140 6 Thefocussedinformationcriterion 145 6.1 Estimatorsandnotationinsubmodels 145 6.2 Thefocussedinformationcriterion,FIC 146 6.3 Limitdistributionsandmeansquarederrorsinsubmodels 148 6.4 Abias-modifiedFIC 150 6.5 CalculationoftheFIC 153 6.6 Illustrationsandapplications 154 ∗ 6.7 Exactmeansquarederrorcalculationsforlinearregression 172 P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 Contents ix 6.8 TheFICforCoxproportionalhazardregressionmodels 174 6.9 Average-FIC 179 ∗ 6.10 ABayesianfocussedinformationcriterion 183 6.11 Notesontheliterature 188 Exercises 189 7 FrequentistandBayesianmodelaveraging 192 7.1 Estimators-post-selection 192 7.2 SmoothAIC,smoothBICandsmoothFICweights 193 7.3 Distributionofmodelaverageestimators 195 7.4 Whatgoeswrongwhenweignoremodelselection? 199 7.5 Betterconfidenceintervals 206 7.6 Shrinkage,ridgeestimationandthresholding 211 7.7 Bayesianmodelaveraging 216 ∗ 7.8 AfrequentistviewofBayesianmodelaveraging 220 ∗ 7.9 Bayesianmodelselectionwithcanonicalnormalpriors 222 7.10 Notesontheliterature 223 Exercises 224 8 Lack-of-fitandgoodness-of-fittests 227 8.1 Theprincipleoforderselection 227 8.2 Asymptoticdistributionoftheorderselectiontest 229 ∗ 8.3 Theprobabilityofoverfitting 232 8.4 Score-basedtests 236 8.5 Twoormorecovariates 238 8.6 Neyman’ssmoothtestsandgeneralisations 240 ∗ 8.7 AcomparisonbetweenAICandtheBICformodeltesting 242 ∗ 8.8 Goodness-of-fitmonitoringprocessesforregressionmodels 243 8.9 Notesontheliterature 245 Exercises 246 9 Modelselectionandaveragingschemesinaction 248 9.1 AICandBICselectionforEgyptianskulldevelopmentdata 248 9.2 Lowbirthweightdata:FICplotsandFICselectionperstratum 252 9.3 SurvivaldataonPBC:FICplotsandFICselection 256 9.4 Speedskatingdata:averagingovercovariancestructuremodels 258 Exercises 266 10 Furthertopics 269 10.1 Modelselectioninmixedmodels 269 10.2 Boundaryparameters 273 ∗ 10.3 Finite-samplecorrections 281 P1:SFK/UKS P2:SFK/UKS QC:SFK/UKS T1:SFK CUUK244-Claeskens 978-0-521-85225-8 January13,2008 8:41 x Contents 10.4 Modelselectionwithmissingdata 282 10.5 When p andq growwithn 284 10.6 Notesontheliterature 285 Overviewofdataexamples 287 References 293 Authorindex 306 Subjectindex 310

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.