Springer Series in Statistics Advisors: P.Bickel,P.Diggle,S.Fienberg,U.Gather, I.Olkin,S.Zeger Forfurthervolumes: http://www.springer.com/series/692 · P.P.B. Eggermont V.N. LaRiccia Maximum Penalized Likelihood Estimation Volume II: Regression 123 P.P.B.Eggermont DepartmentofFoodandResourceEconomics UniversityofDelaware Newark,DE19716 USA [email protected] V.N.LaRiccia DepartmentofFoodandResourceEconomics UniversityofDelaware Newark,DE19716 USA [email protected] ISBN978-0-387-40267-3 e-ISBN978-0-387-68902-9 DOI10.1007/b12285 SpringerDordrechtHeidelbergLondonNewYork LibraryofCongressControlNumber:2001020450 (cid:2)c SpringerScience+BusinessMedia,LLC2009 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein connection with any form of information storage and retrieval, electronic adaptation, computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not theyaresubjecttoproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To Jeanne and Tyler To Cindy Preface Thisisthesecondvolumeofatextonthetheoryandpracticeofmaximum penalizedlikelihoodestimation. Itisintendedforgraduatestudentsinsta- tistics,operationsresearch,andappliedmathematics,aswellasresearchers and practitioners in the field. The present volume was supposed to have a shortchapteronnonparametricregressionbutwasintendedtodealmainly with inverse problems. However, the chapter on nonparametric regression kept growing to the point where it is now the only topic covered. Perhaps there will be a Volume III. It might even deal with inverse problems. But for now we are happy to have finished Volume II. The emphasis in this volume is on smoothing splines of arbitrary order, but other estimators (kernels, local and global polynomials) pass review as well. We study smoothing splines and local polynomials in the context of reproducing kernel Hilbert spaces. The connection between smoothing splines and reproducing kernels is of course well-known. The new twist is thatlettingtheinnerproductdependonthesmoothingparameteropensup new possibilities: It leads to asymptotically equivalent reproducing kernel estimators (without qualifications) and thence, via uniform error bounds for kernel estimators, to uniform error bounds for smoothing splines and, viastrongapproximations,toconfidencebandsfortheunknownregression function. ItcameassomewhatofasurprisethatreproducingkernelHilbert space ideas also proved useful in the study of local polynomial estimators. Throughoutthetext,thereproducingkernelHilbertspaceapproachisused asan“elementary”alternativetomethodsofmetricentropy. Itreachesits limitswithleast-absolute-deviationssplines,whereitstillworks,andtotal- variation penalization of nonparametric least-squares problems, where we miss the optimal convergence rate by a power of logn (for sample size n). The reason for studying smoothing splines of arbitrary order is that one wants to use them for data analysis. The first question then is whether one can actually compute them. In practice, the usual scheme based on spline interpolation is useful for cubic smoothing splines only. For splines of arbitrary order, the Kalman filter is the bee’s knees. This, in fact, is the traditional meeting ground between smoothing splines and repro- ducing kernel Hilbert spaces, by way of the identification of the standard viii Preface smoothingproblemforGaussianprocesseshavingcontinuoussamplepaths with “generalized” smoothing spline estimation in nonparametric regres- sion problems. We give a detailed account, culminating in the Kalman filter algorithm for spline smoothing. The second question is how well smoothing splines of arbitrary order work. We discuss simulation results for smoothing splines and local and global polynomials for a variety of test problems. (We avoided the usual pathological examplesbutdidincludesomenonsmoothexamplesbasedon the Cantor function.) We also show some results on confidence bands for the unknown regression function based on undersmoothed quintic smooth- ing splines with remarkably good coverage probabilities. Acknowledgments When we wrote the preface for Volume I, we had barely moved to our new department, Food and Resource Economics, in the College of Agriculture andNaturalResources. Havingspentthelastnineyearshere,thefollowing assessment is time-tested: Even without the fringe benefits of easy park- ing, seeing the U.S. Olympic skating team practice, and enjoying (the first author, anyway) the smell of cows in the morning, we would be fortunate tobeinournewsurroundings. Wethankthechairofthedepartment,Tom Ilvento, forhisinspiringsupportoftheStatisticsProgram, andthefaculty and staff of FREC for their hospitality. As with any intellectual endeavor, we were influenced by many people and we thank them all. However, six individuals must be explicitly men- tioned: first of all, Zuhair Nashed, who keeps our interest in inverse prob- lems alive; Paul Deheuvels, for his continuing interest and encouragement in our project; Luc Devroye, despite the fact that this volume hardly men- tions density estimation; David Mason, whose influence on critical parts of the manuscript speaks for itself; Randy Eubank, for his enthusiastic sup- port of our project and subtly getting us to study the extremely effective Kalmanfilter;and,finally,JohnKimmel,editorextraordinaire,forhiscon- tinued reminders that we repeatedly promised him we would be done by next Christmas. This time around, we almost made that deadline. Newark, Delaware Paul Eggermont and Vince LaRiccia January 14, 2009 Contents Preface vii Notations, Acronyms and Conventions xvii 12. Nonparametric Regression 1. What and why? 1 2. Maximum penalized likelihood estimation 7 3. Measuring the accuracy and convergence rates 16 4. Smoothing splines and reproducing kernels 20 5. The local error in local polynomial estimation 26 6. Computation and the Bayesian view of splines 28 7. Smoothing parameter selection 35 8. Strong approximation and confidence bands 43 9. Additional notes and comments 48 13. Smoothing Splines 1. Introduction 49 2. Reproducing kernel Hilbert spaces 52 3. Existence and uniqueness of the smoothing spline 59 4. Mean integrated squared error 64 5. Boundary corrections 68 6. Relaxed boundary splines 72 7. Existence, uniqueness, and rates 83 8. Partially linear models 87 9. Estimating derivatives 95 10. Additional notes and comments 96 14. Kernel Estimators 1. Introduction 99 2. Mean integrated squared error 101 3. Boundary kernels 105 4. Asymptotic boundary behavior 110 5. Uniform error bounds for kernel estimators 114 6. Random designs and smoothing parameters 126 7. Uniform error bounds for smoothing splines 132 8. Additional notes and comments 143 x Contents 15. Sieves 1. Introduction 145 2. Polynomials 148 3. Estimating derivatives 153 4. Trigonometric polynomials 155 5. Natural splines 161 6. Piecewise polynomials and locally adaptive designs 163 7. Additional notes and comments 167 16. Local Polynomial Estimators 1. Introduction 169 2. Pointwise versus local error 173 3. Decoupling the two sources of randomness 176 4. The local bias and variance after decoupling 181 5. Expected pointwise and global error bounds 183 6. The asymptotic behavior of the error 184 7. Refined asymptotic behavior of the bias 190 8. Uniform error bounds for local polynomials 195 9. Estimating derivatives 197 10. Nadaraya-Watson estimators 198 11. Additional notes and comments 202 17. Other Nonparametric Regression Problems 1. Introduction 205 2. Functions of bounded variation 208 3. Total-variation roughness penalization 216 4. Least-absolute-deviations splines: Generalities 221 5. Least-absolute-deviations splines: Error bounds 227 6. Reproducing kernel Hilbert space tricks 231 7. Heteroscedastic errors and binary regression 232 8. Additional notes and comments 236 18. Smoothing Parameter Selection 1. Notions of optimality 239 2. Mallows’ estimator and zero-trace estimators 244 3. Leave-one-out estimators and cross-validation 248 4. Coordinate-free cross-validation (GCV) 251 5. Derivatives and smooth estimation 256 6. Akaike’s optimality criterion 260 7. Heterogeneity 265 8. Local polynomials 270 9. Pointwise versus local error, again 275 10. Additional notes and comments 280 19. Computing Nonparametric Estimators 1. Introduction 285 2. Cubic splines 285 3. Cubic smoothing splines 291 Contents xi 4. Relaxed boundary cubic splines 294 5. Higher-order smoothing splines 298 6. Other spline estimators 306 7. Active constraint set methods 313 8. Polynomials and local polynomials 319 9. Additional notes and comments 323 20. Kalman Filtering for Spline Smoothing 1. And now, something completely different 325 2. A simple example 333 3. Stochastic processes and reproducing kernels 338 4. Autoregressive models 350 5. State-space models 352 6. Kalman filtering for state-space models 355 7. Cholesky factorization via the Kalman filter 359 8. Diffuse initial states 363 9. Spline smoothing with the Kalman filter 366 10. Notes and comments 370 21. Equivalent Kernels for Smoothing Splines 1. Random designs 373 2. The reproducing kernels 380 3. Reproducing kernel density estimation 384 4. L2 error bounds 386 5. Equivalent kernels and uniform error bounds 388 6. The reproducing kernels are convolution-like 393 7. Convolution-like operators on Lp spaces 401 8. Boundary behavior and interior equivalence 409 9. The equivalent Nadaraya-Watson estimator 414 10. Additional notes and comments 421 22. Strong Approximation and Confidence Bands 1. Introduction 425 2. Normal approximation of iid noise 429 3. Confidence bands for smoothing splines 434 4. Normal approximation in the general case 437 5. Asymptoticdistributiontheoryforuniformdesigns 446 6. Proofs of the various steps 452 7. Asymptotic 100% confidence bands 464 8. Additional notes and comments 468 23. Nonparametric Regression in Action 1. Introduction 471 2. Smoothing splines 475 3. Local polynomials 485 4. Smoothing splines versus local polynomials 495 5. Confidence bands 499 6. The Wood Thrush Data Set 510 xii Contents 7. The Wastewater Data Set 518 8. Additional notes and comments 527 Appendices 4. Bernstein’s inequality 529 5. The TVDUAL implementation 533 6. Solutions to Some Critical Exercises 1. Solutions to Chapter 13: Smoothing Splines 539 2. Solutions to Chapter 14: Kernel Estimators 540 3. Solutions to Chapter 17: Other Estimators 541 4. Solutions to Chapter 18: Smoothing Parameters 542 5. Solutions to Chapter 19: Computing 542 6. Solutions to Chapter 20: Kalman Filtering 543 7. Solutions to Chapter 21: Equivalent Kernels 546 References 549 Author Index 563 Subject Index 569