ebook img

Smoothing Techniques: With Implementation in S PDF

266 Pages·1991·4.955 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Smoothing Techniques: With Implementation in S

Springer Series in Statistics Advisors J. Berger, S. Fienberg, J. Gani, K. Krickeberg, I. Olkin, B. Singer Springer Series in Statistics Andrews/Herzberg: Data: A Collection of Problems from Many Fields for the Student and Research Worker. Anscombe: Computing in Statistical Science through APL. Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition. Bremaud: Point Processes and Queues: Martingale Dynamics. BrockwelljDavis: Time Series: Theory and Methods, 2nd edition. DaleyjVere-Jones: An Introduction to the Theory of Point Processes. Dzhaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series. Fan-el/: Multivariate Calculation. Fienberg/HoaglinjKluskal/I'anur (Eds.): A Statistical Model: Frederick Mosteller's Contributions to Statistics, Science, and Public Policy. GoodmanjKluskal: Measures of Association for Cross Classifications. Hiirdle: Smoothing Techniques: With Implementation in S. Hartigan: Bayes Theory. Heyer: Theory of Statistical Experiments. Jolliffe: Principal Component Analysis. Kres: Statistical Tables for Multivariate Analysis. Leadbetter/Lindgren/Rootzen: Extremes and Related Properties of Random Sequences and Processes. Le Cam: Asymptotic Methods in Statistical Decision Theory. Le Cam/yang: Asymptotics in Statistics: Some Basic Concepts. Manoukian: Modem Concepts and Theorems of Mathematical Statistics. Miller, Jr.: Simultaneous Statistical Inference, 2nd edition. Mosteller/Wallace: Applied Bayesian and Classical Inference: The Case of The Federalist Papers. Pollard: Convergence of Stochastic Processes. Pratt/Gibbons: Concepts of Nonparametric Theory. Read/Cressie: Goodness-of-Fit Statistics for Discrete Multivariate Data. Reiss: Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics. Ross: Nonlinear Estimation. Sachs: Applied Statistics: A Handbook of Techniques, 2nd edition. Seneta: Non-Negative Matrices and Markov Chains. Siegmund: Sequential Analysis: Tests and Confidence Intervals. Tong: The Multivariate Normal Distribution. Vapnik: Estimation of Dependences Based on Empirical Data. West/Hanison: Bayesian Forecasting and Dynamic Models. Wolter: Introduction to Variance Estimation. Yaglom: Correlation Theory of Stationary and Related Random Functions I: Basic Results. Yaglom: Correlation Theory of Stationary and Related Random Functions II: Supplementary Notes and References. Wolfgang HardIe Smoothing Techniques S With Implementation in With 87 Illustrations Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona Wolfgang Hardie Center for Operations Research and Econometrics Universite Catholique de Louvain B-1348 Louvain-La-Neuve Belgium Mathematics Subject Classification (1980): 62-07, 62G05, 62G25 Library of Congress Cataloging-in-Publication Data Hardie, Wolfgang. Smoothing techniques : with implementation in S I Wolfgang Hardie. p. cm. Includes index. l. Smoothing (Statistics) 2. Mathematical statistics-Data processing. I. Title. QA278.H348 1990 90-47545 519.5--dc20 CIP Printed on acid-free paper. © 1991 Springer-Verlag New York Inc. Softcover reprint of the hardcover 1s t edition 1991 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer soft ware, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Photocomposed using the author's TEX files. Printed and bound by R. R. Donnelley and Sons, Harrisonburg, Virginia. 987654321 ISBN-13: 978-1-4612-8768-1 e-ISBN-13: 978-1-4612-4432-5 DOl: 10.1007/978-1-4612-4432-5 For Renate, Nora, Viola, and Adrian Preface The aim of this book is two-pronged. It addresses two different groups of readers. The one is more theoretically oriented, interested in the mathematics of smoothing techniques. The other comprises statisti cians leaning more towards applied aspects of a new methodology. Since with this book I want to address both sides, this duality is also in the title: Smoothing techniques are presented both in theory and in a mod em computing environment on a workstation, namely the new S. I hope to satisfy both groups of interests by an almost parallel development of theory and practice in S. It is my belief that this monograph can be read at different levels. In particular it can be used in undergraduate teaching since most of it is at the introductory level. I wanted to offer a text that provides a nontech nical introduction to the area of nonparametric density and regression function estimation. However, a statistician more trained in parametric statistics will find many bridges between the two apparently different worlds presented here. Students interested in application of the meth ods may employ the included S code for the estimators and techniques shown. In particular, I have put emphasis on the computational aspects of the algorithms. Smoothing in high dimensions confronts the problem of data sparse ness. A principal feature of smoothing, the averaging of data points in a prescribed neighborhood, is not really practicable in dimensions greater than three if we just have 100 data points. Additive models guide a way out of this dilemma but require for their interactiveness and re cursiveness highly effective algorithms. For this purpose, the method of WARPing is described in great detail. WARPing does not mean that the data are warped. Rather, it is an abbreviation of Weighted Averag ing using Rounded Points. This technique is based on discretizing the data first into a finite grid of bins and then smoothing the binned data. The computational effectiveness lies in the fact that now the smooth ing (weighted averaging) is performed on a much smaller number of (rounded) data points. This text has evolved out of several classes that I taught at different universities-Bonn, Dortmund, and Santiago de Compostella. I would like to thank the students at these institutions for their cooperation and criticisms that formed and shaped the essence of the book. In partic ular, I would like to thank Klaus Breuer and Andreas Krause from the vii viii Preface University of Dortmund for their careful typing and preparation of the figures and the Sand C programs. This book would not have been possible without the generous sup port of FriedheIm Eicker. The technique of WARPing was developed jointly with David Scott; his view on computer-assisted smoothing methods helped me in combining theory and S. Discussions with Steve Marron shaped my sometimes opaque opinion about selection of smoothing parameters. I would like to thank these people for their collaboration and the insight they provided into the field of smoothing techniques. Finally I gratefully acknowledge the financial support of the Deut sche Forschungsgemeinschaft (Sonderforschungsbereich 303) and of the Center for Operations Research and Econometrics (CORE). Louvain-La-Neuve, January 1990 Wolfgang HardIe Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vll I. Density Smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1. The Histogram............................................ 3 1.0 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Definitions of the Histogram. . . . . . . . . . . . . . . . . . . . . . . 4 The Histogram as a Frequency Counting Curve. . . . 6 The Histogram as a Maximum Likelihood Estimate 9 Varying the Binwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 Statistics of the Histogram. . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 The Histogram in S ............................... 18 1.4 Smoothing the Histogram by WARPing............ 27 WARPing Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 WARPing in S .................................... 34 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2. Kernel Density Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.0 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.1 Definition of the Kernel Estimate. . . . . . . . . . . . . . . . . . 44 Varying the Kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Varying the Bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2 Kernel Density Estimation in S ................... 49 Direct Algorithm................................... 49 Implementation in S .............................. 50 2.3 Statistics of the Kernel Density. . . . . .. . . . . . . . . . . . . .. 54 Speed of Convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Confidence Intervals and Confidence Bands. . . . . . . . 62 2.4 Approximating Kernel Estimates by WARPing.. . .. 67 2.5 Comparison of Computational Costs. . . . . . . . . . . . . . . 69 2.6 Comparison of Smoothers Between Laboratories. . . 72 Keeping the Kernel Bias the Same. . . . . . . . . . . . . . . . . 72 Keeping the Support of the Kernel the Same. . . . . . . 72 Canonical Kernels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.7 Optimizing the Kernel Density. . . .. . .. . . .. . . . . . . . .. 76 2.8 Kernels of Higher Order ............ '" ............. 78 2.9 Multivariate Kernel Density Estimation. . . . .. . . . . .. 79 Same Bandwidth in Each Component. . . . . . . . . . . . . . 79 ix x Contents Nonequal Bandwidths in Each Component. ....... . 81 A Matrix of Bandwidths ........................... . 83 Exercises .......................................... . 83 3. Further Density Estimators ............................... . 85 3.0 Introduction ....................................... . 85 3.1 Orthogonal Series Estimators ...................... . 85 3.2 Maximum Penalized Likelihood Estimators ....... . 88 Exercises .......................................... . 88 4. Bandwidth Selection in Practice ........................... . 90 4.0 Introduction ....................................... . 90 4.1 Kernel Estimation Using Reference Distributions .. 90 4.2 Plug-In Methods .................................. . 92 4.3 Cross-Validation ................................... . 92 4.3.1 Maximum Likelihood Cross-Validation ........... . 93 Direct Algorithm .................................. . 95 4.3.2 Least-Squares Cross-Validation .................... . 95 Direct Algorithm .................................. . 99 4.3.3 Biased Cross-Validation ........................... . 100 Algorithm ......................................... . 101 4.4 Cross-Validation for WARPing Density Estimation. 103 4.4.1 Maximum Likelihood Cross-Validation ........... . 103 4.4.2 Least-Squares Cross-Validation .................... . 103 Algorithm ......................................... . 104 Implementation in S ............................. . 106 4.4.3 Biased Cross-Validation ........................... . 111 Algorithm ......................................... . 113 Implementation in S ............................. . 114 Exercises .......................................... . 118 II. Regression Smoothing ..................................... . 121 5. Nonparametric Regression ................................ . 123 5.0 Introduction ....................................... . 123 5.1 Kernel Regression Smoothing ..................... . 126 5.1.1 The Nadaraya-Watson Estimator .................. . 126 Direct Algorithm .................................. . 127 Implementation in S ............................. . 128 5.1.2 Statistics of the Nadaraya-Watson Estimator ...... . 132 5.1.3 Confidence Intervals .............................. . 135 5.1.4 Fixed Design Model. .............................. . 136 Contents xi 5.1.5 The WARPing Approximation .................... . 137 Basic Algorithm ................................... . 138 Implementation in S ............................. . 139 5.2 k-Nearest Neighbor (k-NN) ........................ . 143 5.2.1 Definition of the k-NN Estimate .................. . 143 5.2.2 Statistics of the k-NN Estimate .................... . 146 5.3 Spline Smoothing ................................. . 147 Exercises .......................................... . 149 6. Bandwidth Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.0 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.1 Estimates of the Averaged Squared Error.. . . . . . . . . . 152 6.1.0 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 6.1.1 Penalizing Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.1.2 Cross-Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Direct Algorithm................................... 159 6.2 Bandwidth Selection with WARPing............... 160 Penalizing Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Cross-Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Basic Algorithm ................................... ,. 162 Implementation in S .............................. 163 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1 7. Simultaneous Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.1 Golden Section Bootstrap.......................... 173 Algorithm for Golden Section Bootstrapping. . .. . .. 174 Implementation in S .............................. 177 7.2 Construction of Confidence Intervals. . . . . . . . . . . . . . . 186 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Tables. ............ ............................ ...... ...... 199 Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 List of Used S Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Symbols and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Subject Index. .. .. . ... .. . .. . .. .. . . .. .. . .. .. ... . .. ... ... . ... 257

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.