Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger Springer Series in Statistics Alho/Spencer: Statistical Demography and Forecasting Andersen/Borgan/Gill/Keiding: Statistical Models Based on Counting Processes Atkinson/Riani: Robust Diagnostic Regression Analysis Atkinson/Riani/Ceriloi: Exploring Multivariate Data with the Forward Search Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition Borg/Groenen: Modern Multidimensional Scaling: Theory and Applications, 2nd edition Brockwell/Davis: Time Series: Theory and Methods, 2nd edition Bucklew: Introduction to Rare Event Simulation Cappé/Moulines/Rydén: Inference in Hidden Markov Models Chan/Tong: Chaos: A Statistical Perspective Chen/Shao/Ibrahim: Monte Carlo Methods in Bayesian Computation Coles: An Introduction to Statistical Modeling of Extreme Values Devroye/Lugosi: Combinatorial Methods in Density Estimation Diggle/Ribeiro: Model-based Geostatistics Dudoit/Van der Laan: Multiple Testing Procedures with Applications to Genomics Efromovich: Nonparametric Curve Estimation: Methods, Theory, and Applications Eggermont/LaRiccia: Maximum Penalized Likelihood Estimation, Volume I: Density Estimation Fahrmeir/Tutz: Multivariate Statistical Modeling Based on Generalized Linear Models, 2nd edition Fan/Yao: Nonlinear Time Series: Nonparametric and Parametric Methods Ferraty/Vieu: Nonparametric Functional Data Analysis: Theory and Practice Ferreira/Lee: Multiscale Modeling: A Bayesian Perspective Fienberg/Hoaglin: Selected Papers of Frederick Mosteller Frühwirth-Schnatter: Finite Mixture and Markov Switching Models Ghosh/Ramamoorthi: Bayesian Nonparametrics Glaz/Naus/Wallenstein: Scan Statistics Good: Permutation Tests: Parametric and Bootstrap Tests of Hypotheses, 3rd edition Gouriéroux: ARCH Models and Financial Applications Gu: Smoothing Spline ANOVA Models Gyöfi /Kohler/Krzyźak/Walk: A Distribution-Free Theory of Nonparametric Regression Haberman: Advanced Statistics, Volume I: Description of Populations Hall: The Bootstrap and Edgeworth Expansion Härdle: Smoothing Techniques: With Implementation in S Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis Hart: Nonparametric Smoothing and Lack-of-Fit Tests Hastie/Tibshirani/Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Hedayat/Sloane/Stufken: Orthogonal Arrays: Theory and Applications Heyde: Quasi-Likelihood and its Application: A General Approach to Optimal Parameter Estimation Huet/Bouvier/Poursat/Jolivet: Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS and R Examples, 2nd edition Ibrahim/Chen/Sinha: Bayesian Survival Analysis Jiang: Linear and Generalized Linear Mixed Models and Their Applications Jolliffe: Principal Component Analysis, 2nd edition Knottnerus: Sample Survey Theory: Some Pythagorean Perspectives Konishi/Kitagawa: Information Criteria and Statistical Modeling (continued after index) Springer Series in Statistics (continued from page ii) K osorok :Introduction to Empirical Processes and Semiparametric Inference Küchler/Sørensen: Exponential Families of Stochastic Processes Kutoyants: Statistical Inference for Ergodic Diffusion Processes Lahiri: Resampling Methods for Dependent Data Lavallée: Indirect Sampling Le/Zidek: Statistical Analysis of Environmental Space-Time Processes Le Cam: Asymptotic Methods in Statistical Decision Theory Le Cam/Yang: Asymptotics in Statistics: Some Basic Concepts, 2nd edition Liese/Miescke: Statistical Decision Theory: Estimation, Testing, Selection Liu: Monte Carlo Strategies in Scientifi c Computing Manski: Partial Identifi cation of Probability Distributions Mielke /Berry: Permutation Methods: A Distance Function Approach, 2nd edition Molenberghs/Verbeke: Models for Discrete Longitudinal Data Mukerjee/Wu: A Modern Theory of Factorial Designs Nelsen: An Introduction to Copulas, 2nd edition Pan/Fang: Growth Curve Models and Statistical Diagnostics Politis/Romano/Wolf: Subsampling Ramsay/Silverman: Applied Functional Data Analysis: Methods and Case Studies Ramsay/Silverman: Functional Data Analysis, 2nd edition Reinsel: Elements of Multivariate Time Series Analysis, 2nd edition Rosenbaum: Observational Studies, 2nd edition Rosenblatt: Gaussian and Non-Gaussian Linear Time Series and Random Fields Särndal/Swensson/Wretman: Model Assisted Survey Sampling Santner/Williams/Notz: The Design and Analysis of Computer Experiments Schervish: Theory of Statistics Shaked/Shanthikumar: Stochastic Orders Shao/Tu: The Jackknife and Bootstrap Simonoff: Smoothing Methods in Statistics Song: Correlated Data Analysis: Modeling, Analytics, and Applications Sprott: Statistical Inference in Science Stein: Interpolation of Spatial Data: Some Theory for Kriging Taniguchi/Kakizawa: Asymptotic Theory for Statistical Inference for Time Series Tanner: Tools for Statistical Inference: Methods for the Exploration of Posterior Distri- butions and Likelihood Functions, 3rd edition Tillé: Sampling Algorithms Tsaitis: Semiparametric Theory and Missing Data van der Laan/Robins: Unifi ed Methods for Censored Longitudinal Data and Causality van der Vaart/Wellner: Weak Convergence and Empirical Processes: With Applications to Statistics Verbeke/Molenberghs: Linear Mixed Models for Longitudinal Data Weerahandi: Exact Statistical Methods for Data Analysis Michael R. Kosorok Introduction to Empirical Processes and Semiparametric Inference Michael R. Kosorok DepartmentofBiostatistics UniversityofNorth Carolina 3101 McGavran-Greenberg Hall Chapel Hill, NC 27599-7420 USA [email protected] ISBN978-0-387-74977-8 e-ISBN978-0-387-74978-5 LibraryofCongressControlNumber:2007940955 (cid:1)c 2008SpringerScience+BusinessMedia,LLC Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permissionofthepublisher(SpringerScience+BusinessMedia,LLC,233SpringStreet,NewYork,NY 10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Useinconnection withanyformofinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdevelopedisforbidden. Theuseinthispublicationoftradenames,trademarks,servicemarks,andsimilarterms,eveniftheyare notidentifiedassuch,isnottobetakenasanexpressionofopinionastowhetherornottheyaresubject toproprietaryrights. Printedonacid-freepaper. 9 8 7 6 5 4 3 2 1 springer.com To My Parents, John and Eleanor My Wife, Pamela My Daughters, Jessica and Jeanette and My Brothers, David, Andy and Matt Preface The goal of this book is to introduce statisticians, and other researchers with a background in mathematical statistics, to empirical processes and semiparametric inference. These powerful research techniques are surpris- ingly useful for studying large sample properties of statistical estimates from realistically complex models as well as for developing new and im- proved approaches to statistical inference. This book is more of a textbook than a research monograph, although a number of new results are presented. The level of the book is more in- troductory than the seminal work of van der Vaart and Wellner (1996). In fact, another purpose of this work is to help readers prepare for the mathematically advanced van der Vaart and Wellner text, as well as for the semiparametric inference work of Bickel, Klaassen, Ritov and Well- ner (1997). These two books, along with Pollard (1990) and Chapters 19 and 25 of van der Vaart (1998), formulate a very complete and successful elucidation of modern empirical process methods. The present book owes much by the way of inspiration, concept, and notation to these previous works.Whatisperhapsnewisthegradual—yetrigorous—andunifiedway this book introduces the reader to the field. The book consists of three parts.The first part is an overviewthat con- cisely covers the basic concepts in both empirical processes and semipara- metric inference, while avoiding many technicalities. The second part is devotedtoempiricalprocesses,whilethethirdpartisdevotedtosemipara- metric efficiency and inference. In each of the last two parts, the chapter following the introductory chapter is devoted to the relevant mathemat- ical concepts and technical background needed for the remainder of the viii Preface part. For example, an overview of metric spaces—which are necessary to the study of weak convergence—is included in Chapter 6. Thus the book is largely self contained. In addition, a chapter devoted to case studies is included at the end of each of the three parts of the book. These case studies explore in detail practical examples that illustrate applications of theoretical concepts. The impetus for this work came from a course the author gave in the Department of Statistics at the University of Wisconsin-Madison, during the Spring semester of 2001. Accordingly, the book is designed to be used as a text in a one- or two-semester sequence in empirical processes and semiparametricinference.Inaone-semestercourse,mostofChapters1–10 and 12–18 can be covered, along with Sections 19.1 and 19.2 and parts of Chapter 22. Parts of Chapters 3 and 4 may need to be skipped or glossed over and other content judiciously omitted in order fit everything in. In a twosemestercourse,onecanspendthefirstsemesterfocusingonempirical processesand the secondsemester focusing moreonsemiparametricmeth- ods. In the first semester, Chapters 1 and 2, Sections 4.1–4.3, Chapters 5–10,12–14and parts of Chapter 15 could be covered,while in the second semester, Chapter 3, Sections 4.4–4.5, the remainder of Chapter 15, and Chapters 16–22could be coveredin some detail. The instructor canutilize those parts of Chapter 11 and elsewhere as deemed appropriate.It is good to pick andchoose whatis coveredwithin everychapterpresented,so that the students are not given too much material to digest. The books can also be used for self-study and can be pursued in a basi- callylinearformat,withthe readeromittingdeeper conceptsthe firsttime through. For some sections, such as with Chapter 3, it is worth skimming through to get an outline of the main ideas first without worrying about verifying the math. In general, this kind of material is learned best when homeworkproblems are attempted. Students should generally have had at least half a year of graduate level probability as well as a year of graduate level mathematical statistics before working through this material. Some of the research components presented in this book were partially supported by National Institutes of Health Grant CA075142. The author thanks the editor, John Kimmel, and numerous anonymous referees who providedmuchneededguidancethroughouttheprocessofwriting.Thanks also go to numerous colleagues and students who provided helpful feed- back and corrections on many levels. A partial list of such individuals includes Moulinath Banerjee, Hongyuan Cao, Guang Cheng, Sang-Hoon Cho, Kai Ding, Jason Fine, Minjung Kwak, Bee Leng Lee, Shuangge Ma, Rajat Mukherjee, Andrea Rotnitzky, Rui Song, Anand Vidyashankar,Jon Wellner, Donglin Zeng, and Songfeng Zheng. Chapel Hill October 2007 Contents Preface vii I Overview 1 1 Introduction 3 2 An Overview of Empirical Processes 9 2.1 The Main Features . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Empirical Process Techniques . . . . . . . . . . . . . . . . . 13 2.2.1 Stochastic Convergence . . . . . . . . . . . . . . . . 13 2.2.2 Entropy for Glivenko-Cantelli and Donsker Theorems 16 2.2.3 Bootstrapping Empirical Processes . . . . . . . . . . 19 2.2.4 The Functional Delta Method . . . . . . . . . . . . . 21 2.2.5 Z-Estimators . . . . . . . . . . . . . . . . . . . . . . 24 2.2.6 M-Estimators . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Other Topics . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 Overview of Semiparametric Inference 35 3.1 Semiparametric Models and Efficiency . . . . . . . . . . . . 35 3.2 Score Functions and Estimating Equations. . . . . . . . . . 39 3.3 Maximum Likelihood Estimation . . . . . . . . . . . . . . . 44