AN INTRODUCTION TO BOOTSTRAP METHODS WITH APPLICATIONS TO R AN INTRODUCTION TO BOOTSTRAP METHODS WITH APPLICATIONS TO R Michael R. Chernick Lankenau Institute for Medical Research, Wynnewood, PA Thomas Jefferson University, Philadelphia, PA Robert A. LaBudde Least Cost Formulations Ltd., Norfolk, VA Old Dominion University, Norfolk, VA A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifi cally disclaim any implied warranties of merchantability or fi tness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profi t or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Chernick, Michael R. An introduction to bootstrap methods with applications to R / Michael R. Chernick, Robert A. LaBudde. p. cm. Includes bibliographical references and index. ISBN 978-0-470-46704-6 (hardback) 1. Bootstrap (Statistics) 2. R (Computer program language) I. LaBudde, Robert A., 1947– II. Title. QA276.8.C478 2011 519.5'4–dc22 2011010972 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1 CONTENTS PREFACE xi ACKNOWLEDGMENTS xv LIST OF TABLES xvii 1 INTRODUCTION 1 1.1 Historical Background 1 1.2 Defi nition and Relationship to the Delta Method and Other Resampling Methods 3 1.2.1 Jackknife 6 1.2.2 Delta Method 7 1.2.3 Cross-Validation 7 1.2.4 Subsampling 8 1.3 Wide Range of Applications 8 1.4 The Bootstrap and the R Language System 10 1.5 Historical Notes 25 1.6 Exercises 26 References 27 2 ESTIMATION 30 2.1 Estimating Bias 30 2.1.1 Bootstrap Adjustment 30 2.1.2 Error Rate Estimation in Discriminant Analysis 32 2.1.3 Simple Example of Linear Discrimination and Bootstrap Error Rate Estimation 42 2.1.4 Patch Data Example 51 2.2 Estimating Location 53 2.2.1 Estimating a Mean 53 2.2.2 Estimating a Median 54 2.3 Estimating Dispersion 54 2.3.1 Estimating an Estimate’s Standard Error 55 2.3.2 Estimating Interquartile Range 56 v vi CONTENTS 2.4 Linear Regression 56 2.4.1 Overview 56 2.4.2 Bootstrapping Residuals 57 2.4.3 Bootstrapping Pairs (Response and Predictor Vector) 58 2.4.4 Heteroscedasticity of Variance: The Wild Bootstrap 58 2.4.5 A Special Class of Linear Regression Models: Multivariable Fractional Polynomials 60 2.5 Nonlinear Regression 60 2.5.1 Examples of Nonlinear Models 61 2.5.2 A Quasi-Optical Experiment 63 2.6 Nonparametric Regression 63 2.6.1 Examples of Nonparametric Regression Models 64 2.6.2 Bootstrap Bagging 66 2.7 Historical Notes 67 2.8 Exercises 69 References 71 3 CONFIDENCE INTERVALS 76 3.1 Subsampling, Typical Value Theorem, and Efron’s Percentile Method 77 3.2 Bootstrap-t 79 3.3 Iterated Bootstrap 83 3.4 Bias-Corrected (BC) Bootstrap 85 3.5 BCa and ABC 85 3.6 Tilted Bootstrap 88 3.7 Variance Estimation with Small Sample Sizes 90 3.8 Historical Notes 94 3.9 Exercises 96 References 98 4 HYPOTHESIS TESTING 101 4.1 Relationship to Confi dence Intervals 103 4.2 Why Test Hypotheses Differently? 105 4.3 Tendril DX Example 106 4.4 Klingenberg Example: Binary Dose–Response 108 4.5 Historical Notes 109 4.6 Exercises 110 References 111 CONTENTS vii 5 TIME SERIES 113 5.1 Forecasting Methods 113 5.2 Time Domain Models 114 5.3 Can Bootstrapping Improve Prediction Intervals? 115 5.4 Model-Based Methods 118 5.4.1 Bootstrapping Stationary Autoregressive Processes 118 5.4.2 Bootstrapping Explosive Autoregressive Processes 123 5.4.3 Bootstrapping Unstable Autoregressive Processes 123 5.4.4 Bootstrapping Stationary ARMA Processes 123 5.5 Block Bootstrapping for Stationary Time Series 123 5.6 Dependent Wild Bootstrap (DWB) 126 5.7 Frequency-Based Approaches for Stationary Time Series 127 5.8 Sieve Bootstrap 128 5.9 Historical Notes 129 5.10 Exercises 131 References 131 6 BOOTSTRAP VARIANTS 136 6.1 Bayesian Bootstrap 137 6.2 Smoothed Bootstrap 138 6.3 Parametric Bootstrap 139 6.4 Double Bootstrap 139 6.5 The m-Out-of-n Bootstrap 140 6.6 The Wild Bootstrap 141 6.7 Historical Notes 141 6.8 Exercises 142 References 142 7 CHAPTER SPECIAL TOPICS 144 7.1 Spatial Data 144 7.1.1 Kriging 144 7.1.2 Asymptotics for Spatial Data 147 7.1.3 Block Bootstrap on Regular Grids 148 7.1.4 Block Bootstrap on Irregular Grids 148 7.2 Subset Selection in Regression 148 7.2.1 Gong’s Logistic Regression Example 149 7.2.2 Gunter’s Qualitative Interaction Example 153 7.3 Determining the Number of Distributions in a Mixture 155 viii CONTENTS 7.4 Censored Data 157 7.5 P-Value Adjustment 158 7.5.1 The Westfall–Young Approach 159 7.5.2 Passive Plus Example 159 7.5.3 Consulting Example 160 7.6 Bioequivalence 162 7.6.1 Individual Bioequivalence 162 7.6.2 Population Bioequivalence 165 7.7 Process Capability Indices 165 7.8 Missing Data 172 7.9 Point Processes 174 7.10 Bootstrap to Detect Outliers 176 7.11 Lattice Variables 177 7.12 Covariate Adjustment of Area Under the Curve Estimates for Receiver Operating Characteristic (ROC) Curves 177 7.13 Bootstrapping in SAS 179 7.14 Historical Notes 182 7.15 Exercises 183 References 185 8 WHEN THE BOOTSTRAP IS INCONSISTENT AND HOW TO REMEDY IT 190 8.1 Too Small of a Sample Size 191 8.2 Distributions with Infi nite Second Moments 191 8.2.1 Introduction 191 8.2.2 Example of Inconsistency 192 8.2.3 Remedies 193 8.3 Estimating Extreme Values 194 8.3.1 Introduction 194 8.3.2 Example of Inconsistency 194 8.3.3 Remedies 194 8.4 Survey Sampling 195 8.4.1 Introduction 195 8.4.2 Example of Inconsistency 195 8.4.3 Remedies 195 8.5 m-Dependent Sequences 196 8.5.1 Introduction 196 8.5.2 Example of Inconsistency When Independence Is Assumed 196 8.5.3 Remedy 197
Description: