ebook img

Computer Intensive Methods in Statistics PDF

227 Pages·2020·15.758 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computer Intensive Methods in Statistics

Computer Intensive Methods in Statistics Computer Intensive Methods in Statistics Silvelyn Zwanzig Uppsala University Behrang Mahjani Icahn School of Medicine at Mount Sinai CRCPress Taylor&FrancisGroup 6000BrokenSoundParkwayNW,Suite300 BocaRaton,FL33487-2742 (cid:13)c 2020byTaylor&FrancisGroup,LLC CRCPressisanimprintofTaylor&FrancisGroup,anInformabusiness NoclaimtooriginalU.S.Governmentworks Printedonacid-freepaper InternationalStandardBookNumber-13:978-0-367-19423-9(Paperback) 978-0-367-19425-3(Hardback) Thisbookcontainsinformationobtainedfromauthenticandhighlyregardedsources.Rea- sonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the conse- quences of their use. The authors and publishers have attempted to trace the copyright holdersofallmaterialreproducedinthispublicationandapologizetocopyrightholdersif permissiontopublishinthisformhasnotbeenobtained.Ifanycopyrightmaterialhasnot beenacknowledgedpleasewriteandletusknowsowemayrectifyinanyfuturereprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means,nowknownorhereafterinvented,includingphotocopying,microfilming,andrecord- ing,orinanyinformationstorageorretrievalsystem,withoutwrittenpermissionfromthe publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/)orcontacttheCopyrightClearanceCen- ter, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not- for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system ofpaymenthasbeenarranged. Trademark Notice:Productorcorporatenamesmaybetrademarksorregisteredtrade- marks,andareusedonlyforidentificationandexplanationwithoutintenttoinfringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface ix Introduction xi 1 Random Variable Generation 1 1.1 Basic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Congruential Generators . . . . . . . . . . . . . . . . . 5 1.1.2 The KISS Generator . . . . . . . . . . . . . . . . . . . 8 1.1.3 Beyond Uniform Distributions . . . . . . . . . . . . . 9 1.2 Transformation Methods . . . . . . . . . . . . . . . . . . . . 11 1.3 Accept-Reject Methods . . . . . . . . . . . . . . . . . . . . . 16 1.3.1 Envelope Accept-Reject Methods . . . . . . . . . . . . 20 1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Monte Carlo Methods 25 2.1 Independent Monte Carlo Methods . . . . . . . . . . . . . . 26 2.1.1 Importance Sampling . . . . . . . . . . . . . . . . . . 30 2.1.2 The Rule of Thumb for Importance Sampling . . . . . 32 2.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . 35 2.2.1 Metropolis-Hastings Algorithm . . . . . . . . . . . . . 38 2.2.2 Special MCMC Algorithms . . . . . . . . . . . . . . . 41 2.2.3 Adaptive MCMC . . . . . . . . . . . . . . . . . . . . . 46 2.2.4 Perfect Simulation . . . . . . . . . . . . . . . . . . . . 47 2.2.5 The Gibbs Sampler. . . . . . . . . . . . . . . . . . . . 48 2.3 Approximate Bayesian Computation Methods . . . . . . . . 52 2.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3 Bootstrap 61 3.1 General Principle . . . . . . . . . . . . . . . . . . . . . . . . 61 3.1.1 Unified Bootstrap Framework . . . . . . . . . . . . . 63 3.1.2 Bootstrap and Monte Carlo . . . . . . . . . . . . . . . 68 3.1.3 Conditional and Unconditional Distribution . . . . . . 70 3.2 Basic Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.2.1 Plug-in Principle . . . . . . . . . . . . . . . . . . . . . 73 3.2.2 Why is Bootstrap Good? . . . . . . . . . . . . . . . . 74 3.2.3 Example where Bootstrap Fails . . . . . . . . . . . . . 75 3.3 Bootstrap Confidence Sets . . . . . . . . . . . . . . . . . . . 75 v vi Contents 3.3.1 The Pivotal Method . . . . . . . . . . . . . . . . . . . 76 3.3.2 Bootstrap Pivotal Methods . . . . . . . . . . . . . . . 78 3.3.2.1 Percentile Bootstrap Confidence Interval . . 79 3.3.2.2 Basic Bootstrap Confidence Interval . . . . . 79 3.3.2.3 Studentized Bootstrap Confidence Interval . 79 3.3.3 Transformed Bootstrap Confidence Intervals. . . . . . 82 3.3.4 Prepivoting Confidence Set . . . . . . . . . . . . . . . 83 3.3.5 BC -Confidence Interval . . . . . . . . . . . . . . . . . 84 a 3.4 Bootstrap Hypothesis Tests . . . . . . . . . . . . . . . . . . . 86 3.4.1 Parametric Bootstrap Hypothesis Test . . . . . . . . . 87 3.4.2 Nonparametric Bootstrap Hypothesis Test . . . . . . . 88 3.4.3 Advanced Bootstrap Hypothesis Tests . . . . . . . . . 90 3.5 Bootstrap in Regression . . . . . . . . . . . . . . . . . . . . . 91 3.5.1 Model-Based Bootstrap . . . . . . . . . . . . . . . . . 91 3.5.2 Parametric Bootstrap Regression . . . . . . . . . . . . 93 3.5.3 Casewise Bootstrap in Correlation Model . . . . . . . 94 3.6 Bootstrap for Time Series . . . . . . . . . . . . . . . . . . . . 97 3.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4 Simulation-Based Methods 105 4.1 EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2 SIMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.3 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . 123 4.3.1 F-Backward and F-Forward Procedures . . . . . . . . 124 4.3.2 FSR-Forward Procedure . . . . . . . . . . . . . . . . . 130 4.3.3 SimSel . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5 Density Estimation 141 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.3 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . 145 5.3.1 Statistical Properties . . . . . . . . . . . . . . . . . . . 148 5.3.2 Bandwidth Selection in Practice . . . . . . . . . . . . 154 5.4 Nearest Neighbor Estimator . . . . . . . . . . . . . . . . . . 157 5.5 Orthogonal Series Estimator . . . . . . . . . . . . . . . . . . 157 5.6 Minimax Convergence Rate . . . . . . . . . . . . . . . . . . . 158 5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6 Nonparametric Regression 163 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.2 Kernel Regression Smoothing . . . . . . . . . . . . . . . . . . 166 6.3 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.4 Classes of Restricted Estimators . . . . . . . . . . . . . . . . 173 6.4.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . 175 Contents vii 6.4.2 Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.5 Spline Estimators . . . . . . . . . . . . . . . . . . . . . . . . 181 6.5.1 Base Splines. . . . . . . . . . . . . . . . . . . . . . . . 182 6.5.2 Smoothing Splines . . . . . . . . . . . . . . . . . . . . 187 6.6 Wavelet Estimators . . . . . . . . . . . . . . . . . . . . . . . 193 6.6.1 Wavelet Base . . . . . . . . . . . . . . . . . . . . . . . 193 6.6.2 Wavelet Smoothing . . . . . . . . . . . . . . . . . . . . 196 6.7 Choosing the Smoothing Parameter . . . . . . . . . . . . . . 199 6.8 Bootstrap in Regression . . . . . . . . . . . . . . . . . . . . . 200 6.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 References 207 Index 211 Preface This textbook arose from the lecture notes of a graduate course on computer intensivestatisticsofferedintheDepartmentofMathematicsatUppsalaUni- versity in Sweden. The first version of the script was written in 2001 when the course was first introduced. Since then, the course has been taught continuously, and the script has been updated and developed over the past 18 years. The course contains 50 lectures, each about 45 minutes long. Several interactive R codes are presented during the lectures. This course requires that students submit four assignments in teams. These assignments may involve solving problems through writing R codes. At the end of the course, the students will write short presentations on a particular research question. As a part of the task, the students will identify an interesting data set to analyze with the methods taught during the course. This is the part of the course that I like the most! Since its inception in 2001, students from a variety of fields have partici- patedinthiscourse.Manyhavebackgroundsinmathematicsormathematical statistics, but the course has also seen PhD students from other departments (i.e., biology, IT, physics, and pharmaceutics, etc.). PhD students frequently relate their final presentations to their theses. In 2011, Behrang Mahjani, then a PhD student, came to me and offered collaboration in teaching this course. In the following year, he took over the introductiontoR,therandomnumbergenerationandMonteCarlomethods. In 2016, after earning his PhD, Behrang offered to assist me in finalizing this book.Today,wesharethetitleof“author”,andforthis,Iamthankful.While Behrang and I are the primary authors of this book, we are also joined by a collaborator. This book contains many cartoons, illustrating data material or the main ideas underlying the presented mathematical procedures. The sig- nature on these illustrations, AH, stands for Annina Heinrich, my daughter, a biologist. Based on my explanations and very rough sketches, she drew the cartoons. (My personal favorite pictures are Figures 4.1 and 4.2 about the embedding principle, and Figure 6.2 about smoothing principle). Last but not least, I would like to thank all of the students who attended this course and gave us feedback at different levels. I am most delighted and proud to hear that my course has been useful when I meet the students long after the course. Silvelyn Zwanzig Uppsala, June 2019 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.