ebook img

Resampling Methods: A Practical Guide to Data Analysis PDF

280 Pages·1999·8.642 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Resampling Methods: A Practical Guide to Data Analysis

Phillip I. Good Resampling Methods A Practical Guide to Data Analysis Springer Science+Business Media, LLC Phillip 1. Good 205 West Utica Huntington Beach, CA 92648 USA [email protected] Library of Congress Cataloging-in-Publication Data Good, Phillip 1. Resampling methods : a practical guide to data analysis / Phillip I. Good. p. cm. Includes bibliographical references and index. ISBN 978-1-4757-3051-7 ISBN 978-1-4757-3049-4 (eBook) DOI 10.1007/978-1-4757-3049-4 1. Resampling (Statistics) 1. Title. QA278.8.G66 1999 519.5-dc21 98-26978 CIP AMS Subject Classifications: 62G Printed on acid-free paper. © 1999 Springer Science+Business Media New York Originally published by Birkhăuser Boston in 1999 Softcover reprint of the hardcover 1s t edition 1999 AII rights reserved. This work may not be translated or copied in whole or in part without the written pennission of the publisher Springer Science+Business Media, LLC, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieva!, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the fonner are not espeeia!ly identified, is not to be taken as a sign that sueh names, as understood by the Trade Marks and Merehandise Marks Act, may aeeordingly be used freely by anyone. CARŢTM is a registered trademark of California Statistica! Software, Ine. EXpTM is a registered trademark of Brooks/Cole Publishing Company. SASTM is a registered trademark of SAS Institute Ine. S-PLUSTM is a registered trademark of MathSoft, Ine. Stata™ is a registered trademark of Stata Corporation. StatXaet™ is a registered trademark of Cytel Software. ISBN 978-1-4757-3051-7 Formatted from the author's EXP files. 9 8 7 6 543 2 1 Contents Preface xi 1 Descriptive Statistics 1 1.0. Statistics . . . . 1.1. Reporting Your Results 1.2. Picturing Data ..... 2 1.2.1. Graphs ..... 3 1.2.2. From Observations to Questions 7 1.2.3. Multiple Variables . 7 1.2.4. Contingency Tables 8 1.2.5. Types of Data ... 10 1.3. Measures of Location ... 12 1.3.1. Median and Mode . 12 1.3.2. Arithmetic Mean 12 1.3.3. Geometric Mean . 13 1.4. Measures of Dispersion 13 1.5. Sample versus Population 14 1.5.1. Statistics and Parameters 15 1.5.2. Estimating Population Parameters 16 1.5.3. Precision of an Estimate. 16 1.5.4. Caveats 17 1.6. Summary .... 18 1.7. To Learn More . 18 1.8. Exercises .... 18 1.8.1. Suggestions for Self-Study and Course Review 18 1.8.2. Exercises . ................... 19 VI Contents 2 Cause and Effect 26 2.1. Picturing Relationships 26 2.2. Unpredictable Variation 29 2.2.1. Building a Model 30 2.3. Two Types of Populations 32 2.3.1. Predicting the Unpredictable 33 2.4. Binomial Outcomes ......... 34 2.4.1. Permutations and Combinations 35 2.4.2. Back to the Binomial 36 2.4.3. Probability. . 37 2.5. Independence ........ 38 2.6. Selecting the Sample. . . . . 39 2.6.1. A Young Wo_Man Tasting Herbal Tea 40 2.6.2. Random Sampling and Representative Samples 41 2.7. Summary .... 42 2.8. To Learn More. 43 2.9. Exercises ... 43 3 Testing Hypotheses 46 3.1. Two-Sample Comparison 46 3.1.1. "I Lost the Labels" ... 46 3.2. Five Steps to a Permutation Test. 47 3.2.1. Analyze the Problem .. 47 3.2.2. Choose a Test Statistic 48 3.2.3. Compute the Test Statistic 48 3.2.4. Rearrange the Observations . 49 3.2.5. Draw a Conclusion .... 49 3.3. A Second Example ........ 50 3.3.1. More General Hypotheses. 51 3.4. Comparing Variances 52 3.5. Pitman Correlation. . 53 3.5.1. Effect of Ties 55 3.6. Bivariate Dependence 56 3.7. One-Sample Tests . . 57 3.7.1. The Bootstrap 57 3.7.2. Permutation Test 58 3.7.3. Automating the Process . 60 3.8. Transformations . ........ 60 3.9. Matched Pairs . ......... 60 3.9.1. An Example: Pre-and Post-Treatment Levels 61 3.10. Summary .... 62 3.11. To Learn More . 63 3.12. Exercises .... 63 Contents VII 4 When the Distribution Is Known 67 4.1. Binomial Distribution . . . . . . . . . . . . . . 67 4.1.1. Properties of Independent Observations 68 4.1.2. Testing Hypotheses . . . . . . . 68 4.2. Poisson: Events Rare in Time and Space 69 4.2.1. Applying the Poisson . . 69 4.2.2. Comparing Two Poissons 70 4.2.3. Exponential Distribution 71 4.3. Normal Distribution . . . . 71 4.3.1. Tests for Location . 72 4.3.2. Tests for Scale ... 73 4.4. Distributions........ 74 4.5. Summary and Further Readings. 74 4.6. Exercises ............ . 75 5 Estimation 77 5.1. Point Estimation ........ . 77 5.2. Interval Estimation. . . . . . . . 78 5.2.1. An Untrustworthy Friend 79 5.2.2. Confidence Bounds for a Sample Median 80 5.2.3. A Large-Sample Approximation ..... 81 5.2.4. Confidence Intervals and Rejection Regions 82 5.3. Regression......... 83 5.3.1. Prediction Error . . 85 5.3.2. Correcting for Bias 86 5.4. Which Model? ..... . 87 5.5. Bivariate Correlation. . . 88 5.6. Smoothing the Bootstrap 90 5.7. Bias-Corrected Bootstrap 91 5.8. Iterated Bootstrap 92 5.9. Summary .... 93 5.10. To Learn More . 93 5.11. Exercises. 94 6 Power of a Test 97 6.1. Fundamental Concepts. 98 6.1.1. Two Types of Error 98 6.1.2. Losses....... 99 6.1.3. Significance Level and Power. 100 6.1.4. Exact, Unbiased Tests . . . . . 102 6.1.5. Dollars and Decisions . . . . . 103 6.1.6. What Significance Level Should I Use? 104 6.2. Assumptions.......... 105 6.3. How Powerful Are Our Tests? . 106 6.3.1. One-Sample ..... . 106 viii Contents 6.3.2. Matched Pairs 107 6.3.3. Regression .. 107 6.3.4. Two Samples 107 6.4. Which Test? . . 108 6.5. Summary.... 109 6.6. To Learn More . 109 6.7. Exercises. 109 7 Categorical Data 113 7.1. Fisher's Exact Test. . . . . . . . . . . . 113 7.1.1. One-Tailed and Two-Tailed Tests 115 7.1.2. The Two-Tailed Test 116 7.2. Odds Ratio ....... . 116 7.2.1. Stratified 2 x 2's .. 117 7.3. Exact Significance Levels .. 119 7.4. Unordered r x c Contingency Tables 120 7.5. Ordered Statistical Tables . . . . . . 122 7.5.1. Ordered 2 x c Tables .... 123 7.5.2. More than Two Rows and Two Columns 123 7.6. Summary .... 126 7.7. To Learn More . 126 7.8. Exercises.... 127 8 Experimental Design and Analysis 130 8.1. Noise in the Data ............. . 130 8.1.1. Blocking............... 130 8.1.2. Measuring Factors We Can't Control 133 8.1.3. Randomization 134 8.2. Balanced Designs ...... . 135 8.2.1. Main Effects. . . . . . 136 8.2.2. Testing for Interactions 140 8.3. Designing an Experiment .. . 141 8.3.1. Latin Square ..... . 141 8.4. Another Worked-Through Example. 144 8.5. Determining Sample Size ... . 146 8.6. Unbalanced Designs ...... . 148 8.6.1. Bootstrap to the Rescue . 148 8.6.2. Missing Combinations 149 8.7. Summary .... 151 8.8. To Learn More . 152 8.9. Exercises .... 152 9 Multiple Variables and Multiple Hypotheses 156 9.1. Hotelling's T2 . . . . . . . . . . . . . 156 9.2. Two-Sample Multivariate Comparison 157 Contents IX 9.3. The Generalized Quadratic Form .. . 159 9.3.1. Mantel's U ......... . 159 9.3.2. An Example in Epidemiology. 159 9.3.3. Further Generalization 160 9.3.4. The MRPP Statistic 160 9.4. Multiple Hypotheses. 161 9.5. Summary .... 162 9.6. To Learn More . 162 9.7. Exercises .... 163 10 Classification and Discrimination 164 10.0. Introduction ....... . 164 10.1. Classification. . . . . . . . 165 10.1.1. A Bit of Detection. 166 10.2. Density Estimation. . . . . 166 10.2.1. More Detective Work 169 10.2.2. Bump Hunting. . . . 170 10.2.3. Detective Story: A 3-D Solution 173 10.3. Block Clustering ............ . 173 10.4. CART ................. . 174 10.4.1. Refining the Discrimination Scheme 176 10.5. Other Discrimination Methods .. 177 10.6. Validation . . . . . . . . . . . . . 177 10.6.1. A Multidimensional Model 179 10.7. Cross-Validation ........ . 180 10.8. Summary and Further Readings. 181 10.9. Exercises ......... . 182 11 Survival Analysis and Reliability 187 11.0. Introduction . . . . . 187 11.1. Data Not Censored. . 188 11.1.1. Savage Scores 188 11.2. Censored Data . . 189 11.3. Type I Censoring. 190 11.4. Type II Censoring 191 11.4.1. Bootstrap 191 11.4.2. Comparing Two Survival Curves 192 11.4.3. Hazard Function ..... 194 11.4.4. An Example . . . . . . . 195 11.5. Summary and Further Readings. 196 11.6. Exercises. . . . . . . . . 197 12 Which Statistic Should I Use? 199 12.1. Parametric versus N onparametric 199 12.2. But Is It a Normal Distribution? . 200 x Contents 12.3. Which Hypothesis? 201 12.4. A Guide to Selection. . . . 202 12.4.1. Data in Categories . 202 12.4.2. Discrete Observations. 203 12.4.3. Continuous Data. 203 12.5. Quick Key ........... 205 12.5.1. Categorical Data .... 205 12.5.2. Discrete Observations. 205 12.5.3. Continuous Data. 206 12.6. Classification. . . . . . . . . . 207 1 Program Your Own Resampling Statistics 208 2 C++, SC, and Stata Code for Permutation Tests 214 3 SAS and S-PLUS Code for Bootstraps 228 4 Resampling Software 244 Bibliographical References 247 Index 265 Preface Intended for class use or self-study, this text aspires to introduce statisti cal methodology--estimation, hypothesis testing, and classification-to a wide audience, simply and intuitively, through resampling from the data at hand. The resampling methods-permutations, cross-validation, and the bootstrap- are easy to learn and easy to apply. They require no mathematics beyond introductory high-school algebra, yet are applicable in an exceptionally broad range of subject areas. Introduced in the 1930's, their numerous, albeit straightforward and simple calculations were beyond the capabilities of the primitive calculators then in use; they were soon displaced by less powerful, less accurate approximations that made use of tables. Today, with a powerful computer on every desktop, resampling methods have resumed their dominant role and table lookup is an anacronism. Physicians and physicians in training, nurses and nursing students, business persons, business majors, research workers and students in the biological and social sciences will find here a practical guide to descriptive statistics, classification, estimation, and testing hypotheses. For advanced students in biology, dentistry, medicine, psychology, sociology, and public health, this text can provide a first course in statistics and quantitative reasoning; For industrial statisticians, statistical consultants, and research workers, this text provides an introduction and day-to-day guide to the power, simplicity, and versatility of the bootstrap, cross-validation, and permutation tests. Hopefully, all readers will find my objectives are the same as theirs: To use quantitative methods to characterize, review, report on, test, estimate, and classify findings. If you're just starting to use statistics in your work, begin by reading chapters 1-5 which cover descriptive statistics, cause and effect, sampling, hypothesis test-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.