ebook img

Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health PDF

151 Pages·2008·8.32 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health

Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni Parmigiani Use R! Albert: Bayesian Computation with R Bivand/Pebesma/Go´mez-Rubio: Applied Spatial Data Analysis with R Claude:Morphometrics with R Cook/Swayne: Interactive andDynamic Graphics for Data Analysis: With R and GGobi Hahne/Huber/Gentleman/Falcon:Bioconductor Case Studies Nason: Wavelet Methods in Statistics with R Paradis: Analysis of Phylogenetics and Evolution with R Peng/Dominici: Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health Pfaff: Analysis ofIntegrated and Cointegrated Time Series with R, 2nd edition Sarkar: Lattice: MultivariateData Visualization with R Spector:Data Manipulation with R Roger D. Peng · Francesca Dominici Statistical Methods for Environmental Epidemiology with R A Case Study in Air Pollution and Health ABC Roger D. Peng Francesca Dominici Johns Ho pkins Bloomberg School of Public Health 615 N. Wolfe St. Johns Hopkins University Baltimore MD 21205-2179 USA [email protected] [email protected] Series Editors: Robert Gentleman Kurt Hornik Program in Computational Biology Department of Statistik and Mathematik Division ofPublic Health Sciences Wirtschaftsuniversität Wien Augasse 2-6 Fred Hutchinson Cancer Research Center A-1090 Wien 1100 Fairview Avenue,N. M2-B876 Austria Seattle,Washington98109 USA Giovanni Parmigiani The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University 550 North Broadway Baltimore,MD21205-2011 USA Library of Congress Control Number: 2008928295 ISBN 978-0-387-78166-2 e-ISBN 978-0-387-78167-9 DOI:10.1007/978-0-387-78167-9 ©2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com Preface As an area of statistical application, environmental epidemiology and more specifically, the estimation of health risk associated with the exposure to en- vironmental agents, has led to the development of several statistical methods and software that can then be applied to other scientific areas. The statis- tical analyses aimed at addressing questions in environmental epidemiology have the following characteristics. Often the signal-to-noise ratio in the data islowandthetargetsofinferenceareinherentlysmallrisks.Theseconstraints typically lead to the development and use of more sophisticated (and poten- tially less transparent) statistical models and the integration of large high- dimensional databases. New technologies and the widespread availability of powerful computing are also adding to the complexities of scientific investi- gation by allowing researchers to fit large numbers of models and search over many sets of variables. As the number of variables measured increases, so do the degrees of freedom for influencing the association between a risk factor and an outcome of interest. Wehavewrittenthisbook,inpart,todescribeourexperiencesdeveloping and applying statistical methods for the estimation for air pollution health effects. Our experience has convinced us that the application of modern sta- tistical methodology in a reproducible manner can bring to bear substan- tial benefits to policy-makers and scientists in this area. We believe that the methodsdescribedinthisbookareapplicabletootherareasofenvironmental epidemiology, particularly those areas involving spatial–temporal exposures. In this book, we use the National Morbidity, Mortality, and Air Pollution Study(NMMAPS)andMedicareAirPollutionStudy(MCAPS)datasetsand describetheRpackagesforaccessingthedata.Chapters4,5,6,and7describe the features of the data, the statistical concepts involved, and many of the methods used to analyze the data. Chapter 8 then shows how to bring all of the methods together to conduct a multi-site analysis of seasonally varying effects of PM on mortality. 10 A principal goal of this book is to disseminate R software and promote reproducible research in epidemiological studies and statistical research. As VI Preface a case study we use data and methods relevant to investigating the health effectsofambientairpollution.Researchingthehealtheffectsofairpollution presents an excellent example of the critical need for reproducible research because it involves all of the features already mentioned above: inherently small risks, significant policy implications, sophisticated statistical methodol- ogy, and very large databases linked from multiple sources. The complexity of the analyses involved and the policy relevance of the targets of inference demand transparency and reproducibility. Throughout the book, we show how R can be used to make analyses re- producible and to structure the analytic process in a modular fashion. We find R to be a very natural tool for achieving this goal. In particular, for the productionofthisbook,wehavemadeuseofthetoolsdescribedinChapter3. All of the data described in the book are provided in the NMMAPSlite and MCAPS R packages that can be downloaded from CRAN.1 We have developedRpackagesforimplementingthestatisticalmethodologyaswellas for handling the databases. Packages that are not available from CRAN can be downloaded from the book’s website.2 We would like to express our deepest appreciation to the many collab- orators and students who have worked with us on various projects, short courses, and workshops that we have developed over the years. In particu- lar, Aidan McDermott, Scott Zeger, Luu Pham, Jon Samet, Tom Louis, Leah Welty, Michelle Bell, and Sandy Eckel were all central to the development of the software, databases, exercises, and analyses presented in this book. Several anonymous reviewers provided helpful comments that improved the presentation of the material in the book. In addition, we would like to thank Duncan Thomas for many useful suggestions regarding an early draft of the manuscript. Finally, this work was supported in part by grant ES012054-03 from the National Institute of Environmental Health Sciences. Baltimore, Maryland, Roger Peng April 2008 Francesca Dominici 1 http://cran.r-project.org/ 2 http://www.biostat.jhsph.edu/˜rpeng/useRbook/ Contents Preface ........................................................ V 1 Studies of Air Pollution and Health........................ 1 1.1 Introduction ............................................ 1 1.2 Time Series Studies...................................... 2 1.3 Case-Crossover Studies................................... 2 1.4 Panel Studies ........................................... 3 1.5 Cohort Studies.......................................... 4 1.6 Design Comparisons ..................................... 5 2 Introduction to R and Air Pollution and Health Data ..... 7 2.1 Starting Up R .......................................... 7 2.2 The National Morbidity, Mortality, and Air Pollution Study .. 9 2.3 Organization of the NMMAPSlite Package................ 9 2.3.1 Reading city-specific data .......................... 10 2.3.2 Pollutant data detrending .......................... 11 2.3.3 Mortality age categories............................ 12 2.3.4 Metadata......................................... 13 2.3.5 Configuration options .............................. 14 2.4 MCAPS Data........................................... 14 3 Reproducible Research Tools .............................. 19 3.1 Introduction ............................................ 19 3.2 Distributing Reproducible Research........................ 20 3.3 Getting Started ......................................... 21 3.4 Exploring a Cached Analysis.............................. 22 3.5 Verifying a Cached Analysis .............................. 25 3.6 Caching a Statistical Analysis............................. 28 3.7 Distributing a Cached Analysis............................ 29 3.8 Summary............................................... 30 VIII Contents 4 Statistical Issues in Estimating the Health Effects of Spatial–Temporal Environmental Exposures............. 31 4.1 Introduction ............................................ 31 4.2 Time-Varying Environmental Exposures.................... 32 4.3 Estimation Versus Prediction ............................. 33 4.4 Semiparametric Models .................................. 35 4.4.1 Overdispersion .................................... 36 4.4.2 Representations for f .............................. 36 4.4.3 Estimation of β ................................... 37 4.4.4 Choosing the degrees of freedom for f................ 38 4.5 Combining Information and Hierarchical Models............. 39 5 Exploratory Data Analyses ................................ 41 5.1 Introduction ............................................ 41 5.2 Exploring the Data: Basic Features and Properties........... 41 5.2.1 Pollutant data .................................... 41 5.2.2 Mortality data .................................... 46 5.3 Exploratory Statistical Analysis ........................... 50 5.3.1 Timescale decompositions .......................... 50 5.3.2 Example: Timescale decompositions of PM 10 and mortality ..................................... 51 5.3.3 Correlation at different timescales: A look at the Chicago data ............................... 53 5.3.4 Looking at more detailed timescales ................. 57 5.4 Exploring the Potential for Confounding Bias ............... 60 5.5 Summary............................................... 65 5.6 Reproducibility Package.................................. 65 5.7 Problems............................................... 65 6 Statistical Models ......................................... 69 6.1 Introduction ............................................ 69 6.2 Models for Air Pollution and Health ....................... 69 6.3 Semiparametric Models .................................. 71 6.3.1 GAMs in R....................................... 73 6.4 Pollutants: The Exposure of Interest ....................... 73 6.4.1 Single versus distributed lag ........................ 74 6.4.2 Mortality displacement............................. 77 6.5 Modeling Measured Confounders .......................... 77 6.6 Accounting for Unmeasured Confounders ................... 82 6.6.1 Using GAMs for air pollution and health ............. 84 6.6.2 Computing standard errors for parametric terms in GAMs ................................... 88 6.6.3 Choosing degrees of freedom from the data ........... 88 6.6.4 Example: Semiparametric model for Detroit .......... 90 6.6.5 Smoothers........................................ 92 Contents IX 6.7 Multisite Studies: Putting It All Together .................. 93 6.8 Summary............................................... 93 6.9 Reproducibility Package.................................. 95 6.10 Problems............................................... 95 7 Pooling Risks Across Locations and Quantifying Spatial Heterogeneity ............................................. 99 7.1 Hierarchical Models for Multisite Time Series Studies of Air Pollution and Health............................... 99 7.1.1 Two-stage hierarchical model .......................102 7.1.2 Three-stage hierarchical model ......................104 7.1.3 Spatial correlation model...........................107 7.1.4 Sensitivity analyses to the adjustment for confounders .110 7.2 Example: Examining Sensitivity to Prior Distributions .......112 7.3 Reproducibility Package..................................114 7.4 Problems...............................................114 8 A Reproducible Seasonal Analysis of Particulate Matter and Mortality in the United States ........................117 8.1 Introduction ............................................117 8.2 Methods ...............................................121 8.2.1 Combining information across cities..................123 8.3 Results.................................................123 8.3.1 Sensitivity analyses................................127 8.4 Comments..............................................130 8.5 Reproducibility Package..................................131 References.....................................................133 Index..........................................................143

Description:
Advances in statistical methodology and computing have played an important role in allowing researchers to more accurately assess the health effects of ambient air pollution. The methods and software developed in this area are applicable to a wide array of problems in environmental epidemiology. Thi
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.