ebook img

Linear Regression Models: Applications in R PDF

437 Pages·2021·3.671 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linear Regression Models: Applications in R

Linear Regression Models Statistics in the Social and Behavioral Sciences Series Series Editors Jeff Gill, Steven Heeringa, Wim J. van der Linden, Tom Snijders Recently Published Titles Multivariate Analysis in the Behavioral Sciences, Second Edition Kimmo Vehkalahti and Brian S. Everitt Analysis of Integrated Data Li-Chun Zhang and Raymond L. Chambers Multilevel Modeling Using R, Second Edition W. Holmes Finch, Joselyn E. Bolin, and Ken Kelley Modelling Spatial and Spatial-Temporal Data: A Bayesian Approach Robert Haining and Guangquan Li Handbook of Automated Scoring: Theory into Practice Duanli Yan, André A. Rupp, and Peter W. Foltz Interviewer Effects from a Total Survey Error Perspective Kristen Olson, Jolene D. Smyth, Jennifer Dykema, Allyson Holbrook, Frauke Kreuter, and Brady T. West Measurement Models for Psychological Attributes Klaas Sijtsma and Andries van der Ark Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane Understanding Elections through Statistics: Polling, Prediction, and Testing Ole J. Forsberg Analyzing Spatial Models of Choice and Judgment, Second Edition David A. Armstrong II, Ryan Bakker, Royce Carroll, Christopher Hare, Keith T. Poole and Howard Rosenthal Introduction to R for Social Scientists: A Tidy Programming Approach Ryan Kennedy and Philip Waggoner Linear Regression Models: Applications in R John P. Hoffmann Mixed-Mode Surveys: Design and Analysis Jan van den Brakel, Bart Buelens, Madelon Cremers, Annemieke Luiten, Vivian Meertens, Barry Schouten and Rachel Vis-Visschers For more information about this series, please visit: https :/ /ww w .rou tledg e .com / Chap man-- HallC RC -St atist ics -i n -the -Soci al -an d -Beh avior al -Sc ience s /boo k -ser ies /C HSTSO BESCI Linear Regression Models Applications in R John P. Hoffmann First edition published 2022 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN © 2022 John P. Hoffmann CRC Press is an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and pub- lisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www .copyright .com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions @tandf .co .uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 9780367753689 (hbk) ISBN: 9780367753665 (pbk) ISBN: 9781003162230 (ebk) Typeset in Palatino by Deanta Global Publishing Services, Chennai, India Contents Preface ......................................................................................................................ix Acknowledgments ..............................................................................................xiii Author Biography ..................................................................................................xv 1 Introduction .....................................................................................................1 Our Doubts are Traitors and Make Us Lose the Good We Oft Might Win .........................................................................................................2 Best Statistical Practices ..................................................................................3 Statistical Software ..........................................................................................4 2 Review of Elementary Statistical Concepts ..............................................7 Measures of Central Tendency.......................................................................9 Measures of Dispersion ................................................................................14 Samples and Populations ..............................................................................16 Sampling Error and Standard Errors ..........................................................17 Significance Tests ...........................................................................................19 Unbiasedness and Efficiency .......................................................................25 The Standard Normal Distribution and Z-Scores.....................................26 Covariance and Correlation .........................................................................28 Comparing Means from Two Groups .........................................................30 Examples Using R ..........................................................................................33 Chapter Summary .........................................................................................35 Chapter Exercises ...........................................................................................35 3 Simple Linear Regression Models ............................................................37 Assumptions of Simple LRMs .....................................................................42 An Example of an LRM Using R .................................................................44 Formulas for the Slope Coefficient and Intercept .....................................51 Hypothesis Tests for the Slope Coefficient .................................................53 Chapter Summary .........................................................................................61 Chapter Exercises ...........................................................................................62 4 Multiple Linear Regression Models .........................................................65 An Example of a Multiple LRM ...................................................................66 Comparing Slope Coefficients......................................................................74 Assumptions of Multiple LRMs ..................................................................80 Some Important Characteristics of Multiple LRMs ..................................84 Chapter Summary .........................................................................................85 Chapter Exercises ...........................................................................................86 v vi Contents 5 The ANOVA Table and Goodness-of-Fit Statistics ...............................89 Another Example of a Multiple LRM ..........................................................98 Chapter Summary .......................................................................................101 Chapter Exercises .........................................................................................102 6 Comparing Linear Regression Models ..................................................105 The Partial F-Test and Multiple Partial F-Test ..........................................106 Evaluating Model Fit with Information Criterion Measures ................111 Confounding Variables ...............................................................................112 Chapter Summary .......................................................................................113 Chapter Exercises .........................................................................................114 7 Indicator Variables in Linear Regression Models ...............................115 Indicator Variables in Multiple LRMs .......................................................121 LRMs with Indicator and Continuous Explanatory Variables ..............124 Chapter Summary .......................................................................................133 Chapter Exercises .........................................................................................133 8 Independence ..............................................................................................137 Determining Dependence ..........................................................................139 Example of Adjustment for Clustering .....................................................141 LRM with No Adjustment for Clustering ...........................................142 LRM That Adjusts for Clustering .........................................................143 Serial Correlation .........................................................................................144 Linear Regression Model .......................................................................146 Solutions for Serial Correlation..................................................................149 Linear Regression Model (OLS) ............................................................150 Prais–Winsten Regression Model .........................................................151 Generalized Estimating Equations for Longitudinal Data ....................152 Linear Regression Model (OLS) ............................................................153 General Estimating Equation (GEE) Model with AR(1) Pattern .......154 General Estimating Equation (GEE) Model with Unstructured Pattern .......................................................................................................155 Spatial Autocorrelation ...............................................................................158 Chapter Summary .......................................................................................161 Chapter Exercises .........................................................................................162 9 Homoscedasticity ........................................................................................165 Assessing Homoscedasticity in Multiple LRMs .....................................169 What to Do About Heteroscedasticity ......................................................176 Chapter Summary .......................................................................................183 Chapter Exercises .........................................................................................184 Contents vii 10 Collinearity and Multicollinearity .........................................................187 Multicollinearity ..........................................................................................192 How to Detect Collinearity and Multicollinearity ..................................193 What to Do About Collinearity and Multicollinearity ...........................196 Chapter Summary .......................................................................................198 Chapter Exercises .........................................................................................199 11 Normality, Linearity, and Interaction Effects .......................................201 Are the Errors of Prediction Normally Distributed? ..............................202 Nonlinearities ...............................................................................................209 Testing for Nonlinearities in LRMs...........................................................212 Incorporating Nonlinear Associations in LRMs .....................................215 Interaction Effects ........................................................................................220 Interaction Effects with Continuous Explanatory Variables .................228 Classification and Regression Trees (CART)............................................233 A Cautionary Note about Interaction Effects ..........................................236 Chapter Summary .......................................................................................237 Chapter Exercises .........................................................................................238 12 Model Specification ....................................................................................241 Variable Selection .........................................................................................242 Overfitting—or the Case of Irrelevant Variables ....................................243 Underfitting—or the Case of the Absent Variables ................................244 Endogeneity Bias ..........................................................................................250 Selection Bias ................................................................................................252 How Do We Assess Specification Error and What Do We Do about It? ............................................................................................254 What to Do about Selection Bias? ..............................................................257 Cross-Validation ...........................................................................................262 Variable Selection Procedures ....................................................................269 Chapter Summary .......................................................................................272 Chapter Exercises .........................................................................................273 13 Measurement Errors ...................................................................................275 The Outcome Variable Is Measured with Error ......................................277 The Explanatory Variables Are Measured with Error ...........................279 What Should We Do about Measurement Error? ....................................280 Latent Variables as a Solution to Measurement Error ............................281 Chapter Summary .......................................................................................290 Chapter Exercises .........................................................................................291 14 Influential Observations: Leverage Points and Outliers ....................293 Detecting Influential Observations ...........................................................295 An Example of Using Diagnostic Methods to Identify Influential Observations .................................................................................................298 viii Contents What to Do about Influential Observations .............................................303 Chapter Summary .......................................................................................310 Chapter Exercises .........................................................................................311 15 Multilevel Linear Regression Models ....................................................313 The Basics of Multilevel Regression Models ............................................315 The Multilevel LRM ....................................................................................320 Examining Assumptions of the Model ....................................................325 Group-Level Variables and Cross-Level Interactions .............................329 Chapter Summary .......................................................................................334 Chapter Exercises .........................................................................................334 16 A Brief Introduction to Logistic Regression .........................................337 An Alternative to the LRM: Logistic Regression ....................................340 Extending the Logistic Regression Model ...............................................348 Chapter Summary .......................................................................................352 Chapter Exercises .........................................................................................353 17 Conclusions ..................................................................................................355 Sampling Weights ........................................................................................356 Establishing Causal Associations ..............................................................357 Final Words ...................................................................................................360 Linear Regression Modeling: A Summary ..............................................360 Appendix A: Data Management .....................................................................365 Appendix B: Using Simulations to Examine Assumptions of Linear Regression Models ...............................................................................381 Appendix C: Selected Formulas......................................................................389 Appendix D: User-Written R Packages Employed in the Examples........397 References ...........................................................................................................399 Index .....................................................................................................................411 Preface I wrote this book for researchers, both novice and experienced, whose goal is to investigate statistical associations among quantitative variables. Although several empirical models are available to help researchers meet this goal, I describe one of the oldest and best understood methods: the method of least squares or the linear regression model (LRM). LRMs are designed to account for or predict the values of a single continuous outcome variable with informa- tion from one or more explanatory variables. For example, in the U.S., many jurisdictions, including more than half the states, now allow marijuana— which the federal government continues to classify as having no currently accepted medical use and a high potential for abuse1—to be prescribed for medical conditions. A concern of public health officials is that the prolifera- tion of medical marijuana prescriptions will motivate more marijuana use among young people. Suppose a research team wishes to test the notion that use has increased. They might compare the prevalence of marijuana use among youth in states that permit medical marijuana to those that do not. In fact, one research group completed such a comparison and, using an LRM to consider several potential influences in addition to medical marijuana avail- ability, found that allowing physicians to prescribe marijuana had no effect on youth marijuana use.2 Some quantitative researchers, unfortunately, dismiss the LRM because more recently developed statistical models are considered more rigorous, complex, or even in vogue. In some situations, it might be necessary to choose another statistical model, but it is unwise to ignore the LRM because it remains a valuable tool for studying associations among variables, making predictions, and even, in certain circumstances, helping to identify causal associations. In addition, it remains popular because (1) it is relatively easy to use and understand; (2) statistical software to estimate LRMs is widely available; and (3) LRMs offer a flexible and powerful method for conducting important types of analyses. Even though many researchers use LRMs because they offer an effective set of tools for understanding the association between two or more vari- ables, they are sometimes misused by those who fail to take into account the assumptions the underlie the models. For instance, a LRM might pro- vide unstable results when two or more of the explanatory variables (if you don’t know this term, you will; just keep reading) have high correlations 1 https://www .dea .gov /drug -scheduling 2 Arthur Robin Williams et al. (2017). “Loose Regulation of Medical Marijuana Programs Associated with Higher Rates of Adult Marijuana Use but Not Cannabis Use Disorder,” Addiction 112(11): 1985–1991. ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.