C4800_Prelims.fm Page iii Wednesday, October 18, 2006 9:37 AM LINEAR MIXED MODELS A Practical Guide Using Statistical Software Brady T. West Kathleen B. Welch Andrzej T. Ga/lecki with contributions from Brenda W. Gillespie © 2007 by Taylor & Francis Group, LLC C4800_Prelims.fm Page iv Wednesday, October 18, 2006 9:37 AM Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 1-58488-480-0 (Hardcover) International Standard Book Number-13: 978-1-58488-480-4 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any informa- tion storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For orga- nizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2007 by Taylor & Francis Group, LLC CC44880000__PPrreelliimmss..ffmm PPaaggee vv TWueedsndeasyd, aOyc, tOobcetor b1e7r, 1280,0 260 0161 : 095:3 A7 MAM Dedication To Laura To all of my teachers, especially my parents and grandparents —B.T.W. To Jim, Tracy, and Brian To the memory of Fremont and June —K.B.W. To Viola, Pawe(cid:225), Marta, and Artur To my parents —A.T.G. © 2007 by Taylor & Francis Group, LLC C4800_C000.fm Page vii Tuesday, September 26, 2006 10:42 AM Preface The development of software for fitting linear mixed models was propelled by advances in statistical methodology and computing power in the late 20th century. These develop- ments, while providing applied researchers with new tools, have produced a sometimes confusing array of software choices. At the same time, parallel development of the meth- odology in different fields has resulted in different names for these models, including mixed models, multilevel models, and hierarchical linear models. This book provides a reference on the use of procedures for fitting linear mixed models available in five popular statistical software packages (SAS, SPSS, Stata, R/S-plus, and HLM). The intended audi- ence includes applied statisticians and researchers who want a basic introduction to the topic and an easy-to-navigate software reference. Several existing texts provide excellent theoretical treatment of linear mixed models and the analysis of variance components (e.g., McCulloch and Searle, 2001; Searle, Casella, and McCulloch, 1992; Verbeke and Molenberghs, 2000); this book is not intended to be one of them. Rather, we present the primary concepts and notation, and then focus on the software implementation and model interpretation. This book is intended to be a reference for practicing statisticians and applied researchers, and could be used in an advanced undergraduate or introductory graduate course on linear models. Given the ongoing development and rapid improvements in software for fitting linear mixed models, the specific syntax and available options will likely change as newer versions of the software are released. The most up-to-date versions of selected portions of the syntax associated with the examples in this book, in addition to many of the data sets used in the examples, are available at the following Web site: http://www.umich.edu/~bwest/almmussp.html © 2007 by Taylor & Francis Group, LLC C4800_C000.fm Page ix Tuesday, September 26, 2006 10:42 AM The Authors Brady West is a senior statistician and statistical software consultant at the Center for Statistical Consultation and Research (CSCAR) at the University of Michigan–Ann Arbor. He received a B.S. in statistics (2001) and an M.A. in applied statistics (2002) from the University of Michigan–Ann Arbor. Mr. West has developed short courses on statistical analysis using SPSS, R, and Stata, and regularly consults on the use of procedures in SAS, SPSS, R, Stata, and HLM for the analysis of longitudinal and clustered data. Kathy Welch is a senior statistician and statistical software consultant at the Center for Statistical Consultation and Research (CSCAR) at the University of Michigan–Ann Arbor. She received a B.A. in sociology (1969), an M.P.H. in epidemiology and health education (1975), and an M.S. in biostatistics (1984) from the University of Michigan (UM). She regularly consults on the use of SAS, SPSS, Stata, and HLM for analysis of clustered and longitudinal data, teaches a course on statistical software packages in the University of Michigan Department of Biostatistics, and teaches short courses on SAS software. She has also co-developed and co-taught short courses on the analysis of linear mixed models and generalized linear models using SAS. Andrzej Gałecki is a research associate professor in the Division of Geriatric Medicine, Department of Internal Medicine, and Institute of Gerontology at the University of Mich- igan Medical School, and has a joint appointment in the Department of Biostatistics at the University of Michigan School of Public Health. He received a M.Sc. in applied mathe- matics (1977) from the Technical University of Warsaw, Poland, and an M.D. (1981) from the Medical Academy of Warsaw. In 1985 he earned a Ph.D. in epidemiology from the Institute of Mother and Child Care in Warsaw (Poland). Since 1990, Dr. Gałecki has collaborated with researchers in gerontology and geriatrics. His research interests lie in the development and application of statistical methods for analyzing correlated and over- dispersed data. He developed the SAS macro NLMEM for nonlinear mixed-effects models, specified as a solution of ordinary differential equations. In a 1994 paper, he proposed a general class of covariance structures for two or more within-subject factors. Examples of these structures have been implemented in SAS Proc Mixed. Brenda Gillespie is the associate director of the Center for Statistical Consultation and Research (CSCAR) at the University of Michigan in Ann Arbor. She received an A.B. in mathematics (1972) from Earlham College in Richmond, Indiana, an M.S. in statistics (1975) from The Ohio State University, and earned a Ph.D. in statistics (1989) from Temple University in Philadelphia, Pennsylvania. Dr. Gillespie has collaborated extensively with researchers in health-related fields, and has worked with mixed models as the primary statistician on the Collaborative Initial Glaucoma Treatment Study (CIGTS), the Dialysis Outcomes Practice Pattern Study (DOPPS), the Scientific Registry of Transplant Recipients (SRTR), the University of Michigan Dioxin Study, and at the Complementary and Alter- native Medicine Research Center at the University of Michigan. © 2007 by Taylor & Francis Group, LLC C4800_C000.fm Page xi Tuesday, September 26, 2006 10:42 AM Acknowledgments First and foremost, we wish to thank Brenda Gillespie for her vision and the many hours she spent on making this project a reality. Her contributions have been invaluable. We sincerely wish to thank Caroline Beunckens at the Universiteit Hasselt in Belgium, who has patiently and consistently reviewed our chapters, providing her guidance and insight. We also wish to acknowledge, with sincere appreciation, the careful reading of our text and invaluable suggestions for its improvement provided by Tomasz Burzykowski at the Universiteit Hasselt in Belgium; Oliver Schabenberger at the SAS Institute; Douglas Bates and José Pinheiro, co-developers of the lme() and gls() func- tions in R; Sophia Rabe-Hesketh, developer of the gllamm procedure in Stata; Shu Chen and Carrie Disney at the University of Michigan–Ann Arbor; and John Gillespie at the University of Michigan–Dearborn. We would also like to thank the technical support staff at SAS and SPSS for promptly responding to our inquiries about the mixed modeling procedures in those software packages. We also thank the anonymous reviewers provided by Chapman & Hall/CRC Press for their constructive suggestions on our early draft chapters. The Chapman & Hall/CRC Press staff has consistently provided helpful and speedy feedback in response to our many questions, and we are indebted to Kirsty Stroud for her support of this project in its early stages. We especially thank Rob Calver at Chapman & Hall /CRC Press forhis support and enthusiasm for this project, and his deft and thoughtful guidance throughout. We thank our colleagues at the University of Michigan, especially Myra Kim and Julian Faraway, for their perceptive comments and useful discussions. Our colleagues at the University of Michigan Center for Statistical Consultation and Research (CSCAR) have been wonderful, particularly CSCAR’s director, Ed Rothman, who has provided encour- agement and advice. We are very grateful to our clients who have allowed us to use their data sets as examples. We are thankful to the participants of the 2006 course on mixed- effects models organized by statistics.com for careful reading and comments on the manu- script of our book. In particular, we acknowledge Rickie Domangue from James Madison University, Robert E. Larzelere from the University of Nebraska, and Thomas Trojian from the University of Connecticut. We also gratefully acknowledge support from the Claude Pepper Center Grants AG08808 and AG024824 from the National Institute of Aging. We are especially indebted to our families and loved ones for their patience and support throughout the preparation of this book. It has been a long and sometimes arduous process that has been filled with hours of discussions and many late nights. The time we have spent writing this book has been a period of great learning and has developed a fruitful exchange of ideas that we have all enjoyed. Brady, Kathy, and Andrzej © 2007 by Taylor & Francis Group, LLC C4800_bookTOC.fm Page xiii Friday, October 6, 2006 2:14 PM Contents Chapter 1 Introduction .............................................................................................................1 1.1 What Are Linear Mixed Models (LMMs)? .......................................................................1 1.1.1 Models with Random Effects for Clustered Data ..............................................2 1.1.2 Models for Longitudinal or Repeated-Measures Data ......................................2 1.1.3 The Purpose of this Book ........................................................................................3 1.1.4 Outline of Book Contents .......................................................................................4 1.2 A Brief History of LMMs ....................................................................................................5 1.2.1 Key Theoretical Developments ..............................................................................5 1.2.2 Key Software Developments ..................................................................................7 Chapter 2 Linear Mixed Models: An Overview ..................................................................9 2.1 Introduction ...........................................................................................................................9 2.1.1 Types and Structures of Data Sets ........................................................................9 2.1.1.1 Clustered Data vs. Repeated-Measures and Longitudinal Data .......9 2.1.1.2 Levels of Data ...........................................................................................10 2.1.2 Types of Factors and their Related Effects in an LMM ...................................11 2.1.2.1 Fixed Factors .............................................................................................12 2.1.2.2 Random Factors .......................................................................................12 2.1.2.3 Fixed Factors vs. Random Factors ........................................................12 2.1.2.4 Fixed Effects vs. Random Effects ..........................................................13 2.1.2.5 Nested vs. Crossed Factors and their Corresponding Effects .........13 2.2 Specification of LMMs .......................................................................................................15 2.2.1 General Specification for an Individual Observation ......................................15 2.2.2 General Matrix Specification ................................................................................16 2.2.2.1 Covariance Structures for the D Matrix ...............................................19 2.2.2.2 Covariance Structures for the R Matrix ..............................................20 i 2.2.2.3 Group-Specific Covariance Parameter Values for the D and R i Matrices .....................................................................................................21 2.2.3 Alternative Matrix Specification for All Subjects .............................................21 2.2.4 Hierarchical Linear Model (HLM) Specification of the LMM ........................22 2.3 The Marginal Linear Model ..............................................................................................22 2.3.1 Specification of the Marginal Model ...................................................................22 2.3.2 The Marginal Model Implied by an LMM ........................................................23 2.4 Estimation in LMMs ...........................................................................................................25 2.4.1 Maximum Likelihood (ML) Estimation ..............................................................25 2.4.1.1 Special Case: Assume h is Known ........................................................26 2.4.1.2 General Case: Assume h is Unknown ..................................................27 2.4.2 REML Estimation ...................................................................................................28 2.4.3 REML vs. ML Estimation ......................................................................................28 2.5 Computational Issues .........................................................................................................30 2.5.1 Algorithms for Likelihood Function Optimization ..........................................30 2.5.2 Computational Problems with Estimation of Covariance Parameters .........31 2.6 Tools for Model Selection ..................................................................................................33 © 2007 by Taylor & Francis Group, LLC C4800_bookTOC.fm Page xiv Friday, October 6, 2006 2:14 PM xiv Contents 2.6.1 Basic Concepts in Model Selection ......................................................................34 2.6.1.1 Nested Models ..........................................................................................34 2.6.1.2 Hypotheses: Specification and Testing ................................................34 2.6.2 Likelihood Ratio Tests (LRTs) ..............................................................................34 2.6.2.1 Likelihood Ratio Tests for Fixed-Effect Parameters ..........................35 2.6.2.2 Likelihood Ratio Tests for Covariance Parameters ............................35 2.6.3 Alternative Tests .....................................................................................................36 2.6.3.1 Alternative Tests for Fixed-Effect Parameters ....................................37 2.6.3.2 Alternative Tests for Covariance Parameters .....................................38 2.6.4 Information Criteria ...............................................................................................38 2.7 Model-Building Strategies .................................................................................................39 2.7.1 The Top-Down Strategy ........................................................................................39 2.7.2 The Step-Up Strategy .............................................................................................40 2.8 Checking Model Assumptions (Diagnostics) .................................................................41 2.8.1 Residual Diagnostics ..............................................................................................41 2.8.1.1 Conditional Residuals .............................................................................41 2.8.1.2 Standardized and Studentized Residuals ............................................42 2.8.2 Influence Diagnostics .............................................................................................42 2.8.3 Diagnostics for Random Effects ...........................................................................43 2.9 Other Aspects of LMMs ....................................................................................................43 2.9.1 Predicting Random Effects: Best Linear Unbiased Predictors ........................43 2.9.2 Intraclass Correlation Coefficients (ICCs) ..........................................................45 2.9.3 Problems with Model Specification (Aliasing) ..................................................46 2.9.4 Missing Data ...........................................................................................................48 2.9.5 Centering Covariates .............................................................................................49 2.10 Chapter Summary ...............................................................................................................49 Chapter 3 Two-Level Models for Clustered Data: The Rat Pup Example ..........................................................................................51 3.1 Introduction .........................................................................................................................51 3.2 The Rat Pup Study .............................................................................................................51 3.2.1 Study Description ...................................................................................................51 3.2.2 Data Summary ........................................................................................................54 3.3 Overview of the Rat Pup Data Analysis ........................................................................58 3.3.1 Analysis Steps .........................................................................................................58 3.3.2 Model Specification ................................................................................................60 3.3.2.1 General Model Specification ..................................................................60 3.3.2.2 Hierarchical Model Specification ..........................................................62 3.3.3 Hypothesis Tests ....................................................................................................63 3.4 Analysis Steps in the Software Procedures ....................................................................66 3.4.1 SAS ............................................................................................................................66 3.4.2 SPSS ..........................................................................................................................74 3.4.3 R ................................................................................................................................77 3.4.4 Stata ..........................................................................................................................82 3.4.5 HLM .........................................................................................................................85 3.4.5.1 Data Set Preparation ................................................................................85 3.4.5.2 Preparing the Multivariate Data Matrix (MDM) File ........................86 3.5 Results of Hypothesis Tests ..............................................................................................90 3.5.1 Likelihood Ratio Tests for Random Effects .......................................................90 3.5.2 Likelihood Ratio Tests for Residual Variance ...................................................91 3.5.3 F-tests and Likelihood Ratio Tests for Fixed Effects ........................................91 © 2007 by Taylor & Francis Group, LLC C4800_bookTOC.fm Page xv Friday, October 6, 2006 2:14 PM Contents xv 3.6 Comparing Results across the Software Procedures ....................................................92 3.6.1 Comparing Model 3.1 Results ............................................................................92 3.6.2 Comparing Model 3.2B Results ..........................................................................94 3.6.3 Comparing Model 3.3 Results ............................................................................95 3.7 Interpreting Parameter Estimates in the Final Model ..................................................96 3.7.1 Fixed-Effect Parameter Estimates ......................................................................96 3.7.2 Covariance Parameter Estimates ........................................................................97 3.8 Estimating the Intraclass Correlation Coefficients (ICCs) ...........................................98 3.9 Calculating Predicted Values ..........................................................................................100 3.9.1 Litter-Specific (Conditional) Predicted Values ..............................................100 3.9.2 Population-Averaged (Unconditional) Predicted Values ............................101 3.10 Diagnostics for the Final Model .....................................................................................102 3.10.1 Residual Diagnostics ..........................................................................................102 3.10.1.1 Conditional Residuals ..........................................................................102 3.10.1.2 Conditional Studentized Residuals ...................................................104 3.10.2 Influence Diagnostics .........................................................................................106 3.10.2.1 Overall and Fixed-Effects Influence Diagnostics ............................106 3.10.2.2 Influence on Covariance Parameters ................................................107 3.11 Software Notes ..................................................................................................................108 3.11.1 Data Structure .....................................................................................................108 3.11.2 Syntax vs. Menus ................................................................................................109 3.11.3 Heterogeneous Residual Variances for Level 2 Groups ..............................109 3.11.4 Display of the Marginal Covariance and Correlation Matrices ...........................................................................................109 3.11.5 Differences in Model Fit Criteria .....................................................................109 3.11.6 Differences in Tests for Fixed Effects ..............................................................110 3.11.7 Post-Hoc Comparisons of LS Means (Estimated Marginal Means) ............................................................................111 3.11.8 Calculation of Studentized Residuals and Influence Statistics ..............................................................................................112 3.11.9 Calculation of EBLUPs .......................................................................................112 3.11.10 Tests for Covariance Parameters ......................................................................112 3.11.11 Refeernce Categories for Fixed Factors ...........................................................112 Chapter 4 Three-Level Models for Clustered Data: The Classroom Example ....................................................................................115 4.1 Introduction .......................................................................................................................115 4.2 The Classroom Study .......................................................................................................117 4.2.1 Study Description ...............................................................................................117 4.2.2 Data Summary ....................................................................................................118 4.2.2.1 Data Set Preparation ............................................................................119 4.2.2.2 Preparing the Multivariate Data Matrix (MDM) File ....................119 4.3 Overview of the Classroom Data Analysis ..................................................................122 4.3.1 Analysis Steps .....................................................................................................122 4.3.2 Model Specification ............................................................................................125 4.3.2.1 General Model Specification ..............................................................125 4.3.2.2 Hierarchical Model Specification .......................................................126 4.3.3 Hypothesis Tests .................................................................................................128 4.4 Analysis Steps in the Software Procedures ..................................................................130 4.4.1 SAS ........................................................................................................................130 4.4.2 SPSS .......................................................................................................................136 © 2007 by Taylor & Francis Group, LLC C4800_bookTOC.fm Page xvi Friday, October 6, 2006 2:14 PM xvi Contents 4.4.3 R ..............................................................................................................................141 4.4.4 Stata ........................................................................................................................144 4.4.5 HLM .......................................................................................................................147 4.5 Results of Hypothesis Tests ............................................................................................153 4.5.1 Likelihood Ratio Test for Random Effects .......................................................153 4.5.2 Likelihood Ratio Tests and t-Tests for Fixed Effects .....................................154 4.6 Comparing Results across the Software Procedures ..................................................155 4.6.1 Comparing Model 4.1 Results ............................................................................155 4.6.2 Comparing Model 4.2 Results ............................................................................156 4.6.3 Comparing Model 4.3 Results ............................................................................157 4.6.4 Comparing Model 4.4 Results ............................................................................159 4.7 Interpreting Parameter Estimates in the Final Model ................................................159 4.7.1 Fixed-Effect Parameter Estimates ......................................................................159 4.7.2 Covariance Parameter Estimates .......................................................................161 4.8 Estimating the Intraclass Correlation Coefficients (ICCs) .........................................162 4.9 Calculating Predicted Values ..........................................................................................165 4.9.1 Conditional and Marginal Predicted Values ...................................................165 4.9.2 Plotting Predicted Values Using HLM .............................................................166 4.10 Diagnostics for the Final Model .....................................................................................167 4.10.1 Plots of the EBLUPs .............................................................................................167 4.10.2 Residual Diagnostics ............................................................................................169 4.11 Software Notes ..................................................................................................................171 4.11.1 REML vs. ML Estimation ....................................................................................171 4.11.2 Setting up Three-Level Models in HLM ..........................................................171 4.11.3 Calculation of Degrees of Freedom for t-Tests in HLM ................................171 4.11.4 Analyzing Cases with Complete Data ..............................................................172 4.11.5 Miscellaneous Differences ...................................................................................173 Chapter 5 Models for Repeated-Measures Data: The Rat Brain Example ..................175 5.1 Introduction .......................................................................................................................175 5.2 The Rat Brain Study .........................................................................................................176 5.2.1 Study Description .................................................................................................176 5.2.2 Data Summary ......................................................................................................178 5.3 Overview of the Rat Brain Data Analysis ....................................................................180 5.3.1 Analysis Steps .......................................................................................................180 5.3.2 Model Specification ..............................................................................................182 5.3.2.1 General Model Specification ................................................................182 5.3.2.2 Hierarchical Model Specification ........................................................184 5.3.3 Hypothesis Tests ..................................................................................................185 5.4 Analysis Steps in the Software Procedures ..................................................................187 5.4.1 SAS ..........................................................................................................................187 5.4.2 SPSS ........................................................................................................................190 5.4.3 R ..............................................................................................................................193 5.4.4 Stata ........................................................................................................................195 5.4.5 HLM .......................................................................................................................198 5.4.5.1 Data Set Preparation ..............................................................................198 5.4.5.2 Preparing the MDM File .......................................................................199 5.5 Results of Hypothesis Tests ............................................................................................203 5.5.1 Likelihood Ratio Tests for Random Effects .....................................................203 5.5.2 Likelihood Ratio Tests for Residual Variance .................................................203 5.5.3 F-Tests for Fixed Effects ......................................................................................204 © 2007 by Taylor & Francis Group, LLC
Description: