Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer New York Berlin Heidelberg Barcelona Hong Kong London Milan Paris Singapore Tokyo Springer Texts in Statistics Alfred: Elements of Statistics for the Life and Social Sciences Berger: An Introduction to Probability and Stochastic Processes Blom: Probability and Statistics: Theory and Applications Brockwell and Davis: An Introduction to Times Series and Forecasting Chow and Teicher: Probability Theory: Independence, Interchangeability, Martingales, Third Edition Christensen: Plane Answers to Complex Questions: The Theory of Linear Models, Second Edition Christensen: Linear Models for Multivariate, Time Series, and Spatial Data Christensen: Log-Linear Models and Logistic Regression, Second Edition Creighton: A First Course in Probability Models and Statistical Inference Dean and Voss: Design and Analysis of Experiments du Toit, Steyn, and Stumpf" Graphical Exploratory Data Analysis Edwards: Introduction to Graphical Modelling Finkelstein and Levin: Statistics for Lawyers Flury: A First Course in Multivariate Statistics Jobson: Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate Methods Kalbfleisch: Probability and Statistical Inference, Volume I: Probability, Second Edition Kalbfleisch: Probability and Statistical Inference, Volume II: Statistical Inference, Second Edition Karr: Probability KeyJitz: Applied Mathematical Demography, Second Edition Kiefer: Introduction to Statistical Inference Kokoska and Nevison: Statistical Tables and Formulae Lehmann: Elements of Large-Sample Theory Lehmann: Testing Statistical Hypotheses, Second Edition Lehmann and Casella: Theory of Point Estimation, Second Edition Lindman: Analysis of Variance in Experimental Design Lindsey: Applying Generalized Linear Models Madansky: Prescriptions for Working Statisticians McPherson: Statistics in Scientific Investigation: Its Basis, Application, and Interpretation Mueller: Basic Principles of Structural Equation Modeling Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume I: Probability for Statistics Nguyen and Rogers: Fundamentals of Mathematical Statistics: Volume II: Statistical Inference (Continued after index) Ralph O. Mueller Basic Principles of Structural Equation Modeling An Introduction to LISREL and EQS With 25 Illustrations Springer Ralph O. Mueller, PhD Department of Educational Leadership Graduate School of Education and Human Development The George Washington University Washington, DC 20052 USA [email protected] edu Editorial Board George Casella Stephen Fienberg Ingram Olkin Biometrics Unit Department of Statistics Department of Statistics Cornell University Carnegie Mellon University Stanford University Ithaca. NY 14853-7801 Pittsburgh, PA 15213-3890 Stanford, CA 94305 USA USA USA On the cover: The artwork is based on an illustration of the basic matrices for structural equation models. Library of Congress Cataloging-in-Publication Data Mueller, Ralph O. Basic principles of structural equation modeling: an introduction to LISREL and EQS/Ralph O. Mueller. p. cm.-(Springer texts in statistics) Includes bibliographical references (pp. 216-221) and index. ISBN-13: 978-1-4612-8455-0 e-ISBN: 978-1-4612-3974-1 DOl: 10.1007/978-1-4612-3974-1 1. LISREL. 2. EQS (Computer file) 3. Path analysis-Data processing. 4. Social sciences-Statistical methods. I. Title. II. Series. QA278.3.M84 1996 519.5'35-dc20 95-15043 Printed on acid-free paper. © 1996 Springer-Verlag New York, Inc. Softcover reprint of the hardcover 1s t edition 1996 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software. or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production coordinated by Publishing Network and managed by Francine McNeill; manu facturing supervised by Jeffrey Taub. Typeset by Asco Trade Typesetting Ltd., Hong Kong. 9 8 7 6 5 4 3 2 (Corrected second printing. 1999) To Mama and Papa; to Dan, Paula; and to Rob: My family of choice Preface During the last two decades, structural equation modeling (SEM) has emerged as a powerful multivariate data analysis tool in social science research settings, especially in the fields of sociology, psychology, and education. Although its roots can be traced back to the first half of this century, when Spearman (1904) developed factor analysis and Wright (1934) introduced path analysis, it was not until the 1970s that the works by Karl Joreskog and his associates (e.g., Joreskog, 1977; Joreskog and Van Thillo, 1973) began to make general SEM techniques accessible to the social and behavioral science research communities. Today, with the development and increasing avail ability of SEM computer programs, SEM has become a well-established and respected data analysis method, incorporating many of the traditional analysis techniques as special cases. State-of-the-art SEM software packages such as LISREL (Joreskog and Sorbom, 1993a,b) and EQS (Bentler, 1993; Bentler and Wu, 1993) handle a variety of ordinary least squares regression designs as well as complex structural equation models involving variables with arbitrary distributions. Unfortunately, many students and researchers hesitate to use SEM methods, perhaps due to the somewhat complex underlying statistical repre sentation and theory. In my opinion, social science students and researchers can benefit greatly from acquiring knowledge and skills in SEM since the methods-applied appropriately-can provide a bridge between the theo retical and empirical aspects of behavioral research. That is, interpretations of SEM analyses can assist in understanding aspects of social and behavioral phenomena if (a) a "good" initial model is conceptualized based on a sound underlying substantive theory; (b) appropriate data are collected to estimate the unknown population parameters; (c) the fit of those data to the a priori hypothesized model is assessed; and (d) if theoretically justified, the initial model is modified appropriately should evidence of lack-of-fit and model misspecification arise. Structural equation modeling thus should be under stood as a research process, not as a mere statistical technique. VII V1I1 Preface Many social science graduate programs now offer introductions to SEM within their quantitative methods course sequences. Several currently avail able textbooks (e.g., Blalock, 1964; Bollen, 1989; Duncan, 1975; Hayduk, 1987; Kenny, 1979; Loehlin, 1992) reflect the rapidly increasing interest and advances in SEM methods. However, most treatments of SEM are either outdated and do not provide information on recent statistical advances and modern computer software, or they are of an advanced nature, sometimes making the initial study of SEM techniques unattractive and cumbersome to potential users. The present book was written to bridge this gap by address ing former deficiencies in a readily accessible introductory text. The foci of the book are basic concepts and applications of SEM within the social and behavioral sciences. As such, students in any of the social and behavioral sciences with a background equivalent to a standard two-quarter or two-semester sequence in quantitative research methods (general ANOV A and linear regression models and basic concepts in applied measurement) should encounter no difficulties in mastering the content of this book. The minimal knowledge of matrix algebra that is required to understand the basic statistical foundations of SEM may be reviewed in this text (Appendix C) or acquired from any elementary linear algebra textbook. The exercises and selected references at the end of each chapter serve to test and strengthen the understanding of presented topics and provide the reader with a starting point into the current literature on SEM and related topics. Central to the book is the development of SEM techniques by sequen tially presenting linear regression and recursive path analysis, confirmatory factor analysis, and more general structural equation models. To illustrate statistical concepts and techniques, I have chosen to use the computer pro grams LISREL (Joreskog and Sorbom, 1993a,b) and EQS (Bentler, 1993) to analyze data from selected research in the fields of sociology and counseling psychology. The book can be used as an introduction to the LISREL and EQS programs themselves, although certainly it is not intended to be a substitute for the program manuals (LISREL: Joreskog & Sorbom, 1993a,b; EQS: Bentler, 1993; Bentler and Wu, 1993). While other computer programs could have been utilized [e.g., Steiger's (1989) EzPATH, MuthCn's (1988) LISCOMP, or SAS's PROC CALIS (SAS Institute Inc., 1990)], the choice of introducing LISREL and EQS was based on my perception that they are the most widely used, known, and accepted structural equation programs to date. The notation used to present statistical concepts is consistent with that in other treatments of SEM (e.g., Bollen, 1989; Hayduk, 1987) and follows the Joreskog-Keesling-Wiley approach to represent systems of structural equa tions (e.g., Joreskog, 1973, 1977; Joreskog and Sorbom, 1993a; Keesling, 1972; Wiley, 1973). Since all versions of LISREL are based on this approach, the program's matrix-based syntax is easily explained and understood once some underlying statistical theory has been introduced. However, the LISREL package also features an alternative command language, SIMPLIS (Joreskog Preface IX and Sorbom, 1993b), which allows the researcher to analyze structural equa tion models without fully understanding and using matrix representations. While LISREL examples throughout the chapters use the matrix-based syn tax to illustrate the statistical foundation and representation of SEM, their equivalents using SIMPLIS are explained in Appendix A. The EQS language, on the other hand, is built around the Bentler-Weeks model (e.g., Bentler, 1993; Bentler and Weeks, 1979, 1980); this alternative way to represent structural equation systems is not used in this book. Like SIMPLIS, the EQS syntax does not involve matrix specifications and, hence, knowledge of the matrix form of the Bentler-Weeks model is not required to understand and use the EQS program. The package's MS Windows version (Bentler and Wu, 1993) includes the BuilLEQS option that allows the analyst to create an EQS input file without actually having to write the input program but, instead, by "clicking" on certain options within various dia logue boxes. I feel that knowledge of the program's syntax for model specifi cation is of overall educational value and, thus, discuss actual input files rather than emphasizing the Build_EQS option. Initially, any SEM programming language that does not depend on an understanding of the matrix representation of structural equation models might seem easier than a traditional matrix-based syntax. To acquire an "aesthetic" appreciation of SEM, however, it is necessary to grasp some of the underlying statistical principles and formulations-be it via the approach used here or the one proposed in the Bentler-Weeks model. Getting acquainted with the syntax of the matrix- and nonmatrix-based command languages of LISREL and EQS will aid in this endeavor. In summary, this book was written with four goals in mind: (1) to help users of contemporary social science research methods gain an understand ing and appreciation of the main goals and advantages of "doing" SEM; (2) to explicate some of the fundamental statistical theory that underlies SEM; (3) to illustrate the use of the software packages LISREL and EQS in the analysis of basic recursive structural equation models; and, most impor tantly, (4) to stimulate the reader's curiosity to consult more advanced treatments of SEM. Organization and Historical Perspective The book contains three chapters, largely reflecting the historical develop ment of what is known now as the general area of SEM: from a discussion of classical path analysis (Chapter 1) and an introduction to confirmatory factor analysis (Chapter 2) to an exploration of basic analyses of more general structural equation models (Chapter 3). In addition, the book contains five appendices: Appendix A presents all LISREL examples from the chapters in the SIMPLIS command language and serves as an introduction to this x Preface alternative LISREL syntax. Appendix B briefly reviews the statistical con cepts of distributional location, dispersion, and variable association mainly to explain some of the notation used in this book; some elementary concepts of matrix algebra are presented in Appendix C. Finally, Appendices D and E contain coding information and summary statistics for all variables used in the LISREL and EQS example analyses. Chapter 1 introduces the language and notation of SEM through a discus sion of classical recursive path analysis. Most modern treatments trace the beginnings of SEM to the development of path analysis by Sewall Wright during the 1920s and 1930s (Wright, 1921, 1934). However, it took almost half a century before the method was introduced to the social sciences with articles by Duncan (1966) in sociology, Werts and Linn (1970) in psychology, and Anderson and Evans (1974) in education, among others. Early path analyses were simply ordinary least squares (OLS) regression applications to sets of variables that were structurally related. In many ways, SEM tech niques still can be seen as nothing more than the analysis of simultaneous regression equations. In fact, if the statistical assumptions underlying linear regression are met, standard OLS estimation~as available in computer programs such as SPSS, SAS, or BMDP~can be used to estimate the parameters in some structural equation models. Throughout the review of univariate simple and multiple linear regression in Chapter 1, standard regression notation is replaced by that commonly used in SEM (the Joreskog-Keesling-Wiley notation system), providing an easy transition into the basic concepts of SEM. The introduction of path diagrams and structural equations leads to the First Law of Path Analysis, which allows for the decomposition of covariances between any two vari ables in a recursive path model. In turn, such decompositions lead to the definitions of the direct, indirect, and total effects among structurally ordered variables within a specific model. Finally, a discussion and demonstration of the identification problem in recursive path models is included to sensitize the reader to a complex issue that demands attention before SEM analyses can be conducted. Throughout the chapter, theoretical concepts are illus trated by annotated LISREL and EQS analyses of a data set taken from the sociological literature. In Chapter 2, the inclusion of latent, unobserved variables in structural equation models is introduced from a confirmatory factor analysis (CFA ) perspective. Based on Spearman's (1904) discovery of exploratory factor analysis (EFA), several individuals (e.g., Joreskog, 1967, 1969; Lawley, 1940, 1967; Thurstone, 1947) advanced the statistical theory to represent relation ships among observed variables in terms of a smaller number of underlying theoretical constructs, that is, latent variables. It was not until the mid-1950s that a formal distinction was made between EF A and CFA : While the former method is used to search data for an underlying structure, the latter tech nique is utilized to confirm (strictly speaking, disconfirm) an a priori hypoth esized, theory-derived structure with collected data. In this text, only CFA Preface Xl models are discussed; books by Gorsuch (1983), Mulaik (1972), or McDonald (1985) can be consulted for detailed introductions to EF A. Based on the assumption that most measured variables are imperfect indicators of certain underlying, latent constructs or factors, CFA allows the researcher to cluster observed variables in certain prespecified, theory-driven ways, that is, to specify a priori the latent construct(s) that the observed variables are intended to measure. After estimating the factor loadings (struc tural coefficients in the regressions of observed on the latent variables) from a variance/covariance matrix of the observed variables, the investigator can and should assess whether or not the collected data "fit" the hypothesized factor structure. Since several issues surrounding the question of how best to assess data-model fit still are unresolved and current answers remain contro versial, a detailed discussion of some of the currently utilized data-model fit indices is presented in Chapter 2. The different approaches to data-model fit, as well as other concepts introduced in Chapter 2 (e.g., model modification), are illustrated by (a) an extended analysis of the data set introduced in Chapter 1, and (b) a LISREL and EQS CF A to assess the validity and reliability of a behavior assessment instrument taken from the counseling psychology literature. Finally, Chapter 3 treats the most general recursive structural equation model presented in this book. The conceptual integration of the classical path analysis model (Chapter 1) and the CFA model (Chapter 2) is used to develop the general structure. During the 1970s, several researchers (e.g., Keesling, 1972; Joreskog, 1977; Wiley, 1973) succeeded in developing the statistical theory and computational procedures for combining path analysis and CF A into a single statistical framework. The various versions of LISREL, developed by Joreskog and associates dating back to 1970, continuously incorporated up-to-date statistical advances in SEM. Similarly, EQS (Bentler, 1993) is regarded as one of the most sophisticated and respected software packages for the analysis of a wide variety of structural equation models, including path analytical, CFA , and more general models. In addition, its MS Windows version (Bentler and Wu, 1993) includes many exploratory data analytical tools, such as outlier analysis and other descriptive statistics with various plot options, in addition to inferential techniques, such as dependent and independent t-tests, regression, and ANOYA. After discussing the specification and identification of general structural equation models that involve latent variables in Chapter 3, the estimation of direct, indirect, and total effect components introduced in Chapter 1 is sim plified by introducing a matrix algorithm to estimate the various effect com ponents. Also, two of the most common iterative methods for estimating parameters in general structural equation models are discussed: maximum likelihood (Joreskog, 1969, 1970; among others) and generalized least squares (Goldberger, 1971; Joreskog and Goldberger, 1972). While these techniques have important advantages over more traditional methods such as OLS, both methods still depend on the multivariate normality assumption; for