ebook img

Multivariate Data Analysis PDF

739 Pages·2013·10.916 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multivariate Data Analysis

M u l t i v a r i a t e D a t a A n a l y s is Multivariate Data Analysis Joseph F. Hair Jr. William C. Black H a Barry J. Babin Rolph E. Anderson i r Seventh Edition B l a c k B a b i n A n d e r s o n 7 t h e d i t i o ISBN 978-1-29202-190-4 n 9 781292 021904 Multivariate Data Analysis Joseph F. Hair Jr. William C. Black Barry J. Babin Rolph E. Anderson Seventh Edition ISBN 10: 1-292-02190-X ISBN 13: 978-1-292-02190-4 Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk © Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this book by such owners. ISBN 10: 1-292-02190-X ISBN 13: 978-1-292-02190-4 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America 12334455563853141714931113155919919 P E A R S O N C U S T O M L I B R AR Y Table of Contents 1. Overview of Multivariate Methods Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 1 2. Examining Your Data Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 31 3. Exploratory Factor Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 89 4. Multiple Regression Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 151 5. Multiple Discriminant Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 231 6. Logistic Regression: Regression with a Binary Dependent Variable Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 313 7. Conjoint Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 341 8. Cluster Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 415 9. Multidimensional Scaling Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 475 10. Analyzing Nominal Data with Correspondence Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 519 11. Structural Equations Modeling Overview Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 541 12. Confirmatory Factor Analysis Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 599 13. Testing Structural Equations Models Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 639 I 676259 14. MANOVA and GLM Joseph F. Hair, Jr./William C. Black/Barry J. Babin/Rolph E. Anderson 665 Index 729 II Overview of Multivariate Methods LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: (cid:2) Explain what multivariate analysis is and when its application is appropriate. (cid:2) Discuss the nature of measurement scales and their relationship to multivariate techniques. (cid:2) Understand the nature of measurement error and its impact on multivariate analysis. (cid:2) Determine which multivariate technique is appropriate for a specific research problem. (cid:2) Define the specific techniques included in multivariate analysis. (cid:2) Discuss the guidelines for application and interpretation of multivariate analyses. (cid:2) Understand the six-step approach to multivariate model building. CHAPTER PREVIEW This chapter presents a simplified overview of multivariate analysis. It stresses that multivariate analysis methods will increasingly influence not only the analytical aspects of research but also the design and approach to data collection for decision making and problem solving. Although multivariate tech- niques share many characteristics with their univariate and bivariate counterparts, several key differ- ences arise in the transition to a multivariate analysis. To illustrate this transition, this chapter presents a classification of multivariate techniques. It then provides general guidelines for the application of these techniques as well as a structured approach to the formulation, estimation, and interpretation ofmultivariate results. The chapter concludes with a discussion of the databases utilized throughout the text to illustrate application of the techniques. KEY TERMS Before starting the chapter, review the key terms to develop an understanding of the concepts and ter- minology used. Throughout the chapter, the key terms appear in boldface. Other points of emphasis in the chapter are italicized. Also, cross-references within the key terms appear in italics. Alpha (a) See Type I error. Beta (β) See Type II error. Bivariate partial correlation Simple (two-variable) correlation between two sets of residuals (unexplained variances) that remain after the association of other independent variables is removed. From Chapter 1 of Multivariate Data Analysis,7/e. Joseph F. Hair, Jr., William C. Black, Barry J. Babin, Rolph E. Anderson. Copyright © 2010 by Pearson Prentice Hall. All rights reserved. 1 Overview of Multivariate Methods Bootstrapping An approach to validating a multivariate model by drawing a large number of sub- samples and estimating models for each subsample. Estimates from all the subsamples are then com- bined, providing not only the “best” estimated coefficients (e.g., means of each estimated coefficient across all the subsample models), but their expected variability and thus their likelihood of differing from zero; that is, are the estimated coefficients statistically different from zero or not? This approach does not rely on statistical assumptions about the population to assess statistical significance, but instead makes its assessment based solely on the sample data. Composite measure See summated scales. Dependence technique Classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent. The objective is prediction of the dependent variable(s) by the independent variable(s). An example is regression analysis. Dependent variable Presumed effect of, or response to, a change in the independent variable(s). Dummy variable Nonmetricallymeasured variable transformed into a metricvariable by assign- ing a 1 or a 0 to a subject, depending on whether it possesses a particular characteristic. Effect size Estimate of the degree to which the phenomenon being studied (e.g., correlation or difference in means) exists in the population. Independent variable Presumed cause of any change in the dependent variable. Indicator Single variable used in conjunction with one or more other variables to form a composite measure. Interdependence technique Classification of statistical techniques in which the variables are notdivided into dependentand independentsets; rather, all variables are analyzed as a single set (e.g., factor analysis). Measurement error Inaccuracies of measuring the “true” variable values due to the fallibility of the measurement instrument (i.e., inappropriate response scales), data entry errors, or respondent errors. Metric data Also called quantitative data, interval data,or ratio data, these measurements iden- tify or describe subjects (or objects) not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the attribute. For example, a person’s age and weight are metric data. Multicollinearity Extent to which a variable can be explained by the other variables in the analy- sis. As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationships. Multivariate analysis Analysis of multiple variables in a single relationship or set of relationships. Multivariate measurement Use of two or more variables as indicators of a single composite measure. For example, a personality test may provide the answers to a series of individual ques- tions (indicators), which are then combined to form a single score (summated scale) representing the personality trait. Nonmetric data Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object. They differ from metric databy indicating the presence of an attribute, but not the amount. Examples are occupation (physician, attorney, professor) or buyer status (buyer, nonbuyer). Also called nominal dataor ordinal data. Power Probability of correctly rejecting the null hypothesis when it is false; that is, correctly finding a hypothesized relationship when it exists. Determined as a function of (1) the statistical significance level set by the researcher for a Type I error (a), (2) the sample size used in the analysis, and (3) the effect sizebeing examined. Practical significance Means of assessing multivariate analysis results based on their substantive findings rather than their statistical significance. Whereas statistical significance determines whether the result is attributable to chance, practical significance assesses whether the result is useful (i.e., substantial enough to warrant action) in achieving the research objectives. Reliability Extent to which a variable or set of variables is consistent in what it is intended to measure. If multiple measurements are taken, the reliable measures will all be consistent in their 2 Overview of Multivariate Methods values. It differs from validityin that it relates not to what should be measured, but instead to how it is measured. Specification error Omitting a key variable from the analysis, thus affecting the estimated effects of included variables. Summated scales Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliabilityof the measurement through multivariate measurement. In most instances, the separate variables are summed and then their total or average score is used in the analysis. Treatment Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an experiment (e.g., testing the appeal of color versus black-and- white advertisements). Type I error Probability of incorrectly rejecting the null hypothesis—in most cases, it means saying a difference or correlation exists when it actually does not. Also termed alpha (a). Typical levels are 5 or 1 percent, termed the .05 or .01 level, respectively. Type II error Probability of incorrectly failing to reject the null hypothesis—in simple terms, the chance of not finding a correlation or mean difference when it does exist. Also termed beta(β), it is inversely related to Type I error. The value of 1 minus the Type II error (1 -β) is defined as power. Univariate analysis of variance (ANOVA) Statistical technique used to determine, on the basis of one dependent measure, whether samples are from populations with equal means. Validity Extent to which a measure or set of measures correctly represents the concept of study— the degree to which it is free from any systematic or nonrandom error. Validity is concerned with how well the concept is defined by the measure(s), whereas reliabilityrelates to the consistency of the measure(s). Variate Linear combination of variables formed in the multivariate technique by deriving empirical weights applied to a set of variables specified by the researcher. WHAT IS MULTIVARIATE ANALYSIS? Today businesses must be more profitable, react quicker, and offer higher-quality products and ser- vices, and do it all with fewer people and at lower cost. An essential requirement in this process is effective knowledge creation and management. There is no lack of information, but there is a dearth of knowledge. As Tom Peters said in his book Thriving on Chaos, “We are drowning in information and starved for knowledge” [7]. The information available for decision making exploded in recent years, and will continue to do so in the future, probably even faster. Until recently, much of that information just disappeared. It was either not collected or discarded. Today this information is being collected and stored in data warehouses, and it is available to be “mined” for improved decision making. Some of that informa- tion can be analyzed and understood with simple statistics, but much of it requires more complex, multivariate statistical techniques to convert these data into knowledge. A number of technological advances help us to apply multivariate techniques. Among the most important are the developments in computer hardware and software. The speed of computing equipment has doubled every 18 months while prices have tumbled. User-friendly software pack- ages brought data analysis into the point-and-click era, and we can quickly analyze mountains of complex data with relative ease. Indeed, industry, government, and university-related research centers throughout the world are making widespread use of these techniques. We use the generic term researcher when referring to a data analyst within either the practitioner or academic communities. We feel it inappropriate to make any distinction between these two areas, because research in both relies on theoretical and quantitative bases. Although the research objectives and the emphasis in interpretation may vary, a researcher within either area must address all of the issues, both conceptual and empirical, raised in the discussions of the statistical methods. 3 Overview of Multivariate Methods MULTIVARIATE ANALYSIS IN STATISTICAL TERMS Multivariate analysis techniques are popular because they enable organizations to create knowledge and thereby improve their decision making. Multivariate analysisrefers to all statistical techniques that simultaneously analyze multiple measurements on individuals or objects under investigation. Thus, any simultaneous analysis of more than two variables can be loosely considered multivariate analysis. Many multivariate techniques are extensions of univariate analysis (analysis of single-variable distributions) and bivariate analysis (cross-classification, correlation, analysis of variance, and sim- ple regression used to analyze two variables). For example, simple regression (with one predictor variable) is extended in the multivariate case to include several predictor variables. Likewise, the single dependent variable found in analysis of variance is extended to include multiple dependent variables inmultivariate analysis of variance. Some multivariate techniques (e.g., multiple regression and multivariate analysis of variance) provide a means of performing in a single analysis what once took multiple univariate analyses to accomplish. Other multivariate techniques, however, are uniquely designed to deal with multivariate issues, such as factor analysis, which identifies the struc- ture underlying a set of variables, or discriminant analysis, which differentiates among groups based on a set of variables. Confusion sometimes arises about what multivariate analysis is because the term is not used consistently in the literature. Some researchers use multivariatesimply to mean examining relation- ships between or among more than two variables. Others use the term only for problems in which all the multiple variables are assumed to have a multivariate normal distribution. To be considered truly multivariate, however, all the variables must be random and interrelated in such ways that their different effects cannot meaningfully be interpreted separately. Some authors state that the purpose of multivariate analysis is to measure, explain, and predict the degree of relationship among variates (weighted combinations of variables). Thus, the multivariate character lies in the multiple variates (multiple combinations of variables), and not only in the number of variables or observations. For our present purposes, we do not insist on a rigid definition of multivariate analysis. Instead, multivariate analysis will include both multivariable techniques and truly multivariate techniques, because we believe that knowledge of multivariable techniques is an essential first step in understanding multivariate analysis. SOME BASIC CONCEPTS OF MULTIVARIATE ANALYSIS Although the roots of multivariate analysis lie in univariate and bivariate statistics, the extension to the multivariate domain introduces additional concepts and issues of particular relevance. These con- cepts range from the need for a conceptual understanding of the basic building block of multivariate analysis—the variate—to specific issues dealing with the types of measurement scales used and the statistical issues of significance testing and confidence levels. Each concept plays a significant role in the successful application of any multivariate technique. The Variate As previously mentioned, the building block of multivariate analysis is the variate,a linear combi- nation of variables with empirically determined weights. The variables are specified by the researcher, whereas the weights are determined by the multivariate technique to meet a specific objective. A variate of nweighted variables (X to X ) can be stated mathematically as: 1 n Variate value(cid:2)w X (cid:3)w X (cid:3)w X (cid:3)...(cid:3)w X 1 1 2 2 3 3 n n where X is the observed variable and w is the weight determined by the multivariate technique. n n 4 Overview of Multivariate Methods The result is a single value representing a combination of the entire setof variables that best achieves the objective of the specific multivariate analysis. In multiple regression, the variate is determined in a manner that maximizes the correlation between the multiple independent variables and the single dependent variable. In discriminant analysis, the variate is formed so as to create scores for each observation that maximally differentiates between groups of observations. In factor analysis, variates are formed to best represent the underlying structure or patterns of the variables as represented by their intercorrelations. In each instance, the variate captures the multivariate character of the analysis. Thus, in our discussion of each technique, the variate is the focal point of the analysis in many respects. We must understand not only its collective impact in meeting the technique’s objective but also each separate variable’s contribution to the overall variate effect. Measurement Scales Data analysis involves the identification and measurement of variation in a set of variables, either among themselves or between a dependent variable and one or more independent variables. The key word here is measurement because the researcher cannot identify variation unless it can be meas- ured. Measurement is important in accurately representing the concept of interest and is instrumen- tal in the selection of the appropriate multivariate method of analysis. Data can be classified into one of two categories—nonmetric (qualitative) and metric (quantitative)—based on the type of attributes or characteristics they represent. The researcher must define the measurement type—nonmetric or metric—for each variable. To the computer, the values are only numbers. As we will see in the following section, defining data as either metric or nonmetric has substantial impact on what the data can represent and how it can be analyzed. NONMETRIC MEASUREMENT SCALES Nonmetric data describe differences in type or kind by indicating the presence or absence of a characteristic or property. These properties are discrete in that by having a particular feature, all other features are excluded; for example, if a person is male, he cannot be female. An “amount” of gender is not possible, just the state of being male or female. Nonmetric measurements can be made with either a nominal or an ordinal scale. Nominal Scales. A nominal scale assigns numbers as a way to label or identify subjects or objects. The numbers assigned to the objects have no quantitative meaning beyond indicating the presence or absence of the attribute or characteristic under investigation. Therefore, nominal scales, also known as categorical scales, can only provide the number of occurrences in each class or category of the variable being studied. For example, in representing gender (male or female) the researcher might assign numbers to each category (e.g., 2 for females and 1 for males). With these values, however, we can only tabu- late the number of males and females; it is nonsensical to calculate an average value of gender. Nominal data only represent categories or classes and do not imply amounts of an attribute or characteristic. Commonly used examples of nominally scaled data include many demographic attrib- utes (e.g., individual’s sex, religion, occupation, or political party affiliation), many forms of behavior (e.g., voting behavior or purchase activity), or any other action that is discrete (happens or not). Ordinal Scales. Ordinal scales are the next “higher” level of measurement precision. In the case of ordinal scales, variables can be ordered or ranked in relation to the amount of the attribute possessed. Every subject or object can be compared with another in terms of a “greater than” or “less than” relationship. The numbers utilized in ordinal scales, however, are really nonquantitative because they indicate only relative positions in an ordered series. Ordinal scales provide no measure of the actual amount or magnitude in absolute terms, only the order of the values. The researcher knows the order, but not the amount of difference between the values. 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.