i Multivariate Statistical Modeling in Engineering and Management The book focuses on problem- solving for practitioners and model- building for academicians under multivariate situations. This book helps readers in understanding the issues, such as knowing vari- ability, extracting patterns, building relationships, and making objective decisions. A large number of multivariate statistical models are covered in the book. The readers will learn how a practical problem can be converted to a statistical problem and how the statistical solution can be interpreted as a practical solution. Key features: • Provides conceptual models/ approaches linking theories and practices in multivariate domain • Provides step- by- step procedure for estimating parameters of developed models • Provides blueprint for data- driven decision- making • Includes practical examples and case studies relevant for intended audiences The book will help everyone involved in data- driven problem- solving, modeling, and decision- making. iii Multivariate Statistical Modeling in Engineering and Management Jhareswar Maiti iv First edition published 2022 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487- 2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyri ght.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-7 50- 8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978- 1- 4665- 6436- 7 (hbk) ISBN: 978- 1- 032- 30018- 4 (pbk) ISBN: 978- 1- 003- 30306- 0 (ebk) DOI: 10.1201/ 9781003303060 Typeset in Times by Newgen Publishing UK v Dedication To my father, Late Prabodh Chandra Maiti vi vii Contents Foreword ..........................................................................................................................................xv Preface ............................................................................................................................................xvii Acknowledgments ...........................................................................................................................xxi Author ...........................................................................................................................................xxiii PART I Prerequisites ................................................................1 Chapter 1 Introduction ..................................................................................................................3 1.1 Data- driven Decision- making ............................................................................3 1.2 Variable and Data Types.....................................................................................5 1.2.1 Random Variable ..................................................................................5 1.2.2 Measurement Scale and Data Types .....................................................6 1.2.3 Data Sources .........................................................................................7 1.3 Models and Modeling ........................................................................................7 1.4 Statistical Approaches to Model- Building .......................................................10 1.4.1 Step 1: Define Problem ......................................................................11 1.4.2 Step 2: Develop Conceptual Model ....................................................12 1.4.3 Step 3: Design Study ..........................................................................14 1.4.4 Step 4: Collect Data ............................................................................15 1.4.5 Step 5: Examine Data .........................................................................16 1.4.6 Step 6: Select a Suitable Model ..........................................................18 1.4.7 Step 7: Estimate Parameters ...............................................................19 1.4.8 Step 8: Verify Model ..........................................................................19 1.4.9 Step 9: Validate Model .......................................................................19 1.4.10 Step 10: Interpret Results ...................................................................20 1.5 Multivariate Models .........................................................................................20 1.6 Illustrative Problems ........................................................................................20 1.7 Case Descriptions .............................................................................................26 1.7.1 Case 1: Job Stress Assessment among Employees in Coke Plant ..........................................................................................26 1.7.2 Case 2: Job Demand Analysis of Underground Coal Mine Workers .....................................................................................26 1.7.3 Case 3: Study of the Process and Quality Characteristics and Their Relationships in Worm Gear Manufacturing ............................27 1.7.4 Case 4: Study of the Process and Quality Characteristics in Cast Iron Melting Process ..................................................................27 1.7.5 Case 5: Employees Safety Practices in Mines ....................................27 1.7.6 Case 6: Modeling Causal Relationships of Job Risk Perception of EOT Crane Operators .....................................................................27 1.8 Aims of the Book .............................................................................................28 1.9 Organization of the Book .................................................................................29 Exercises .....................................................................................................................30 vii viii viii Contents Chapter 2 Basic Univariate Statistics ..........................................................................................33 2.1 Population and Parameter ................................................................................33 2.2 Defining Population: The Probability Distributions ........................................36 2.2.1 Univariate Normal Distribution ..........................................................37 2.3 Sample and Statistics .......................................................................................38 2.3.1 Measures of Central Tendency ...........................................................38 2.3.2 Measures of Dispersion ......................................................................41 2.4 Sampling Distribution ......................................................................................42 2.4.1 Standard Normal Distribution ............................................................43 2.4.2 Chi- square Distribution ......................................................................44 2.4.3 t- distribution .......................................................................................45 2.4.4 F- distribution ......................................................................................47 2.5 Central Limit Theorem.....................................................................................48 2.6 Estimation ........................................................................................................49 2.6.1 Confidence Interval for Single Population Mean (µ) .........................50 2.6.2 Confidence Interval for Single Population Variance (σ2) ...................54 2.6.3 Confidence Interval for the Difference Between Two- population Means .................................................................................................55 2.6.4 Confidence Interval for the Ratio of Two-population Variances ........59 2.7 Hypothesis testing ............................................................................................61 2.7.1 Hypothesis Testing Concerning Single Population Mean ..................61 2.7.2 Hypothesis Testing Concerning Single Population Variance .............64 2.7.3 Hypothesis Testing Concerning Equality of Two-population Means .................................................................................................66 2.7.4 Hypothesis Testing Concerning Equality of Two-population Variances ................................................................................................69 2.8 Learning Summary ...........................................................................................70 Exercises .....................................................................................................................71 Chapter 3 Basic Computations ....................................................................................................73 3.1 Matrix Algebra .................................................................................................73 3.1.1 Data as a Matrix .................................................................................73 3.1.2 Row and Column Vectors ...................................................................74 3.1.3 Orthogonal Vectors .............................................................................81 3.1.4 Linear Dependency of a Set of Vectors ..............................................86 3.1.5 The Gram- Schmidt Orthogonalization Process ..................................87 3.1.6 Projection of One Vector on Another .................................................90 3.1.7 Basic Matrices ....................................................................................91 3.1.8 Basic Matrix Operations .....................................................................93 3.1.9 Determinants ......................................................................................95 3.1.10 Rank of a Matrix.................................................................................98 3.1.11 Inverse of a Matrix .............................................................................98 3.1.12 Eigenvalues and Eigenvectors ..........................................................102 3.1.13 Spectral Decomposition ...................................................................109 3.1.14 Singular Value Decomposition (SVD) .............................................111 3.1.15 Positive Definite Matrices ................................................................113 3.2 Methods of Least Squares ..............................................................................115 3.2.1 Ordinary Least Squares (OLS) .........................................................116 3.2.2 Weighted Least Squares (WLS) .......................................................120 3.2.3 Iteratively Reweighted Least Squares (IRLS) ..................................122 ix Contents ix 3.2.4 Generalized Least Squares (GLS) ....................................................123 3.3 Maximum Likelihood Method .......................................................................124 3.3.1 Probability Function .........................................................................124 3.3.2 Likelihood Function .........................................................................125 3.3.3 Maximum Likelihood Estimation ....................................................127 3.4 Generation of Random Variable .....................................................................130 3.4.1 Generation of Univariate Normal Observations ...............................130 3.4.2 Generating Multivariate Normal Observations ................................133 3.5 Resampling Methods .....................................................................................133 3.5.1 Jackknife ...........................................................................................133 3.5.2 Bootstrap ..........................................................................................134 3.6 Learning Summary .........................................................................................134 Exercises ...................................................................................................................135 PART II Foundations of Multivariate Statistics .....................137 Chapter 4 Multivariate Descriptive Statistics ............................................................................139 4.1 Multivariate Observations ..............................................................................139 4.2 Mean Vectors ..................................................................................................141 4.3 Covariance Matrix ..........................................................................................143 4.4 Correlation Matrix .........................................................................................149 4.5 Types of Correlation.......................................................................................158 4.5.1 Correlation between Two Ordinal Variables .....................................159 4.5.2 Correlation between Two Nominal Variables ...................................163 4.5.3 Correlation between One Continuous and One Ordinal Variable ................................................................................166 4.5.4 Correlation between One Continuous and One Nominal Variable ..............................................................................168 4.5.5 Correlation between One Ordinal and Another Nominal Variable ..............................................................................170 4.6 Correlation with Dependence Structure .........................................................170 4.6.1 Part Correlation ................................................................................172 4.6.2 Partial Correlation ............................................................................174 4.7 Learning Summary .........................................................................................176 Exercises ...................................................................................................................176 Chapter 5 Multivariate Normal Distribution .............................................................................179 5.1 Statistical Distance .........................................................................................179 5.2 Bivariate Normal Density Function ...............................................................184 5.3 Multivariate Normal Density Function ..........................................................189 5.4 Constant Density Contours ............................................................................193 5.5 Properties of Multivariate Normal Density Function.....................................193 5.6 Assessing Multivariate Normality..................................................................198 5.6.1 Tests of Univariate Normality ..........................................................198 5.6.2 Tests of Multivariate Normality .......................................................199 5.6.3 Remedy to Violation of Multivariate Normality ..............................203 5.7 Learning Summary .........................................................................................204 Exercises ...................................................................................................................204 x x Contents Chapter 6 Multivariate Inferential Statistics .............................................................................207 6.1 Estimation of Parameters of Multivariate Normal Distribution .....................207 6.2 Sampling Distribution of x and S .................................................................209 6.3 Multivariate Central Limit Theorem ..............................................................209 6.4 Hotelling’s T2 Distribution .............................................................................211 6.5 Inference about Single Population Mean Vector ............................................213 6.5.1 Confidence Region ...........................................................................214 6.5.2 Simultaneous Confidence Intervals ..................................................220 6.5.3 Hypothesis Testing ...........................................................................228 6.6 Inference About Equality of Two- population Mean Vectors..........................232 6.6.1 Confidence Region ...........................................................................232 6.6.2 Simultaneous Confidence Intervals ..................................................241 6.6.3 Hypothesis Testing ...........................................................................246 6.7 Confidence Region and Hypothesis Testing for Covariance Matrix Σ ..........251 6.7.1 Confidence Region for Σ ..................................................................251 6.7.2 Hypothesis Testing for Σ ..................................................................251 6.8 Sampling from Non- normal Population .........................................................254 6.9 Learning Summary .........................................................................................255 Exercises ...................................................................................................................256 PART III Multivariate Models ..............................................257 Chapter 7 Multivariate Analysis of Variance ............................................................................259 7.1 Analysis of Variance (ANOVA) .....................................................................259 7.1.1 Conceptual Model ............................................................................259 7.1.2 Assumptions .....................................................................................262 7.1.3 Total Sum Squares Decomposition ..................................................265 7.1.4 Hypothesis Testing ...........................................................................268 7.1.5 Estimation of Parameters ..................................................................272 7.1.6 Model Adequacy Tests .....................................................................278 7.1.7 Interpretation of Results ...................................................................284 7.2 Multivariate Analysis of Variance (MANOVA) .............................................284 7.2.1 Conceptual Model ............................................................................285 7.2.2 Assumptions .....................................................................................288 7.2.3 Total Sum Squares and Cross Product (SSCP) Decomposition .......291 7.2.4 Hypothesis Testing ...........................................................................294 7.2.5 Estimation of Parameters ..................................................................299 7.2.6 Model Adequacy Tests .....................................................................303 7.2.7 Test of Assumptions .........................................................................304 7.3 Two- Way MANOVA ......................................................................................306 7.3.1 Two- Way ANOVA ............................................................................307 7.3.2 Two- Way MANOVA ........................................................................313 7.3.3 Hypothesis Testing ...........................................................................316 7.4 Case Study .....................................................................................................320 7.5 Learning Summary .........................................................................................320 Exercises ...................................................................................................................321