Statistics Texts in Statistical Science Introduction to Multivariate Analysis: Linear and Nonlinear Modeling shows how multivariate analysis is widely used for extracting useful I Introduction to information and patterns from multivariate data and for understanding the n structure of random phenomena. Along with the basic concepts of various t r procedures in traditional multivariate analysis, the book covers nonlinear o techniques for clarifying phenomena behind observed multivariate data. It d Multivariate primarily focuses on regression modeling, classification and discrimination, u dimension reduction, and clustering. c t The text thoroughly explains the concepts and derivations of the AIC, BIC, io Analysis and related criteria and includes a wide range of practical examples of n model selection and evaluation criteria. To estimate and evaluate models t with a large number of predictor variables, the author presents regularization o methods, including the L1 norm regularization that gives simultaneous Linear and Nonlinear Modeling M model estimation and variable selection. u Features l • Explains how to use linear and nonlinear multivariate techniques to t i extract information from data and understand random phenomena v • Includes a self-contained introduction to theoretical results a • Presents many examples and figures that facilitate a deep r i understanding of multivariate analysis techniques a • Covers regression, discriminant analysis, Bayesian classification, t e support vector machines, principal component analysis, and clustering A • Incorporates real data sets from engineering, pattern recognition, n medicine, and more a l For advanced undergraduate and graduate students in statistical science, y this text provides a systematic description of both traditional and newer s i techniques in multivariate analysis and machine learning. It also introduces s linear and nonlinear statistical modeling for researchers and practitioners in industrial and systems engineering, information science, life science, and other areas. K o n is h Sadanori Konishi K16322 i K16322_cover.indd 1 5/14/14 9:32 AM Introduction to Multivariate Analysis Linear and Nonlinear Modeling CHAPMAN & HALL/CRC Texts in Statistical Science Series Series Editors Francesca Dominici, Harvard School of Public Health, USA Julian J. Faraway, University of Bath, UK Martin Tanner, Northwestern University, USA Jim Zidek, University of British Columbia, Canada Statistical Theory: A Concise Introduction Introduction to Statistical Methods for F. Abramovich and Y. Ritov Clinical Trials T.D. Cook and D.L. DeMets Practical Multivariate Analysis, Fifth Edition A. Afifi, S. May, and V.A. Clark Applied Statistics: Principles and Examples Practical Statistics for Medical Research D.R. Cox and E.J. Snell D.G. Altman Multivariate Survival Analysis and Competing Interpreting Data: A First Course Risks in Statistics M. Crowder A.J.B. Anderson Statistical Analysis of Reliability Data Introduction to Probability with R M.J. Crowder, A.C. Kimber, K. Baclawski T.J. Sweeting, and R.L. Smith Linear Algebra and Matrix Analysis for An Introduction to Generalized Statistics Linear Models, Third Edition S. Banerjee and A. Roy A.J. Dobson and A.G. Barnett Statistical Methods for SPC and TQM Nonlinear Time Series: Theory, Methods, and D. Bissell Applications with R Examples R. Douc, E. Moulines, and D.S. Stoffer Bayesian Methods for Data Analysis, Third Edition Introduction to Optimization Methods and B.P. Carlin and T.A. Louis Their Applications in Statistics B.S. Everitt Second Edition R. Caulcutt Extending the Linear Model with R: Generalized Linear, Mixed Effects and The Analysis of Time Series: An Introduction, Nonparametric Regression Models Sixth Edition J.J. Faraway C. Chatfield A Course in Large Sample Theory Introduction to Multivariate Analysis T.S. Ferguson C. Chatfield and A.J. Collins Multivariate Statistics: A Practical Approach Problem Solving: A Statistician’s Guide, B. Flury and H. Riedwyl Second Edition C. Chatfield Readings in Decision Analysis S. French Statistics for Technology: A Course in Applied Statistics, Third Edition Markov Chain Monte Carlo: C. Chatfield Stochastic Simulation for Bayesian Inference, Second Edition Bayesian Ideas and Data Analysis: An D. Gamerman and H.F. Lopes Introduction for Scientists and Statisticians Bayesian Data Analysis, Third Edition R. Christensen, W. Johnson, A. Branscum, A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, and T.E. Hanson A. Vehtari, and D.B. Rubin Modelling Binary Data, Second Edition Multivariate Analysis of Variance and D. Collett Repeated Measures: A Practical Approach for Modelling Survival Data in Medical Research, Behavioural Scientists Second Edition D.J. Hand and C.C. Taylor D. Collett Practical Data Analysis for Designed Practical Stationary Stochastic Processes: Theory and Longitudinal Data Analysis Applications D.J. Hand and M. Crowder G. Lindgren Logistic Regression Models The BUGS Book: A Practical Introduction to J.M. Hilbe Bayesian Analysis D. Lunn, C. Jackson, N. Best, A. Thomas, and Richly Parameterized Linear Models: D. Spiegelhalter Additive, Time Series, and Spatial Models Using Random Effects Introduction to General and Generalized J.S. Hodges Linear Models H. Madsen and P. Thyregod Statistics for Epidemiology N.P. Jewell Time Series Analysis H. Madsen Stochastic Processes: An Introduction, Second Edition Pólya Urn Models P.W. Jones and P. Smith H. Mahmoud The Theory of Linear Models Randomization, Bootstrap and Monte Carlo B. Jørgensen Methods in Biology, Third Edition B.F.J. Manly Principles of Uncertainty J.B. Kadane Introduction to Randomized Controlled Clinical Trials, Second Edition Graphics for Statistics and Data Analysis with R J.N.S. Matthews K.J. Keen Statistical Methods in Agriculture and Mathematical Statistics Experimental Biology, Second Edition K. Knight R. Mead, R.N. Curnow, and A.M. Hasted Introduction to Multivariate Analysis: Statistics in Engineering: A Practical Approach Linear and Nonlinear Modeling A.V. Metcalfe S. Konishi Beyond ANOVA: Basics of Applied Statistics Nonparametric Methods in Statistics with SAS Applications R.G. Miller, Jr. O. Korosteleva A Primer on Linear Models Modeling and Analysis of Stochastic Systems, J.F. Monahan Second Edition Applied Stochastic Modelling, Second Edition V.G. Kulkarni B.J.T. Morgan Exercises and Solutions in Biostatistical Theory Elements of Simulation L.L. Kupper, B.H. Neelon, and S.M. O’Brien B.J.T. Morgan Exercises and Solutions in Statistical Theory Probability: Methods and Measurement L.L. Kupper, B.H. Neelon, and S.M. O’Brien A. O’Hagan Design and Analysis of Experiments with SAS Introduction to Statistical Limit Theory J. Lawson A.M. Polansky A Course in Categorical Data Analysis Applied Bayesian Forecasting and Time Series T. Leonard Analysis Statistics for Accountants A. Pole, M. West, and J. Harrison S. Letchford Statistics in Research and Development, Introduction to the Theory of Statistical Time Series: Modeling, Computation, and Inference Inference H. Liero and S. Zwanzig R. Prado and M. West Statistical Theory, Fourth Edition Introduction to Statistical Process Control B.W. Lindgren P. Qiu Sampling Methodologies with Applications Generalized Linear Mixed Models: P.S.R.S. Rao Modern Concepts, Methods and Applications W. W. Stroup A First Course in Linear Model Theory N. Ravishanker and D.K. Dey Survival Analysis Using S: Analysis of Time-to-Event Data Essential Statistics, Fourth Edition M. Tableman and J.S. Kim D.A.G. Rees Applied Categorical and Count Data Analysis Stochastic Modeling and Mathematical W. Tang, H. He, and X.M. Tu Statistics: A Text for Statisticians and Quantitative Elementary Applications of Probability Theory, F.J. Samaniego Second Edition H.C. Tuckwell Statistical Methods for Spatial Data Analysis O. Schabenberger and C.A. Gotway Introduction to Statistical Inference and Its Applications with R Large Sample Methods in Statistics M.W. Trosset P.K. Sen and J. da Motta Singer Understanding Advanced Statistical Methods Decision Analysis: A Bayesian Approach P.H. Westfall and K.S.S. Henning J.Q. Smith Statistical Process Control: Theory and Analysis of Failure and Survival Data Practice, Third Edition P. J. Smith G.B. Wetherill and D.W. Brown Applied Statistics: Handbook of GENSTAT Generalized Additive Models: Analyses An Introduction with R E.J. Snell and H. Simpson S. Wood Applied Nonparametric Statistical Methods, Epidemiology: Study Design and Fourth Edition Data Analysis, Third Edition P. Sprent and N.C. Smeeton M. Woodward Data Driven Statistical Methods Experiments P. Sprent B.S. Yandell Texts in Statistical Science Introduction to Multivariate Analysis Linear and Nonlinear Modeling Sadanori Konishi Chuo University Tokyo, Japan TAHENRYO KEISEKI NYUMON: SENKEI KARA HISENKEI E by Sadanori Konishi © 2010 by Sadanori Konishi Originally published in Japanese by Iwanami Shoten, Publishers, Tokyo, 2010. This English language edition pub- lished in 2014 by Chapman & Hall/CRC, Boca Raton, FL, U.S.A., by arrangement with the author c/o Iwanami Sho- ten, Publishers, Tokyo. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140508 International Standard Book Number-13: 978-1-4665-6729-0 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ- ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents ListofFigures xiii ListofTables xxi Preface xxiii 1 Introduction 1 1.1 RegressionModeling 1 1.1.1 RegressionModels 2 1.1.2 RiskModels 4 1.1.3 ModelEvaluationandSelection 5 1.2 ClassificationandDiscrimination 7 1.2.1 DiscriminantAnalysis 7 1.2.2 BayesianClassification 8 1.2.3 SupportVectorMachines 9 1.3 DimensionReduction 11 1.4 Clustering 11 1.4.1 HierarchicalClusteringMethods 12 1.4.2 NonhierarchicalClusteringMethods 12 2 LinearRegressionModels 15 2.1 RelationshipbetweenTwoVariables 15 2.1.1 DataandModeling 16 2.1.2 ModelEstimationbyLeastSquares 18 2.1.3 ModelEstimationbyMaximumLikelihood 19 2.2 RelationshipsInvolvingMultipleVariables 22 2.2.1 DataandModels 23 2.2.2 ModelEstimation 24 2.2.3 Notes 29 2.2.4 ModelSelection 31 2.2.5 GeometricInterpretation 34 2.3 Regularization 36 vii viii 2.3.1 RidgeRegression 37 2.3.2 Lasso 40 2.3.3 L NormRegularization 44 1 3 NonlinearRegressionModels 55 3.1 ModelingPhenomena 55 3.1.1 RealDataExamples 57 3.2 ModelingbyBasisFunctions 58 3.2.1 Splines 59 3.2.2 B-splines 63 3.2.3 RadialBasisFunctions 65 3.3 BasisExpansions 67 3.3.1 BasisFunctionExpansions 68 3.3.2 ModelEstimation 68 3.3.3 ModelEvaluationandSelection 72 3.4 Regularization 76 3.4.1 RegularizedLeastSquares 77 3.4.2 RegularizedMaximumLikelihoodMethod 79 3.4.3 ModelEvaluationandSelection 81 4 LogisticRegressionModels 87 4.1 RiskPredictionModels 87 4.1.1 ModelingforProportionalData 87 4.1.2 BinaryResponseData 91 4.2 MultipleRiskFactorModels 94 4.2.1 ModelEstimation 95 4.2.2 ModelEvaluationandSelection 98 4.3 NonlinearLogisticRegressionModels 98 4.3.1 ModelEstimation 100 4.3.2 ModelEvaluationandSelection 101 5 ModelEvaluationandSelection 105 5.1 CriteriaBasedonPredictionErrors 105 5.1.1 PredictionErrors 106 5.1.2 Cross-Validation 108 5.1.3 Mallows’C 110 p 5.2 InformationCriteria 112 5.2.1 Kullback-LeiblerInformation 113 5.2.2 InformationCriterionAIC 115 5.2.3 DerivationofInformationCriteria 121 5.2.4 MultimodelInference 127 ix 5.3 BayesianModelEvaluationCriterion 128 5.3.1 PosteriorProbabilityandBIC 128 5.3.2 DerivationoftheBIC 130 5.3.3 BayesianInferenceandModelAveraging 132 6 DiscriminantAnalysis 137 6.1 Fisher’sLinearDiscriminantAnalysis 137 6.1.1 BasicConcept 137 6.1.2 LinearDiscriminantFunction 141 6.1.3 Summary of Fisher’s Linear Discriminant Analysis 144 6.1.4 PriorProbabilityandLoss 146 6.2 ClassificationBasedonMahalanobisDistance 148 6.2.1 Two-ClassClassification 148 6.2.2 MulticlassClassification 149 6.2.3 Example:DiagnosisofDiabetes 151 6.3 VariableSelection 154 6.3.1 PredictionErrors 154 6.3.2 BootstrapEstimatesofPredictionErrors 156 6.3.3 The.632Estimator 158 6.3.4 Example:CalciumOxalateCrystals 160 6.3.5 StepwiseProcedures 162 6.4 CanonicalDiscriminantAnalysis 164 6.4.1 DimensionReductionbyCanonical DiscriminantAnalysis 164 7 BayesianClassification 173 7.1 Bayes’Theorem 173 7.2 ClassificationwithGaussianDistributions 175 7.2.1 ProbabilityDistributionsandLikelihood 175 7.2.2 DiscriminantFunctions 176 7.3 LogisticRegressionforClassification 179 7.3.1 LinearLogisticRegressionClassifier 179 7.3.2 NonlinearLogisticRegressionClassifier 183 7.3.3 MulticlassNonlinearLogisticRegression Classifier 187 8 SupportVectorMachines 193 8.1 SeparatingHyperplane 193 8.1.1 LinearSeparability 193 8.1.2 MarginMaximization 196
Description: