ebook img

Applied Multivariate Methods for Data Analysts PDF

584 Pages·1998·55.854 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Applied Multivariate Methods for Data Analysts

'I AppMluiletiMdve athrioadtes fDoarAta n alysts DalEl.aJ so hnson KansSatsaU tnei versity DuxburyP ress AnI mproifBn rto oksP/uCbolliseCh oimnpga ny ® I(I)P AnI nternaTthioomnsaPolun b lishing Company PaciGfirco v•eA lba•nB ye lmo•n Bto nn•B ost•oC ni ncin•n Daettir• oJ ioth anne•sL bounrdgo n Madr•i Mde lbou•rn Mee xiCciot •y NeYwo r•k P ar•iS si ngap•To orkey •oT orontWoa sh•i ngton SponsorinCga rEoCdlriyotnco kre:t t PermiEsdsiitoMonarsyC: l ark MarkeTteianMmga: r Pceyr mMaanr,g /a'raertk s CovDeers iRgonRy:. N euhaus EditAosrsiiaslKt iamnbteR:ra bl1y1 m TypeseStpteicntPgru:ub mlS ieshrevri ces ProduEcdtiitoNonar n:Vc eyl tlraus Cover PCroiunrtiienrg : ManusEcdriitpLotor r:rP eatltaag i PrinatniBdni gn dCionugr:i er COPYRIGH1T9©9b 8yB rooks/ColCeo mPpuabnlyi shing A DiviosfIi notne rnaTthioomnsPaoulnb liIsnhci.n g Id)P ThIeT lPo giaosr egisttreardeeudms aehrdek r uenidnle irc ense. DuxbPurreyas nstd h lee laofga ort er adeumsaehrdek rsue nidnle irc ense. Fomro rien formcaotnitoanc,t : BROOKS/CPOULBEL ISHING InternTahtoimosEnodanil t ores COMPANY Sene5c3a 51F1o rLeosdtRg oea d CoPlo.l anco PacGirfiocv CeA9, 3 950 115M6e0x iDc.Fo .,M, e xico USA InternTahtoimosPnouanbl l iGsmhbiHn g InternaTthioomnsPaoulnb liEsuhrionpge KonigswSitnrta4es1rs8ee r BerksHhoiur1se6e 8 -173 532B2o7n n HigHho lborn Germany LondWoCnl AVA 7 England International TAhsoimas on Publishing 22H1e ndeRrosaodn ThomNaesl sAouns tralia #05-H1e0n deBrusiolnd ing 10D2o dSdts reet Singa0p3o1r5e SouMtehl bou3rn2e0,5 VictAoursitar,a lia InternTahtoimosPnouanbl l iJsahpianng HirakaKwyaocwhao B3uFi lding, NelsCoann ada 2-2H-i1r akawacho 112B0i rchmount Road ChiyodTao-kky1uo0, 2 ScarboOrnotuagrhi,o Japan CanaMdlaK5 G4 Alrli grhetsse Nrovp eador.ftt hwiosr mka by er eprodsutcoierndaer , de trsiyesvtoaerlm , transcirnai nfbyoe rdom,rb ya nmye ans-elemcetcrhoapnnhiioccta,lo ,c orepcyoirndgi,n g, oro therwiset-hpweri itwohrroi utptte ernm iosfts hipeou nb liBsrhoeork,sP /uCbollies hing CompaPnayc,Gi rfiocv Cea,l ifornia 93950. Prinitnte hUden itSetdao tfAe mse rica. 10 9 8 7 6 5 4 3 2 1 LibroafCr oyn grCeastsa Joging-Diant-aP ublication JohnDsaolnEl,.a (,sd ate] Applmiueldt ivarifaotdrea tamane atlIhyD osadtlsEsl . a s Johnson. p.cm. Inclbuidbelsi ogrreafpehriecniacnled se xa.n d ISB0 -534-23796-7 1.MultivaanrailayItsT.eii st.l e. QA278.1J969185 5!9.'553-dc21 974-7133 CIP Contents 1. APPLIED MULTIVARIATE METHODS 1 1.1 An Overview of Multivariate Methods 1 2 Variable- and Individual-Directed Techniques Creating New Variables 2 Principal Components Analysis 3 Factor Analysis 3 Discriminant Analysis 4 Canonical Discriminant Analysis 5 Logistic Regression 5 Cluster Analysis 5 Multivariate Analysis of Variance 6 Canonical Variates Analysis 7 Canonical Correlation Analysis 7 Where to Find the Preceding Topics 7 1.2 Two Examples 8 Independence of Experimental Units 11 1.3 Types of Variables 11 u 1.4 Data Matrices and Vectors Variable Notation 13 Data Matrix 13 Data Vectors 13 Data Subscripts 14 1.5 The Multivariate Normal Distribution 15 Some Definitions 15 Summarizing Multivariate Distributions 16 Mean Vectors and Variance-Covariance Matrices 16 Correlations and Correlation Matrices 17 The Multivariate Normal Probability Density Function 19 Bivariate Normal Distributions 19 iii iv Contents 1.6S tatiCsotmipcuatli2 n2g Cautions About Computer Usage 22 Missing Values 22 Replacing Missing Values by Zeros 23 Replacing Missing Values by Averages 23 Removing Rows of the Data Matrix 23 Sampling Strategies 24 Data Entry Errors and Data Verification 24 1.7M ultivOaurtilait2ee5r s Locating Outliers 25 Dealing with Outliers 25 Outliers May Be Influential 26 1.8M ultivSaurmimaSattreay t is2t6i cs 1.9S tandaDradtaianz deZ/dS o cro re2s7 Exercises 28 2. SAMPLE CORRELATIONS 35 2.1S tatiTsetsaitncsCd ao ln fidIenntceer v3a5l s Are the Correlations Large Enough to Be Useful? 36 Confidence Intervals by the Chart Method 36 Confidence Intervals by Fisher's Approximation 38 Confidence Intervals by Ruben's Approximation 39 Variable Groupings Based on Correlations 40 Relationship to Factor Analysis 46 2.2S ummar4y6 Exercises 47 3. MULTIVARIATE DATA PLOTS 55 3.1T hree-DiDmaetnPasl ioot5ns5a l 3.2P lootfHs i ghDeirm ensDiaotnaa5 l9 Chernoff Faces 61 Star Plots and Sun-Ray Plots 63 Contents V AndrePwlso't 6s5 Side-bSyc-aStPitldeoert 6 s6 3.3 Plotting to Check for Multivariate Normality 67 Summar7y3 Exerci7s3e s 4. EIGENVALUES AND EIGENVECTORS 77 4.1 Trace and Determinant 77 Exampl7e8s 4.2 Eigenvalues 78 4.3 Eigenvectors 79 PosiDteifivneai ntPdeo siSteimvied eMfiantirtiec8 e0s 4.4 Geometric Descriptions (p = 2) 82 Vecto8r2s BivarNioartmDeai ls tribu8t3i ons 4.5 Geometric Descriptions (p = 3) 87 Vector8s7 TrivaNroiramDtaiels tribu8t7i ons 4.6 Geometric Descriptions (p > 3) 90 Summar9y1 Exerci9s1e s 5. PRINCIPAL COMPONENTS ANALYSIS 93 5.1 Reasons for Using Principal Components Analysis 93 DatSac reen9i3n g Cluster9i5n g DiscriAmnianlayns9ti5 s Regress9i5o n 5.2 Objectives of Principal Components Analysis 96 5.3 Principal Components Analysis on the Variance-Covariance Matrix I 96 PrinCcoimppaoln Secnotr e9s8 ComponLeonatdV iencgt or9s8 Vi Contents 5.4 Estimation of Principal Components 99 EstimoafPt riionnCc oimppaolSn ceonrte 9s9 5.5 Determining the Number of Principal Components 99 Meth1o d1 00 Meth2o d1 00 5.6 Caveats 107 5.7 PCA on the Correlation Matrix P 109 PrinCcoimppaolSn ceonrte 1s1 0 CompoCnoernrte Vleacttioo1rn1s 0 SampCloer reMlaattriio1xn1 0 DetermtihNneui mnbgoe fPr r inCcoimppaoln e1n1t0s 5.8 Testing for Independence of the Original Variables 111 5.9 Structural Relationships 111 5.10 Statistical Computing Packages 112 SARS Proced1u1r2e PRINCOMP PrinCcoimppaolnA ennatlUsys siiFnsag c Atnoarl ysis Progra1m1s8 PCAw itShP SS's Proced1u2r4e FACTOR Summar1y4 2 Exerci1s4e2s 6. FACTOR ANALYSIS 147 6.1 Objectives of Factor Analysis 147 6.2 Caveats 148 6.3 Some History of Factor Analysis 148 6.4 The Factor Analysis Model 150 Assumpt1i5o0n s MatFroirxom ft hFea cAtnoarl Myosdiesl1 51 DefiniotfFi aocnAtsno arl Tyesrimsi no1l5o1g y 6.5 Factor Analysis Equations 151 Nonuniqoufte hnFeea scst o1r5s2 6.6 Solving the Factor Analysis Equations 153 Contents VII 6.7 ChoostihnAegp propNruimabtoeefrF actor1s5 5 Subjective Criteria l56 Objective Criteria 156 6.8 CompuStoelru toifto hnFesa ctAonra lyEsqiusa tio1n5s7 Principal Factor Method on R 158 Principal Factor Method with Iteration 159 6.9 RotatFiancgt or1s7 0 Examples (m= 2) 171 Rotation Methods 172 The Varimax Rotation Method 173 6.10O bliRqoutea tMieotnh od1s7 4 6.11F actSocro re1s8 0 Bartlett's Method or the Weighted Least-Squares Method 181 Thompson's Method or the Regression Method 181 Ad Hoc Methods 181 Summary 212 Exercises 213 7.D ISCRIMAINNAALNYTS IS 217 7.1 DiscrimfionTraw toMi uolnt ivNaorrimaPatolep ulati2o1n7s A Likelihood Rule 218 The Linear Discriminant Function Rule 218 A Mahalanobis Distance Rule 218 A Posterior Probability Rule 218 Sample Discriminant Rules 219 Estimating Probabilities of Misclassification 220 Resubstitution Estimates 220 Estimates from Holdout Data 220 Cross-Validation Estimates 221 7.2 CosFtu nctainoPdnr si Porro bab(iTlwiPoto ipeusl ati2o2n9s ) 7.3 A GeneDriaslc rimRiunl(aeTn wtPo o pulati2o3n1s ) A Cost Function 232 Prior Probabilities 232 Viii Contents Average Cost of Misclassification 232 A Bayes Rule 233 Classification Functions 233 Unequal Covariance Matrices 233 Tricking Computing Packages 234 7.4 Discriminant Rules (More than Two Populations) 235 Basic Discrimination 238 7.5 Variable Selection Procedures 245 Forward Selection Procedure 245 Backward Elimination Procedure 246 Stepwise Selection Procedure 246 Recommendations 247 Caveats 247 7.6 Canonical Discriminant Functions 255 The First Canonical Function 256 A Second Canonical Function 257 Determining the Dimensionality of the Canonical Space 260 Discriminant Analysis with Categorical Predictor Variables 273 7.7 Nearest Neighbor Discriminant Analysis 275 7.8 Classification Trees 283 Summary 283 Exercises 283 8. LOGISTIC REGRESSION METHODS 287 8.1 Logistic Regression Model 287 8.2 The Logit Transformation 287 Model Fitting 288 8.3 Variable Selection Methods 296 8.4 Logistic Discriminant Analysis (More Than Two Populations) 301 Logistic Regression Models 301 Model Fitting 302 I Another SAS LOGISTIC Analysis 314 Exercises 316 ConteinXt s CLUSTER ANALYSIS 319 9. 9.1 Measures of Similarity and Dissimilarity 319 Ruler Distance 319 Standardized Ruler Distance 320 A Mahalanobis Distance 320 Dissimilarity Measures 320 9.2 Graphical Aids in Clustering 321 Scatter Plots 321 Using Principal Components 322 Andrews' Plots 322 Other Methods 322 9.3 Clustering Methods 322 Nonhierarchical Clustering Methods 323 Hierarchical Clustering 323 Nearest Neighbor Method 323 A Hierarchical Tree Diagram 325 Other Hierarchical Clustering Methods 326 Comparisons of Clustering Methods 327 Verification of Clustering Methods 327 How Many Clusters? 327 Beale's F-TyStpatiestic 328 A Pseudo Hotelling's T2T est 329 The Cubic Clustering Criterion 329 Clustering Order 334 Estimating the Number of Clusters 339 Principal Components Plots 348 Clustering with SPSS 355 SAS's Procedure 369 FASTCLUS 9.4 Multidimensional Scaling 385 Exercises 395 10. MEAN VECTORS AND VARIANCE-COVARIANCE MATRICES 397 10.1 Inference Procedures for Variance-Covariance Matrices 397 A Test for a Specific Variance-Covariance Matrix 398 A Test for Sphericity 400

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.