Table Of Content'I AppMluiletiMdve athrioadtes
fDoarAta n alysts
DalEl.aJ so hnson
KansSatsaU tnei versity
DuxburyP ress
AnI mproifBn rto oksP/uCbolliseCh oimnpga ny
®
I(I)P
AnI nternaTthioomnsaPolun b lishing Company
PaciGfirco v•eA lba•nB ye lmo•n Bto nn•B ost•oC ni ncin•n Daettir• oJ ioth anne•sL bounrdgo n
Madr•i Mde lbou•rn Mee xiCciot •y NeYwo r•k P ar•iS si ngap•To orkey •oT orontWoa sh•i ngton
SponsorinCga rEoCdlriyotnco kre:t t PermiEsdsiitoMonarsyC: l ark
MarkeTteianMmga: r Pceyr mMaanr,g /a'raertk s CovDeers iRgonRy:. N euhaus
EditAosrsiiaslKt iamnbteR:ra bl1y1 m TypeseStpteicntPgru:ub mlS ieshrevri ces
ProduEcdtiitoNonar n:Vc eyl tlraus Cover PCroiunrtiienrg :
ManusEcdriitpLotor r:rP eatltaag i PrinatniBdni gn dCionugr:i er
COPYRIGH1T9©9b 8yB rooks/ColCeo mPpuabnlyi shing
A DiviosfIi notne rnaTthioomnsPaoulnb liIsnhci.n g
Id)P ThIeT lPo giaosr egisttreardeeudms aehrdek r uenidnle irc ense.
DuxbPurreyas nstd h lee laofga ort er adeumsaehrdek rsue nidnle irc ense.
Fomro rien formcaotnitoanc,t :
BROOKS/CPOULBEL ISHING InternTahtoimosEnodanil t ores
COMPANY Sene5c3a
51F1o rLeosdtRg oea d CoPlo.l anco
PacGirfiocv CeA9, 3 950 115M6e0x iDc.Fo .,M, e xico
USA
InternTahtoimosPnouanbl l iGsmhbiHn g
InternaTthioomnsPaoulnb liEsuhrionpge KonigswSitnrta4es1rs8ee r
BerksHhoiur1se6e 8 -173 532B2o7n n
HigHho lborn Germany
LondWoCnl AVA 7
England International TAhsoimas on Publishing
22H1e ndeRrosaodn
ThomNaesl sAouns tralia #05-H1e0n deBrusiolnd ing
10D2o dSdts reet Singa0p3o1r5e
SouMtehl bou3rn2e0,5
VictAoursitar,a lia InternTahtoimosPnouanbl l iJsahpianng
HirakaKwyaocwhao B3uFi lding,
NelsCoann ada 2-2H-i1r akawacho
112B0i rchmount Road ChiyodTao-kky1uo0, 2
ScarboOrnotuagrhi,o Japan
CanaMdlaK5 G4
Alrli grhetsse Nrovp eador.ftt hwiosr mka by er eprodsutcoierndaer , de trsiyesvtoaerlm ,
transcirnai nfbyoe rdom,rb ya nmye ans-elemcetcrhoapnnhiioccta,lo ,c orepcyoirndgi,n g,
oro therwiset-hpweri itwohrroi utptte ernm iosfts hipeou nb liBsrhoeork,sP /uCbollies hing
CompaPnayc,Gi rfiocv Cea,l ifornia 93950.
Prinitnte hUden itSetdao tfAe mse rica.
10 9 8 7 6 5 4 3 2 1
LibroafCr oyn grCeastsa Joging-Diant-aP ublication
JohnDsaolnEl,.a (,sd ate]
Applmiueldt ivarifaotdrea tamane atlIhyD osadtlsEsl . a s
Johnson.
p.cm.
Inclbuidbelsi ogrreafpehriecniacnled se xa.n d
ISB0 -534-23796-7
1.MultivaanrailayItsT.eii st.l e.
QA278.1J969185
5!9.'553-dc21 974-7133
CIP
Contents
1. APPLIED MULTIVARIATE METHODS
1
1.1 An Overview of Multivariate Methods 1
2
Variable- and Individual-Directed Techniques
Creating New Variables 2
Principal Components Analysis 3
Factor Analysis 3
Discriminant Analysis 4
Canonical Discriminant Analysis 5
Logistic Regression 5
Cluster Analysis 5
Multivariate Analysis of Variance 6
Canonical Variates Analysis 7
Canonical Correlation Analysis 7
Where to Find the Preceding Topics 7
1.2 Two Examples 8
Independence of Experimental Units 11
1.3 Types of Variables 11
u
1.4 Data Matrices and Vectors
Variable Notation 13
Data Matrix 13
Data Vectors 13
Data Subscripts 14
1.5 The Multivariate Normal Distribution 15
Some Definitions 15
Summarizing Multivariate Distributions 16
Mean Vectors and Variance-Covariance Matrices 16
Correlations and Correlation Matrices 17
The Multivariate Normal Probability Density Function 19
Bivariate Normal Distributions 19
iii
iv Contents
1.6S tatiCsotmipcuatli2 n2g
Cautions About Computer Usage 22
Missing Values 22
Replacing Missing Values by Zeros 23
Replacing Missing Values by Averages 23
Removing Rows of the Data Matrix 23
Sampling Strategies 24
Data Entry Errors and Data Verification 24
1.7M ultivOaurtilait2ee5r s
Locating Outliers
25
Dealing with Outliers
25
Outliers May Be Influential
26
1.8M ultivSaurmimaSattreay t is2t6i cs
1.9S tandaDradtaianz deZ/dS o cro re2s7
Exercises
28
2. SAMPLE CORRELATIONS 35
2.1S tatiTsetsaitncsCd ao ln fidIenntceer v3a5l s
Are the Correlations Large Enough to Be Useful? 36
Confidence Intervals by the Chart Method 36
Confidence Intervals by Fisher's Approximation 38
Confidence Intervals by Ruben's Approximation 39
Variable Groupings Based on Correlations 40
Relationship to Factor Analysis 46
2.2S ummar4y6
Exercises 47
3. MULTIVARIATE DATA PLOTS 55
3.1T hree-DiDmaetnPasl ioot5ns5a l
3.2P lootfHs i ghDeirm ensDiaotnaa5 l9
Chernoff Faces 61
Star Plots and Sun-Ray Plots 63
Contents V
AndrePwlso't 6s5
Side-bSyc-aStPitldeoert 6 s6
3.3 Plotting to Check for Multivariate Normality 67
Summar7y3
Exerci7s3e s
4. EIGENVALUES AND EIGENVECTORS 77
4.1 Trace and Determinant 77
Exampl7e8s
4.2 Eigenvalues 78
4.3 Eigenvectors 79
PosiDteifivneai ntPdeo siSteimvied eMfiantirtiec8 e0s
4.4 Geometric Descriptions (p = 2) 82
Vecto8r2s
BivarNioartmDeai ls tribu8t3i ons
4.5 Geometric Descriptions (p = 3) 87
Vector8s7
TrivaNroiramDtaiels tribu8t7i ons
4.6 Geometric Descriptions (p > 3)
90
Summar9y1
Exerci9s1e s
5. PRINCIPAL COMPONENTS ANALYSIS 93
5.1 Reasons for Using Principal Components Analysis 93
DatSac reen9i3n g
Cluster9i5n g
DiscriAmnianlayns9ti5 s
Regress9i5o n
5.2 Objectives of Principal Components Analysis 96
5.3 Principal Components Analysis on the Variance-Covariance
Matrix I 96
PrinCcoimppaoln Secnotr e9s8
ComponLeonatdV iencgt or9s8
Vi Contents
5.4 Estimation of Principal Components 99
EstimoafPt riionnCc oimppaolSn ceonrte 9s9
5.5 Determining the Number of Principal Components 99
Meth1o d1 00
Meth2o d1 00
5.6 Caveats 107
5.7 PCA on the Correlation Matrix P 109
PrinCcoimppaolSn ceonrte 1s1 0
CompoCnoernrte Vleacttioo1rn1s 0
SampCloer reMlaattriio1xn1 0
DetermtihNneui mnbgoe fPr r inCcoimppaoln e1n1t0s
5.8 Testing for Independence of the Original Variables 111
5.9 Structural Relationships 111
5.10 Statistical Computing Packages 112
SARS Proced1u1r2e
PRINCOMP
PrinCcoimppaolnA ennatlUsys siiFnsag c Atnoarl ysis
Progra1m1s8
PCAw itShP SS's Proced1u2r4e
FACTOR
Summar1y4 2
Exerci1s4e2s
6. FACTOR ANALYSIS 147
6.1 Objectives of Factor Analysis 147
6.2 Caveats 148
6.3 Some History of Factor Analysis 148
6.4 The Factor Analysis Model 150
Assumpt1i5o0n s
MatFroirxom ft hFea cAtnoarl Myosdiesl1 51
DefiniotfFi aocnAtsno arl Tyesrimsi no1l5o1g y
6.5 Factor Analysis Equations 151
Nonuniqoufte hnFeea scst o1r5s2
6.6 Solving the Factor Analysis Equations 153
Contents VII
6.7 ChoostihnAegp propNruimabtoeefrF actor1s5 5
Subjective Criteria l56
Objective Criteria 156
6.8 CompuStoelru toifto hnFesa ctAonra lyEsqiusa tio1n5s7
Principal Factor Method on R 158
Principal Factor Method with Iteration 159
6.9 RotatFiancgt or1s7 0
Examples (m= 2) 171
Rotation Methods 172
The Varimax Rotation Method 173
6.10O bliRqoutea tMieotnh od1s7 4
6.11F actSocro re1s8 0
Bartlett's Method or the Weighted Least-Squares Method 181
Thompson's Method or the Regression Method 181
Ad Hoc Methods 181
Summary 212
Exercises 213
7.D ISCRIMAINNAALNYTS IS 217
7.1 DiscrimfionTraw toMi uolnt ivNaorrimaPatolep ulati2o1n7s
A Likelihood Rule 218
The Linear Discriminant Function Rule 218
A Mahalanobis Distance Rule 218
A Posterior Probability Rule 218
Sample Discriminant Rules 219
Estimating Probabilities of Misclassification 220
Resubstitution Estimates 220
Estimates from Holdout Data 220
Cross-Validation Estimates 221
7.2 CosFtu nctainoPdnr si Porro bab(iTlwiPoto ipeusl ati2o2n9s )
7.3 A GeneDriaslc rimRiunl(aeTn wtPo o pulati2o3n1s )
A Cost Function 232
Prior Probabilities 232
Viii Contents
Average Cost of Misclassification 232
A Bayes Rule 233
Classification Functions 233
Unequal Covariance Matrices 233
Tricking Computing Packages 234
7.4 Discriminant Rules (More than Two Populations) 235
Basic Discrimination 238
7.5 Variable Selection Procedures 245
Forward Selection Procedure 245
Backward Elimination Procedure 246
Stepwise Selection Procedure 246
Recommendations 247
Caveats 247
7.6 Canonical Discriminant Functions 255
The First Canonical Function 256
A Second Canonical Function 257
Determining the Dimensionality of the Canonical Space 260
Discriminant Analysis with Categorical Predictor Variables 273
7.7 Nearest Neighbor Discriminant Analysis 275
7.8 Classification Trees 283
Summary 283
Exercises 283
8. LOGISTIC REGRESSION METHODS
287
8.1 Logistic Regression Model 287
8.2 The Logit Transformation 287
Model Fitting 288
8.3 Variable Selection Methods 296
8.4 Logistic Discriminant Analysis (More Than Two
Populations) 301
Logistic Regression Models 301
Model Fitting 302
I
Another SAS LOGISTIC Analysis 314
Exercises 316
ConteinXt s
CLUSTER ANALYSIS 319
9.
9.1 Measures of Similarity and Dissimilarity 319
Ruler Distance 319
Standardized Ruler Distance 320
A Mahalanobis Distance 320
Dissimilarity Measures 320
9.2 Graphical Aids in Clustering
321
Scatter Plots 321
Using Principal Components 322
Andrews' Plots 322
Other Methods 322
9.3 Clustering Methods
322
Nonhierarchical Clustering Methods 323
Hierarchical Clustering 323
Nearest Neighbor Method 323
A Hierarchical Tree Diagram 325
Other Hierarchical Clustering Methods 326
Comparisons of Clustering Methods 327
Verification of Clustering Methods 327
How Many Clusters? 327
Beale's F-TyStpatiestic 328
A Pseudo Hotelling's T2T est 329
The Cubic Clustering Criterion 329
Clustering Order 334
Estimating the Number of Clusters 339
Principal Components Plots 348
Clustering with SPSS 355
SAS's Procedure 369
FASTCLUS
9.4 Multidimensional Scaling
385
Exercises 395
10. MEAN VECTORS AND VARIANCE-COVARIANCE MATRICES 397
10.1 Inference Procedures for Variance-Covariance Matrices 397
A Test for a Specific Variance-Covariance Matrix 398
A Test for Sphericity 400