Table Of ContentMultivariate Statistics with R
Paul J. Hewson
March 17, 2009
Multivariate Statistics Chapter 0
(cid:13)cPaul Hewson ii
Contents
1 Multivariate data 1
1.1 The nature of multivariate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The role of multivariate investigations . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Summarisingmultivariatedata(presentingdataasamatrix,meanvectors,covariance
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Data display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Graphical and dynamic graphical methods . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Chernoff’s Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Scatterplots, pairwise scatterplots (draftsman plots) . . . . . . . . . . . . . . 5
1.4.3 Optional: 3d scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.4 Other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Animated exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Matrix manipulation 11
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Vector multiplication; the inner product . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Outer product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Vector length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iii
Multivariate Statistics Chapter 0
2.1.4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.5 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.6 Angle between vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Transposing matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Some special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Equality and addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Crossproduct matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Eigen values and eigen vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Extended Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Measures of distance 33
3.1 Mahalanobis Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Distributional properties of the Mahalanobis distance . . . . . . . . . . . . . 35
3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Distance between points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
(cid:13)cPaul Hewson iv
Multivariate Statistics Chapter 0
3.3.1 Quantitative variables - Interval scaled . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Distance between variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.3 Quantitative variables: Ratio Scaled . . . . . . . . . . . . . . . . . . . . . . 42
3.3.4 Dichotomous data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.5 Qualitative variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.6 Different variable types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Properties of proximity matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Cluster analysis 51
4.1 Introduction to agglomerative hierarchical cluster analysis . . . . . . . . . . . . . . . 54
4.1.1 Nearest neighbour / Single Linkage. . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 Furthest neighbour / Complete linkage . . . . . . . . . . . . . . . . . . . . . 55
4.1.3 Group average link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.4 Alternative methods for hierarchical cluster analysis . . . . . . . . . . . . . . 58
4.1.5 Problems with hierarchical cluster analysis . . . . . . . . . . . . . . . . . . . 59
4.1.6 Hierarchical clustering in R . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Cophenetic Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Divisive hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.1 Partitioning around medoids . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.2 Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 K-centroids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 Further information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
(cid:13)cPaul Hewson v
Multivariate Statistics Chapter 0
5 Multidimensional scaling 71
5.1 Metric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.1 Similarities with principal components analysis . . . . . . . . . . . . . . . . . 73
5.2 Visualising multivariate distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Assessing the quality of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.1 Sammon Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Multivariate normality 79
6.1 Expectations and moments of continuous random functions . . . . . . . . . . . . . . 79
6.3 Multivariate normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.5.1 R estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Inference for the mean 85
7.1 Two sample Hotelling’s T2 test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2 Constant Density Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.3 Multivariate Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8 Discriminant analysis 95
8.1 Fisher discimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2 Accuracy of discrimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.3 Importance of variables in discrimination . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4 Canonical discriminant functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.5 Linear discrimination - a worked example . . . . . . . . . . . . . . . . . . . . . . . . 100
9 Principal component analysis 101
(cid:13)cPaul Hewson vi
Multivariate Statistics Chapter 0
9.1 Derivation of Principal Components . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.1.1 A little geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.1.2 Principal Component Stability . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.2 Some properties of principal components . . . . . . . . . . . . . . . . . . . . . . . . 110
9.8 Illustration of Principal Components . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.8.1 An illustration with the Sydney Heptatholon data . . . . . . . . . . . . . . . 112
9.8.2 Principal component scoring . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.8.3 Prepackaged PCA function 1: princomp() . . . . . . . . . . . . . . . . . . 114
9.8.4 Inbuilt functions 2: prcomp() . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.9 Principal Components Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.10 “Model” criticism for principal components analysis . . . . . . . . . . . . . . . . . . 117
9.10.1 Distribution theory for the Eigenvalues and Eigenvectors of a covariance matrix118
9.13 Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.15.1 Partial sphericity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.22 How many components to retain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.22.1 Data analytic diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.23.1 Cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.23.2 Forward search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.23.3 Assessing multivariate normality . . . . . . . . . . . . . . . . . . . . . . . . 138
9.25 Interpreting the principal components . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.27 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10 Canonical Correlation 143
10.1 Canonical variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
(cid:13)cPaul Hewson vii
Multivariate Statistics Chapter 0
10.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.3 Computer example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10.3.1 Interpreting the canonical variables . . . . . . . . . . . . . . . . . . . . . . . 147
10.3.2 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11 Factor analysis 149
11.1 Role of factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.2 The factor analysis model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
11.2.1 Centred and standardised data . . . . . . . . . . . . . . . . . . . . . . . . . 152
11.2.2 Factor indeterminacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.2.3 Strategy for factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.3 Principal component extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.3.1 Diagnostics for the factor model . . . . . . . . . . . . . . . . . . . . . . . . 158
11.3.2 Principal Factor solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.4 Maximum likelihood solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
11.5 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11.6 Factor scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Bibliography 170
(cid:13)cPaul Hewson viii
Books
Many of the statistical analyses encountered to date consist of a single response variable and one
or more explanatory variables. In this latter case, multiple regression, we regressed a single response
(dependent)variableonanumberofexplanatory(independent)variables. Thisisoccasionallyreferred
toas“multivariateregression”whichisallratherunfortunate. Thereisn’tanentirelyclear“canon”of
whatisamultivariatetechniqueandwhatisn’t(onecouldarguethatdiscriminantanalysisinvolvesa
singledependentvariable). However, wearegoingtoconsiderthesimultaneousanalysisofanumber
of related variables. We may approach this in one of two ways. The first group of problems relates
to classification, where attention is focussed on individuals who are more alike. In unsupervised
classification (cluster analysis) we are concerned with a range of algorithms that at least try to
identifyindividualswhoaremorealikeifnottodistinguishcleargroupsofindividuals. Therearealso
awiderangeofscalingtechniqueswhichhelpusvisualisethesedifferencesinlowerdimensionality. In
supervised classification (discriminant analysis) we already have information on group membership,
and wish to develop rules from the data to classify future observations.
The other group of problems concerns inter-relationships between variables. Again, we may be
interestedinlowerdimensionthathelpusvisualiseagivendataset. Alternatively,wemaybeinterested
to see how one group of variables is correlated with another group of variables. Finally, we may be
interested in models for the interrelationships between variables.
This book is still a work in progress. Currently it contains material used as notes to support a
module at the University of Plymouth, where we work in conjunction with Johnson and Wichern
(1998). It covers a reasonably established range of multivariate techniques. There isn’t however a
clear “canon” of multivariate techniques, and some of the following books may also be of interest:
Other Introductory level books:
• Afifi and Clark (1990)
• Chatfield and Collins (1980)
• Dillon and Goldstein (1984)
• Everitt and Dunn (1991)
ix
Multivariate Statistics Chapter 0
• Flury and Riedwyl (1988)
• Johnson (1998)
• Kendall (1975)
• Hair et al. (1995)
• et al. (1998)
• Manly (1994)
Intermediate level books:
• Flury (1997) (My personal favourite)
• Gnanadesikan (1997)
• Harris (1985)
• Krzanowski (2000) ?Krzanowski and Marriott (1994b)
• Rencher (2002)
• Morrison (2005)
• Seber (1984)
• Timm (1975)
More advanced books:
• Anderson (1984)
• Bilodeau and Brenner (1999)
• Giri (2003)
• Mardia et al. (1979)
• Muirhead (York)
• Press (1982)
• Srivastava and Carter (1983)
(cid:13)cPaul Hewson x