ebook img

The Geometry of Multivariate Statistics PDF

174 Pages·1994·9.743 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Geometry of Multivariate Statistics

The Geometry of Multivariate Statistics Thomas D. Wickens University of California, Los Angeles Vp Psychology Press A Taylor & Francis Group NEW YORK AND LONDON First published 1995 by Lawrence Erlbaum Associates, Inc. Published 2014 by Psychology Press 711 Third Avenue, New York, NY 10017 and by Psychology Press 27 Church Road, Hove, East Sussex, BN3 2FA Psychology Press is an imprint of the Taylor & Francis Group, an informa business Copyright © 1995 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Wickens, Thomas D., 1942- The Geometry of Multivariate Statistics / Thomas D. Wickens. p. cm. Includes index. 1. Multivariate analysis. 2. Vector Analysis. I. Title QA278.W53 1994 519.5’35-dc20 94-4654 CIP ISBN 13: 978-0-805-81656-3 (hbk) Publisher’s Note The publisher has gone to great lengths to ensure the quality of this reprint but points out that some imperfections in the original may be apparent. Contents 1 Variable space and subject space 1 2 Some vector geometry 9 2.1 Elementary operations on vectors.......................................... 9 2.2 Variables and vectors................................................................ 18 2.3 Vector spaces............................................................................ 21 2.4 Linear dependence and independence.................................... 24 2.5 Projection onto subspaces....................................................... 25 3 Bivariate regression 32 3.1 Selecting the regression vector................................................. 32 3.2 Measuring goodness of fit ....................................................... 35 3.3 Means and the regression intercept........................................ 37 3.4 The difference between two means........................................ 40 4 Multiple regression 44 4.1 The geometry of prediction.................................................... 44 4.2 Measuring goodness of fit ....................................................... 48 4.3 Interpreting a regression vector.............................................. 51 5 Configurations of regression vectors 58 5.1 Linearly dependent predictors................................................. 58 5.2 Nearly multicollinear predictors.............................................. 62 5.3 Orthogonal predictors.............................................................. 66 5.4 Suppressor variables................................................................. 69 6 Statistical tests 72 6.1 The effect space and the error space..................................... 72 6.2 The population regression model........................................... 76 6.3 Testing the regression effects ................................................. 78 6.4 Parameter restrictions.............................................................. 85 iii iv CONTENTS 7 Conditional relationships 90 7.1 Partial correlation.................. 90 7.2 Conditional effects in multiple regression............................... 94 7.3 Statistical tests of conditional effects..................................... 98 8 The analysis of variance 105 8.1 Representing group differences ..................... 105 8.2 Unequal sample sizes................................................................. Ill 8.3 Factorial designs....................................................................... 115 8.4 The analysis of covariance........................................................ 119 9 Principal-component analysis 127 9.1 Principal-component vectors..................................................... 127 9.2 Variable-space representation.................................................. 133 9.3 Simplifying the variables........................................................... 134 9.4 Factor analysis.......................................................................... 137 10 Canonical correlation 144 10.1 Angular relationships between spaces..................................... 144 10.2 The sequence of canonical triplets........................................... 148 10.3 Test statistics............................................................................. 151 10.4 The multivariate analysis of variance..................................... 155 Preface In simple terms, this little book is designed to help its reader think about multivariate statistics. I say “think” here because I have not written about how one programs the computer or calculates the test statistics. Instead I hope to help the reader understand in a broad and intuitive sense what the multivariate procedures do and how their results are interpreted. There are many ways to develop multivariate statistical theory. The traditional approach is algebraic. Sets of observations are represented by matrices, linear combinations are formed from these matrices by multiply­ ing them by coefficient matrices, and useful statistics are found by imposing various criteria of optimization on these combinations. Matrix algebra is the vehicle for these calculations. A second approach is computational. Many users of multivariate statistics find that they do not need to know the mathematical basis of the techniques as long as they can transform data into results. The computation can be done by a package of computer programs that somebody else has written. An approach to multivariate statistics from this perspective emphasizes how the computer packages are used, and is usually coupled with rules that allow one to extract the most important numbers from the output and interpret them. Useful as both approaches are, particularly when combined, they over­ look an important aspect of multivariate analysis. To apply it correctly, one needs a way to conceptualize the multivariate relationships among the vari­ ables. To some extent, the equations help. A linear combination explicitly defines a new variable, and a correlation matrix accurately expresses the pattern of association among the members of a set of variables. However, I have never found these descriptions sufficient, either for myself or when teaching others. Problems that involve many variables require a deeper understanding than is typically provided by the formal equations or the computer programs. Although knowing the algebra is helpful and a pow­ erful computer program is almost essential, neither is sufficient without a good way to picture the variables. Fortunately, a tool to develop this understanding is available. Multi­ vi PREFACE variate statistical theory is fundamentally an application of the theory of linear algebra, and linear algebra has a strong geometric flavor. This spa­ tial interpretation carries over to multivariate statistics and gives a concrete and pictorial form to multivariate relationships. The geometry lets one de­ scribe, more or less easily, the complex pattern of relationships among a set of variables. It gives a metaphor for the way that variables are com­ bined. With a bit of practice, one develops an intuitive feel for how the multivariate methods work. However, rather unfortunately, I believe, this approach is ignored in the conventional treatment of multivariate statistics. To be sure, geometric references appear as asides in many texts and the metaphor motivates the terminology in several places—the use of the word “orthogonal” to mean “uncorrelated” is an example. However, except in a few domains, such as factor analysis (and even there not consistently), the understanding that a geometric representation gives is not exploited. This book presents most important procedures of multivariate statistics geometrically. I have tried to develop the theory entirely this way. Even when computational equations are presented, they derive from the geome­ try instead of the algebra. I hope that this emphasis will give the reader a coherent picture into which all the multivariate techniques fit. In the interests of presenting a unified approach and to keep this book short, I have not covered either the algebraic basis of the methods or the computer tools that are available to carry it out. This omission does not indicate that I think that either algebra or computation is unimportant. I have concentrated on one aspect of multivariate statistics and have left the more mechanical parts to other sources. I expect that the book will often be used in tandem with either an algebraic or a computational study of the techniques, whichever is more compelling to a reader’s needs and tastes. In this spirit, the book is an adjunct to, but not a substitute for, a more conventional treatment. One feature of this book may seem curious. I do not refer to other books and readings. Two classes of references might have been expected. The first group contains references to other geometric treatments of mul­ tivariate statistics. This work is widely spread throughout many sources, but I have found none that directly follows on from this book. Geometric ideas pervade most treatments of linear algebra and some treatments of multivariate statistics, but are often given implicitly. The second missing class of citations is to multivariate statistics in nongeometric form. There are many such books, written at many different levels. References to all of them would be excessive, and a reference to one or two would be both arbitrary and restrictive. Moreover, I have noticed that, particularly with technical matter, the book that is the most understandable is the book one has used before. To send readers to a new source is often more confusing PREFACE vii than helpful. Since my goal here is to present a way of thinking, and since I expect it to be used in combination with other approaches, I have not pinned things down tightly. A reader wishing to pursue these ideas can do so, with some thought, in any of the dozens of multivariate texts or hundreds of texts on linear algebra. The best start is to consult whichever of these books is familiar. Finally, I want to acknowledge the many friends, colleagues, and stu­ dents (in all permutations and combinations) who have read drafts of sec­ tions of this book. I have drawn freely on their comments and suggestions, although, perhaps foolishly, have not followed them all. I am no less grate­ ful for their efforts if my memory is too poor and this preface is too short to list them all individually. This page intentionally left blank Chapter 1 Variable space and subject space Multivariate statistics concerns the analysis of data in which several vari­ ables are measured on each of a series of individuals or subjects. The goal of the analysis is to examine the interrelationships among the variables: how they vary together or separately and what structure underlies them. These relationships are typically quite complex, and their study is made easier if one has a way to represent them graphically or pictorially. There are two complementary graphical representations, each of which contributes different insights. This chapter describes these two ways to view a set of multivariate data. Any description of multivariate data starts with a representation of the observations and the variables. Consider an example. Suppose that one has ten observations of two variables, X and Y, as shown in Figure 1.1. For the zth subject, denote the scores by X{ and Y{. Summary statistics for these data give the two means as A - 4.00 and Y = 12.00, their standard deviations as <sx ~ 2.06 and sy = 4.55, and their correlation as 0.904. The first way to picture these data is as a scatterplot. One variable, here X, is assigned to the horizontal axis and the other variable, here Y, is assigned to the vertical axis. Each subject’s scores are plotted as a point; thus, the first subject is plotted at the point (1, 4), the second subject at

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.