ebook img

Methods for Statistical Data Analysis of Multivariate Observations, Second Edition PDF

370 Pages·1997·6.689 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Methods for Statistical Data Analysis of Multivariate Observations, Second Edition

Methods for Statistical Data Analysis of Multivariate Observations WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: Vic Barnett, Ralph A. Bradley, Nicholas I. Fisher, J. Stuart Hunter, J. B. Kadane, David G. Kendall, David W. Scott, Adrian F. M. Smith, JozefL. Teugels, Geoffrey S. Watson A complete list of the titles in this series appears at the end of this volume. Methods for Statistical Data Analysis of Multivariate Observations Second Edition R. GNANADESIKAN Rutgers University New Brunswick, New Jersey A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York · Chichester · Weinheim · Brisbane · Singapore · Toronto This text is printed on acid-free paper. Copyright © 1977 by Bell Telephone Laboratories, Inc. Copyright © 1997 by Bell Communications Research Inc. (Bellcore). Published by John Wiley & Sons, Inc. All rights reserved. Published simultaneously in Canada. Reproduction or translation of any part of this work beyond that permitted by Section 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012. Library of Congress Cataloging in Publication Data: Gnanadesikan, Ramanathan, 1932- Methods for statistical data analysis of multivariate observations. -- 2nd ed. / R. Gnanadesikan. p. cm. — (Wiley series in probability and statistics. Applied probability and statistics) "A Wiley-Interscience publication." Includes index. ISBN 0-471-16119-5 (cloth : alk. paper) 1. Multivariate analysis. I. Title. II. Series. QA278.G6 1997 96-30766 519.5'35—dc20 CIP 10 9 8 7 6 5 4 3 2 1 To the family of my childhood and the family of my parenthood .TO t j ^ ^ T: j £ STTf^r: ίΤτί^Τ: 5TrQ: j ffie manner of the Upanishads, I invoke the /π blessings of the Vedic Lord of Prayers upon you, my good reader, upon myself and upon my collaborators. I also take this opportunity to pay my obeisance to my ancestors and my predecessors in my field. Contents Preface to the Second Edition xi Preface to the First Edition xv 1. Introduction 1 2. Reduction of Dimensionality 5 2.1. General, 5 2.2. Linear Reduction Techniques, 7 2.2.1. Principal Components Analysis, 7 2.2.2. Factor Analysis, 15 2.3. Nonmetric Methods for Nonlinear Reduction of Dimensionality, 26 2.4. Nonlinear Singularities and Generalized Principal Components Analysis, 46 2.4.1. Nonlinear Singularities, 46 2.4.2. Generalized Principal Components Analysis, 51 References, 60 3. Development and Study of Multivariate Dependencies 62 3.1. General, 62 3.2. Internal Dependencies, 62 3.3. External Dependencies, 68 References, 80 4. Multidimensional Classification and Clustering 81 4.1. General, 81 4.2. Classification, 82 vii Vlll CONTENTS 4.2.1. Distance Measures, 89 4.2.2. Classification Strategies for Large Numbers of Groups, 95 4.2.3. Classification in the Presence of Possible Systematic Changes among Replications, 98 4.3. Clustering, 101 4.3.1. Inputs, 102 4.3.2. Clustering Algorithms, 106 4.3.2a. Hierarchical Clustering Procedures, 110 4.3.2b. Nonhierarchical Clustering Procedures, 121 4.3.3. Outputs, 124 References, 138 5. Assessment of Specific Aspects of Multivariate Statistical Models 139 5.1. General, 139 5.2. Estimation and Tests of Hypotheses, 139 5.2.1. Location and Regression Parameters, 140 5.2.2. Dispersion Parameters, 144 5.2.3. Robust Estimation of Location and Dispersion, 144 5.3. Data-Based Transformations, 164 5.4. Assessment of Distributional Properties, 177 5.4.1. Methods for Evaluating Similarity , of Marginal Distributions, 178 5.4.2. Methods for Assessing Normality, 187 5.4.3. Elliptical Distributions, 220 References, 226 6. Summarization and Exposure 227 6.1. General, 227 6.2. Study of an Unstructured Multiresponse Sample, 228 6.3. Comparison of Several Multiresponse Samples, 256 6.3.1. Graphical Internal Comparisons among Single-Degree-of-Freedom Contrast Vectors, 258 6.3.2. Graphical Internal Comparisons among Equal-Degree-of-Freedom Groupings, 279 6.4. Multidimensional Residuals and Methods for Detecting Multivariate Outliers, 292 6.4.1. Analysis of Multidimensional Residuals, 293 6.4.2. Other Methods for Detecting Multivariate Outliers, 305 References, 317 CONTENTS ix References 319 Appendix Software 332 Author Index 343 Subject Index 347 Preface to the Second Edition Since the publication of the first edition a number of developments have had major effects on the current state of the art in multiresponse data analysis. These include not only significant augmentation of the technology, such as enhanced computing power including numerics and graphics, but also major statistical methodological developments stimulated in part by real-world problems and needs. Diagnostic aids which tend to be mostly graphical, robust/resistant methods whose results are not sensitive to deviant behavior of real-world data, and cluster analysis techniques for pattern recognition, are just a few examples of such methodological developments over the past two decades. The scope, structure and applied emphasis of the first edition provide a natural setting for many of the new methods. The main objective of the second edition is to expand the coverage of methods while retaining the framework of the first edition. The recent decade has also seen a fundamental change in the paradigm of data analysis. While there are differences in the details depending on the specific problem at hand, there are some general features of the new paradigm that can be contrasted with the more classical approach. For example, the newer developments tend to cast solutions in terms of "fitting" functions that are not globally parametrized (such as planes fitted to points in p-dimensional space) but instead are more "locally" focused and then "pieced together" in some fashion (e.g., low-order splines, or even planes), with some form of trade-off between "accuracy" and "smoothness" involved in the fitting algorithms. The flexibility gained in the richness of the relationships that can be handled, including the ability to accommodate nonlinear ones, is often at the expense of iterative fitting of several local relationships, and the lack of succinct or parsimonious descriptions that are features of the more classical approaches to the same problems. Also, distributional models that play a role in statistical assessment and inferences in the more classical approaches, tend to be deemphasized in the new paradigm. The reliance is on more data-dependent and computer-intensive tools, such as resampling (e.g., jackknife, bootstrap, cross validation), to provide the basis for inferences and assessments of performance. xi xii PREFACE TO THE SECOND EDITION The methods based on the new paradigm have a great deal of appeal but yet, like most things in the context of the complexities of the real world, they are not a panacea. With a considerably broadened base of experience, and inevitable modifications and adaptations of them, the newer methods will eventually perhaps replace the classical techniques. However, for both peda- gogy and practice, the classical methods will probably be around for quite a while. Widely accessible software implementations of the classical techniques, as well as the comfort of the familiarity of their conceptual underpinnings, suggest that this will be so. For the immediate purposes of this second edition, it was tempting to incorporate all of the newer developments and integrate them with the more classical methods. However, although many of the methods developed since the first edition may adopt the classical paradigm rather than the new one, because of their number and the wide relevance of their conceptual underpinnings, a decision was made to include the details of just some of the newer methods that adopt the classical paradigm, and only briefly mention (with appropriate references) specific approaches which fall under the new paradigm. Among other things, this has enabled a more manageable increase in the volume of material to be included in the second edition. Currently, there are a few books available that are concerned with methods based on the new paradigm and addressed to specific topics of multivariate analysis. Hopefully, one or more of the people who have played a central role in the development of multivariate data analysis techniques with the new framework, will soon write a compre- hensive book on these methods. New material appears in virtually every chapter of this edition. However, there are heavier concentrations in some more than in others. A major expansion, reflecting the vigorous development of methods as well as applica- tions in the field of pattern recognition, is the material on cluster analysis. New sections, focused on issues of inputs to clustering algorithms and on the critical need for aids in interpreting the results of cluster analysis, have been added. Other new material in this edition pertains to useful summarization and exposure techniques in Chapter 6 that did not exist at the time of the first edition. For instance, descriptions have been added of new graphical methods for assessing the separations amongst the eigenvalues of a correlation matrix and for comparing sets of eigenvectors. Topics that have been enlarged on, largely due to the increased experience with some of the techniques that were relatively new at the time of publication of the first edition, include robust estimation and a class of distributional models that is slightly broader than the multivariate normal in Chapter 5. A new appendix on software, with particular reference to the functions available in two widely-used systems, S (or Splus) and SAS, is included for help with statistical computing aspects. In the light of the decision regarding the second edition, the intended audience for it is the same one identified in the preface to the first edition. In the years since the first edition was published, the author has been gratified to hear from many people in fields of application of multivariate statistical

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.