ebook img

A survey of multivariate methods for systematics PDF

257 Pages·1980·104.797 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A survey of multivariate methods for systematics

A S U R V E Y M U L T I V A R I A T E M E T H O D S F O R S Y S T E M A T I C S Nancy A. Neff Leslie F. Marcus for a workshop: Numerical Methods in Systematic Mammalogy American S ociety of Mammalogists Annual Meeting 9 June 1980 .mi 1869 THE LIBRARY A S U R V E Y 0 F M U L T I V A R I A T E M E T H O D S F O R S Y S T E M A T I C S Nancy A. Neff American Museum of Natural History, and City College of the City University of New Yor k Leslie F. Marcus American Museum o f Natural History, and Queens College of the City University of New Yor k printed at the American Museum o f Natural History, New York supported by National Science Foundation Research Grant DEB 79-17382 copyright @ 1980 by Nancy A. Neff and Leslie F. Marcus All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the copyright owners. Nancy A. Neff Leslie F# Marcus Dept. Vertebrate Paleontology Dept. Invertebrates American Museum of Natural History Central Park West at 79th Street New York, New York 10024 USA This publication may be cited: Neff, N. A. and L. F. Marcus. 1980. A Survey of Multivariate Methods for Systematics. New York: privately published This manual was prepared for distribution at a workshop on Numerical Methods in Systematic Mammalogy, held on 9 June 1980 during the 1980 Annual Meeting of the American Society of Mammalogists in Kingston Rhode Island. If additional copies are still available, they may be obtained by sending: 1. a self-addressed mailing label, and 2. a check or money order for $1.00 made out to the American Museum of Natural History, for postage and handling, to one of the authors at the above address. T A B L E O F C O N T E N T S Preface v Acknowledgments vii Table of Abbreviations viii Table of Symbols ix I N T R O D U C T I O N 1 Background for Use of this Manual 9 Design of Study Assumptions 15 1) Independence and random sampling 21 2) Multivariate normality 22 3) Central Limit Theorem 24 4) Transformations 27 5) Homogeneity of variability 28 6) Robustness of estimators and tests 29 7) Validity of estimates and jackknife estimators 31 8) Randomization tests 34 Sample size 36 Missing Data 39 Implementation Errors and accuracy ^2 Iteration ^ Interpretation and publication 45 M E T H O D S 4 7 Principal Components Analysis 51 Principal Coordinates Analysis 70 Other Principal Components Related Methods Correspondence Analysis 81 Biplot 83 Nonmetric Multidimensional Scaling 86 Factor Analysis 94 Andrews Plots 115 Multiple Regression 118 iii NEFF and MARCUS, 1980 Canonical Correlation Analysis 132 Multidimensional Contingency Tables 141 Discriminant Analysis 145 Multivariate Analysis of Variance 152 Canonical Variates Analysis 157 Discrimination 164 Summary of terminology and recommendations 171 P U R P O S E S Data Screening 177 Data Reduction 181 Exploration: Looking for Structure 187 Cluster Analysis, Numerical Cladistics, and Tree Analysis 193 Size and Shape 199 Biorthogonal transformation grids 205 Fourier analysis 207 Statistical Inference 209 B I B L I O G R A P H Y 2 1 3 Appendix I: Publication of the Results of Multivariate Studies —by Michael A. Bogan 237 Appendix II: Statistical Packages and Computer Programs, —with David J. Schmidly and Mark D. Engstrom 239 iv P R E F A C E That the biological world is multivariate has been widely recognized. The importance of quantifying one's observations for analysis and for communication has also been recognized. These facts, plus widespread computer availability has led to the near exponential growth in recent years in the applications of multivariate analyses in systernatics and other areas of biology. Multivariate analyses are numerical techniques used to study and describe the covariation between variables, individuals, or both together. In systernatics, the variables are usually characters measured on a set of individuals. Tne g oals are varied: some possible purposes include data or dimension reduction, searching for structure, testing the fit of one's data to a model, or discriminating between populations. The strength of multivariate analysis lies in the ability it confers on the user to examine many variates simultaneously and quantitatively. Multivariate methods have not yet fulfilled all expectations, and will not without further development of the statistical methodology. Demands by the practitioners for more relevant methodology will encourage the statisticians. Also there is much for most of us to learn about the potential of the methods already developed. All of the questions have not been asked and all informative structures and models have not yet been explored. Our manual, and the American Society of Mammalogists workshop which it accompanied, was c onceived in response to the growth and interest in this methodology. We trust that this initial effort will shortly be made o bsolete by more complete discussions reflecting the results of continued growth and exploration. Although the increase in use of numerical methods is sometimes cited as evidence of increasing objectivity in systernatics, the use of measurements for data, or mathematical models for hypotheses, does not obviate a continuing need for close observation of the specimens themselves and clear thinking about the biological hypotheses being investigated. Numerical methods are tools only—powerful tools, yes, but only means to an end. We have written this manual in the hope that concentration on how n umerical methods can be used in systernatics will help the system3tists be demanding of the methodology and search for the numerical methods to fit the biological questions, and never vice versa. We believe that numerical methods, even in their present state of development, have a great potential for usefulness in systernatics. The rapid increase in the employment of multivariate studies indicate that others are of this opinion also. As interaction between systematists and statisticians increases, we expect even more useful design of methods, permitting closer tailoring to the biological structures and relationships we study, and fitting more e xactly the models and hypotheses which systematists wish to test. v NEFF and MARCUS, 1980 Just as drawing a specimen usually increases the amount one sees on it, so defining and taking measurements can often be instructive because of the time spent thinking hard about and closely looking at the specimens themselves Unexpected relationships among values or 0 individuals should lead one back to the specimens for a fresh look. In sum, numerical methods are certainly not an alternative to looking at the specimens themselves! This manual is not about numerical taxonomy. We ha ve discussed methods of multivariate analyses with respect to a diversity of goals in systematics. We h ave written from a fairly strongly held philosophical position about how s cientific studies are most effectively done, but not with a narrow definition of appropriate goals or topics of study. Numerical taxonomy, in our use of the term, includes the philosophical positions well discussed in Sneath and Sokal [1973]; in contrast, we wished to survey methodology in a neutral fashion, divorced from a specific choice of goals. Also, because of our interest in morphology, we have emphasized ordination and descriptive techniques, rather than clustering techniques which are well covered by s everal books to date. One final point about our approach may be made e xplicit: what was included in this survey, and the nature of the discussion, was largely motivated by initially strong feelings on the part of at least one of us about how numerical methods are most wisely used in systematics. The research and writing have not substantially changed the major tenets of our philosophy, although some specific details of belief have certainly changed. However, we have come to realize that this methodology is in fact complex, and difficult to present and discuss from the orientation of biologists or systematists rather than statisticians. We a re not attempting to set standards for numerical studies in systematics. We do urge anyone, inexperienced or experienced, to practice a healthy scepticism towards the literature and the recommendations therein, including this text. vi A C K N O W L E D G M E N T S We have been aided in the preparation of this manual by numerous and diverse people. Dr. F. James Rohlf, Department of Ecology and Evolution, SUNY at Stony Brook, gave generously of his time and expertise. He critically read early drafts of most sections and greatly improved the product. We are grateful for both his criticism and his encouragement. Tne National Science Foundation supported the production of this manual, and the American Society of Mammalogists workshop at which it was distributed, through a research grant DEB 79—17332 to the authors. We thank the American Museum of Natural History, most especially Ms. Diane Menditto, for help with and review of the grant application, and subsequent administration of the grant. The staff of the museum library was, as ever, cheerfully helpful. The Department of Invertebrates, in which one of us (LFM) is a Research Associate, was extremely helpful in supplying workspace and use of a CRT and Diablo terminals; we especially thank Ms. Julia Golden of that department for permitting us incredible license in the use of her office. The Department of Vertebrate Paleontology also supplied space, supplies and support. The Print Shop of the American Museum, in the person of Mr. Vincent Tumillo, performed heroically in printing this manual under too short a time estimate from two inexperienced publishers! This manual will accompany a workshop on 9 June 1980 at the Annual Meeting of the American Society of Mammalogists. We thank Drs. F. J. Rohlf, M. A. Bogan, D. J. Schmidly, D. A. Schlitter, and G.D. Schnell for agreeing to serve as discussants on a panel at this workshop. We appreciate ASM's interest, and especially the support and help of Dr. J. Mary Taylor. Drs. Duane A. Schlitter and Sydney Anderson provided early comments and encouragement. We thank the City University of New York, University Computer Center for making available to us text-editing (WYLBUR) and word-processing (Waterloo SCRIPT) capabilities. vii NEFF and MARCUS, 1980 ABBREVIATIONS USED IN THIS TEXT A NOVA analysis of variance BMDP Biomedical Computer Programs, P-series CCA canonical correlation analysis CVA canonical variates analysis MANOVA multivariate analysis of variance ML maximum likelihood (factor analysis) MST minimum spanning tree NMDS nonmetric multidimensional scaling NT numerical taxonomy NTSYS Numerical Taxonomic System of Multivariate Statistical Programs OTU operational taxonomic unit PCA principal components analysis PCORD principal coordinates analysis PFA principal factor analysis; PFA(iter) refers to the iterative solution to PFA SAS Statistical Analysis System SPSS Statistical Package for the Social Sciences VIF variance inflation factor viii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.