ebook img

Multivariate Data Analysis PDF

224 Pages·1987·7.113 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multivariate Data Analysis

MULTIVARIATE DATA ANALYSIS ASTROPHYSICS AND SPACE SCIENCE LIBRARY A SERIES OF BOOKS ON THE RECENT DEVELOPMENTS OF SPACE SCIENCE AND OF GENERAL GEOPHYSICS AND ASTROPHYSICS PUBLISHED IN CONNECTION WITH THE JOURNAL SP ACE SCIENCE REVIEWS Editorial Board R.L.F. BOYD, University College, London, England W. B. BURTON, Sterrewacht, Leiden, The Netherlands L. GOLDBERG, Kilt Peak National Observatory, Tucson, Ariz., U.S.A. C. DE JAGER, University of Utrecht, The Netherlands J. KLECZEK, Czechoslovak Academy of Sciences, Ondfejov, Czechoslovakia Z. KOPAL, University of Manchester, England R. LOST, European Space Agency, Paris, France L. 1. SEDOV, Academy of Sciences of the U.S.S.R., Moscow, U.S.S.R. Z. SVESTKA, Laboratory for Space Research, Utrecht, The Netherlands MULTIVARIATE DATA ANALYSIS by FlONN MURTAGH ST-ECFIEuropean Southern Observatory, Munich, F.R. G. and Space Science Department, ESTEC, Noordwijk, The Netherlands and ANDRE HECK CD.S., Observatoire Astronomique, Strasbourg, France D. REIDEL PUBLISHING COMPANY A MEMBER OF THE KLUWER ACADEMIC PUBLISHERS GROUP DORDRECHT/BOSTON/LANCASTER/TOKYO Library of Congress Cataloging in Publication Data Murtagh, Fionn. Multivariate data analysis (Astrophysics and space science library) Bibliography: p. Includes index. 1. Statistical astronomy. 2. Multivariate analysis. 1. Heck, A. (Andre) II. Title. QB149.M87 1987 520' .1'519535 86-29821 ISBN-13: 978-90-277-2426-7 e-ISBN-13: 978-94-009-3789-5 DOl: 10.1007/978-94-009-3789-5 Published by D. Reidel Publishing Company, P.O. Box 17, 3300 AA Dordrecht, Holland. Sold and distributed in the U.S.A. and Canada by Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers Group, P.O. Box 322, 3300 AH Dordrecht, Holland. All Rights Reserved © 1987 by D. Reidel Publishing Company, Dordrecht, Holland No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical including photocopying, recording or by any information storage and retrleval system, without written permission from the copyright owner To Gladys and Sorcha. To Marianne. Contents List of Figures xi List of Tables xiii Foreword xv 1 Data Coding and Initial Treatment 1 1.1 The Problem ...... . 1 1.2 Mathematical Description 2 1.2.1 Introduction 2 1.2.2 Distances . . . . . 3 1.2.3 Similarities and Dissimilarities 6 1.3 Examples and Bibliography . . . . . . 8 1.3.1 Selection of Parameters .... 8 1.3.2 Example 1: Star-Galaxy Separation 8 1.3.3 Example 2: Galaxy Morphology Classification. 9 1.3.4 General References . . . 10 2 Principal Components Analysis 13 2.1 The Problem ...... . 13 2.2 Mathematical Description . . . 15 2.2.1 Introduction ..... . 15 2.2.2 Preliminaries - Scalar Product and Distance 17 2.2.3 The Basic Method . . . . . . . . 18 2.2.4 Dual Spaces and Data Reduction ..... . 21 2.2.5 Practical Aspects . . . . . . . . . . . . . . 23 2.2.6 Iterative Solution of Eigenvalue Equations 25 2.3 Examples and Bibliography 26 2.3.1 General Remarks ..... . 26 2.3.2 Artificial Data ...... . 27 2.3.3 Examples from Astronomy 29 2.3.4 General References . . . . . 33 viii CONTENTS 2.4 Software and Sample Implementation 33 2.4.1 Progam Listing 34 2.4.2 Input Data .. 49 2.4.3 Sample Output 51 3 Cluster Analysis 55 3.1 The Problem 55 3.2 Mathematical Description 56 3.2.1 Introduction 56 3.2.2 Hierarchical Methods 57 3.2.3 Agglomerative Algorithms. 61 3.2.4 Minimum Variance Method in Perspective . 67 3.2.5 Minimum Variance Method: Mathematical Properties 70 3.2.6 Minimal Spanning Tree 71 3.2.7 Partitioning Methods 73 3.3 Examples and Bibliography . . 77 3.3.1 Examples from Astronomy 77 3.3.2 General References . . . . . 83 3.4 Software and Sample Implementation 85 3.4.1 Program Listing: Hierarchical Clustering 86 3.4.2 Program Listing: Partitioning. 99 3.4.3 Input Data . . 106 3.4.4 Sample Output 106 4 Discriminant Analysis III 4.1 The Problem .... 111 4.2 Mathematical Description . . . . . . . 113 4.2.1 Multiple Discriminant Analysis 113 4.2.2 Linear Discriminant Analysis . 115 4.2.3 Bayesian Discrimination: Quadratic Case 116 4.2.4 Maximum Likelihood Discrimination 119 4.2.5 Bayesian Equal Covariances Case 120 4.2.6 Non-Parametric Discrimination. 120 4.3 Examples and Bibliography ... . 122 4.3.1 Practical Remarks .... . 122 4.3.2 Examples from Astronomy 123 4.3.3 General References ..... 125 4.4 Software and Sample Implementation 127 4.4.1 Program Listing: Linear Discriminant Analysis 128 4.4.2 . Program Listing: Multiple Discriminant Analysis 134 4.4.3 Program Listing: K-NNs Discriminant Analysis 148 4.4.4 Input Data .................... . 151 CONTENTS ix 4.4.5 Sample Output: Linear Discriminant Analysis 151 4.4.6 Sample Output: Multiple Discriminant Analysis 152 4.4.7 Sample Output: K-NNs Discriminant Analysis . 154 5 Other Methods 155 5.1 The Problems ...... . 155 5.2 Correspondence Analysis. 156 5.2.1 Introduction 156 5.2.2 Properties of Correspondence Analysis 157 5.2.3 The Basic Method . . . . . . . . . 159 5.2.4 Axes and Factors ......... . 159 5.2.5 Multiple Correspondence Analysis 160 5.3 Principal Coordinates Analysis . 163 5.3.1 Description ....... . 163 5.3.2 Multidimensional Scaling 166 5.4 Canonical Correlation Analysis 167 5.5 Regression Analysis ..... . 167 5.6 Examples and Bibliography . . 169 5.6.1 Regression in Astronomy 169 5.6.2 Regression in General 171 5.6.3 Other Techniques ..... 171 6 Case Study: IUE Low Dispersion Spectra 173 6.1 Presentation........... 173 6.2 The ruE Satellite and its Data 173 6.3 The Astrophysical Context 174 6.4 Selection of the Sample 176 6.5 Definition of the Variables . 176 6.5.1 The Continuum Asymmetry Coefficient 177 6.5.2 The Reddening Effect 178 6.6 Spect~al Features . . . . . . . . . . . . . . . . . 179 6.6.1 Generalities ............... . 179 6.6.2 Objective Detection of the Spectral Lines 180 6.6.3 Line Intensities . . . . . . 182 6.6.4 Weighting Line Intensities . . . 182 6.7 Multivariate Analyses ........ . 182 6.7.1 Principal Components Analysis 183 6.7.2 Cluster Analysis ....... . 184 6.7.3 Multiple Discriminant Analysis 191 7 Conclusion: Strategies for Analysing Data 195 7.1 Objectives of Multivariate Methods ..... 195 x CONTENTS 7.2 Types of Input Data . 196 7.3 Strategies of Analysis 197 General References 199 Index 205 List of Figures 1.1 Two records (x and y) with three variables (Seyfert type, maglll- tude, X-ray emission) showing disjunctive coding. 5 2.1 Points and their projections onto axes. . . . . . . 14 2.2 Some examples of PCA of centred clouds of points. 16 2.3 Projection onto an axis. . . . . . . . . . . . . . . . 19 3.1 Construction of a dendrogram by the single linkage method. 58 3.2 Differing representations of a hierarchic clustering on 6 objects. 60 3.3 Another approach to constructing a single linkage dendrogram. 62 3.4 Five points, showing NNs and RNNs. . . . . . . . . . . . . . . . 66 3.5 Alternative representations of a hierarchy with an inversion. . . 67 3.6 Three binary hierarchies: symmetric, asymmetric and intermediate. 69 3.7 Minimal spanning tree of a point pattern (non-unique). 73 3.8 Differing point patterns. . . . . . . . . . . . . . . . . . . . . . . .. 74 4.1 Two sets of two groups. Linear separation performs adequately in (a) but non-linear separation is more appropriate in (b). . . . . .. 112 4.2 The assignment of a new sample a to one of two groups of centres Yl and Y2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.1 Table in complete disjunctive form and associated Burt table. 161 5.2 Horseshoe pattern in principal plane of Correspondence Analysis. 164 6.1 The International Ultraviolet Explorer (IUE). . . . . . . . . . 174 6.2 Illustration of terms in the asymmetry coefficient S (see text). 178 6.3 Illustration of the reddening effect on a normalized spectrum. 179 6.4 Test used for objectively detecting potential lines in the spectra. 180 6.5 Weighting of the line intensities by the asymmetry coefficient through the Variable Procrustean Bed technique. . . . . . . . . . . . . . .. 183 6.6 Dwarf and supergiant stars in the plane of discriminant factors 1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 192 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.