ebook img

Graphical Exploratory Data Analysis PDF

322 Pages·18.571 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Graphical Exploratory Data Analysis

Springer Texts in Statistics Advisors: Stephen Fienberg Ingram Olkin Springer Texts in Statistics Probability and Statistical Inference: Volume I: Probability by i.C. Kalbfleisch Probability and Statistical Inference: Volume 2: Statistical Inference by Nathan Keyfifz Graphical Exploratory Data Analysis by S. du Toit, et al. Counting for Something: An Historical View of Statistics by William Peters S.H.C. du Toit A.G.W. Steyn R.H. Stumpf Graphical Exploratory Data Analysis With 180 Graphical Representations Springer-Verlag New York Berlin Heidelberg London Paris Tokyo S.H.C. du Toit A.G.W. Steyn Institute for Statistical Research Contract Researcher Human Sciences Research Council Institute for Statistical Research Pretoria Human Sciences Research Council South Africa Pretoria South Africa R.H. Stumpf Contract Researcher Institute for Statistical Research Human Sciences Research Council Pretoria South Africa Editorial Board Stephen Fienberg Ingram Olkin Department of Statistics Department of Statistics Carnegie-Mellon University Stanford University Pittsburgh, PA 15213 Stanford, CA 94305 U.S.A. U.S.A. AMS Classification: 62H99 Library of Congress Cataloging in Publication Data du Toit, S.H.e. Graphical exploratory data analysis. (Springer texts in statistics) Bibliography: p. Includes index. 1. Statistics-Graphic methods. 1. Steyn, A.G.W. II. Stumpf, R.H. III. Title. IV. Series. QA276.3.D778 1986 001.4'226 86-4009 © 1986 by Springer-Verlag New York Inc. Softcover reprint of the hardcover 1s t edition 1986 All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag, 175 Fifth Avenue, New York, New York 10010, U.S.A. The use of general descriptive names, trade names, trademarks, etc. in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Typeset by Asco Trade Typesetting, Ltd., Hong Kong. Printed and bound by R.R. Donnelley and Sons, Harrisonburg, Virginia. Printed in the United States of America. 9 8 7 6 543 2 1 ISBN-l3: 978-1-4612-9371-2 e-ISBN-13: 978-1-4612-4950-4 DOl: 10/1007-978-1-4612-4950-4 Preface Portraying data graphically certainly contributes toward a clearer and more penetrative understanding of data and also makes sophisticated statistical data analyses more marketable. This realization has emerged from many years of experience in teaching students, in research, and especially from engaging in statistical consulting work in a variety of subject fields. Consequently, we were somewhat surprised to discover that a comprehen sive, yet simple presentation of graphical exploratory techniques for the data analyst was not available. Generally books on the subject were either too incomplete, stopping at a histogram or pie chart, or were too technical and specialized and not linked to readily available computer programs. Many of these graphical techniques have furthermore only recently appeared in statis tical journals and are thus not easily accessible to the statistically unsophis ticated data analyst. This book, therefore, attempts to give a sound overview of most of the well-known and widely used methods of analyzing and portraying data graph ically. Throughout the book the emphasis is on exploratory techniques. Real izing the futility of presenting these methods without the necessary computer programs to actually perform them, we endeavored to provide working com puter programs in almost every case. Graphic representations are illustrated throughout by making use of real-life data. Two such data sets are frequently used throughout the text. In realizing the aims set out above we avoided intricate theoretical derivations and explanations but we nevertheless are convinced that this book will be of inestimable value even to a trained statistician. We certainly do not wish to claim that this book represents an exhaustive treatment of the topic of graphical exploratory techniques. Due to the many graphical techniques currently in existence and those having been developed VI Preface recently, we were forced to be selective in our choice of techniques presented here. While acknowledging the existence of those techniques not included in this book, we nevertheless believe that our choice of techniques presents a good cross section which should be of great benefit to every data analyst whichever discipline he may represent. The origin ofthis book can be traced back to a series oflectures on graphical techniques presented in 1983 to researchers at the Human Sciences Research Council (HSRC) in the Republic of South Africa. These lectures eventually led to a seminar on graphical techniques and finally to a HSRC report in 1984. The HSRC graciously provided some financial support so that this report in turn, after much editing, changing, and new research led to the writing of this book. In an undertaking of this nature various people and organizations usually play an indispensable role. First, we would like to thank the HSRC and the University of Pretoria for placing their computers at our disposal. Second, we would like to thank Dr. Nico Crowther, Director of the Institute for Statistical Research at the HSRC, for his constant encouragement and advice in completing this project. Furthermore, Terry Shaw and Arien Strasheim were of great help in editing the many computer programs as were Antoinette van der Merwe and Jacques Pie terse in helping to photocopy, check data, proofread, and so forth. Dr. Trevor Hastie from the Institute for Biostatistics at the Medical Research Council in Cape Town was nice enough to write the initial draft on scatterplot smoothing. We however accept full responsibility for the final version of this section. We would also like to thank Professor Stephen Fienberg, advisor to Springer-Verlag, who was of great help to us during his recent visit to South Africa. His mature insight and stimulating suggestions led to many improve ments in the final form of the book. Lynette Hearne and Mynie Stobbe did an excellent job in preparing the many figures while Trisia Badenhorst and Christa de Bruin typed this difficult manuscript. Furthermore, we wish to express our gratitude to the Biometrika Trust for permission to publish Table 28 from Biometrika Tables for Statisticians (1976) and to Professor W.J. Serfontein and Mr. I.B. Ubbink from the University of Pretoria for supplying us with the fitness/cholesterol data. Many of the other data sets used in this book are done so with the kind permission of the HSRC. Finally, and most important of all, we would like to thank our wives Dorothy, Jeanetta, and Adie. Without their understanding and support we would not have been able to complete this book. STEPHEN DU TOIT GERT STEYN ROLF STUMPF Contents Preface v CHAPTER 1 The Role of Graphics in Data Exploration 1. Introduction 1 2. Historical Background 3 3. Content of the Book 4 4. Central Data Sets 5 5. Different Types of Data 8 6. Computer Programs 12 CHAPTER 2 Graphics for Univariate and Bivariate Data 13 1. Introduction 13 2. Graphics for Univariate Data 14 3. Stem-and-Leaf Plots 26 4. Graphics for Bivariate Data 30 5. Graphical Perception 33 CHAPTER 3 Graphics for Selecting a Probability Model 36 1. Introduction 36 2. Discrete Models 37 3. Continuous Models 39 4. General 53 viii Contents CHAPTER 4 Visual Representation of Multivariate Data 54 1. Introduction 54 2. "Scatterplots" in More Than Two Dimensions 54 3. Profiles 55 4. Star Representations 57 5. Glyphs 57 6. Boxes 59 7. Andrews' Curves 59 8. Chernoff Faces 64 9. General 71 CHAPTER 5 Cluster Analysis 73 1. Introduction 73 2. The Probability Approach 74 3. Measures of Distance and Similarity 74 4. Hierarchical Cluster Analysis 78 5. Computer Programs for Hierarchical Cluster Analysis 82 6. Digraphs 88 7. Spanning Trees 90 8. Cluster Analysis of Variables 94 9. Application of Cluster Analysis to Fitness/Cholesterol Data 98 10. Other Graphical Techniques of Cluster Analysis 103 11. General 104 CHAPTER 6 Multidimensional Scaling 105 1. Introduction 105 2. The Biplot 107 3. Principal Component Analysis 114 4. Correspondence Analysis 121 5. Classical (Metric) Scaling 127 6. Non-Metric Scaling 131 7. Three-Way Multidimensional Scaling (INDSCAL) 144 8. Guttman's Techniques 151 9. Facet Theory 156 10. Partial Order Scalogram Analysis 164 11. General 174 CHAPTER 7 Graphical Representations in Regression Analysis 176 1. Introduction 176 2. The Scatterplot 178 3. Residual Plots 194 Contents IX 4. Mallows' Ck-Statistic 215 5. Confidence and Forecast Bands 218 6. The Ridge Trace 219 7. General 223 CHAPTER 8 CHAID and XAID: Exploratory Techniques for Analyzing Extensive Data Sets 224 1. Introduction 224 2. CHAID-An Exploratory Technique for Analyzing Categorical Data 224 3. Applying a CHAID Analysis 229 4. XAID-An Exploratory Technique for Analyzing a Quantitative Dependent Variable with Categorical Predictors 236 5. Application ofXAID Analysis 237 6. General 242 CHAPTER 9 Control Charts 245 1. Introduction 245 2. Process Capability 246 3. Control Charts for Items with Quantitative Characteristics 247 4. Control Charts for Dichotomous Measurements (P-Chart) 251 5. Cumulative Sum Charts 252 6. Cumulative Sine Charts 259 7. General 263 CHAPTER 10 Time Series Representations 264 1. Representations in the Time Domain 264 2. Representations in the Frequency Domain 274 CHAPTER 11 Further Useful Graphics 283 1. Graphics for the Two-Sample Problem 283 2. Graphical Techniques in Analysis of Variance 289 3. Four-Fold Circular Display of 2 x 2 Contingency Tables 295 References 300 Index 305

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.