ebook img

New Approaches in Classification and Data Analysis PDF

694 Pages·1994·23.094 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview New Approaches in Classification and Data Analysis

Studies in Classification, Data Analysis, and Knowledge Organization Managing Editors Editorial Board H. H. Bock, Aachen W. H. E. Day, St. lohn's O. Opitz, Augsburg E. Diday, Paris M. Schader, Mannheim A. Ferligoj, Ljubljana W. Gaul, Karlsruhe 1. C. Gower, Harpenden D. J. Hand, Milton Keynes P. Ihm, Marburg J. Meulmann, Leiden S. Nishisato, Toronto F. J. Radermacher, Ulm R. Wille, Darmstadt Titles in the Series H.-H. Bock and P. Ihm (Eds.) Classification, Data Analysis, and Knowledge Organization M. Schader (Ed.) Analyzing and Modeling Data and Knowledge O. Opitz, B. Lausen, and R.. Klar (Eds.) Information and Classification H.-H. Bock, W. Lenski, and M. M. Richter (Eds.) Information Systems and Data Analysis E. Diday . Y Lechevallier . M. Schader P. Bertrand . B. Burtschy (Eds.) --------~ ---------~~ New Approaches in Classification and Data Analysis With 152 Figures Springer-Verlag Berlin Heidelberg GmbH Prof. Edwin Diday, Institut National de Recherche en Informatique et en Automatique (INRIA) - Rocquencourt, F-75 150 Le Chesnay, France Prof. Yves Lechevallier, Institut National de Recherche en Informatique et en Automatique (INRIA) - Rocquencourt, F-75150 Le Chesnay, France Prof. Dr. Martin Schader, Universität Mannheim, Lehrstuhl für Wirtschaftsinformatik III, Schloß, D-6813 I Mannheim, FRG Prof. Patrice Bertrand, Universite Paris IX Dauphine, PI. du Marechal de Laure de Tassigny, F-75775 Paris Cedex 16, France Dr. Bernard Burtschy, TELECOM-Paris, 46, rue Barrault, F-750I3 Paris, France ISBN 978-3-540-58425-4 ISBN 978-3-642-51175-2 (eBook) DOI 10.1007/978-3-642-51175-2 Tbis work is subject to copyright. All rights are reserved, whether the whole or part of the material is concemed, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 1965, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1994 Originally published by Springer-Verlag Berlin· Heidelberg in 1994. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. 43/2202-5432 I 0 - Printed on acid-free-paper Preface This book brings together a wide range of topics and perspectives in the growing field of Classification and related methods of Exploratory and Multivariate Data Analysis. It gives a broad view on the state ofthe art, useful for those in the scientific community who gather data and seek tools for analyzing and interpreting large sets of data. As it presents a wide field of applications, this book is not only of interest for data analysts, mathematicians and statisticians, but also for scientists from many areas and disciplines concerned with real data, e.g., medicine, biology, astronomy, image analysis, pattern recognition, social sciences, psychology, marketing, etc. It contains 79 invited or selected and refereed papers presented during the Fourth Bi ennial Conference of the International Federation of Classification Societies (IFCS'93) held in Paris. Previous conferences were held at Aachen (Germany), Charlottesville (USA) and Edinburgh (U.K.). The conference at Paris emerged from the elose coop eration between the eight members of the IFCS: British Classification Society (BCS), Classification Society of North America (CSNA), Gesellschaft für Klassifikation (GfKl), J apanese Classification Society (J CS), Jugoslovenska Sekcija za Klasifikacije (JSK), Societe Francophone de Classification (SFC), Societa. Italiana di Statistica (SIS), Vereniging voor Ordinatie en Classificatie (VOC), and was organized by INRIA ("Institut National de Recherche en Informatique et en Automatique"), Rocquencourt and the "Ecole Nationale Superieure des Telecommuni cations," Paris. A software exhibition provided the opportunity for industrial com panies and research laboratories to show their programs and data analysis systems. Various prototypesfrom research laboratories refiected the growing activity in this field. We gratefully acknowledge the effort made by many colleagues who selected and reviewed papers, or chaired sessions during the conference. Also, we are very grate ful to all members of the International Scientific Committee for their sponsorship and their advices and to the Program Committee members for their help and sup port. We appreciate the active collaboration of all participants and authors coming from more than twenty nations which all rendered possible the scientific success of IFCS'93. Our thanks are extended to several industrial companies for sponsoring the technical organization of the conference: Acknosoft, Cisia, EDF, TELECOM, Uni ware. Furthermore, we are very indebted to the Public Relations team of INRIA and TELECOM for their great devotion all along the conference with special mention to Madame M.C. Sance (INRIA) for her constant and indispensable help during the two years of preparation. Finally, we thank Springer-Verlag, and, especially, Dr. Peter Schuster for excellent cooperation and for the opportunity to publish this volume in the series "Stlldies in Classification, Data Analysis, and Knowledge Organization." Rocqllencourt and Mannheim, July 1994 Edwin Diday Yves Lechevallier Martin Schader Patrice Bertrand Bernard Burtschy Contents General aspects in classification and data analysis Classification and Clustering: Problems for the Future H.H. Bock . ................. : 3 Prom classifications to cognitive categorization: the example of the road lexicon D. Dubois, D. Fleury. . . . . . . . . . . . . . . . . . . . . . . . . 25 A review of graphical methods in Japan-from histogram to dynamic display M. Mizuta . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 New Data and New Tools: A Hypermedia Environment for Navigating Statistical Knowledge in Data Science N. Ohsumi. . . . . . . . . . . . . . . . . . . . . . . . . . . 45 On the logical necessity and priority of a monothetic conception of dass, and on the consequent inadequacy of polythetic accounts of category and categorization J.P. Sutc/ijfe. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55 Research and Applications of Quantification Methods in East Asian Countries Y. Tanaka, T. Tarumi, M.-H. Huh . . . . . . . . . . . . . . . . . .. 64 Section 1: Methodological aspects of classification 1.1 Dissimilarity analysis, hierarchical and tree-like classification Algorithms for a geometrical P.C.A. with the Ll-norm M. Benayade, B. Fichet . . . . . . . . . . . . 75 Comparison of hierarchical classifications T. Benkaraache, B. Van Cutsem . . 85 On quadripolar Robinson dissimilarity matrices F. Critchley . . . . . . . . . . . . . . . . . . . . . 93 An Ordered Set Approach to Neutral Consensus Functions G.D. Crown, M.F. Janowitz, R. C. Powers . ............... 102 Prom Apresjan Hierarchies and Bandelt-Dress Weak hierarchies to Quasi-hierarchies J. Diatta, B. Fichet . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Spanning trees and average linkage clustering A. Guenoche. . . . . . . . . . . . . . 119 Adjustments of tree metries based on minimum spanning trees B. Leclere .................... . 128 The complexity of the median procedure for binary trees F.R. MeMorris, M.A. Steel ........... . 136 1.2 Probabilistic and statistical approaches for clustering A multivariate analysis of aseries of variety trials with special reference to classification of varieties T. Calinski, S. Czajka, Z. Kaezmarek . . . . . . . . . . . . . . . . . . 141 VIII Quality control of mixture. Application: The grass P. Trcourt . . . . . . . . .. . ..... 149 Mixture Analysis with Noisy Data M.P. Windham, A. Cutter .. 155 Locally optimal tests on spatial clustering W. Vach . ............ . 161 1.3 Assessment of classifications and the number of clusters Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis H. Bozdogan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 An examination of procedures for deterrnining the number of clusters in a data set A. Hardy . ................................ 178 The gap test: an optimal method for deterrnining the number of natural classes in cluster analysis J.-P. Rasson, T. Kubushishi . ...................... 186 1.4 Clustering methods and computational aspects Mode detection and valley seeking by binary morphological analysis of connectivity for pattern classification C. Botte-Lecocq, J.-G. Postaire .............. . 194 Interactive Class Classification Us ing Types C. Capponi . .................. . 204 K-means clustering in a low-dimensional Euclidean space G. De Soete, J.D. Carroll . . . . . . . . . . . . . . . . . . 212 Complexity relaxation of dynamic prograruming for cluster analysis Y. Dodge, T. Gafner. . . . . . . . . . . . . . . . . . . . . 220 Partitioning Problems in Cluster Analysis: A Review of Mathematical Programrning Approaches P. Hansen, B. Jaumard, E. Sanlaville . . . . . . . . . . . .. .... 228 Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets A. Lelu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Graphs and structural sirnilarities M. Liquiere . . . . . . . . . . . . . . . . . . . 249 A generalisation of the diameter criterion for clustering P. Prea .................... . 257 Percolation and multimodal data structuring R. C. Tremolieres. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 IX 1.5 Discrimination and learning Classification and Discrimination Techniques Applied to the Early Detection of Business Failure M. Bardos ........................ . 269 Recursive Partition and Symbolic Data Analysis A. Ciampi, E. Diday, J. Lebbe, R. Vignes . 277 Interpretation Tools For Generalized Discriminant Analysis A. Famj ................... . 285 Inference about rejected cases in discriminant analysis D.J. Hand, W.E. Henley . .......... . 292 Structure Learning of Bayesian Networks by Genetic Algorithms P. Larmiiaga, M. Poza . . . . . . . . . . . . . . . . . . . . 300 On the representation of observational data used for classification and identification of natural objects J. Le Renard, N. Conruyt . . . . . . . . . . . . . . . . . . . 308 Alternative strategies and CATANOVA testing in two-stage binary segmentation F. Mola, R. Siciliano . . . . . . . . . . . . . . . . . . . . . . . . .. 316 Section 2: Data-specific approaches 2.1 Sequence analysis in molecular biology Alignment, Comparison and Consensus of Molecular Sequences W.H.E. Day, F.R. McMorris . . . . . . . . . . . . . . 327 An Empirical Evaluation of Consensus Rules for Molecular Sequences W.H.E. Day, A.D. Gordon . . . . . . . . . . . . . . . . . . 347 A Probabilistic Approach To Identifying Consensus In Molecular Sequences A.D. Gordon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Applications of Distance Geometry to Moleeular Conformation T.L. Hayden . ..................... . 362 Classifieation of aligned biological sequenees 1. C. Lerman, J. Nicolas, B. Tal/ur, P. Peter . . . . . 370 2.2 Symbolic data for classification and data analysis Use of Pyramids in Symbolic Data Analysis P. Brito .............. . 378 Proximity Coefficients between Boolean symbolic objects F. de A. T. de Carvalho ............. . 387 Coneeptual Clustering in Structured Domains: A Theory Guided Approach F. Esposito. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Automatie Aid to Symbolie Cluster Interpretation M. Gettler Summa, E. Perinel, J. Ferraris . . . . . . . . . . . . . 405 Symbolie Clustering Algorithms using Similarity and Dissimilarity Measures K. Chidananda Gowda, E. Diday . . . . . . . . . . . . . . . . . . 414 x Feature Selection for Symbolic Data Classification M. [chino . ............... . 423 Towards extraction method of knowledge founded by symbolic objects S. Smadhi . . . . . . . . . . . . . . . . . . . . . . . . . . 430 One Method of Classification based on an Analysis of the Structural Relationship between Independent Variables S. Takakura . . . . . . . . . . . . . . . . . . . . . . . . . 438 The Integration of Neural Networks with Symbolic Knowledge Processing A. Ultsch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 2.3 Uncertainty handling and fuzzy data Ordering of Fuzzy k-Partitions S. Bodjanova. . . . . . . . .... 455 On the Extension of Prob ability Theory and Statistics to the Handling of Fuzzy Data R. [(ruse . .......................... . 463 Fuzzy Regression W. Näther . 470 Clustering and Aggregation of Fuzzy Preference Data: Agreement vs. Information J. W. Owsiriski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Rough Classification with Valued Closeness Relation R. Slowiriski, J. Stefanowski . . . . . . . . . 482 Section 3: Multivariate analysis and statistical methods 3.1 Visual Representation of data Representing proximities by network models 1'(' C. [(lauer , , , . . . . . . . . . . . 493 An Eigenvector Algorithm to Fit lp-Distance Matrices R. J'vfeyer . .................. . 502 3.2 Analysis of contingency tables A non linear approach to Non Symmetrical Data Analysis J.-F. Durand, Y. Escoufier ........... . 510 An Algorithmic Approach to Bilinear Models for Two-Way Contingency Tables A. de Falguerolles, B. Francis . . . . . . . . . . . . . . . . . . . .. 518 3.3 Statistical methods New Approaches Based on Rankings in Sensory Evaluation Y. Baba . . . . . . . . . , . . . . . . . , . . . . . 525 Estimating failure times distributions from censored systems arranged in series M. Bacha, G. Celeux, J. Diebolt, E. Idee . . . . . . . . . . . . . .. 533 XI Calibration Used as a Nonresponse Adjustment F. Dupont ............... . 539 Least Squares Smoothers and Additive Decomposition U. Halekoh, P. O. Degens ............. . 549 Section 4: Applications and information processing 4.1 Knowledge-based systems and textual data High Dimensional Representations and Information Retrieval G.F. Furnas . . . . . . . . . . . . . . . . . . . . . . 559 Experiments of Textual Data Analysis at Electricite de France G. Hebrail, J. Marsais ................ . 569 Conception of a Data Supervisor in the Prospect of Piloting Management Quality of Service and Marketing M. Jambu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Discriminant Analysis Using Textual Data L. Lebart, C. Callant. . . . . . . . . . . . . . . . . . 584 Recent Developments in Case Based Reasoning: Improvements of Similarity Measures M.M. Richter . . . . . . . . . . . 594 4.2 Medical data and image analysis Contiguity in discrirninant factorial analysis for image clustering L. Abdessemed, B. Escofier .............. . 602 Exploratory and Confirmatory Discrete Multivariate Analysis in a Probabilistic Approach for Studying the Regional Distribution of Aids in Angola H. Bacelar-Nicolau, F. C. Nicolau . . . . . . . . . . . . . . . . . . . . 610 Factor Analysis of Medical Image Sequences (FAMIS): Fundamental principles and applications H. Benali, I. Buvat, F. Frouin, J.P. Bazin, J. Chabriais, R. di Paola . 619 Multifractal Segmentation of Medical Images J.-P. Berroir, J. Levy Vehel . . . . . . . . . . . . . . . . . . . . . 628 The Human Organism-a Place to Thrive for the Immuno-Deficiency Virus A. W.M. Dress, R. Wetzet. . . . . . . . . . . . . . . . . . . . . . 636 Comparability and usefulness of newer and classical data analysis techniques. Application in medical domain classification E. Krusinska, J. Stefanowski, J.-E. Strömberg .............. 644 4.3 Astronomy The Classification of IRAS Point Sources J.A.D.L. Blommaert, W.E.C.J. van der Veen, H.J. Habing . 653 Astronornical classification of the Hipparcos input catalogue M. Herndndez-Pajares, R. Cubarsi, J. Floris ....... . 663

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.