Otto Opitz (Editor) In Cooperation with the Chairmen W Gaul, University of Karlsruhe H. Schneiling, University of Konstanz P. O. Degens, University of Dortmund Conceptual and Numerical Analysis of Data Proceedings of the 13 th Conference of the Gesellschaft fUr Klassifikation e. V University of Augsburg, April 10-12, 1989 Springer-Verlag Berlin Heidelberg New York London Paris Tokyo Hong Kong Prof Dr. Otto Opitz Lehrstuhl fUr Mathematische Methoden der Wirtschaftswissenschaften Universitat Augsburg, Memminger StraBe 14 0-8900 Augsburg, FRG With 72 Figures ISBN-13: 978-3-540-51641-5 e-ISBN-13: 978-3-642-75040-3 DOl: 10.1007/978-3-642-75040-3 This work is subject to copyright. All rights are reserved, whether the whole or part of the mate rial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recita tion, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplica tion of this publication or parts thereof is only permitted under the provisions of the German Copy right Law of September 9, 1965, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law. © Springer-Verlag Berlin· Heidelberg 1989 The use of registered names, trademarks, etc. in this publication does not imply, even in the absence ofa specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. 2142/7130-543210 ACKNOWLEDGEMENTS We thank the following institutions and companies for their finan cial and technical support: Universitat Augsburg Stadt Augsburg Industrie- und Handelskammer fUr Augsburg und Schwaben Gesellschaft fiir Mathematik und Datenverarbeitung mbH - Projekttrager Fachinformation European Research Office, U.S. Army Research Bayerische Landesbank, Miinchen BAYWA AG, Miillchen BMW AG, Miinchen Daimler-Benz AG, Stuttgart Henkel KGaA, Diisseldorf IBM Deutschland GmbH, Stuttgart Vereinte Krankenversicherung AG, Miinchen Vereinte Lebensversicherung AG, Miinchen VW Stifterverband,Wolfsburg Augsburger Aktienbank AG Bayer. Hypotheken- und Wechsel-Bank AG, Augsburg Bayerische Vereinsbank, Augsburg BOWE GmbH, Augsburg Digital Equipment International GmbH, Kaufbeuren Goetze Friedberg GmbH Grob-Werke GmbH & Co. KG, Mindelheim Hoechst AG, Werk Bobingen Kernkraftwerke Gundremmingen Betriebsgesellschaft mbH VI Kreissparkasse Augsburg Grundstiicksgesellschaft Kroll & Nill, Augsburg Lech-Elektrizitatswerke AG, Augsburg MAN Roland Druckmaschinen AG, Augsburg NCR GmbH, Augsburg Raiffeisenbank Augsburg eG Renk AG, Augsburg Siemens AG, Augsburg Stadtsparkasse Augsburg SCIENTIFIC PROGRAM COMMITTEE SECTION 1: Data Analysis and Classification: Basic Concepts and Methods Chairman: W. Gaul, Universitat Karlsruhe in cooperation with H.H. Bock, RWTH Aachen E. Diday, INRIA, Le Chesnay Cedex/France L. Fahrmeir, Universitat Regensburg G. Herden, Universitat-Gesamthochschule Essen R. Mathar, Universitat Augsburg SECTION 2: Applications in Library Sciences, Documentation and Information Sciences Chairman: H. Schne1ling, Universitat Konstanz in cooperation with I. Dahlberg, INDEKS-Verlag, Frankfurt/Main R. Fugmann, Idstein H. Leclercq, Katholieke Universiteit Leuven/Belgien U. Schulz, Universitatsbibliothek Oldenburg VIII SECTION 3: Applications in Economics and Social Sciences o. Chairman: Opitz, Universitat Augsburg in cooperation with K. Ambrosi, Universitat Hildesheim Th. Bausch, Universitat Augsburg J. Krauth, Universitat Dusseldorf G. Ronning, Universitat Konstanz M. Schader, Universitat der Bundeswehr Hamburg H.-J. Schek, ETH Ziirich/Schweiz SECTION 4: Applications in Natural Sciences and Computer Sciences Chairman: P.O. Degens, Universitat Dortmund in cooperation with W. Erdelen, Universitat des Saarlandes W. Ludwig, Technische Universitat Miinchen J. W. Munch, Universitat-Gesamthochschule Siegen PREFACE The 13th conference of the Gesellschaft fUr Klassifikation e.V. took place at the Universitat Augsburg from April 10 to 12, 1989, with the' local organization by the Lehrstuhl fUr Mathematische Me thoden der Wirtschaftswissenschaften. The wide ranged subject of the conference Conceptual and Numerical Analysis of Data was obliged to indicate the variety of the concepts of data and information as well as the manifold methods of analysing and structuring. Based on the received announcements of papers four sections have been arranged: 1. Data Analysis and Classification: Basic Concepts and Methods 2. Applications in Library Sciences, Documentation and Information Sciences 3. Applications in Economics and Social Sciences 4. Applications in Natural Sciences and Computer Sciences This classification doesn't separate strictly, but it shows that theo retic and applying researchers of most different disciplines were disposed to present a paper. In 60 survey and special lectures the speakers reported on developments in theory and applications en couraging the interdisciplinary dialogue of all participants. This volume contains 42 selected papers grouped according to the four sections. Now we give a short insight into the presented papers. x Several problems of concept analysis, cluster analysis, data analysis and multivariate statistics are considered in 18 pa pers of section 1. The geometric representation of a concept lattice is a collection of figures in the plane corresponding to the given concepts in such a way that the subconcept-superconcept-relation corresponds to the containment relation between the figures. R. Wille discusses in which situations geometric representations are useful or even more appropriate than line diagrams which have been proven to be an universal and successful tool for representing concept lattices. W. Kollewe shows by means of a survey among nine managers how methods of concept analysis can be applied to the evaluation of the results. The original data should be represented without loss of information. Therefore special attention is focussed on conceptual scaling. Cluster analysis provides a large number of mathematical and statistical methods for subdividing a given set of objects into a suitable number of homogeneous classes. H.H. Bock reviews and presents in a survey paper a series of probabilistic models and approaches used in cluster analysis for specifying the classi fication structures, for deriving clustering criteria and strategies, for constructing and optimizing clustering algorithms, for testing the existence of a clustering structure in the data, for checking the relevance of calculated classifications and for other aspects. On the other hand he reflects on some applications for solving internal problems in probability theory and statistics. A modification of the Rand index for comparing partitions is given by J. Krauth. He describes how the Hubert/Arabie modification can be altered in analogy to a modification of the kappa coefficient performed earlier by the author and how the modification can be applied to the extension of the Hubert/Arabie correction to non disjoint partitions. XI The problem of overlapping clustering is how to limit the resulting o. overlappings. Opitz and R. Wiedemann develop a new agglomerative algorithm restricting the number of classes in such a way, that maximal cliques of smaller size and therefore smaller single linkage homogeneity are avoided. Further they give some views on the best and the worst case run of their algorithm. The following papers are concerned with hierarchical clustering and graphtheoretical methods in cluster analysis. K. Wolf and P.O. Degens present some suggestions for analysing methods which construct additive trees and they show that existing me thods differ significantly in terms of the given suggestions and the quality of the results achieved. W. Vach specializes in the problem of least squares approximation of additive trees with fixed tree structure and sketches implications for the approxima tion problem with unknown tree structure. M. Tittel and P.O. Degens use isotonic regression to construct a monotoneously in dexed hierarchy such that the associated ultrametric approxima tes the given distance matrix. They distinguish between the large average linkage problem (no knowledge about the hierarchy struc ture) and the small average linkage problem (fixed hierarchy). G. Herden tackles the problem that every goodness of fit cri terion should be compatible with transformations or homomor phisms applied to represent the quality of qualitative data. He dis cusses conditions and consequences for the situation, when there exists exactly one uniquely determined criterion for heterogeneity functions based upon dissimilarity coefficients. The aim of the survey paper of E. Diday and M.P. Brito is to introduce symbolic cluster analysis and to show that the symbolic approach extends cluster analysis to more complex data which may be closer to the multidimensional reality. They define order, union and intersection between symbolic objects and show that these bbjects are organized according to an inheritance lattice. Further the authors study several kinds of qualities of symbolic objects, of classes and classifications of symbolic objects.