ebook img

Cluster analysis: Survey and evaluation of techniques PDF

121 Pages·1973·4.005 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cluster analysis: Survey and evaluation of techniques

Cluster analysis II U Tilburg Studies on Sociology 1 Edited by the Institute for Labour Studies of the Tilburg School of Economics, Social Sciences and Law Members of the Board R. de Moor, Chairman F. van Dooren J. Godefroy F. Grunfeld H. Loevendie J. Stalpers Ph. C. Stouthard Director of Research A. Vermeulen A study on Research Methods Cluster analysis Survey and evaluation of techniques E.J. Bijnen Lecturer in Statistics at the Social Faculty of the Tilburg School of Economics, Social Sciences and Law Foreword by Ph. C. Stouthard Professor of Statistics and Research Methods, Tilburg School of Economics, Social Sciences and Law, Department of Psychology 1973 Tilburg University Press The Netherlands Translated by C.E. Brand-Maher Copyright © 1973 by Nijgh-Wolters-Noordhoff Universitaire Uitgevers B. V. Softcover reprint of the hardcover 1st edition 1973 No part of this book may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. Distributors: Academic Book Services Holland, P.O. Box 66, Groningen, the Netherlands ISBN-13: 978-94-011-6784-0 e-ISBN-13: 978-94-011-6782-6 001: 10.1007/978-94-011-6782-6 Foreword During the last years the number of applications of cluster analysis in the social sciences has increased very rapidly. One of the reasons for this is the growing awareness that the assumption of homogeneity implicit in the application of such techniques as factor analysis and scaling is often violated by social science data; another is the increased interest in typolo gies and the construction of types. Dr. Bijnen has done an extremely useful job by putting together and evaluating attempts to arrive at better and more elegant techniques of cluster analysis from such diverse fields as the social sciences, biology and medicine. His presentation is very clear and concise, reflecting his intention not to write a 'cookery-book' but a text for scholars who need a reliable guide to pilot them through an extensive and widely scattered literature. Ph. C. Stouthard v Preface This book contains a survey of a number of techniques of clustering analysis. The merits and demerits of the procedures described are also discussed so that the research worker can make an informed choice be tween them. These techniques have been published in a very great number of journals which are not all easily accessible to the sociologist. This difficulty is com pounded because developments in the different disciplines have occurred almost entirely independently from each other; reference is made only sporadically in a piece of literature to the literature of other disciplines. This is one of the reasons why the survey contained in this book cannot be complete. In addition to this the techniques described have been selected according to their relevance to sociological (and psychological) research. As far as classification of the techniques is concerned, a classification following the field of application of the methods has been opted for (for other classification criteria see, for example, Williams and Dale, 1965 and Lance and Williams, 1967a). After the general introduction, chapter one goes on to discuss a number of coefficients which indicate relative similarity between objects since most clustering techniques take their point of departure from a matrix of similarity coefficients. In the second chapter attention is paid to a number of methods which can be used for clustering variables as well as objects, and the third chapter treats methods for clustering just objects. A special chapter has been devoted to the me thods of McQuitty because of their systematic development and very characteristic point of departure. Finally, the fifth chapter demonstrates via an illustrative example the application of some methods. In writing this book I have received valuable remarks on several aspects of the text from Prof. dr. Ph. C. Stouthard, for which I am very grateful. The responsibility for any faults or deficiencies is, of course, entirely mine. E.J. Bijnen VII Contents INTRODUCTION 1. COEffICIENTS FOR DEFINING THE DEGREE OF SIMILARITY BETWEEN OBJECTS 4 1.1. Introduction 4 1.2. The slope method of Du Mas 6 1.3. Cattell's,p coefficient of pattern similarity 9 1.4. The D-coefficient 10 1.5. Cohen's,c coefficient 12 1.6. Zubin's index and its variants 13 1.6.1. Zubin's index 13 1.6.2. The similarity index of Jaccard 14 1.6.3. The index of Rogers and Tanimoto 15 1.6.4. The G-index of Holley and Guilford 15 1.7. Hyvarinen's coefficient 15 1.8. Smirnov's coefficient 16 1.9. Goodall's probabilistic similarity index 18 1.10. The distance measure of Williams, a.o. 18 1.11. Conclusion 19 2. METHODS DEVELOPED FOR FORMING CLUSTERS OF VARIABLES OR OBJECTS 21 2.1. Introduction 21 2.2. The matrix diagonal method 21 2.3. Methods for re-ordering a socio-matrix 22 2.3.1. Method of Beum and Brundage 23 2.3.2. Method of Coleman and MacRae 24 IX 2.3.3. Method of Weiss 25 2.3.4. Method of Spilerman 26 2.4. Ramifying linkage analysis 26 2.5. The Gengerelli method 27 2.6. The approximate delimitation method 28 2.7. The B-coefficient of Holzinger and Harman 28 2.8. Iterative factor analysis 29 2.8.1. Wherry and Gaylord 29 2.8.2. Bass 30 2.8.3. Boon van Ostade 30 2.8.4. Conclusion 32 2.9. Sneath's single linkage method 32 2.10. S"rensen's complete linkage method 33 2.11. Wishart's method 33 2.12. The method of Michener and Sokal 34 2.13. Bridges'method 35 2.14. The King method 35 2.15. Tryon's cluster analysis 36 2.16. Conclusion 37 3. METHODS OF FORMING CLUSTERS FOR OBJECTS 38 3.1. Introduction 38 3.2. Thorndike's method 38 3.3. The method of Sawrey, Keller and Conger 39 3.4. Ward's method 41 3.5. Johnson's hierarchical clustering scheme 42 3.6. Hierarchical representation of similarity matrices by trees 44 3.7. Cluster analysis according to Constantinescu 45 3.8. The method of Rogers and Tanimoto 45 3.9. Hyvarinen's method 46 3.10. Bonner's methods 48 3.10.1. On the basis of dichotomous variables 48 3.10.2. On the basis of variables on interval level 49 3.11. Boolean cluster search method 51 3.12. Gengerelli's method 53 3.13. Mattson and Dammann's method 54 3.14. The methods of Edwards, a.o. 55 3.15. Conclusion 56 x 4. METHODS FOR THE CONSTRUCTION OF TYPES FOLLOWING MCQUITTY 57 4.1. Introduction 57 4.2. Agreement analysis 58 4.3. Elementary linkage analysis 62 4.4. Elementary factor analysis 64 4.5. Hierarchical linkage analysis 65 4.6. Hierarchical syndrome analysis 66 4.7. Multiple rank order typal analysis 71 4.8. Classification by reciprocal pairs 71 4.9. Intercolumnar correlational analysis 72 4.10. Nominee-selectee analysis 75 4.11. Multiple agreement analysis 76 4.12. Criticism 78 5. SOME APPLICATIONS 80 5.1. Introduction 80 5.2. Thorndike's method 82 5.3. The method of Sawrey, Keller and Conger 85 5.4. Ward's method 89 5.5. McQUitty's syndrome analysis 91 5.5.1. On election results 91 5.5.2. On latent class analysis data 91 5.6. Factor analysis 96 5.6.1. On the basis of correlation coefficients 96 5.6.2. On the basis of product sums 96 5.7. Comparison of the applications 99 CONCLUSION 100 BffiLiOGRAPHY 103 XI Introduction Cluster analysis constitutes one of those techniques of analysis which, to judge from the many publications of recent times, is receiving increas ing interest. This development has been occurring simultaneously over a number of disciplines, such as psychology, psychiatry, biology, sociology, the medical sciences, economics and archeology. It is especially to the first three named of these disciplines that we must look for the beginning of this development; these are also the areas in which the greatest progress has occurred. In the processing of sociological data, cluster analysis is applied in forming 'homogeneous' groups of research variables, such as behavior, attitudes, opinions etc. Sometimes one wishes to check which variables are strongly related in order to reduce the number of variables and, via this reduction, to gain greater control over the assembled data. In such a case one takes it for granted that these joins can occur without great loss of relevant information. In other cases again one wishes to determine (as Johnson, 1967, p. 241, has pointed out) whether there is any structure in a great mass of data which has been collected in a rather haphazard manner. This last, however, constitutes a less attractive field of applica tion because of the lack of a good research design. The most important area of application for clustering techniques is in forming groups of objects, such as persons, companies, associations etc., on the basis of the total (relevant) data collected for each object. It makes sense to analyse the data concerning an object as a whole, because patterns can provide information which separate variables do not. Meehl (1950) has shown the possibility of two dichotomous test variables, each having a fifty-fifty answer distribution on the dichotomous critericn variable, being useful in the prediction of the criterion variable when use is made of the configuration of answers on both test-variables (Meehl's paradox). This possibility occurs if the correlations between both test variables in the separate criterion groups differ strongly from each other. 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.