ebook img

Typologies and taxonomies: An introduction to classification techniques PDF

95 Pages·1994·3.181 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Typologies and taxonomies: An introduction to classification techniques

Series/Number 07-102 TYPOLOGIES AND TAXONOMJES An Introduction to Classification Techniques KENNETH D. BAILEY University of California, Los Angeles SAGE PUBLICATIONS International Educational and Professional Publisher Thousand Oaks London New Delhi Copyright <!} 1994 by Sugc Publicutions, Inc. All rights reserved. No purt of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any informa tion storage and retrieval system, without permission in writing from the publisher. For information address: SAGE Publications, Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications Ltd. 6 Bonhill Street London EC2A 4PU United Kingdom SAGE Publications India Pvt. Ltd. M-32 Market Greater Kailash I New Delhi 110 048 India Printed in the United States of America Library of Congress Catalog Curd No. 89-043409 Bailey, Kenneth D. Typologies aod taxonomies: An introduction to classification •techniques I Kenneth D. Bailey. p. em.-(Quaotitative applications in the social sciences ; 07-102) Includes bibliographical references. ISBN 0-8039-5259-7 (pb) I. Social sciences-Classification. 2. Social sciences Methodology. I. Title. II. Series: Sage university papers series. Quantitative applications in the social sciences; no. 07-102. H61.2.B34 1994 ' 300'.12-dc20 94-7960 94 95 96 97 98 10 9 8 7 6 5 4 3 2 1 Sage Production Editor: Astrid Virding When citing a university paper, please use the proper form. Remember to cite the current Sage University Paper series title and include the paper number. One of the following formats can be adapted (depending on the style manual used): (1) BAILEY, KENNETH D. (1994) Typologies and Taxonomies: An Introduction to Classification Techniques. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-102. Thousand Oaks, CA: Sage. OR (2) Bailey, K. D. (1994). Typologies and taxonomies: An introduction to classification techniques (Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-102). Thousand Oaks, CA: Sage. CONTENTS Series Editor's Introduction v 1. Typologies and Taxonomies in Social Science 1 The History of Classification in Social Science 10 Advantages and Disadvantages of Classification 11 2. Classical Typology Construction (Precomputer) 17 Substruction 24 Reduction 26 Advantages and Disadvantages of Conceptual Typologies 33 3. Numerical Taxonomy and Cluster Analysis 34 Criteria for Evaluating Clustering Methods 40 A Typology of Clustering Techniques 48 Examples of Clustering 50 Divisive Methods 57 Evaluation of Clusters 60 The Interpretation of Clusters 61 Advantages and Disadvantages of Clustering 63 4. Relations Among Techniques 66 Cluster Analysis and Factor Analysis 68 Cluster Analysis and Multidimensional Scaling 71 Cluster Analysis and Multiple Discriminant Analysis 73 Classification and Systems Analysis 75 5. Summary and Conclusions 76 Classification ~esources 77 Classification Applications 78 Concluding Remarks 83 References 84 About the Author 90 SERIES EDITOR'S INTRODUCTION Classification is basic to the social sciences but seldom receives methodo logical exposition, perhaps because it is so ingrained in research practice. Professor Bailey's volume thus affords a rare opportunity for systematic reflection on this vital set of techniques. Classification involves the or dering of cases in terms of their similarity and can be broken down into two essential approaches: typology and taxonomy. The former is primarily conceptual, the latter empirical. Construction of a typology requires conceptualization along at least two dimensions. To illustrate, consider the following hypothetical example. Dr. Alice Auburn, political scientist, develops a typology of nations, in terms of a political system dimension-Democratic versus Authoritar ian-and an economic system dimension-Market versus Controlled. She then has a four-cell table, with each cell labeled a "type" of nation; for example, those in the Authoritarian/Controlled cell called a Totalitarian type, those in the Authoritarian/Market cell called a Traditional type. Since this is the simplest of typologies, she has little trouble naming all the cells. If she decides to add dimensions (or categories within the current dimensions), however, she will quickly multiply the number of "types" and so may wish to resort to a "reduction" procedure, carefully described by Professor Bailey. A taxonomy begins empirically, rather thin conceptually, with the goal of classifying cases according to their measured similarity on observed variables. The principal technique here is cluster analysis. Suppose, to continue our hypothetical example, Pr. Auburn focuses on a sample of 16 Latin American nations (cases) measured on 100 variables taken from U.N. Statistical Yearbooks. She applies Ward's hierarchial clustering method (a widely used agglomerative, objective, average linkage proce dure) and finds that the nations group themselves into two distinct clus ters. Conceptually, what do these clusters represent? Perhaps the more democratic nations are in one group, the more authoritarian in another. However, as Professor Bailey sagely points out, the cluster solution does vi not speak to the conceptual meaning of the clusters, but instead confines itself to demonstration of their empirical presence. There are so many varieties of cluster analysis that the novice, not to mention the expert, risks bewilderment. Fortunately, Professor Bailey patiently describes each, along with advantages and disadvantages. In a masterful example of classification and summation, he offers a "Typology of Clustering Methods" (see Table 3.4), reducing most of them to two dimensions (form of linkage, type of similarity level) in six cells. Besides this valuable synthesis, he relates cluster analysis to factor analysis, multidimensional scaling, multiple discriminant analysis, and systems analysis. For the different classification procedures, conceptual or empiri cal, Professor Bailey provides comparative, thorough coverage, always in a writing style clear to student and faculty alike. All social science re searchers will learn from this unique volume. -MichaelS. Lewis-Beck -Series Editor TYPOLOGIES AND TAXONOMIES An Introduction to Classification Techniques KENNETH D. BAILEY University of California, Los Angeles 1. TYPOLOGIES AND TAXONOMIES IN SOCIAL SCIENCE This monograph is about methods of classification. Classification is a very central process in all facets of our lives. It is so ubiquitous that not only do we generally fail to analyze it, we often even fail to recognize its very existence. Classification is arguably one of the most central and generic of all our conceptual exercises. It is the foundation not only for concep tualization, language, and speech, but also for mathematics, statistics, and data analysis in general. Without classification, there could be no ad vanced conceptualization, reasoning, language, data analysis or, for that matter, social science research. Yet, as central as this process is, it is often poorly understood. It is almost the methodological equivalent of electric ity-we use it every day, yet often consider it to be rather mysterious. It is one of those things that we all use without knowing very much about how it works. In its simplest form, classification is merely defined as the ordering of entities into groups or classes on the basis of their similarity. Statistically speaking, we generally seek to minimize within-group variance, while maximizing between-group variance. This means that we arrange a set of entities into groups, so that each group is as different as possible from all other groups, but each group is internally as homogeneous as possible. By maximizing both within-group homogeneity and between-group hetero geneity, we make groups that are as distinct (nonoverlapping) as possible, with all members within a group being as alike as possible. These are general goals that specific classification techniques may alter somewhat. 2 As just defined, classification is both a process and an end result. We may thus speak both of the process of classification and of a classification so formed. Almost everything is classified to some degree in everyday life, from chewing gum (bubble and nonbubble), to people (men and women), to animals, to vegetables, to minerals. Grouping objects by similarity, how ever, is not quite as simple as it sounds. Imagine that we throw a mixture of 30 knives, forks, and spoons into a pile on a table and ask three people to group them by "similarity." Imagine our surprise when three different classifications result. One person classifies into two groups of utensils, the long and the short. Another classifies into three classes-plastic, wooden, and silver. The third person classifies into three groups-knives, forks, and spoons. Whose classification is "best"? To belabor the example, suppose we secure survey data from a sample of 100 persons. We ask three people to classify this sample by "physi cal characteristics." One person produces the expected grouping into women and men. The second groups by height, producing one class of persons 5 feet 8 inches tall or more (mostly men but some women), and another class of persons under 5 feet 8 inches tall (mostly women, but some men). The third person produces a similar group based on weight, with the class of persons over 160 pounds containing both men and women (but mostly men), while the class under 160 pounds also contains both men and women (but mostly women). Again, which classification is the "best"? The lesson here should be obvious-a classification is no better than the dimensions or variables on which it is based. If you follow the rules of classification perfectly but classify on trivial dimensions, you will produce a trivial classification. As a case in point, a classification that they have four legs or two legs may produce a four-legged group consisting of a giraffe, a dining-room table, and a dancing couple. Is this what we really want? One basic secret to successful classification, then, is the ability to ascertain the key or fundamental characteristics on which the classifica tion is to be based. A person who classifies mixtures of lead and gold on the basis of weight alone will probably be sadder but wiser. It is crucial that the fundamental or defining characteristics of the phenomenon be identified. Unfortunately, there is no specific formula for identifying key characteristics, whether the task is theory construction, classification, or statistical analysis. In all of these diverse cases, prior knowledge and theoretical guidance are required in order to make the right decisions. 3 Assuming that the fundamental dividing characteristics can be identi fied, the classification process often becomes very simplistic. It does not take a genius to separate white beans from black beans. Often the classi fication process merely consists of such simple divisions into groups. In such cases it can be accomplished by almost anyone, simply by observing the similar objects and grouping them together. The problem is that empirical classification problems are seldom so simple. The world is very complex. The difficulty of grouping by similarity grows exponentially with the number of objects to be classified and the number of dimensions on which they are being grouped. Thus while classifying 10 beans by color is simplicity itself, classifying .5 billion people on the basis of two thousand variables is another matter entirely. Even such a complex prob lem is relatively straightforward. We could conceivably simply match each of the 5 billion persons, two at a time, on all two thousand variables. This is not intellectually taxing, but is tremendously time consuming and likely overwhelms even the largest computer. Thus, for very large samples and many variables, some shorthand methods such as clustering algo rithms or formulas need to be devised. Another problem with classification procedures is that, like theory and data analysis, they can encompass at least three levels of analysis. These are the conceptual (where only concepts are classified), the empirical (where only empirical entities are classified), and the combined concep tuaVempiricallevel (called the operational or indicator level by Bailey, 1990), where both are combined. In the latter, a conceptual classification is first devised, and then empirical examples of some or all of the cells are subsequently identified. Basic Concepts The generic classification process, as defined above, is quite simple. The only basic rule is that the classes formed must be both exhaustive and mutually exclusive. This means that if N persons are to be classified, there must be an appropriate class for each (exhaustivity), but only one correct class for each, with no case being a member of two classes (mutual exclusivity). Thus, there must be one class (but only one) for each of the Npersons. Although the basic definition of classification is simple, the complexity of the cases to which it is addressed makes it complicated, as noted above. This leads to a plethora of definitions, many of them quite arcane. It is crucial that the student of classification not get so confused by all of the 4 terms that he or she is unable to see the forest for the trees. The following terms are the minimum set of basic concepts. Classification. Already defined, classification is the general process of grouping entities by similarity. Classification can either be unidimen sional, being based solely on a single dimension or characteristic, or multidimensional, being based on a number of dimensions. When multi dimensional, the dimensions are generally thought to be correlated or related, as in the four rational dimensions (compensatory rewards, spe cialization, performance emphasis, and segmental participation) dis cussed in Chapter 2. Unrelated dimensions generally would not be combined in a classification, but could be. Dimensions are generally categorical data, such as nominal or ordinal variables. Interval and ratio variables can be used as well, however, if the researcher first categorizes them. Further, quantified cluster methods can use variables of all levels nominal, ordinal, interval, or ratio-as discussed in Chapter 3. Typology. Typology is another term for a classification. Two charac teristics distinguish typologies from generic classifications. A typology is generally multidimensional and conceptual. Typologies generally are characterized by labels or names in their cells. As a hypothetical example, let us use two dimensions to construct a classification. These dimensions are intelligence (dichotomized as intelligent/unintelligent) and motiva tion (dichotomized as motivated/unmotivated). Combining these two dimensions creates a fourfold typology; as shown in Table 1.1. These four categories can be defined as cells in the table. In this case, they are types, or type concepts. A motivated and intelligent person can be labeled as successful; an intelligent but unmotivated person is likely to be an under achiever; while a motivated but unintelligent person is an overachiever; and one who lacks both intelligence and motivation is likely doomed to failure. A problem is that ifthe number of dimensions is large, and the number of categories in each dimension is also large, the typology may contain a great many cells or types. For example, even if all dimensions are dichoto mies, the formula for determining the number of cells is 2M, where M is the number of dimensions. Thus, for five dichotomous dimensions the typology will contain only 25 or 32 cells, but for 12 dichotomous dimen sions the number of cells is 212 or 4,096. If the dimensions are polytomous rather than dichotomous, as is often the case, the number of cells expands much more rapidly. Because the number of types can be so large, re-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.