ebook img

Cluster Analysis PDF

348 Pages·2010·2.79 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cluster Analysis

Red box rules are for proof stage only. Delete before final printing. EVERITT LANDAU Cluster Analysis LEESE 5th Edition STAHL Brian S. Everitt, Sabine Landau, Morven Leese and Daniel Stahl King’s College London, UK Cluster Cluster analysis comprises a range of methods for classifying multivariate C data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns l present. These techniques have proven useful in a wide range of areas u such as medicine, psychology, market research and bioinformatics. Analysis s t This 5th edition of the highly successful Cluster Analysis includes coverage e of the latest developments in the field and a new chapter dealing with finite r mixture models for structured data. A Real life examples are used throughout to demonstrate the application 5th Edition of the theory, and figures are used extensively to illustrate graphical n techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis. a l Key Features: y (cid:129) Presents a comprehensive guide to clustering techniques, with focus s i on the practical aspects of cluster analysis. s (cid:129) Provides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from 5 bioinformatics and gene studies. t (cid:129) Updates the chapter on mixture models to include recent developments h and presents a new chapter on mixture modelling for structured data. E d Practitioners and researchers working in cluster analysis and data analysis i t will benefit from this book. i o n Brian S. Everitt (cid:129) Sabine Landau Morven Leese (cid:129) Daniel Stahl WILEY SERIES IN PROBABILITY AND STATISTICS Cluster Analysis WILEY SERIES IN PROBABILITYAND STATISTICS Established by WALTER A. SHEWHARTand SAMUEL S. WILKS Editors David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein, Geert Molenberghs,David W. Scott, Adrian F.M. Smith, Ruey S. Tsay, Sanford Weisberg Editors Emeriti Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G. Kendall, Jozef L. Teugels Acompletelistofthetitlesinthisseriescanbefoundonhttp://www.wiley.com/ WileyCDA/Section/id-300611.html. Cluster Analysis 5th Edition Brian S. Everitt . Sabine Landau Morven Leese . Daniel Stahl King’s College London, UK Thiseditionfirstpublished2011 (cid:1)2011JohnWiley&Sons,Ltd Registeredoffice JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,WestSussex,PO198SQ,UnitedKingdom Fordetailsofourglobaleditorialoffices,forcustomerservicesandforinformationabouthowtoapplyforpermission to reuse the copyright material in this book please see our website at www.wiley.com. TherightoftheauthortobeidentifiedastheauthorofthisworkhasbeenassertedinaccordancewiththeCopyright, DesignsandPatentsAct1988. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,inany formorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedbytheUK Copyright,DesignsandPatentsAct1988,withoutthepriorpermissionofthepublisher. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprintmaynotbe availableinelectronicbooks. Designationsusedbycompaniestodistinguishtheirproductsareoftenclaimedastrademarks.Allbrandnamesand productnamesusedinthisbookaretradenames,servicemarks,trademarksorregisteredtrademarksoftheir respectiveowners.Thepublisherisnotassociatedwithanyproductorvendormentionedinthisbook.Thispublication isdesignedtoprovideaccurateandauthoritativeinformationinregardtothesubjectmattercovered.Itissoldonthe understandingthatthepublisherisnotengagedinrenderingprofessionalservices.Ifprofessionaladviceorother expertassistanceisrequired,theservicesofacompetentprofessionalshouldbesought. LibraryofCongressCataloging-in-PublicationData Everitt,Brian. ClusterAnalysis/BrianS.Everitt.–5thed. p.cm.– (Wileyseriesinprobabilityandstatistics;848) Summary:‘‘Thiseditionprovidesathoroughrevisionofthefourtheditionwhichfocusesonthepracticalaspects ofclusteranalysisandcoversnewmethodologyintermsoflongitudinaldataandprovidesexamplesfrom bioinformatics.Reallifeexamplesareusedthroughouttodemonstratetheapplicationofthetheory,andfigures areusedextensivelytoillustrategraphicaltechniques.Thisbookincludesanappendixofgettingstartedoncluster analysisusingR,aswellasacomprehensiveandup-to-datebibliography.’’–Providedbypublisher. Summary:‘‘Thiseditionprovidesathoroughrevisionofthefourtheditionwhichfocusesonthepracticalaspects ofclusteranalysisandcoversnewmethodologyintermsoflongitudinaldataandprovidesexamplesfrom bioinformatics’’–Providedbypublisher. Includesbibliographicalreferencesandindex. ISBN978-0-470-74991-3(hardback) 1. Clusteranalysis. I.Title. QA278.E92011 519.5’3–dc22 2010037932 AcataloguerecordforthisbookisavailablefromtheBritishLibrary. PrintISBN:978-0-470-74991-3 ePDFISBN:978-0-470-97780-4 oBookISBN:978-0-470-97781-1 ePubISBN:978-0-470-97844-3 Setin10.25/12ptTimesRomanbyThomsonDigital,Noida,India To Joanna, Rachel, Hywel and Dafydd Brian Everitt To Premjit Sabine Landau To Peter Morven Leese To Charmen Daniel Stahl Contents Preface xiii Acknowledgement xv 1 An Introduction to classification and clustering 1 1.1 Introduction 1 1.2 Reasons for classifying 3 1.3 Numerical methods of classification – cluster analysis 4 1.4 What is a cluster? 7 1.5 Examples of the use of clustering 9 1.5.1 Market research 9 1.5.2 Astronomy 9 1.5.3 Psychiatry 10 1.5.4 Weather classification 11 1.5.5 Archaeology 12 1.5.6 Bioinformatics and genetics 12 1.6 Summary 13 2 Detecting clusters graphically 15 2.1 Introduction 15 2.2 Detecting clusters with univariate and bivariate plots of data 16 2.2.1 Histograms 16 2.2.2 Scatterplots 16 2.2.3 Density estimation 19 2.2.4 Scatterplot matrices 24 2.3 Using lower-dimensional projections of multivariate data for graphical representations 29 2.3.1 Principal components analysis of multivariate data 29 2.3.2 Exploratory projection pursuit 32 2.3.3 Multidimensional scaling 36 2.4 Three-dimensional plots and trellis graphics 38 2.5 Summary 41 viii CONTENTS 3 Measurement of proximity 43 3.1 Introduction 43 3.2 Similarity measures for categorical data 46 3.2.1 Similarity measures for binary data 46 3.2.2 Similarity measures for categorical data with more than two levels 47 3.3 Dissimilarity and distance measures for continuous data 49 3.4 Similarity measures for data containing both continuous and categorical variables 54 3.5 Proximity measures for structured data 56 3.6 Inter-group proximity measures 61 3.6.1 Inter-group proximity derived from the proximity matrix 61 3.6.2 Inter-group proximity based on group summaries for continuous data 61 3.6.3 Inter-group proximity based on group summaries for categorical data 62 3.7 Weighting variables 63 3.8 Standardization 67 3.9 Choice of proximity measure 68 3.10 Summary 69 4 Hierarchical clustering 71 4.1 Introduction 71 4.2 Agglomerative methods 73 4.2.1 Illustrative examples of agglomerativemethods 73 4.2.2 The standard agglomerativemethods 76 4.2.3 Recurrence formula for agglomerative methods 78 4.2.4 Problems of agglomerativehierarchical methods 80 4.2.5 Empirical studies of hierarchical agglomerative methods 83 4.3 Divisive methods 84 4.3.1 Monothetic divisive methods 84 4.3.2 Polythetic divisive methods 86 4.4 Applying the hierarchical clustering process 88 4.4.1 Dendrograms and other tree representations 88 4.4.2 Comparing dendrograms and measuring their distortion 91 4.4.3 Mathematical properties of hierarchical methods 92 4.4.4 Choice of partition – the problem of the number of groups 95 4.4.5 Hierarchical algorithms 96 4.4.6 Methods for large data sets 97 4.5 Applications of hierarchical methods 98 4.5.1 Dolphin whistles – agglomerative clustering 98 4.5.2 Needs of psychiatric patients – monothetic divisive clustering 101 4.5.3 Globalization of cities – polythetic divisive method 101

Description:
WILEY SERIES IN PROBABILITY AND STATISTICS. Cluster. Analysis. 5th Edition. Brian S. Everitt • Sabine Landau. Morven Leese • Daniel Stahl
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.