Table Of ContentAN INTRODUCTION TO
MATHEMATICAL
TAXONOMY
G.
DUNN AND
B. S.
EVERITT
An Introduction to
Mathematical
Taxonomy
G.
DUNN
School of Epidemiology & Health Sciences
University of Manchester
13. S.
EVERITT
Institute of Psychiatry
University of London
Dover Publications, Inc.
Mineola, New York
Next, in the potato, we have the scarcely innocent underground
stem of one of a tribe set aside for evil; having the deadly night
shade for its queen, and including the henbane, the witch's
mandrake, and the worst natural curse of modem civilization
tobacco ... Examine the purple and yellow bloom of the com
mon hedge nightshade; you will find it constructed exactly like
some of the forms of the cyclamen; and getting this clue, you
will find at last the whole poisonous and terrible group to be -
sisters of the primulas!
John Ruskin
The Queen o/the Air
Copyright
Copyright © 1982 by Cambridge University Press
All rights reserved.
Bibliographical Note
This Dover edition, first published in 2004, is an unabridged republication of
the work originally published in 1982 by Cambridge University Press as a vol
ume in the series Cambridge Studies in Mathematical Biology.
International Standard Book Number: 0-486-43587-3
Manufactured in the United States of America
Dover Publications, Inc., 31 East 2nd Street, Mineola, N.Y. 11501
CONTENTS
Preface vii
1 Au iDtroduction to the philosopby and aims of numerical
taxonomy 1
l.l Introduction
1.2 Systematics, classification and taxonomy
1.3 The construction of taxonomic hierarchies by traditional
and numerical taxonomy: comparison of methods 2
1.4 The philosophy of taxonomy 4
1.5 Classification and inferences concerning patterns of
evolution 7
1.6 Summary 9
2 Taxonomic characters 11
2.1 Introduction 11
2.2 Number of characters 12
2.3 Type of characters and coding of character states 14
2.3.1 Qualitative characters 14
2.3.2 Quantitative characters 18
2.4 Weighting of characters 20
2.5 Homology of characters 21
2.6 Summary 23
3 The measurement of similarity 2S
3.1 Introduction 25
3.2 Similarity measures for binary characters 25
3.2.1 The simple matching coefficient 26
3.2.2 Jaccard's coefficient 26
3.3 Similarity measures for qualitative characters having more
than two states 29
3.4 Similarity measures for quantitative characters 30
3.5 Measures of dissimilarity and distance 31
3.6· Gowe'r's similarity coefficient 39
3. 7 Simila~~ty and distance between popUlations 41
iv Contents
3.7.1 Qualitative characters Al
3.7.2 Quantitative characters 43
3.8 Summary 45
4 Principal components analysis 46
4.1 Introduction 46
4.2 Principal components analysis-geometrical interpretation 47
4.3 A brief mathematical account of principal components
analysis 49
4.4 Examples 51
4.5 Principal components plots 55
4.6 Factor analysis 57
4.7 Summary 58
5 MultidimeDSiooal scaliog 59
5.1 Introduction 59
5.2 Classical multidimensional scaling 59
5.2.1 Principal coordinates analysis-technical details 61
5.2.2 Principal coordinates analysis-an example 65
5.3 Other methods of multidimensional scaling 68
5.3.1 Non-metric multidimensional scaling-technical details 70
5.3.2 Non-metric multidimensional scaling-examples 71
5.4 Minimum spanning trees 73
5.5 Summary 76
6 Cluster analysis 77
6.1 Introduction 77
6.2 Hierarchical clustering techniques 77
6.2.1 Single-linkage clustering 78
6.2.2 Complete-linkage clustering 80
6.2.3 Group-average clustering 81
6.2.4 Centroid clustering 82
6.3 Properties of hierarchical techniques 85
6.4 Other clustering methods 87
6.4.1 Monothetic divisive clustering 87
6.4.2 Minimization of trace (W) 88
6.4.3 A multivariate mixture model for cluster analysis 89
6.4.4 Jardine and Sibson's K-dend clustering method 89
6.5 An example 91
6.6 The evaluation of results and other problems 94
6.6.1 Measuring clustering tendency 95
6.6.2 Global fit of hierarchy 96
6.6.3 Partitions from a hierarchy 96
6.7 What is a cluster? 101
6.8 Summary 104
Contents v
7 Identification and assignment techniques 106
7.1 Introduction 106
7.2 Diagnostic keys 107
7.2.1 The construction of diagnostic keys 110
7.3 Probabilistic assignment techniques 112
7.3.1 Fisher's linear discriminant function 115
7.3.2 Canonical variate analysis 116
7.3.3 An example of canonical variate analysis 117
7.4 Summary 121
8 The construction of evolutionar~ trees 122
8.1 Introduction 122
8.2 Evolution as a branching process 122
8.3 The principle of minimal evolution 125
8.4 The topology of the tree 127
8.5 Optimization of trees 128
8.6 Reticulate evolution: the problem of hybrids 129
8.7 Gene phylogenies 131
8.8 Summary 136
References 138
Author Index 145
Subject Index 147
PREFACE
Our aim in this text is to provide biologists and others with an
introduction to those mathematical techniques useful in taxonomy.
We hope that the text will be suitable both for undergraduates
following courses in mathematical biology and for research workers
whose interests include classification of particular organisms. The
mathematical level demanded for understanding the text has been
kept deliberately low, involving only a passing knowledge of matrix
algebra and elementary statistics.
There are already a number of books available dealing with the
topics of numerical and mathematical taxonomy, notably those of
Sneath & Sokal, and Jardine & Sibson. This text, however, is
specifically designed as an introduction to the area, and does not
claim to be as comprehensive as·t hose just mentioned. It should,
however, serve as useful preparation for those two works.
Our thanks are due to Dr David Hand for many useful comments
on the text, and to Mrs Bertha Lakey for her careful typing of the
manuscript. Finally we would like to thank all of our colleagues at
the Institute of Psychiatry for providing a stimulating working
environment, and almost constant entertainment during the prepara
tion of this book.
B. S. Everitt
G. Dunn
Institute of Psychiatry
London, November 1980
1
An introduction to the philosophy
and aims of numerical taxonomy
1.1 Introduction
Classification of organisms has been a preoccupation of
biologists since the very first biological investigations. Aristotle, for
example, built up an elaborate system for classifying the species of
the animal kingdom, which began by dividing animals into two main
groups; those having red blood, corresponding roughly to our own
vertebrates, and those lacking it, the invertebrates. He further
subdivided these two groups according to the way in which the young
are produced, whether alive, in eggs, as pupae and so on. Such
classification has always been an essential component of man's
knowledge of the living world. If nothing else, early man must have
been able to realize that many individual objects (whether or not they
would now be classified as 'living') shared certain properties such as
being edible, or poisonous, or ferocious, and so on. A modem
biologist might be tempted to remark that the earliest methods of
classifying animals and plants by biologists were, like those of
prehistoric man, based upon what may now be considered super
ficially similar features, and that although the resulting classifications
could have been useful, for example for communication, they did not
in general imply any 'natural' or 'real' affinity. However, one might
then be tempted to ask what are 'superficially similar features' and
what are 'real' or 'natural' classifications? Such questions and the
attempts to answer them will be discussed in later parts of this
chapter.
1.2 Systematics, classification and taxonomy
Before proceeding further it is necessary to introduce a
number of terms which will be met frequently throughout the rest
of the book. The definitions given here are intentionally brief; the
full extent of the meaning of each term will become apparent during
the remaining chapters.