Table Of ContentNEW ANT COLONY OPTIMISATION ALGORITHMS
FOR HIERARCHICAL CLASSIFICATION OF PROTEIN
FUNCTIONS
a thesis submitted to
The University of Kent at Canterbury
in the subject of computer science
for the degree
of doctor of philosophy.
By
Fernando Esteban Barril Otero
January 2010
Abstract
Ant colony optimisation (ACO) is a metaheuristic to solve optimisation problems
inspired by the foraging behaviour of ant colonies. It has been successfully ap-
plied to several types of optimisation problems, such as scheduling and routing,
and more recently for the discovery of classification rules. The classification task
in data mining aims at predicting the value of a given goal attribute for an exam-
ple, based on the values of a set of predictor attributes for that example. Since
real-world classification problems are generally described by nominal (categori-
cal or discrete) and continuous (real-valued) attributes, classification algorithms
are required to be able to cope with both nominal and continuous attributes.
Current ACO classification algorithms have been designed with the limitation of
discovering rules using nominal attributes describing the data. Furthermore, they
also have the limitation of not coping with more complex types of classification
problems—e.g., hierarchical multi-label classification problems.
This thesis investigates the extension of ACO classification algorithms to cope
with the aforementioned limitations. Firstly, a method is proposed to extend the
rule construction process of ACO classification algorithms to cope with contin-
uous attributes directly. Four new ACO classification algorithms are presented,
as well as a comparison between them and well-known classification algorithms
from the literature. Secondly, an ACO classification algorithm for the hierarchical
problem of protein function prediction—which is a major type of bioinformatics
problem addressed in this thesis—is presented. Finally, three different approaches
to extend ACO classification algorithms to the more complex case of hierarchical
multi-label classification are described, elaborating on the ideas of the proposed
hierarchical classification ACO algorithm. These algorithms are compare against
state-of-the-artdecision treeinduction algorithms forhierarchical multi-label clas-
sification in the context of protein function prediction.
The computational results of experiments with a wide range of data sets—
including challenging protein function prediction data sets with very largenumber
ii
of class labels—have shown that the proposed ACO classification algorithms are
competitive to well-known classification algorithms from the literature, for both
conventional (flat single-label) classification andhierarchical multi-label classifica-
tion problems. These algorithms address unexplored research areas in the context
of ACO classification algorithms to the best of our knowledge, and are therefore
original contributions.
iii
Acknowledgements
Firstly, I would like to thank my supervisors Alex Freitas and Colin Johnson
for the opportunity of undertaking this work, and for their excellent supervision,
support and guidance throughout my PhD.
Furthermore, I owe a big thank you to my family, for always being there. A
special thanks to my parents for the opportunities and the support they have
given me in all my endeavours. I am also very grateful to Carin Tun˚aker for all
her help, patience, love and support.
Many friends made my time in Canterbury highly enjoyable. I would like to
thank Abigail Smith, Beatriz Martinez, Ben Robert, Claire Harris, Cristiano Can-
tore, Dolores Castaneda, Eva Barbazan, Gisele Pappa, Javier Valbuena, Mariola
Esteban and Miguel Leon-Ledesma for all the time that we spent together. My
thanks also go to my friends and colleagues from the School of Computing at
the University of Kent, in special to Carlos Silla, Lingfang Du, Mudassar Iqbal,
Sebastian Marion and Siddhartha Ghosh.
Finally, I would like to gratefully acknowledge the financial support received
from the European Union’s INTERREG project (Ref. No. 162/025/361) as well
as the continuous financial support received from the School of Computing at the
University of Kent, during the course of my PhD.
iv
Contents
Abstract ii
Acknowledgements iv
List of Tables ix
List of Figures xiii
List of Algorithms xix
1 Introduction 1
1.1 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Data Mining 9
2.1 Common Data Mining Tasks . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Association Rule Learning . . . . . . . . . . . . . . . . . . 12
2.2 The Conventional (Flat) Classification Task . . . . . . . . . . . . 13
2.2.1 Decision Tree Induction . . . . . . . . . . . . . . . . . . . 14
2.2.2 Rule Induction . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Multi-Label Classification . . . . . . . . . . . . . . . . . . 21
2.3 The Hierarchical Classification Task . . . . . . . . . . . . . . . . . 23
2.3.1 Basic Concepts of Hierarchical Classification . . . . . . . . 24
2.3.2 Hierarchical Multi-Label Classification . . . . . . . . . . . 35
2.4 Evaluation Measures for Classification . . . . . . . . . . . . . . . 36
2.5 Evaluation Measures for Hierarchical Classification . . . . . . . . 37
v
2.5.1 Hierarchical Measures of Precision, Recall and F-measure . 38
2.5.2 Precision-Recall Curves . . . . . . . . . . . . . . . . . . . . 39
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Ant Colony Optimisation 43
3.1 The ACO Metaheuristic . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.1 Problem Representation . . . . . . . . . . . . . . . . . . . 46
3.1.2 Building Solutions . . . . . . . . . . . . . . . . . . . . . . 47
3.1.3 Pheromone Trails . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 ACO applied to Classification: Ant-Miner . . . . . . . . . . . . . 49
3.2.1 Construction Graph . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 Rule Construction . . . . . . . . . . . . . . . . . . . . . . 52
3.2.3 Heuristic Information . . . . . . . . . . . . . . . . . . . . . 54
3.2.4 Rule Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.5 Rule Pruning . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.6 Pheromone Trails . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.7 Classifying New Examples . . . . . . . . . . . . . . . . . . 58
3.3 Ant-Miner Extensions . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 Bioinformatics 64
4.1 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Protein Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.1 UniProt Knowledgebase . . . . . . . . . . . . . . . . . . . 72
4.2.2 InterPro . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.3 IntAct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 Protein Functional Classification Schemes . . . . . . . . . . . . . 73
4.3.1 Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.2 FunCat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Protein Function Prediction . . . . . . . . . . . . . . . . . . . . . 78
4.4.1 Protein Features as Predictor Attributes . . . . . . . . . . 81
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 Handling Continuous Attributes in Ant Colony Classification Al-
gorithms 86
5.1 Ant-Miner Coping with Continuous Attributes . . . . . . . . . . . 88
vi
5.1.1 Construction Graph . . . . . . . . . . . . . . . . . . . . . 89
5.1.2 Heuristic Information . . . . . . . . . . . . . . . . . . . . . 89
5.1.3 Rule Construction . . . . . . . . . . . . . . . . . . . . . . 92
5.1.4 Pheromone Updating . . . . . . . . . . . . . . . . . . . . . 94
5.2 Minimum Description Length-based Discretisation . . . . . . . . . 94
5.3 Encoding Attribute Interaction as Pheromone Levels: Associating
Pheromones with Edges . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 Combining Pheromone Associated with Edges and Minimum De-
scription Length-based Discretisation . . . . . . . . . . . . . . . . 103
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Computational Results for Ant-Miner Coping with Continuous
Attributes 105
6.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7 Hierarchical and Multi-Label Ant Colony Classification Algo-
rithms 121
7.1 Hierarchical Classification Ant-Miner . . . . . . . . . . . . . . . . 122
7.1.1 Construction Graphs . . . . . . . . . . . . . . . . . . . . . 124
7.1.2 Rule Construction . . . . . . . . . . . . . . . . . . . . . . 127
7.1.3 Rule Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 129
7.1.4 Rule Pruning . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.1.5 Pheromone Trails . . . . . . . . . . . . . . . . . . . . . . . 132
7.1.6 Heuristic Information . . . . . . . . . . . . . . . . . . . . . 133
7.1.7 Using a Rule List to Classify New Examples . . . . . . . . 136
7.2 Coping with Multi-Label Data . . . . . . . . . . . . . . . . . . . . 136
7.2.1 Multi-Label Rule Consequent . . . . . . . . . . . . . . . . 138
7.2.2 Distance-based Heuristic Information . . . . . . . . . . . . 140
7.2.3 Distance-based Discretisation of Continuous Values . . . . 142
7.2.4 Hierarchical Multi-Label Rule Evaluation . . . . . . . . . . 146
7.2.5 Simplified Rule Pruning . . . . . . . . . . . . . . . . . . . 146
7.3 Pittsburgh-based Approach . . . . . . . . . . . . . . . . . . . . . 147
7.3.1 Extended Sequential Covering Strategy . . . . . . . . . . . 149
7.3.2 Updating Pheromone Values Based on the Rule List Quality 153
vii
7.4 A Baseline Approach for Hierarchical Multi-Label Classification
with Ant-Miner: Building One Classifier per Class . . . . . . . . . 153
7.4.1 The Baseline Ant Colony Algorithm . . . . . . . . . . . . . 154
7.4.2 Class-specific Heuristic Information . . . . . . . . . . . . . 157
7.4.3 Class-specific Interval Selection for Continuous Attributes 158
7.4.4 Rule Quality Measure . . . . . . . . . . . . . . . . . . . . 159
7.4.5 Classifying New Examples . . . . . . . . . . . . . . . . . . 160
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8 Computational Results for Hierarchical and Multi-Label Ant-
Miner 163
8.1 Initial Work on Protein Function Prediction—the hAnt-Miner al-
gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.1.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . 164
8.1.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 166
8.1.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . 168
8.2 Hierarchical Multi-Label Protein Function Prediction in Yeast . . 170
8.2.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.2.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 171
8.2.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . 173
8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9 Conclusions and Future Research 186
9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
References 193
A Software Availability 207
viii
List of Tables
2.1 The data set for the artificial ‘weather’ problem [97], where each
row corresponds to a different ‘saturday morning’ observation and
the ‘Class’ indicates if whether or not (‘P’ or ‘N’, respectively) it is
a good day to play tennis. . . . . . . . . . . . . . . . . . . . . . . 10
2.2 An example of a multi-label data set, where each example is asso-
ciated with one or more different class labels. In this example, the
predictor attributes are omitted and the class attribute has three
different values {sports, politics, entertainment}. . . . . . . . . . . 22
4.1 Summary of biological databases publicly available online, mainly
containing protein information. . . . . . . . . . . . . . . . . . . . 71
4.2 Main (top-level) FunCat categories. All categories, with the excep-
tion of 98 and 99, represent a root category of a tree-structured
hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1 Summary of the classification algorithms used in the experiments.
Ant-Miner is described in chapter 3; cAnt-Miner variations are de-
scribedinchapter 5; J48,JRipandPARTareimplemented inWeka
workbench [133] and briefly described in chapter 2. . . . . . . . . 106
6.2 Summary of the data sets used in the experiments. The first col-
umn of the table gives the data set abbreviation, the second gives
the dataset name, the third and forth columns give the number
of nominal and continuous attributes respectively, the fifth column
gives the number of class labels, the sixth column gives the number
ofexamplesintheoriginaldatasetandtheseventh columngivesthe
number of examples in the discrete dataset (after the discretisation
of continuous attributes and removal of duplicated examples). . . 107
6.3 Summary of the user-defined parameter values used in Ant-Miner
and cAnt-Miner variations in all data sets. . . . . . . . . . . . . . 109
ix
6.4 Predictive accuracy (mean ± standard deviation) obtained with
the ten-fold cross-validation procedure in the eighteen data sets
by Ant-Miner, cAnt-Miner, cAnt-Miner-MDL and cAnt-Miner2,
respectively. The last row of the table indicates the rank-based
score—the higher the score, the better the ranking—according to
the non-parametric Friedman test [34, 54]. . . . . . . . . . . . . . 111
6.5 Predictive accuracy (mean ± standard deviation) obtained with
the ten-fold cross-validation procedure in the eighteen data sets
by cAnt-Miner2-MDL, J48, JRip and PART, respectively. The
last row of the table indicates the rank-based score—the higher
the score, the better the ranking—according to the non-parametric
Friedman test [34, 54]. . . . . . . . . . . . . . . . . . . . . . . . . 112
6.6 Model size (mean ± standard deviation) obtained with the ten-fold
cross-validation procedure in the eighteen data sets by Ant-Miner,
cAnt-Miner, cAnt-Miner-MDL and cAnt-Miner2, respectively. The
last row of the table indicates the rank-based score—the lower the
score, the better the ranking, since smaller models are preferred—
according to the non-parametric Friedman test [34, 54]. . . . . . . 115
6.7 Model size (mean ± standard deviation) obtained with the ten-
fold cross-validation procedure in the eighteen data sets by cAnt-
Miner2-MDL, J48, JRip and PART, respectively. The last row of
the table indicates the rank-based score—the lower the score, the
better the ranking, since smaller models are preferred—according
to the non-parametric Friedman test [34, 54]. . . . . . . . . . . . . 116
6.8 Summary of the pairwise comparisons in terms of predictive accu-
racy amongst Ant-Miner and cAnt-Miner variations conducted in
order to evaluate the influence of each individual Ant-Miner’s ex-
tensions proposed in chapter 5. For each row, the ‘⊕’ (‘⊖’) symbol
indicates that the first algorithm performs better (worse) than the
second algorithm, followed by the sum of positive/negative ranks
(Score column) and the corresponding p-value, according to the
Wilcoxon signed rank test. The significant differences at the 0.01
level are shown in bold. . . . . . . . . . . . . . . . . . . . . . . . . 117
x
Description:2.1 Common Data Mining Tasks . 3.2 ACO applied to Classification: Ant-Miner 7 Hierarchical and Multi-Label Ant Colony Classification Algo-.