Table Of ContentSoftware Manual
Institute of Bioinformatics, Johannes Kepler University Linz
APCluster
An R Package for Affinity Propagation Clustering
Ulrich Bodenhofer, Johannes Palme, Chrats Melkonian, and Andreas
Kothmeier
Institute of Bioinformatics, Johannes Kepler University Linz
Altenberger Str. 69, 4040 Linz, Austria
apcluster@bioinf.jku.at
Version 1.4.10, May 31, 2022
Institute of Bioinformatics Tel. +43 732 2468 4520
Johannes Kepler University Linz Fax +43 732 2468 4539
A-4040 Linz, Austria http://www.bioinf.jku.at
2
Scope and Purpose of this Document
This document is a user manual for the R package apcluster [2]. It is only meant as a gentle
introduction into how to use the basic functions implemented in this package. Not all features of
the R package are described in full detail. Such details can be obtained from the documentation
enclosedintheRpackage. Furthernotethefollowing: (1)thisisneitheranintroductiontoaffin-
ity propagation nor to clustering in general; (2) this is not an introduction to R. If you lack the
background for understanding this manual, you first have to read introductory literature on these
subjects.
Contents 3
Contents
1 Introduction 4
2 Installation 4
2.1 InstallationviaCRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Manualinstallationfromsource . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Compatibilityissues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 GettingStarted 5
4 AdjustingInputPreferences 10
5 Exemplar-basedAgglomerativeClustering 20
5.1 Gettingstarted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Mergingclustersobtainedfromaffinitypropagation . . . . . . . . . . . . . . . . 23
5.3 Detailsonthemergingobjective . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 LeveragedAffinityPropagation 27
7 SparseAffinityPropagation 30
8 ProcessingBiologicalSequences 33
9 SimilarityMatrices 33
9.1 ThefunctionnegDistMat() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
9.2 Othersimilaritymeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
9.3 Rectangularsimilaritymatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.4 Definingacustomsimilaritymeasureforleveragedaffinitypropagation . . . . . 43
9.5 Definingacustomsimilaritymeasurethatcreatesasparsesimilaritymatrix . . . 43
10 Miscellaneous 44
10.1 Conveniencevs.efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.2 Clusteringnamedobjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10.3 Computingalabelvectorfromaclusteringresult . . . . . . . . . . . . . . . . . 46
10.4 Customizingheatmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
10.5 Addingalegendtoplotsofclusteringresults . . . . . . . . . . . . . . . . . . . 49
10.6 Implementationandperformanceissues . . . . . . . . . . . . . . . . . . . . . . 50
11 SpecialNotesforUsersUpgradingfromPreviousVersions 51
11.1 Upgradingfromaversionolderthan1.3.0 . . . . . . . . . . . . . . . . . . . . . 51
11.2 UpgradingtoVersion1.3.3ornewer . . . . . . . . . . . . . . . . . . . . . . . . 51
11.3 UpgradingtoVersion1.4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
11.4 UpgradingtoVersion1.4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
12 HowtoCiteThisPackage 52
4 2 Installation
1 Introduction
Affinity propagation (AP) is a relatively new clustering algorithm that has been introduced by
BrendanJ.FreyandDelbertDueck[6].1 Theauthorsthemselvesdescribeaffinitypropagationas
follows:
“Analgorithmthatidentifiesexemplarsamongdatapointsandformsclustersofdata
points around these exemplars. It operates by simultaneously considering all data
point as potential exemplars and exchanging messages between data points until a
goodsetofexemplarsandclustersemerges.”
AP has been applied in various fields recently, among which bioinformatics is becoming in-
creasinglyimportant. FreyandDueckhavemadetheiralgorithmavailableasMatlabcode.1 Mat-
lab,however,isrelativelyuncommoninbioinformatics. Instead,thestatisticalcomputingplatform
R has become a widely accepted standard in this field. In order to leverage affinity propagation
forbioinformaticsapplications,wehaveimplementedaffinitypropagationasanRpackage. Note,
however, that the given package is in no way restricted to bioinformatics applications. It is as
generallyapplicableasFrey’sandDueck’soriginalMatlabcode.1
Starting with Version 1.1.0, the apcluster package also features exemplar-based agglomer-
ative clustering which can be used as a clustering method on its own or for creating a hierarchy
of clusters that have been computed previously by affinity propagation. Leveraged Affinity Prop-
agation, a variant of AP especially geared to applications involving large data sets, has first been
includedinVersion1.3.0.
2 Installation
2.1 InstallationviaCRAN
TheRpackageapcluster(currentversion: 1.4.10)ispartoftheComprehensiveRArchiveNet-
work (CRAN)2. Thesimplest way to installthe package, therefore, is to enterthe following com-
mandintoyourRsession:
install.packages("apcluster")
If you use R on Windows or Mac OS, you can also conveniently use the package installation
menuofyourRGUI.
2.2 Manualinstallationfromsource
Under special circumstances, e.g. if you want to compile the C++ code included in the package
with some custom options, you may prefer to install the package manually from source. To this
end,openthepackage’spageatCRAN3 andthenproceedasfollows:
1https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/
2http://cran.r-project.org/
3https://CRAN.R-project.org/package=apcluster
3 Getting Started 5
1. Downloadapcluster_1.4.10.tar.gzandsaveittoyourharddisk.
2. Openashell/terminal/commandpromptwindowandchangetothedirectorywhereyouput
apcluster_1.4.10.tar.gz. Enter
R CMD INSTALL apcluster_1.4.10.tar.gz
toinstallthepackage.
Notethatthismightrequireadditionalsoftwareonsomeplatforms. WindowsrequiresRtools4 to
beinstalledandtobeavailableinthedefaultsearchpath(environmentvariablePATH).MacOSX
requires Xcode developer tools5 (make sure that you have the command line tools installed with
Xcode).
2.3 Compatibilityissues
AllversionsdownloadablefromCRANhavebeenbuiltusingthelatestversion,R4.1.1. However,
thepackageshouldworkwithoutsevereproblemsonRversions≥3.0.0.
3 Getting Started
Toloadthepackage,enterthefollowinginyourRsession:
library(apcluster)
If this command terminates without any error message or warning, you can be sure that the
packagehasbeeninstalledsuccessfully. Ifso,thepackageisreadyforusenowandyoucanstart
clusteringyourdatawithaffinitypropagation.
Thepackageincludesbothausermanual(thisdocument)andareferencemanual(helppages
foreachfunction). Toviewtheusermanual,enter
vignette("apcluster")
Helppagescanbeviewedusingthehelpcommand. Itisrecommendedtostartwith
help(apcluster)
Affinitypropagationdoesnotrequirethedatasamplestobeofanyspecifickindorstructure.
AP only requiresa similarity matrix, i.e., givenl data samples, this isan l×l real-valued matrix
S, in which an entry S corresponds to a value measuring how similar sample i is to sample j.
ij
APdoesnotrequirethesevaluestobeinaspecificrange. Valuescanbepositiveornegative. AP
doesnotevenrequirethesimilaritymatrixtobesymmetric(although,inmostapplications,itwill
4http://cran.r-project.org/bin/windows/Rtools/
5https://developer.apple.com/technologies/tools/
6 3 Getting Started
be symmetric anyway). A value of −∞ is interpreted as “absolute dissimilarity”. The higher a
value,themoresimilartwosamplesareconsidered.
To get a first impression, let us create a random data set in R2 as the union of two “Gaussian
clouds”:
cl1 <- cbind(rnorm(30, 0.3, 0.05), rnorm(30, 0.7, 0.04))
cl2 <- cbind(rnorm(30, 0.7, 0.04), rnorm(30, 0.4, .05))
x1 <- rbind(cl1, cl2)
plot(x1, xlab="", ylab="", pch=19, cex=0.8)
8
0.
7
0.
6
0.
5
0.
4
0.
3
0.
0.3 0.4 0.5 0.6 0.7
Thepackageapclusteroffersseveraldifferentwaysforclusteringdata. Thesimplestwayisthe
following:
apres1a <- apcluster(negDistMat(r=2), x1)
Inthisexample,thefunctionapcluster()firstcomputesasimilaritymatrixfortheinputdata
x1 using the similarity function passed as first argument. The choice negDistMat(r=2) is the
standardsimilaritymeasureusedinthepapersofFreyandDueck—negativesquareddistances.
Alternatively,onecancomputethesimilaritymatrixbeforehandandcallapcluster()forthe
similaritymatrix(foramoredetaileddescriptionofthedifferences,see10.1):
s1 <- negDistMat(x1, r=2)
apres1b <- apcluster(s1)
The function apcluster() creates an object belonging to the S4 class APResult which is
defined by the present package. To get detailed information on which data are stored in such
objects,enter
3 Getting Started 7
help(APResult)
Thesimplestthingwecandoistoenterthenameoftheobject(whichimplicitlycallsshow())
togetasummaryoftheclusteringresult:
apres1a
##
## APResult object
##
## Number of samples = 60
## Number of iterations = 131
## Input preference = -0.1416022
## Sum of similarities = -0.1955119
## Sum of preferences = -0.2832044
## Net similarity = -0.4787163
## Number of clusters = 2
##
## Exemplars:
## 25 44
## Clusters:
## Cluster 1, exemplar 25:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 26 27 28 29 30
## Cluster 2, exemplar 44:
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## 53 54 55 56 57 58 59 60
Theapclusterpackageallowsforplottingtheoriginaldatasetalongwithaclusteringresult:
plot(apres1a, x1)
8 3 Getting Started
8
0.
7
0.
6
0.
5
0.
4
0.
3
0.
0.3 0.4 0.5 0.6 0.7
In this plot, each color corresponds to one cluster. The exemplar of each cluster is marked by a
boxandallclustermembersareconnectedtotheirexemplarswithlines.
Aheatmapisplottedwithheatmap():
heatmap(apres1a)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Intheheatmap,thesamplesaregroupedaccordingtoclusters. Theaboveheatmapconfirmsagain
thattherearetwomainclustersinthedata. Aheatmapcanbeplottedfortheobjectapres1abe-
3 Getting Started 9
causeapcluster(),ifcalledfordataandasimilarityfunction,bydefaultincludesthesimilarity
matrixintheoutputobject(unlessitwascalledwiththeswitchincludeSim=FALSE).Ifthesimi-
laritymatrixisnotincluded(whichisthedefaultifapcluster()hasbeencalledonasimilarity
matrixdirectly),heatmap()mustbecalledwiththesimilaritymatrixassecondargument:
heatmap(apres1b, s1)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Supposewewanttohavebetterinsightintowhatthealgorithmdidineachiteration. Forthis
purpose,wecansupplytheoptiondetails=TRUEtoapcluster():
apres1c <- apcluster(s1, details=TRUE)
This option tells the algorithm to keep a detailed log about its progress. For example, this
allowsforplottingthethreeperformancemeasuresthatAPusesinternallyforeachiteration:
plot(apres1c)
0
2
−
milarity −4
Si
6 Fitness (overall net similarity)
−
Sum of exemplar preferences
Sum of similarities to exemplars
8
−
0 20 40 60 80 100 120
# Iterations
10 4 Adjusting Input Preferences
Theseperformancemeasuresare:
1. Sumofexemplarpreferences
2. Sumofsimilaritiesofexemplarstotheirclustermembers
3. Netfitness: sumofthetwoformer
For details, the user is referred to the original affinity propagation paper [6] and the supplemen-
tary material published on the affinity propagation Web page.1 We see from the above plot that
the algorithm has not made any change for the last 100 iterations. AP, through its parameter
convits, allows to control for how long AP waits for a change until it terminates (the default is
convits=100). IftheuserhasthefeelingthatAPwillprobablyconvergequickeronhis/herdata
set,alowervaluecanbeused:
apres1c <- apcluster(s1, convits=15, details=TRUE)
apres1c
##
## APResult object
##
## Number of samples = 60
## Number of iterations = 46
## Input preference = -0.1416022
## Sum of similarities = -0.1955119
## Sum of preferences = -0.2832044
## Net similarity = -0.4787163
## Number of clusters = 2
##
## Exemplars:
## 25 44
## Clusters:
## Cluster 1, exemplar 25:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 26 27 28 29 30
## Cluster 2, exemplar 44:
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## 53 54 55 56 57 58 59 60
4 Adjusting Input Preferences
Apartfromthe similarity matrixitself, themostimportant input parameterofAPisthe so-called
inputpreferencewhichcanbeinterpretedasthetendencyofadatasampletobecomeanexemplar
(see[6]andsupplementarymaterialontheAPhomepage1 foramoredetailedexplanation). This
inputpreferencecaneitherbechosenindividuallyforeachdatasampleoritcanbeasinglevalue
Description:ity propagation nor to clustering in general; (2) this is not an introduction to R. If you Affinity propagation (AP) is a relatively new clustering algorithm that has