Table Of Content

Software Manual Institute of Bioinformatics, Johannes Kepler University Linz APCluster An R Package for Affinity Propagation Clustering Ulrich Bodenhofer, Johannes Palme, Chrats Melkonian, and Andreas Kothmeier Institute of Bioinformatics, Johannes Kepler University Linz Altenberger Str. 69, 4040 Linz, Austria apcluster@bioinf.jku.at Version 1.4.10, May 31, 2022 Institute of Bioinformatics Tel. +43 732 2468 4520 Johannes Kepler University Linz Fax +43 732 2468 4539 A-4040 Linz, Austria http://www.bioinf.jku.at 2 Scope and Purpose of this Document This document is a user manual for the R package apcluster [2]. It is only meant as a gentle introduction into how to use the basic functions implemented in this package. Not all features of the R package are described in full detail. Such details can be obtained from the documentation enclosedintheRpackage. Furthernotethefollowing: (1)thisisneitheranintroductiontoaffin- ity propagation nor to clustering in general; (2) this is not an introduction to R. If you lack the background for understanding this manual, you first have to read introductory literature on these subjects. Contents 3 Contents 1 Introduction 4 2 Installation 4 2.1 InstallationviaCRAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Manualinstallationfromsource . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Compatibilityissues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 GettingStarted 5 4 AdjustingInputPreferences 10 5 Exemplar-basedAgglomerativeClustering 20 5.1 Gettingstarted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2 Mergingclustersobtainedfromaffinitypropagation . . . . . . . . . . . . . . . . 23 5.3 Detailsonthemergingobjective . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6 LeveragedAffinityPropagation 27 7 SparseAffinityPropagation 30 8 ProcessingBiologicalSequences 33 9 SimilarityMatrices 33 9.1 ThefunctionnegDistMat() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 9.2 Othersimilaritymeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 9.3 Rectangularsimilaritymatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 9.4 Definingacustomsimilaritymeasureforleveragedaffinitypropagation . . . . . 43 9.5 Definingacustomsimilaritymeasurethatcreatesasparsesimilaritymatrix . . . 43 10 Miscellaneous 44 10.1 Conveniencevs.efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 10.2 Clusteringnamedobjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 10.3 Computingalabelvectorfromaclusteringresult . . . . . . . . . . . . . . . . . 46 10.4 Customizingheatmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 10.5 Addingalegendtoplotsofclusteringresults . . . . . . . . . . . . . . . . . . . 49 10.6 Implementationandperformanceissues . . . . . . . . . . . . . . . . . . . . . . 50 11 SpecialNotesforUsersUpgradingfromPreviousVersions 51 11.1 Upgradingfromaversionolderthan1.3.0 . . . . . . . . . . . . . . . . . . . . . 51 11.2 UpgradingtoVersion1.3.3ornewer . . . . . . . . . . . . . . . . . . . . . . . . 51 11.3 UpgradingtoVersion1.4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 11.4 UpgradingtoVersion1.4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 12 HowtoCiteThisPackage 52 4 2 Installation 1 Introduction Affinity propagation (AP) is a relatively new clustering algorithm that has been introduced by BrendanJ.FreyandDelbertDueck[6].1 Theauthorsthemselvesdescribeaffinitypropagationas follows: “Analgorithmthatidentifiesexemplarsamongdatapointsandformsclustersofdata points around these exemplars. It operates by simultaneously considering all data point as potential exemplars and exchanging messages between data points until a goodsetofexemplarsandclustersemerges.” AP has been applied in various fields recently, among which bioinformatics is becoming in- creasinglyimportant. FreyandDueckhavemadetheiralgorithmavailableasMatlabcode.1 Mat- lab,however,isrelativelyuncommoninbioinformatics. Instead,thestatisticalcomputingplatform R has become a widely accepted standard in this field. In order to leverage affinity propagation forbioinformaticsapplications,wehaveimplementedaffinitypropagationasanRpackage. Note, however, that the given package is in no way restricted to bioinformatics applications. It is as generallyapplicableasFrey’sandDueck’soriginalMatlabcode.1 Starting with Version 1.1.0, the apcluster package also features exemplar-based agglomer- ative clustering which can be used as a clustering method on its own or for creating a hierarchy of clusters that have been computed previously by affinity propagation. Leveraged Affinity Prop- agation, a variant of AP especially geared to applications involving large data sets, has first been includedinVersion1.3.0. 2 Installation 2.1 InstallationviaCRAN TheRpackageapcluster(currentversion: 1.4.10)ispartoftheComprehensiveRArchiveNet- work (CRAN)2. Thesimplest way to installthe package, therefore, is to enterthe following com- mandintoyourRsession: install.packages("apcluster") If you use R on Windows or Mac OS, you can also conveniently use the package installation menuofyourRGUI. 2.2 Manualinstallationfromsource Under special circumstances, e.g. if you want to compile the C++ code included in the package with some custom options, you may prefer to install the package manually from source. To this end,openthepackage’spageatCRAN3 andthenproceedasfollows: 1https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/ 2http://cran.r-project.org/ 3https://CRAN.R-project.org/package=apcluster 3 Getting Started 5 1. Downloadapcluster_1.4.10.tar.gzandsaveittoyourharddisk. 2. Openashell/terminal/commandpromptwindowandchangetothedirectorywhereyouput apcluster_1.4.10.tar.gz. Enter R CMD INSTALL apcluster_1.4.10.tar.gz toinstallthepackage. Notethatthismightrequireadditionalsoftwareonsomeplatforms. WindowsrequiresRtools4 to beinstalledandtobeavailableinthedefaultsearchpath(environmentvariablePATH).MacOSX requires Xcode developer tools5 (make sure that you have the command line tools installed with Xcode). 2.3 Compatibilityissues AllversionsdownloadablefromCRANhavebeenbuiltusingthelatestversion,R4.1.1. However, thepackageshouldworkwithoutsevereproblemsonRversions≥3.0.0. 3 Getting Started Toloadthepackage,enterthefollowinginyourRsession: library(apcluster) If this command terminates without any error message or warning, you can be sure that the packagehasbeeninstalledsuccessfully. Ifso,thepackageisreadyforusenowandyoucanstart clusteringyourdatawithaffinitypropagation. Thepackageincludesbothausermanual(thisdocument)andareferencemanual(helppages foreachfunction). Toviewtheusermanual,enter vignette("apcluster") Helppagescanbeviewedusingthehelpcommand. Itisrecommendedtostartwith help(apcluster) Affinitypropagationdoesnotrequirethedatasamplestobeofanyspecifickindorstructure. AP only requiresa similarity matrix, i.e., givenl data samples, this isan l×l real-valued matrix S, in which an entry S corresponds to a value measuring how similar sample i is to sample j. ij APdoesnotrequirethesevaluestobeinaspecificrange. Valuescanbepositiveornegative. AP doesnotevenrequirethesimilaritymatrixtobesymmetric(although,inmostapplications,itwill 4http://cran.r-project.org/bin/windows/Rtools/ 5https://developer.apple.com/technologies/tools/ 6 3 Getting Started be symmetric anyway). A value of −∞ is interpreted as “absolute dissimilarity”. The higher a value,themoresimilartwosamplesareconsidered. To get a first impression, let us create a random data set in R2 as the union of two “Gaussian clouds”: cl1 <- cbind(rnorm(30, 0.3, 0.05), rnorm(30, 0.7, 0.04)) cl2 <- cbind(rnorm(30, 0.7, 0.04), rnorm(30, 0.4, .05)) x1 <- rbind(cl1, cl2) plot(x1, xlab="", ylab="", pch=19, cex=0.8) 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 0.3 0.4 0.5 0.6 0.7 Thepackageapclusteroffersseveraldifferentwaysforclusteringdata. Thesimplestwayisthe following: apres1a <- apcluster(negDistMat(r=2), x1) Inthisexample,thefunctionapcluster()firstcomputesasimilaritymatrixfortheinputdata x1 using the similarity function passed as first argument. The choice negDistMat(r=2) is the standardsimilaritymeasureusedinthepapersofFreyandDueck—negativesquareddistances. Alternatively,onecancomputethesimilaritymatrixbeforehandandcallapcluster()forthe similaritymatrix(foramoredetaileddescriptionofthedifferences,see10.1): s1 <- negDistMat(x1, r=2) apres1b <- apcluster(s1) The function apcluster() creates an object belonging to the S4 class APResult which is defined by the present package. To get detailed information on which data are stored in such objects,enter 3 Getting Started 7 help(APResult) Thesimplestthingwecandoistoenterthenameoftheobject(whichimplicitlycallsshow()) togetasummaryoftheclusteringresult: apres1a ## ## APResult object ## ## Number of samples = 60 ## Number of iterations = 131 ## Input preference = -0.1416022 ## Sum of similarities = -0.1955119 ## Sum of preferences = -0.2832044 ## Net similarity = -0.4787163 ## Number of clusters = 2 ## ## Exemplars: ## 25 44 ## Clusters: ## Cluster 1, exemplar 25: ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## 26 27 28 29 30 ## Cluster 2, exemplar 44: ## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 ## 53 54 55 56 57 58 59 60 Theapclusterpackageallowsforplottingtheoriginaldatasetalongwithaclusteringresult: plot(apres1a, x1) 8 3 Getting Started 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 0.3 0.4 0.5 0.6 0.7 In this plot, each color corresponds to one cluster. The exemplar of each cluster is marked by a boxandallclustermembersareconnectedtotheirexemplarswithlines. Aheatmapisplottedwithheatmap(): heatmap(apres1a) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960 Intheheatmap,thesamplesaregroupedaccordingtoclusters. Theaboveheatmapconfirmsagain thattherearetwomainclustersinthedata. Aheatmapcanbeplottedfortheobjectapres1abe- 3 Getting Started 9 causeapcluster(),ifcalledfordataandasimilarityfunction,bydefaultincludesthesimilarity matrixintheoutputobject(unlessitwascalledwiththeswitchincludeSim=FALSE).Ifthesimi- laritymatrixisnotincluded(whichisthedefaultifapcluster()hasbeencalledonasimilarity matrixdirectly),heatmap()mustbecalledwiththesimilaritymatrixassecondargument: heatmap(apres1b, s1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960 Supposewewanttohavebetterinsightintowhatthealgorithmdidineachiteration. Forthis purpose,wecansupplytheoptiondetails=TRUEtoapcluster(): apres1c <- apcluster(s1, details=TRUE) This option tells the algorithm to keep a detailed log about its progress. For example, this allowsforplottingthethreeperformancemeasuresthatAPusesinternallyforeachiteration: plot(apres1c) 0 2 − milarity −4 Si 6 Fitness (overall net similarity) − Sum of exemplar preferences Sum of similarities to exemplars 8 − 0 20 40 60 80 100 120 # Iterations 10 4 Adjusting Input Preferences Theseperformancemeasuresare: 1. Sumofexemplarpreferences 2. Sumofsimilaritiesofexemplarstotheirclustermembers 3. Netfitness: sumofthetwoformer For details, the user is referred to the original affinity propagation paper [6] and the supplemen- tary material published on the affinity propagation Web page.1 We see from the above plot that the algorithm has not made any change for the last 100 iterations. AP, through its parameter convits, allows to control for how long AP waits for a change until it terminates (the default is convits=100). IftheuserhasthefeelingthatAPwillprobablyconvergequickeronhis/herdata set,alowervaluecanbeused: apres1c <- apcluster(s1, convits=15, details=TRUE) apres1c ## ## APResult object ## ## Number of samples = 60 ## Number of iterations = 46 ## Input preference = -0.1416022 ## Sum of similarities = -0.1955119 ## Sum of preferences = -0.2832044 ## Net similarity = -0.4787163 ## Number of clusters = 2 ## ## Exemplars: ## 25 44 ## Clusters: ## Cluster 1, exemplar 25: ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## 26 27 28 29 30 ## Cluster 2, exemplar 44: ## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 ## 53 54 55 56 57 58 59 60 4 Adjusting Input Preferences Apartfromthe similarity matrixitself, themostimportant input parameterofAPisthe so-called inputpreferencewhichcanbeinterpretedasthetendencyofadatasampletobecomeanexemplar (see[6]andsupplementarymaterialontheAPhomepage1 foramoredetailedexplanation). This inputpreferencecaneitherbechosenindividuallyforeachdatasampleoritcanbeasinglevalue

Description:

ity propagation nor to clustering in general; (2) this is not an introduction to R. If you Affinity propagation (AP) is a relatively new clustering algorithm that has

APCluster - An R Package for Affinity Propagation Clustering - The PDF

61 Pages·2016·0.98 MB·English

by Ulrich Bodenhofer

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download APCluster - An R Package for Affinity Propagation Clustering - The PDF Free - Full Version

by Ulrich Bodenhofer| 2016| 61 pages| 0.98| English

Download APCluster - An R Package for Affinity Propagation Clustering - The by Ulrich Bodenhofer in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About APCluster - An R Package for Affinity Propagation Clustering - The

ity propagation nor to clustering in general; (2) this is not an introduction to R. If you Affinity propagation (AP) is a relatively new clustering algorithm that has

Detailed Information

Author:	Ulrich Bodenhofer
Publication Year:	2016
Pages:	61
Language:	English
File Size:	0.98
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free APCluster - An R Package for Affinity Propagation Clustering - The Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download APCluster - An R Package for Affinity Propagation Clustering - The PDF?

Yes, on https://PDFdrive.to you can download APCluster - An R Package for Affinity Propagation Clustering - The by Ulrich Bodenhofer completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read APCluster - An R Package for Affinity Propagation Clustering - The on my mobile device?

After downloading APCluster - An R Package for Affinity Propagation Clustering - The PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of APCluster - An R Package for Affinity Propagation Clustering - The?

Yes, this is the complete PDF version of APCluster - An R Package for Affinity Propagation Clustering - The by Ulrich Bodenhofer. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download APCluster - An R Package for Affinity Propagation Clustering - The PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.