RESEARCHARTICLE Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing AhmadAbubaker1,2*,AdamBaharum1,MahmoudAlrefaei3 1 SchoolofMathematicalSciences,UniversitySainsMalaysia,11800USMPenang,Malaysia, 2 DepartmentofMathematics&Statistics,Al-ImamMuhammadIbnSaudIslamicUniversity,P.O. Box90950,11623Riyadh,SaudiArabia,3 DepartmentsofMathematics&Statistics,JordanUniversityof ScienceandTechnology,Irbid22110,Jordan * [email protected] Abstract ThispaperputsforwardanewautomaticclusteringalgorithmbasedonMulti-ObjectivePar- ticleSwarmOptimizationandSimulatedAnnealing,“MOPSOSA”.Theproposedalgorithm iscapableofautomaticclusteringwhichisappropriateforpartitioningdatasetstoasuitable numberofclusters.MOPSOSAcombinesthefeaturesofthemulti-objectivebasedparticle swarmoptimization(PSO)andtheMulti-ObjectiveSimulatedAnnealing(MOSA).Three clustervalidityindiceswereoptimizedsimultaneouslytoestablishthesuitablenumberof clustersandtheappropriateclusteringforadataset.Thefirstclustervalidityindexiscentred onEuclideandistance,thesecondonthepointsymmetrydistance,andthelastcluster validityindexisbasedonshortdistance.Anumberofalgorithmshavebeencomparedwith OPENACCESS theMOPSOSAalgorithminresolvingclusteringproblemsbydeterminingtheactualnumber ofclustersandoptimalclustering.Computationalexperimentswerecarriedouttostudy Citation:AbubakerA,BaharumA,AlrefaeiM(2015) AutomaticClusteringUsingMulti-objectiveParticle fourteenartificialandfivereallifedatasets. SwarmandSimulatedAnnealing.PLoSONE10(7): e0130995.doi:10.1371/journal.pone.0130995 Editor:YongDeng,SouthwestUniversity,CHINA Received:December7,2014 Introduction Accepted:May27,2015 Published:July1,2015 Dataclusteringisanimportanttaskinthefieldofunsuperviseddatasets.Theclusteringtech- niquedistributesthedatasetintoclustersofsimilarfeatures[1].Tosolveaclusteringproblem, Copyright:©2015Abubakeretal.Thisisanopen thenumberofclustersthatfitsadatasetmustbedetermined,andtheobjectsfortheseclusters accessarticledistributedunderthetermsofthe CreativeCommonsAttributionLicense,whichpermits mustbeassignedappropriately.Thenumberofclustersmayormaynotbeknown,thereby unrestricteduse,distribution,andreproductioninany makingitdifficulttofindthebestsolutiontotheclusteringproblem.Assuch,theclustering medium,providedtheoriginalauthorandsourceare problemcanbeviewedasanoptimizationproblem.Thischallengehasledtotheproposalof credited. manyautomaticclusteringalgorithmsinpreviousliterature;thesealgorithmsestimatethe DataAvailabilityStatement:Allrelevantdataare appropriatenumberofclustersandappropriatelypartitionadatasetintotheseclusterswithout withinthepaperanditsSupportingInformationfiles. theneedtoknowtheactualnumberofclusters[2–8].Mostofthesealgorithmsrelyexclusively Funding:Theauthorshavenosupportorfundingto ononeinternalevaluationfunction(validityindex).Thevalidityindexhasanobjectivefunc- report. tiontoevaluatethevariouscharacteristicsofclusters,whichillustratestheclusteringquality andaccuracyoftheclusteringsolutions[9].Nevertheless,thesingleevaluationfunctionis CompetingInterests:Theauthorshavedeclared thatnocompetinginterestsexist. oftenineligibletodeterminetheappropriateclustersforadataset,thusgivinganinferior PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 1/23 AutomaticClusteringAlgorithm solution[10].Accordingly,theclusteringproblemisstructuredasamulti-objectiveoptimiza- tionproblemwhereindifferentvalidityindicescanbeappliedandevaluatedsimultaneously. Severalautomaticmulti-objectiveclusteringalgorithmsareproposedinliteraturetosolve theclusteringproblem.EvolutionappearedinthisareaafterHandlandKnowles[3]proposed anevolutionaryapproachcalledmulti-objectiveclusteringwithautomaticKdetermination (MOCK).Forsomeoftheautomaticmulti-objectiveclusteringalgorithmsrelatedtoMOCK, canreferto[11–13].Amulti-objectiveclusteringtechniqueinspiredbyMOCKnamed VAMOSA,whichisbasedonsimulatedannealingastheunderlyingoptimizationstrategyand thepointsymmetry-baseddistance,wasproposedbySahaandBandyopadhyay[5]. Howtodealwithvariousshapesofdatasets(hyperspheres,linear,spiral,convex,andnon- convex),overlappingdatasets,datasetswithasmallorlargenumberofclusters,anddatasetsthat haveobjectswithsmallorlargedimensionswithoutprovidingtheproperclusteringorknowing theclusternumberisachallenge.SahaandBandyopadhyay[8]developedtwomulti-objective clusteringtechniques(GenClustMOOandGenClustPESA2)byusingasimulatedannealing- basedmulti-objectiveoptimizationtechniqueandtheconceptofmultiplecenterstoeachcluster thatcandealwithdifferenttypesofclusterstructures.GenClustMOOandGenClustPESA2were comparedwithMOCK[3],VGAPS[4],K-means(KM)[14],andsingle-linkageclusteringtech- nique(SL)[15]usingnumerousartificialandreal-lifedatasetsofdiversecomplexities.However, thesealgorithmsdidnotgivethedesiredhighaccuracyinclusteringdatasets. Thecurrentstudyproposesanautomaticclusteringalgorithm,namely,hybridmulti-objec- tiveparticleswarmoptimizationwithsimulatedannealing(MOPSOSA),whichdealswithdif- ferentsizes,shapes,anddimensionsofdatasetsandanunknownnumberofclusters.The Numericalresultsoftheproposedalgorithmareshowntoperformbetterthanthoseofthe GenClustMOO[8]andGenClustPESA2[8]methodsintermsofclusteringaccuracy(seethe ResultsandDiscussionsSection).Inordertodealwithanydatasetandqualificationtodeter- mineappropriateclustersandobtaingoodsolutionswithhighaccuracy,combinatorialparticle swarmoptimizationII[7]isdevelopedtodealwiththreedifferentclustervalidityindices,simul- taneously.ThefirstclustervalidityindexistheDavies-Bouldinindex(DB-index)[16],whichis basedonEuclideandistance;thesecondoneissymmetry-basedclustervalidityindex(Sym- index)[4],whichisbasedonpointsymmetrydistance;andthelastoneisaconnectivity-based clustervalidityindex(Conn-index)[17],whichisbasedonshortdistance.Ifnochangeexistsin aparticlepositionorwhenitismovedtoabadposition,thentheMOPSOSAalgorithmuses MOSA[18]toimprovethesearchingparticle.TheMOPSOSAalgorithmalsoutilizesKM method[14]toimprovetheselectionoftheinitialparticlepositionbecauseofitssignificancein theoverallperformanceofthesearchprocess.ItcreatesalargenumberofParetooptimalsolu- tionsthroughatrade-offbetweenthethreedifferentvalidityindices.Therefore,theideaofshar- ingfitness[19]isincorporatedintheproposedalgorithmtomaintaindiversityinthe repositorythatcontainsParetooptimalsolutions.Paretooptimalsolutionsareimportantfor decisionmakerstochoosefrom.Furthermore,tocomplywiththedecision-makerrequire- ments,theproposedalgorithmutilizesasemi-supervisedmethod[20]toprovideasinglebest solutionfromtheParetoset.TheperformanceofMOPSOSAiscomparedwiththeperfor- mancesofthreeautomaticmulti-objectiveclusteringtechniques,namely,GenClustMOO[8], GenClustPESA2[8],andMOCK[3],andwiththoseofthreesingle-objectiveclusteringtech- niques,namely,VGAPS[4],KM[14],andSL[15],using14artificialand5real-lifedatasets. Thereminderofthispaperisstructuredasfollows;Section2describesthemulti-objective clusteringproblem;Section3illustratestheproposedMOPSOSAalgorithmindetails;Section 4presentsthedatasetsusedinthenumericalexperiments,theevaluationofclusteringquality, andthesettingoftheparametersfortheMOPSOSAalgorithm;Section5includesdiscussion oftheresults;Finally,concludingremarksaregiveninSection6. PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 2/23 AutomaticClusteringAlgorithm ClusteringProblem Theclusteringproblemisdefinedasfollows:ConsiderthedatasetP={p ,p ,...,p },wherep = 1 2 n i (p ,p ,...,p )isafeaturevectorofd-dimensionsandalsoreferredtoastheobject,p isthe i1 i2 id ij featurevalueofobjectiatdimensionj,andnisthenumberofobjectsinP.TheclusteringofP isthepartitioningofPintokclusters{C ,C ,...,C }withthefollowingproperties: 1 2 k [ k C ¼P ð1Þ i¼1 i C \C ¼(cid:1); i6¼j; i¼1;2;...;k; j¼1;2;...;k ð2Þ i j C 6¼(cid:1); i¼1;2;...;k ð3Þ i Theclusteringoptimizationproblemwithoneobjectivefunctionfortheclusteringproblem canbeformedasfollows:min=max fðCÞsuchthatEqs(1)to(3)aresatisfied,wherefisthe C2Y validityindexfunction,Θisthefeasiblesolutionssetthatcontainsallpossibleclusteringfor thedatasetPofnobjectsintokclusters,C={C ,C ,...,C }andk=2,3,...,n‒1. 1 2 k Themulti-objectiveclusteringproblemforSdifferentvalidityindicesisdefinedasfollows: min FðCÞ¼½f ðCÞ;f ðCÞ;...;f ðCÞ(cid:2): ð4Þ C2Y 1 2 S whereF(C)isavectorofSvalidityindices.Notethattheremaybenosolutionthatminimizes allthefunctionsf(C).Therefore,theaimistoidentifythesetofallnon-dominantsolutions. i Definition:ConsiderCandC(cid:3)astwosolutionsinthefeasiblesolutionssetΘ,thesolution CissaidtobedominatedbythesolutionC(cid:3)ifandonlyiff(C(cid:3))(cid:4)f(C),8i=1,...,Sandf(C(cid:3)) i i i <f(C)foratleastonei.Otherwise,Cissaidtobenon-dominatedbyC(cid:3). i TheParetooptimalsetisasetthatincludesallnon-dominatedsolutionsinthefeasiblesolu- tionssetΘ. TheProposedMOPSOSAAlgorithm Simulatedannealingrequiresmorecalculationtimethandoesparticleswarmoptimization [21].Theformerrequireslowvariationsoftemperatureparameterstoobtainaglobalsolution [22].Someoftheparticlesmaybecomestagnantandremainunchanged,especiallywhenthe objectivefunctionsofthebestpersonalpositionandthebestglobalpositionaresimilar[21]. Assuch,theparticlecannotjumpout,whichinturncausesconvergencetowardthelocalsolu- tionandthelossofitscapabilitytosearchfortheoptimalParetoset.Thisphenomenonisadis- advantageincomparisonwithsimulatedannealing,whichcanjumpawayfromalocal solution.TheproposedMOPSOSAalgorithm,aspreviouslymentioned,isahybridalgorithm thatmergestheadvantagesoffastcalculationandconvergenceinparticleswarmoptimization withthecapabilitytoevadelocalsolutionsinsimulatedannealing. TheclusteringsolutionX isdescribedusinglabel-basedintegerencoding[23].Eachparticle i positionisaclusteringsolution.TheparticlepositionXtandvelocityVtarepresentedasvec- i i torswithncomponentsXt ¼ðXt;Xt;...;Xt ÞandVt ¼ðVt;Vt;...;VtÞattimet,i=1,..., i i1 i2 in i i1 i2 in m,wherenisthenumberofdataobjects,andmisthenumberofparticles(swarmsize).The positioncomponentXt 2f1;...;Ktgrepresentstheclusternumberofjthobjectinithparticle, ij i andVt 2f0;...;Ktgrepresentsthemotionofjthobjectinithparticle,whereKt 2 ij i i fK ;...;K gisthenumberofclustersrelatedtoparticleiattimet(whereK andK min max min max aretheminimumandmaximumnumberofclusters,respectively;thedefaultvalueofK is2; pffiffiffi min andK is nþ1unlessitismanuallyspecified)[24].Thebestpreviouspositionofith max PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 3/23 AutomaticClusteringAlgorithm particleatiterationtisrepresentedasXPt ¼ðXPt ;XPt ;...;XPt Þ.Theleaderpositionchosen i i1 i2 in fromtherepositoryofParetosetsforithparticleatiterationtisrepresentedby XGt ¼ðGPt ;GPt ;...;GPt Þ. i i1 i2 in TheflowchartinFig1illustratesthegeneralprocessoftheMOPSOSAalgorithm.Thepro- cessofthealgorithmisdescribedinthefollowing11steps: Step1:Thealgorithmparameters,suchasswarmsizem,numberofiterationsIter,maximum andminimumnumbersofclusters,velocityparameters,initialcoolingtemperatureT ,and 0 t=0,areinitialized. Step2:TheinitialparticlepositionXtusingKMmethod[14],initialvelocityVt ¼0,andinitial i i XPt ¼Xt,i=1,...,maregenerated. i i Step3:Theobjectivefunctionsf1ðXitÞ;...;fSðXitÞ,i=1,...,m,whereSisthenumberofobjec- tivefunctions,arecomputed.TherepositoryofParetosetsisfilledwithallnon-dominated XPt,i=1,...,mbasedonafitness-sharingbasis. i Step4:TheleaderXGtfromtherepositoryofParetosetsnearesttocurrentXtisselected.The i i clustersinXPtandXGtarerenumberedonthebasisoftheirsimilaritytotheclustersinXt, i i i i=1,...,m. Step5:ThenewVnew andXnew,i=1,...,m,arecomputedusingXGt,XPt,Xt,andVt. i i i i i i Step6:ThevalidityofXnew,i=1,...,mischecked,andthecorrectionprocessisappliedifitis i notvalid. Step7:Theobjectivefunctionsf1(Xnewi),...,fs(Xnewi)andf1ðXitÞ;...;fsðXitÞ,i=1,...,mare computed. Step8:AdominancecheckforXnew,i=1,...,misperformed,thatis,ifXnew isnon-domi- i i natedbyXt,thenXtþ1 ¼Xnew andVtþ1 ¼Vnew;otherwise,theMOSAtechniqueis i i i i i appliedandXtþ1 ¼XMOSAandVtþ1 ¼VMOSA,i=1,...,m,whereXMOSAandVMOSAarethe i i i i i i positionandvelocityparticlesrespectivelyobtainedbyapplyingtheMOSAtechnique.The MOSAisdiscussedindetailsinsectionMOSATechniquebelow.Uponcompletionofthe generationofnewpositionsforallparticles,thecoolingtemperatureT isupdated. t+1 Step9:ThenewXPtþ1,i=1,...,misidentified. i Step10:TheParetosetrepositoryisupdated. Step11:t=t+1isset;ift(cid:5)Iter,thenthealgorithmisstoppedandtheParetosetrepository containstheParetosolutions;otherwise,gotostep4. ThefollowingsectionswillelucidatethestepsoftheMOPSOSAalgorithm. Particlesswarminitialization Initialparticlesaregenerallyconsideredoneofthesuccessfactorsinparticleswarmoptimiza- tionthataffectthequalityofthesolutionandthespeedofconvergence.Hence,theMOPSOSA algorithmemploysKMmethodasameanstoimprovethegenerationoftheinitialswarmof particles.Fig2depictsaflowchartforthegenerationofmparticles.Startingwithi=1andW= min{K −K +1,m},ifW=m,thenmparticleswillbegeneratedbyKMmethodwiththe max min numberofclustersK =K +i−1,i=1,...,m.IfW=K −K +1,thenthefirstWparticles i min max min willbegeneratedbyKMwiththenumberofclustersK =K +i−1,i=1,...,W,andtheother i min particlewillbegeneratedbyKMwiththenumberofclustersK,i=W+1,...mselected i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 4/23 AutomaticClusteringAlgorithm Fig1.FlowchartfortheproposedMOPSOSAalgorithm. doi:10.1371/journal.pone.0130995.g001 PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 5/23 AutomaticClusteringAlgorithm Fig2.Flowchartforinitializingparticleswarm. doi:10.1371/journal.pone.0130995.g002 randomlybetweenK andK .Foreachparticle,theinitialvelocitiesareselectedtobezero min max V =0,i=1,...,m,andtheinitialXP isequaltothecurrentpositionX foralli=1,...,m. i i i Objectivefunctions Theproposedalgorithmusesthreetypesofclustervalidityindicesasobjectivefunctionsto achieveoptimization.Thesevalidityindices,DB-index,Sym-index,andConn-index,apply threedifferentdistances,namely,Euclideandistance,pointsymmetricdistance,andshortdis- tance,respectively.Eachvalidityindexindicatesadifferentaspectofgoodsolutionsincluster- ingproblems.Thesevalidityindicesaredescribedbelow. DB-index. ThisindexwasdevelopedbyDavies—Bouldin[16]whichisafunctionofthe ratioofthesumofwithin-clusterobjects(intra-clusterdistance)andbetweenclustersepara- tion(inter-clusterdistance).ThewithinithclusterC,S iscalculatedusingEq(5).The i i,q PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 6/23 AutomaticClusteringAlgorithm distancebetweenclustersC andC isdenotedbyd ,whichiscomputedusingEq(6). i j ij,t 0 X 11 1 q S ¼@ kp(cid:6)ckqA ð5Þ i;q n i 2 i p2Ci d ¼kc (cid:6)ck ð6Þ ij;t i j t wheren =|C|isthenumberofobjectsinclusterC,c istheclustercenterofclusterC andis i i P i i i definedasc ¼ 1 p,andqandtarepositiveintegernumbers.DBisdefinedas: i ni p2Ci X 1 k DB¼ R ð7Þ k i;qt i¼1 n o whereR ¼max Si;qþSj;q .AsmallvalueofDBmeansagoodclusteringresult. i;qt j;j6¼i dij;t Sym-index. Therecentlydevelopedpointsymmetrydistanced (p,c)isemployedinthis ps clustervalidityindexSym,whichmeasurestheoverallaveragesymmetryinconnectionwith theclustercenters[4].Itisdefinedasfollows.Letpbeapoint,andthereflectedsymmetrical pointofpwithrespecttoaspecificcentercis2c−pandisdenotedbyp(cid:3).Letknearunique nearestneighborstop(cid:3)beattheEuclideandistancesofd,i=1,...,knear.Thepointsymmetric i distanceisdefinedas: P kneard d ðp;cÞ¼d ðp;cÞ(cid:7)dðp;cÞ¼ i¼1 i(cid:7)dðp;cÞ ð8Þ ps sym e knear e whered(p,c)istheEuclideandistancebetweenthepointpandthecentercandd (p,c)isa e P sym symmetricmeasureofpwithrespecttoc,whichisdefinedas kneard=knear.Inthisstudy, i¼1 i knear=2.Theclustervalidityfunctionisdefinedas (cid:3) (cid:4) 1 1 Sym¼ (cid:7) (cid:7)D ð9Þ k ε k k P P whereε ¼ k E,E ¼ ni d(cid:3)ðpi;cÞ,piisthejthobjectofclusteri,andD ¼maxk kc (cid:6) k i¼1 i i j¼1 ps j i j k i;j¼1 i ckisthemaximumEuclideandistancebetweenthetwocentersamongallclusterpairs.Eq(8) j isusedwithsomeconstrainttocomputed(cid:3)ðpi;cÞ.Theknearnearestneighborsofp(cid:3)andpi ps j i j j shouldbelongtotheithcluster,wherep(cid:3)isthereflectedpointofthepointpiwithrespecttoc. j j i AlargevalueforSym-indexmeansthattheactualnumberofclustersandproperpartitioning areobtained. Conn-index. ThethirdclustervalidityindexusedinthisstudyisproposedbySahaand Bandyopadhyay[17],itdependsonthenotionofclusterconnectedness.TocomputeConn- index,thetherelativeneighborhoodgraph[25]structuringforthedatasethastobeconducted first.Subsequently,theshortdistancebetweentwopointsxandyisdenotedbyd (x,y)and short isdefinedasfollows: d ðx;yÞ¼mnpaitnhmneadixwðediÞ ð10Þ short i¼1 j¼1 j wherenpathisthenumberofallpathsbetweenxandyintheRNGstructuring;ned isthenum- i berofedgesalongithpath,i=1,...,npath;ediisjthedgeinithpath,j=1,...,ned andi=1,..., j i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 7/23 AutomaticClusteringAlgorithm npath;andwðediÞistheedgeweightoftheedgeedi.TheedgeweightwðediÞisequaltothe j j j Euclideandistancebetweenaandb,d(a,b),whereaandbaretheendpointsoftheedgeedi. e j TheclustervalidityindexConndevelopedbySahaandBandyopadhyay[17]isdefinedas follows: XX k ni d ðpi;mÞ short j i (cid:3)i¼1 j¼1 (cid:4) Conn¼ ð11Þ n mkin d ðm;mÞ i;j¼1;i6¼j short j i wherem isthemedoidoftheithclusterthatisequaltothepointwiththeminimumaverage i distancetoallpointsintheithclusterm ¼pi ,and i! minindex minindex¼argminni Pni dðpi;piÞ=n .TheminimumvalueofConn-indexmeanstheclus- t¼1 e t j i j¼1 tersinterconnectedinternallyandseparatelyfromeachother. Aftertheparticleshavebeenmovedtoanewposition,thethreeobjectivefunctionsare computedforeachparticleintheswarm.TheobjectivefunctionsforaparticlepositionXare {DB(X),1/Sym(X),Conn(X)}.ThethreeobjectivesareminimizedsimultaneouslyusingMOP- SOSAalgorithm. XPupdating Thepreviousbestpositionofithparticleatiterationtisupdatedbynon-dominantcriteria.XPt i iscomparedwiththenewpositionXtþ1.Threecasesofthiscomparisonareconsidered. i • IfXPitisdominatedbyXitþ1,thenXPitþ1 ¼Xitþ1. • IfXitþ1isdominatedbyXPit,thenXPitþ1 ¼XPit. • IfXPtandXtþ1arenon-dominated,thenoneofthemwillbechosenrandomlyasXPtþ1. i i i Thisupdateoccursoneachparticle. Repositoryupdating TherepositoryisutilizedasaguidebyMOPSOSAalgorithmfortheswarmtowardthePareto front.Thenon-dominatedparticlepositionsarestoredintherepository.Topreservethediver- sityofnon-dominatedsolutionsintherepository,sharingfitness[19]isagoodmethodtocon- troltheacceptanceofnewentriesintotherepositorywhenitisfull.Fitnesssharingwasused byLechugaandRowe[26]inmulti-objectiveparticleswarmoptimization.Ineachiteration, thenewnon-dominatedsolutionsareaddedintotheexternalrepositoryandeliminationofthe dominatedsolutions.Incasethenon-dominatedsolutionsareincreasedthanthesizeofthe repository,thefitnesssharingiscalculatedforallnon-dominatedsolutions.Thesolutionsthat havelargestvaluesoffitnesssharingareselectedtofilltherepository. Clusterre-numbering There-numberingprocessisdesignedtoeliminatetheredundantparticlesthatrepresentthe samesolution.TheproposedMOPSOSAalgorithmemploysthere-numberingprocedure designedbyMasoudetal.[7].Thisprocedureusesasimilarityfunctiontomeasurethedegree ofsimilaritybetweentheclustersoftwoinputsolutionsXtandXPt(orXGt).Thetwoclusters i i i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 8/23 AutomaticClusteringAlgorithm thataremostsimilararematched.AnyclusterinXPt(orXGt)notmatchedtoanyclusterXt i i i willusetheunusednumberintheclusteringnumbering.MOPSOSAalgorithmusesthesimi- larityfunctionknownasJaccardcoefficient[27],whichisdefinedasfollows: _ n SimðC;C Þ¼ 11 ð12Þ j k n þn þn 11 10 01 _ whereC isjthclusterinXt, C iskthclusterinXPt,n isthenumberofobjectsthatexistin j _ i k i 11 _ bothC and C ,n isthenumberofobjectsthatexistinC butdoesnotexistin C ,andn is j k 10 _j k 01 thenumberofobjectsthatdonotexistinC butexistin C . j k Velocitycomputation MOPSOSAalgorithmemploystheexpressionsandoperatorsmodifiedbyMasoudetal.[7]. Thenewvelocityforparticleiatiterationtiscalculatedasfollows: Vtþ1 ¼ðW(cid:8)VtÞ(cid:9)ððR (cid:8)ðXPt(cid:10)XtÞÞ(cid:9)ðR (cid:8)ðXGt(cid:10)XtÞÞÞ ð13Þ i i 1 i i 2 i i whereW,R ,andR arethevectorsofncomponentswithvalues0or1thataregeneratedran- 1 2 domlywithaprobabilityofw,r ,andr ,respectively.Theoperations(cid:8),(cid:9),and(cid:10)arethemul- 1 2 tiplication,merging,anddifference,respectively. • Differenceoperator⊖⊖ ThedifferenceoperationcalculatesthedifferencebetweenXtandXPt(orXGt).Let i i lPt ¼ðlpt ;...;lpt Þ¼XPt (cid:10)Xt,andlGt ¼ðlgt;...;lgtÞ¼XGt (cid:10)Xtbedefinedasfol- i i1 in i i i i1 in i lows: ( XPt if Xt 6¼XPt lpt ¼ ij ij ij ð14Þ ij 0 otherwise ( XGt if Xt 6¼XGt lgt ¼ j ij j ð15Þ ij 0 otherwise • Multiplicationoperator⊗⊗ Themultiplicationoperatorisdefinedasfollows:letA=(a ,...,a )andB=(b ,...,b )are 1 n 1 n twovectorsofncomponents,thenA(cid:8)B=(a b ,...,a b ). 1 1 n n • Mergingoperator⊕⊕ Themergingoperatorisdefinedasfollows:letA=(a ,...,a )andB=(b ,...,b )betwovec- 1 n 1 n torsofncomponents,thenC=A(cid:9)B=(c ,c ,...,c ),where 1 2 n 8 >>>><bai iiff aai 6¼¼00aannddbbi ¼6¼00 c ¼ i i i ð16Þ i >>>>:aiorbirandomly if ai 6¼0andbi 6¼0 0 otherwise PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 9/23 AutomaticClusteringAlgorithm Positioncomputation MOPSOSAalgorithmemploysthedefinitiontogeneratenewpositions,asproposedby Masoudetal.[7].Thenewpositionisgeneratedfromthevelocityasfollows: ( Vt if Vtþ1 6¼0 Xtþ1 ¼ ij ij ð17Þ ij r otherwise whererisanintegerrandomnumberin½1;Kt þ1(cid:2)andKtþ1<K .Thispropertyenables i i max theparticletoaddnewclusters.Thepreviousoperatorsandthedifferencesinclusternumberof Xt,XPt,andXGtleadtotheadditionorremovalofsomeoftheclustersintheoutputofthenew i i positionXtþ1.Sometimesanemptyclustermayexist,whichleadstoinvalidparticleposition. i Suchaninstancecanbeavoidedbyexposingtheparticletoresetthenumberingclusters.There- -numberingprocessworksbyencodingthelargestclusternumbertothesmallestunusedone. MOSAtechnique MOSAmethod[18]isappliedintheMOPSOSAalgorithmatiterationtforparticleiincaseXt i dominatesthenewpositionXnew.Fig3presentstheflowchartfortheMOSAtechnique i appliedinMOPSOSA.TheprocedurefortheMOSAtechniqueisexplainedineightsteps below. 1. Step1:LetPSXandPSVbetwoemptysets,niterisamaximumnumberofiteration,and q=0. Qs 2. Step2:EvaluateEXPq ¼ expð(cid:6)½fjðXnewiÞ(cid:6)fjðXitÞ(cid:2)þ=TtÞ,wherethecoolingtempera- j¼1 tureT isupdatedinstep8ofMOPSOSAalgorithm.Generateuniformrandomnumber t u2(0,1),ifu<EXP ,gotostep7.Otherwise,proceedtothenextstep. q 3. Step3:AddXnew toPSXandVnew toPSV,thenPSXandPSVareupdatedtoinclude i i onlynon-dominantsolutions. 4. Step4:Ifq(cid:5)niter,thenchooseasolutionrandomlyfromPSXasthenewparticleposition Xnew andthecorrespondingvelocityVnew fromPSV,andproceedtostep7.Otherwise, i i q=q+1,andgeneratethenewvelocityVnew andpositionXnew fromtheoldpositionXt. i i i 5. Step5:Calculatetheobjectivefunctionf1(Xnewi),...,fs(Xnewi),andf1ðXitÞ;...;fSðXitÞ. 6. Step6:PerformadominancecheckforXnew,ifXnew isnon-dominatedbyXt,thenpro- i i i ceedtostep7.Otherwisegotostep2. 7. Step7:ThenewpositionandvelocityXnew andVnew areacceptedasthenewgeneration i i ofXiMOSAandViMOSA,respectively,XiMOSA ¼XnewiandViMOSA ¼Vnewi: 8. Step8:CheckthevalidityforXMOSA,andapplythere-numberingprocessifitisinvalid. i ReturnXMOSAandVMOSA. i i Selectionofthebestsolution Ingeneral,aParetosetcontainingseveralnon-dominatedsolutionsisprovidedonthefinal runofmulti-objectiveproblems[28].Eachnon-dominatedsolutionintroducesapatternof clusteringforthegivendataset.Thesemi-supervisedmethodproposedbySahaand PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 10/23
Description: