ebook img

Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing PDF

23 Pages·2017·4.25 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

RESEARCHARTICLE Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing AhmadAbubaker1,2*,AdamBaharum1,MahmoudAlrefaei3 1 SchoolofMathematicalSciences,UniversitySainsMalaysia,11800USMPenang,Malaysia, 2 DepartmentofMathematics&Statistics,Al-ImamMuhammadIbnSaudIslamicUniversity,P.O. Box90950,11623Riyadh,SaudiArabia,3 DepartmentsofMathematics&Statistics,JordanUniversityof ScienceandTechnology,Irbid22110,Jordan * [email protected] Abstract ThispaperputsforwardanewautomaticclusteringalgorithmbasedonMulti-ObjectivePar- ticleSwarmOptimizationandSimulatedAnnealing,“MOPSOSA”.Theproposedalgorithm iscapableofautomaticclusteringwhichisappropriateforpartitioningdatasetstoasuitable numberofclusters.MOPSOSAcombinesthefeaturesofthemulti-objectivebasedparticle swarmoptimization(PSO)andtheMulti-ObjectiveSimulatedAnnealing(MOSA).Three clustervalidityindiceswereoptimizedsimultaneouslytoestablishthesuitablenumberof clustersandtheappropriateclusteringforadataset.Thefirstclustervalidityindexiscentred onEuclideandistance,thesecondonthepointsymmetrydistance,andthelastcluster validityindexisbasedonshortdistance.Anumberofalgorithmshavebeencomparedwith OPENACCESS theMOPSOSAalgorithminresolvingclusteringproblemsbydeterminingtheactualnumber ofclustersandoptimalclustering.Computationalexperimentswerecarriedouttostudy Citation:AbubakerA,BaharumA,AlrefaeiM(2015) AutomaticClusteringUsingMulti-objectiveParticle fourteenartificialandfivereallifedatasets. SwarmandSimulatedAnnealing.PLoSONE10(7): e0130995.doi:10.1371/journal.pone.0130995 Editor:YongDeng,SouthwestUniversity,CHINA Received:December7,2014 Introduction Accepted:May27,2015 Published:July1,2015 Dataclusteringisanimportanttaskinthefieldofunsuperviseddatasets.Theclusteringtech- niquedistributesthedatasetintoclustersofsimilarfeatures[1].Tosolveaclusteringproblem, Copyright:©2015Abubakeretal.Thisisanopen thenumberofclustersthatfitsadatasetmustbedetermined,andtheobjectsfortheseclusters accessarticledistributedunderthetermsofthe CreativeCommonsAttributionLicense,whichpermits mustbeassignedappropriately.Thenumberofclustersmayormaynotbeknown,thereby unrestricteduse,distribution,andreproductioninany makingitdifficulttofindthebestsolutiontotheclusteringproblem.Assuch,theclustering medium,providedtheoriginalauthorandsourceare problemcanbeviewedasanoptimizationproblem.Thischallengehasledtotheproposalof credited. manyautomaticclusteringalgorithmsinpreviousliterature;thesealgorithmsestimatethe DataAvailabilityStatement:Allrelevantdataare appropriatenumberofclustersandappropriatelypartitionadatasetintotheseclusterswithout withinthepaperanditsSupportingInformationfiles. theneedtoknowtheactualnumberofclusters[2–8].Mostofthesealgorithmsrelyexclusively Funding:Theauthorshavenosupportorfundingto ononeinternalevaluationfunction(validityindex).Thevalidityindexhasanobjectivefunc- report. tiontoevaluatethevariouscharacteristicsofclusters,whichillustratestheclusteringquality andaccuracyoftheclusteringsolutions[9].Nevertheless,thesingleevaluationfunctionis CompetingInterests:Theauthorshavedeclared thatnocompetinginterestsexist. oftenineligibletodeterminetheappropriateclustersforadataset,thusgivinganinferior PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 1/23 AutomaticClusteringAlgorithm solution[10].Accordingly,theclusteringproblemisstructuredasamulti-objectiveoptimiza- tionproblemwhereindifferentvalidityindicescanbeappliedandevaluatedsimultaneously. Severalautomaticmulti-objectiveclusteringalgorithmsareproposedinliteraturetosolve theclusteringproblem.EvolutionappearedinthisareaafterHandlandKnowles[3]proposed anevolutionaryapproachcalledmulti-objectiveclusteringwithautomaticKdetermination (MOCK).Forsomeoftheautomaticmulti-objectiveclusteringalgorithmsrelatedtoMOCK, canreferto[11–13].Amulti-objectiveclusteringtechniqueinspiredbyMOCKnamed VAMOSA,whichisbasedonsimulatedannealingastheunderlyingoptimizationstrategyand thepointsymmetry-baseddistance,wasproposedbySahaandBandyopadhyay[5]. Howtodealwithvariousshapesofdatasets(hyperspheres,linear,spiral,convex,andnon- convex),overlappingdatasets,datasetswithasmallorlargenumberofclusters,anddatasetsthat haveobjectswithsmallorlargedimensionswithoutprovidingtheproperclusteringorknowing theclusternumberisachallenge.SahaandBandyopadhyay[8]developedtwomulti-objective clusteringtechniques(GenClustMOOandGenClustPESA2)byusingasimulatedannealing- basedmulti-objectiveoptimizationtechniqueandtheconceptofmultiplecenterstoeachcluster thatcandealwithdifferenttypesofclusterstructures.GenClustMOOandGenClustPESA2were comparedwithMOCK[3],VGAPS[4],K-means(KM)[14],andsingle-linkageclusteringtech- nique(SL)[15]usingnumerousartificialandreal-lifedatasetsofdiversecomplexities.However, thesealgorithmsdidnotgivethedesiredhighaccuracyinclusteringdatasets. Thecurrentstudyproposesanautomaticclusteringalgorithm,namely,hybridmulti-objec- tiveparticleswarmoptimizationwithsimulatedannealing(MOPSOSA),whichdealswithdif- ferentsizes,shapes,anddimensionsofdatasetsandanunknownnumberofclusters.The Numericalresultsoftheproposedalgorithmareshowntoperformbetterthanthoseofthe GenClustMOO[8]andGenClustPESA2[8]methodsintermsofclusteringaccuracy(seethe ResultsandDiscussionsSection).Inordertodealwithanydatasetandqualificationtodeter- mineappropriateclustersandobtaingoodsolutionswithhighaccuracy,combinatorialparticle swarmoptimizationII[7]isdevelopedtodealwiththreedifferentclustervalidityindices,simul- taneously.ThefirstclustervalidityindexistheDavies-Bouldinindex(DB-index)[16],whichis basedonEuclideandistance;thesecondoneissymmetry-basedclustervalidityindex(Sym- index)[4],whichisbasedonpointsymmetrydistance;andthelastoneisaconnectivity-based clustervalidityindex(Conn-index)[17],whichisbasedonshortdistance.Ifnochangeexistsin aparticlepositionorwhenitismovedtoabadposition,thentheMOPSOSAalgorithmuses MOSA[18]toimprovethesearchingparticle.TheMOPSOSAalgorithmalsoutilizesKM method[14]toimprovetheselectionoftheinitialparticlepositionbecauseofitssignificancein theoverallperformanceofthesearchprocess.ItcreatesalargenumberofParetooptimalsolu- tionsthroughatrade-offbetweenthethreedifferentvalidityindices.Therefore,theideaofshar- ingfitness[19]isincorporatedintheproposedalgorithmtomaintaindiversityinthe repositorythatcontainsParetooptimalsolutions.Paretooptimalsolutionsareimportantfor decisionmakerstochoosefrom.Furthermore,tocomplywiththedecision-makerrequire- ments,theproposedalgorithmutilizesasemi-supervisedmethod[20]toprovideasinglebest solutionfromtheParetoset.TheperformanceofMOPSOSAiscomparedwiththeperfor- mancesofthreeautomaticmulti-objectiveclusteringtechniques,namely,GenClustMOO[8], GenClustPESA2[8],andMOCK[3],andwiththoseofthreesingle-objectiveclusteringtech- niques,namely,VGAPS[4],KM[14],andSL[15],using14artificialand5real-lifedatasets. Thereminderofthispaperisstructuredasfollows;Section2describesthemulti-objective clusteringproblem;Section3illustratestheproposedMOPSOSAalgorithmindetails;Section 4presentsthedatasetsusedinthenumericalexperiments,theevaluationofclusteringquality, andthesettingoftheparametersfortheMOPSOSAalgorithm;Section5includesdiscussion oftheresults;Finally,concludingremarksaregiveninSection6. PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 2/23 AutomaticClusteringAlgorithm ClusteringProblem Theclusteringproblemisdefinedasfollows:ConsiderthedatasetP={p ,p ,...,p },wherep = 1 2 n i (p ,p ,...,p )isafeaturevectorofd-dimensionsandalsoreferredtoastheobject,p isthe i1 i2 id ij featurevalueofobjectiatdimensionj,andnisthenumberofobjectsinP.TheclusteringofP isthepartitioningofPintokclusters{C ,C ,...,C }withthefollowingproperties: 1 2 k [ k C ¼P ð1Þ i¼1 i C \C ¼(cid:1); i6¼j; i¼1;2;...;k; j¼1;2;...;k ð2Þ i j C 6¼(cid:1); i¼1;2;...;k ð3Þ i Theclusteringoptimizationproblemwithoneobjectivefunctionfortheclusteringproblem canbeformedasfollows:min=max fðCÞsuchthatEqs(1)to(3)aresatisfied,wherefisthe C2Y validityindexfunction,Θisthefeasiblesolutionssetthatcontainsallpossibleclusteringfor thedatasetPofnobjectsintokclusters,C={C ,C ,...,C }andk=2,3,...,n‒1. 1 2 k Themulti-objectiveclusteringproblemforSdifferentvalidityindicesisdefinedasfollows: min FðCÞ¼½f ðCÞ;f ðCÞ;...;f ðCÞ(cid:2): ð4Þ C2Y 1 2 S whereF(C)isavectorofSvalidityindices.Notethattheremaybenosolutionthatminimizes allthefunctionsf(C).Therefore,theaimistoidentifythesetofallnon-dominantsolutions. i Definition:ConsiderCandC(cid:3)astwosolutionsinthefeasiblesolutionssetΘ,thesolution CissaidtobedominatedbythesolutionC(cid:3)ifandonlyiff(C(cid:3))(cid:4)f(C),8i=1,...,Sandf(C(cid:3)) i i i <f(C)foratleastonei.Otherwise,Cissaidtobenon-dominatedbyC(cid:3). i TheParetooptimalsetisasetthatincludesallnon-dominatedsolutionsinthefeasiblesolu- tionssetΘ. TheProposedMOPSOSAAlgorithm Simulatedannealingrequiresmorecalculationtimethandoesparticleswarmoptimization [21].Theformerrequireslowvariationsoftemperatureparameterstoobtainaglobalsolution [22].Someoftheparticlesmaybecomestagnantandremainunchanged,especiallywhenthe objectivefunctionsofthebestpersonalpositionandthebestglobalpositionaresimilar[21]. Assuch,theparticlecannotjumpout,whichinturncausesconvergencetowardthelocalsolu- tionandthelossofitscapabilitytosearchfortheoptimalParetoset.Thisphenomenonisadis- advantageincomparisonwithsimulatedannealing,whichcanjumpawayfromalocal solution.TheproposedMOPSOSAalgorithm,aspreviouslymentioned,isahybridalgorithm thatmergestheadvantagesoffastcalculationandconvergenceinparticleswarmoptimization withthecapabilitytoevadelocalsolutionsinsimulatedannealing. TheclusteringsolutionX isdescribedusinglabel-basedintegerencoding[23].Eachparticle i positionisaclusteringsolution.TheparticlepositionXtandvelocityVtarepresentedasvec- i i torswithncomponentsXt ¼ðXt;Xt;...;Xt ÞandVt ¼ðVt;Vt;...;VtÞattimet,i=1,..., i i1 i2 in i i1 i2 in m,wherenisthenumberofdataobjects,andmisthenumberofparticles(swarmsize).The positioncomponentXt 2f1;...;Ktgrepresentstheclusternumberofjthobjectinithparticle, ij i andVt 2f0;...;Ktgrepresentsthemotionofjthobjectinithparticle,whereKt 2 ij i i fK ;...;K gisthenumberofclustersrelatedtoparticleiattimet(whereK andK min max min max aretheminimumandmaximumnumberofclusters,respectively;thedefaultvalueofK is2; pffiffiffi min andK is nþ1unlessitismanuallyspecified)[24].Thebestpreviouspositionofith max PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 3/23 AutomaticClusteringAlgorithm particleatiterationtisrepresentedasXPt ¼ðXPt ;XPt ;...;XPt Þ.Theleaderpositionchosen i i1 i2 in fromtherepositoryofParetosetsforithparticleatiterationtisrepresentedby XGt ¼ðGPt ;GPt ;...;GPt Þ. i i1 i2 in TheflowchartinFig1illustratesthegeneralprocessoftheMOPSOSAalgorithm.Thepro- cessofthealgorithmisdescribedinthefollowing11steps: Step1:Thealgorithmparameters,suchasswarmsizem,numberofiterationsIter,maximum andminimumnumbersofclusters,velocityparameters,initialcoolingtemperatureT ,and 0 t=0,areinitialized. Step2:TheinitialparticlepositionXtusingKMmethod[14],initialvelocityVt ¼0,andinitial i i XPt ¼Xt,i=1,...,maregenerated. i i Step3:Theobjectivefunctionsf1ðXitÞ;...;fSðXitÞ,i=1,...,m,whereSisthenumberofobjec- tivefunctions,arecomputed.TherepositoryofParetosetsisfilledwithallnon-dominated XPt,i=1,...,mbasedonafitness-sharingbasis. i Step4:TheleaderXGtfromtherepositoryofParetosetsnearesttocurrentXtisselected.The i i clustersinXPtandXGtarerenumberedonthebasisoftheirsimilaritytotheclustersinXt, i i i i=1,...,m. Step5:ThenewVnew andXnew,i=1,...,m,arecomputedusingXGt,XPt,Xt,andVt. i i i i i i Step6:ThevalidityofXnew,i=1,...,mischecked,andthecorrectionprocessisappliedifitis i notvalid. Step7:Theobjectivefunctionsf1(Xnewi),...,fs(Xnewi)andf1ðXitÞ;...;fsðXitÞ,i=1,...,mare computed. Step8:AdominancecheckforXnew,i=1,...,misperformed,thatis,ifXnew isnon-domi- i i natedbyXt,thenXtþ1 ¼Xnew andVtþ1 ¼Vnew;otherwise,theMOSAtechniqueis i i i i i appliedandXtþ1 ¼XMOSAandVtþ1 ¼VMOSA,i=1,...,m,whereXMOSAandVMOSAarethe i i i i i i positionandvelocityparticlesrespectivelyobtainedbyapplyingtheMOSAtechnique.The MOSAisdiscussedindetailsinsectionMOSATechniquebelow.Uponcompletionofthe generationofnewpositionsforallparticles,thecoolingtemperatureT isupdated. t+1 Step9:ThenewXPtþ1,i=1,...,misidentified. i Step10:TheParetosetrepositoryisupdated. Step11:t=t+1isset;ift(cid:5)Iter,thenthealgorithmisstoppedandtheParetosetrepository containstheParetosolutions;otherwise,gotostep4. ThefollowingsectionswillelucidatethestepsoftheMOPSOSAalgorithm. Particlesswarminitialization Initialparticlesaregenerallyconsideredoneofthesuccessfactorsinparticleswarmoptimiza- tionthataffectthequalityofthesolutionandthespeedofconvergence.Hence,theMOPSOSA algorithmemploysKMmethodasameanstoimprovethegenerationoftheinitialswarmof particles.Fig2depictsaflowchartforthegenerationofmparticles.Startingwithi=1andW= min{K −K +1,m},ifW=m,thenmparticleswillbegeneratedbyKMmethodwiththe max min numberofclustersK =K +i−1,i=1,...,m.IfW=K −K +1,thenthefirstWparticles i min max min willbegeneratedbyKMwiththenumberofclustersK =K +i−1,i=1,...,W,andtheother i min particlewillbegeneratedbyKMwiththenumberofclustersK,i=W+1,...mselected i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 4/23 AutomaticClusteringAlgorithm Fig1.FlowchartfortheproposedMOPSOSAalgorithm. doi:10.1371/journal.pone.0130995.g001 PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 5/23 AutomaticClusteringAlgorithm Fig2.Flowchartforinitializingparticleswarm. doi:10.1371/journal.pone.0130995.g002 randomlybetweenK andK .Foreachparticle,theinitialvelocitiesareselectedtobezero min max V =0,i=1,...,m,andtheinitialXP isequaltothecurrentpositionX foralli=1,...,m. i i i Objectivefunctions Theproposedalgorithmusesthreetypesofclustervalidityindicesasobjectivefunctionsto achieveoptimization.Thesevalidityindices,DB-index,Sym-index,andConn-index,apply threedifferentdistances,namely,Euclideandistance,pointsymmetricdistance,andshortdis- tance,respectively.Eachvalidityindexindicatesadifferentaspectofgoodsolutionsincluster- ingproblems.Thesevalidityindicesaredescribedbelow. DB-index. ThisindexwasdevelopedbyDavies—Bouldin[16]whichisafunctionofthe ratioofthesumofwithin-clusterobjects(intra-clusterdistance)andbetweenclustersepara- tion(inter-clusterdistance).ThewithinithclusterC,S iscalculatedusingEq(5).The i i,q PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 6/23 AutomaticClusteringAlgorithm distancebetweenclustersC andC isdenotedbyd ,whichiscomputedusingEq(6). i j ij,t 0 X 11 1 q S ¼@ kp(cid:6)ckqA ð5Þ i;q n i 2 i p2Ci d ¼kc (cid:6)ck ð6Þ ij;t i j t wheren =|C|isthenumberofobjectsinclusterC,c istheclustercenterofclusterC andis i i P i i i definedasc ¼ 1 p,andqandtarepositiveintegernumbers.DBisdefinedas: i ni p2Ci X 1 k DB¼ R ð7Þ k i;qt i¼1 n o whereR ¼max Si;qþSj;q .AsmallvalueofDBmeansagoodclusteringresult. i;qt j;j6¼i dij;t Sym-index. Therecentlydevelopedpointsymmetrydistanced (p,c)isemployedinthis ps clustervalidityindexSym,whichmeasurestheoverallaveragesymmetryinconnectionwith theclustercenters[4].Itisdefinedasfollows.Letpbeapoint,andthereflectedsymmetrical pointofpwithrespecttoaspecificcentercis2c−pandisdenotedbyp(cid:3).Letknearunique nearestneighborstop(cid:3)beattheEuclideandistancesofd,i=1,...,knear.Thepointsymmetric i distanceisdefinedas: P kneard d ðp;cÞ¼d ðp;cÞ(cid:7)dðp;cÞ¼ i¼1 i(cid:7)dðp;cÞ ð8Þ ps sym e knear e whered(p,c)istheEuclideandistancebetweenthepointpandthecentercandd (p,c)isa e P sym symmetricmeasureofpwithrespecttoc,whichisdefinedas kneard=knear.Inthisstudy, i¼1 i knear=2.Theclustervalidityfunctionisdefinedas (cid:3) (cid:4) 1 1 Sym¼ (cid:7) (cid:7)D ð9Þ k ε k k P P whereε ¼ k E,E ¼ ni d(cid:3)ðpi;cÞ,piisthejthobjectofclusteri,andD ¼maxk kc (cid:6) k i¼1 i i j¼1 ps j i j k i;j¼1 i ckisthemaximumEuclideandistancebetweenthetwocentersamongallclusterpairs.Eq(8) j isusedwithsomeconstrainttocomputed(cid:3)ðpi;cÞ.Theknearnearestneighborsofp(cid:3)andpi ps j i j j shouldbelongtotheithcluster,wherep(cid:3)isthereflectedpointofthepointpiwithrespecttoc. j j i AlargevalueforSym-indexmeansthattheactualnumberofclustersandproperpartitioning areobtained. Conn-index. ThethirdclustervalidityindexusedinthisstudyisproposedbySahaand Bandyopadhyay[17],itdependsonthenotionofclusterconnectedness.TocomputeConn- index,thetherelativeneighborhoodgraph[25]structuringforthedatasethastobeconducted first.Subsequently,theshortdistancebetweentwopointsxandyisdenotedbyd (x,y)and short isdefinedasfollows: d ðx;yÞ¼mnpaitnhmneadixwðediÞ ð10Þ short i¼1 j¼1 j wherenpathisthenumberofallpathsbetweenxandyintheRNGstructuring;ned isthenum- i berofedgesalongithpath,i=1,...,npath;ediisjthedgeinithpath,j=1,...,ned andi=1,..., j i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 7/23 AutomaticClusteringAlgorithm npath;andwðediÞistheedgeweightoftheedgeedi.TheedgeweightwðediÞisequaltothe j j j Euclideandistancebetweenaandb,d(a,b),whereaandbaretheendpointsoftheedgeedi. e j TheclustervalidityindexConndevelopedbySahaandBandyopadhyay[17]isdefinedas follows: XX k ni d ðpi;mÞ short j i (cid:3)i¼1 j¼1 (cid:4) Conn¼ ð11Þ n mkin d ðm;mÞ i;j¼1;i6¼j short j i wherem isthemedoidoftheithclusterthatisequaltothepointwiththeminimumaverage i distancetoallpointsintheithclusterm ¼pi ,and i! minindex minindex¼argminni Pni dðpi;piÞ=n .TheminimumvalueofConn-indexmeanstheclus- t¼1 e t j i j¼1 tersinterconnectedinternallyandseparatelyfromeachother. Aftertheparticleshavebeenmovedtoanewposition,thethreeobjectivefunctionsare computedforeachparticleintheswarm.TheobjectivefunctionsforaparticlepositionXare {DB(X),1/Sym(X),Conn(X)}.ThethreeobjectivesareminimizedsimultaneouslyusingMOP- SOSAalgorithm. XPupdating Thepreviousbestpositionofithparticleatiterationtisupdatedbynon-dominantcriteria.XPt i iscomparedwiththenewpositionXtþ1.Threecasesofthiscomparisonareconsidered. i • IfXPitisdominatedbyXitþ1,thenXPitþ1 ¼Xitþ1. • IfXitþ1isdominatedbyXPit,thenXPitþ1 ¼XPit. • IfXPtandXtþ1arenon-dominated,thenoneofthemwillbechosenrandomlyasXPtþ1. i i i Thisupdateoccursoneachparticle. Repositoryupdating TherepositoryisutilizedasaguidebyMOPSOSAalgorithmfortheswarmtowardthePareto front.Thenon-dominatedparticlepositionsarestoredintherepository.Topreservethediver- sityofnon-dominatedsolutionsintherepository,sharingfitness[19]isagoodmethodtocon- troltheacceptanceofnewentriesintotherepositorywhenitisfull.Fitnesssharingwasused byLechugaandRowe[26]inmulti-objectiveparticleswarmoptimization.Ineachiteration, thenewnon-dominatedsolutionsareaddedintotheexternalrepositoryandeliminationofthe dominatedsolutions.Incasethenon-dominatedsolutionsareincreasedthanthesizeofthe repository,thefitnesssharingiscalculatedforallnon-dominatedsolutions.Thesolutionsthat havelargestvaluesoffitnesssharingareselectedtofilltherepository. Clusterre-numbering There-numberingprocessisdesignedtoeliminatetheredundantparticlesthatrepresentthe samesolution.TheproposedMOPSOSAalgorithmemploysthere-numberingprocedure designedbyMasoudetal.[7].Thisprocedureusesasimilarityfunctiontomeasurethedegree ofsimilaritybetweentheclustersoftwoinputsolutionsXtandXPt(orXGt).Thetwoclusters i i i PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 8/23 AutomaticClusteringAlgorithm thataremostsimilararematched.AnyclusterinXPt(orXGt)notmatchedtoanyclusterXt i i i willusetheunusednumberintheclusteringnumbering.MOPSOSAalgorithmusesthesimi- larityfunctionknownasJaccardcoefficient[27],whichisdefinedasfollows: _ n SimðC;C Þ¼ 11 ð12Þ j k n þn þn 11 10 01 _ whereC isjthclusterinXt, C iskthclusterinXPt,n isthenumberofobjectsthatexistin j _ i k i 11 _ bothC and C ,n isthenumberofobjectsthatexistinC butdoesnotexistin C ,andn is j k 10 _j k 01 thenumberofobjectsthatdonotexistinC butexistin C . j k Velocitycomputation MOPSOSAalgorithmemploystheexpressionsandoperatorsmodifiedbyMasoudetal.[7]. Thenewvelocityforparticleiatiterationtiscalculatedasfollows: Vtþ1 ¼ðW(cid:8)VtÞ(cid:9)ððR (cid:8)ðXPt(cid:10)XtÞÞ(cid:9)ðR (cid:8)ðXGt(cid:10)XtÞÞÞ ð13Þ i i 1 i i 2 i i whereW,R ,andR arethevectorsofncomponentswithvalues0or1thataregeneratedran- 1 2 domlywithaprobabilityofw,r ,andr ,respectively.Theoperations(cid:8),(cid:9),and(cid:10)arethemul- 1 2 tiplication,merging,anddifference,respectively. • Differenceoperator⊖⊖ ThedifferenceoperationcalculatesthedifferencebetweenXtandXPt(orXGt).Let i i lPt ¼ðlpt ;...;lpt Þ¼XPt (cid:10)Xt,andlGt ¼ðlgt;...;lgtÞ¼XGt (cid:10)Xtbedefinedasfol- i i1 in i i i i1 in i lows: ( XPt if Xt 6¼XPt lpt ¼ ij ij ij ð14Þ ij 0 otherwise ( XGt if Xt 6¼XGt lgt ¼ j ij j ð15Þ ij 0 otherwise • Multiplicationoperator⊗⊗ Themultiplicationoperatorisdefinedasfollows:letA=(a ,...,a )andB=(b ,...,b )are 1 n 1 n twovectorsofncomponents,thenA(cid:8)B=(a b ,...,a b ). 1 1 n n • Mergingoperator⊕⊕ Themergingoperatorisdefinedasfollows:letA=(a ,...,a )andB=(b ,...,b )betwovec- 1 n 1 n torsofncomponents,thenC=A(cid:9)B=(c ,c ,...,c ),where 1 2 n 8 >>>><bai iiff aai 6¼¼00aannddbbi ¼6¼00 c ¼ i i i ð16Þ i >>>>:aiorbirandomly if ai 6¼0andbi 6¼0 0 otherwise PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 9/23 AutomaticClusteringAlgorithm Positioncomputation MOPSOSAalgorithmemploysthedefinitiontogeneratenewpositions,asproposedby Masoudetal.[7].Thenewpositionisgeneratedfromthevelocityasfollows: ( Vt if Vtþ1 6¼0 Xtþ1 ¼ ij ij ð17Þ ij r otherwise whererisanintegerrandomnumberin½1;Kt þ1(cid:2)andKtþ1<K .Thispropertyenables i i max theparticletoaddnewclusters.Thepreviousoperatorsandthedifferencesinclusternumberof Xt,XPt,andXGtleadtotheadditionorremovalofsomeoftheclustersintheoutputofthenew i i positionXtþ1.Sometimesanemptyclustermayexist,whichleadstoinvalidparticleposition. i Suchaninstancecanbeavoidedbyexposingtheparticletoresetthenumberingclusters.There- -numberingprocessworksbyencodingthelargestclusternumbertothesmallestunusedone. MOSAtechnique MOSAmethod[18]isappliedintheMOPSOSAalgorithmatiterationtforparticleiincaseXt i dominatesthenewpositionXnew.Fig3presentstheflowchartfortheMOSAtechnique i appliedinMOPSOSA.TheprocedurefortheMOSAtechniqueisexplainedineightsteps below. 1. Step1:LetPSXandPSVbetwoemptysets,niterisamaximumnumberofiteration,and q=0. Qs 2. Step2:EvaluateEXPq ¼ expð(cid:6)½fjðXnewiÞ(cid:6)fjðXitÞ(cid:2)þ=TtÞ,wherethecoolingtempera- j¼1 tureT isupdatedinstep8ofMOPSOSAalgorithm.Generateuniformrandomnumber t u2(0,1),ifu<EXP ,gotostep7.Otherwise,proceedtothenextstep. q 3. Step3:AddXnew toPSXandVnew toPSV,thenPSXandPSVareupdatedtoinclude i i onlynon-dominantsolutions. 4. Step4:Ifq(cid:5)niter,thenchooseasolutionrandomlyfromPSXasthenewparticleposition Xnew andthecorrespondingvelocityVnew fromPSV,andproceedtostep7.Otherwise, i i q=q+1,andgeneratethenewvelocityVnew andpositionXnew fromtheoldpositionXt. i i i 5. Step5:Calculatetheobjectivefunctionf1(Xnewi),...,fs(Xnewi),andf1ðXitÞ;...;fSðXitÞ. 6. Step6:PerformadominancecheckforXnew,ifXnew isnon-dominatedbyXt,thenpro- i i i ceedtostep7.Otherwisegotostep2. 7. Step7:ThenewpositionandvelocityXnew andVnew areacceptedasthenewgeneration i i ofXiMOSAandViMOSA,respectively,XiMOSA ¼XnewiandViMOSA ¼Vnewi: 8. Step8:CheckthevalidityforXMOSA,andapplythere-numberingprocessifitisinvalid. i ReturnXMOSAandVMOSA. i i Selectionofthebestsolution Ingeneral,aParetosetcontainingseveralnon-dominatedsolutionsisprovidedonthefinal runofmulti-objectiveproblems[28].Eachnon-dominatedsolutionintroducesapatternof clusteringforthegivendataset.Thesemi-supervisedmethodproposedbySahaand PLOSONE|DOI:10.1371/journal.pone.0130995 July1,2015 10/23

Description:
MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). (PDF). Author Contributions. Conceived and designed the experiments: AA Pal SK, Mitra S. Fuzzy versions of Kohonen's net and MLP-based
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.