ebook img

An Algebra to Merge Heterogeneous Classifiers PDF

0.48 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Algebra to Merge Heterogeneous Classifiers

An Algebra to Merge Heterogeneous Classifiers PhilippeJ.Giabbanellia,∗,JosephG.Petersb aUniversityofCambridge,Cambridge,CB20QQ,UnitedKingdom bSchoolofComputingScience,SimonFraserUniversity,Burnaby,BritishColumbia,V5A1S6,Canada Abstract 5 Indistributed classification, eachlearnerobservesits environmentand deducesa classifier. As a learnerhasonlya 1 0 localviewofitsenvironment,classifierscanbeexchangedamongthelearnersandintegrated,ormerged,toimprove 2 accuracy.However,theoperationofmergingisnotdefinedformostclassifiers. Furthermore,theclassifiersthathave to be mergedmaybe ofdifferenttypesin settingssuch as ad-hocnetworksin whichseveralgenerationsof sensors n a maybecreatingclassifiers. We introducedecisionspacesasa frameworkformergingpossiblydifferentclassifiers. J We formallystudythemergingoperationasanalgebra,andprovethatitsatisfies a desirablesetofproperties. The 2 impactoftimeisdiscussedforthetwomaindataminingsettings. Firstly,decisionspacescannaturallybeusedwith 2 non-stationary distributions, such as the data collected by sensor networks, as the impact of a model decays over time. Secondly, we introduce an approach for stationary distributions, such as homogeneousdatabases partitioned ] M over different learners, which ensures that all models have the same impact. We also present a method that uses storageflexiblytoachievedifferenttypesofdecayfornon-stationarydistributions.Finally,weshowthatthealgebraic D approachdevelopedformergingcanalsobeusedtoanalyzethebehaviourofotheroperators. . s c Keywords: Modelcombination;Non-stationarydistributions;Unsupervisedmeta-learning [ 2 v 1 1. Introduction 4 1 Awidevarietyofsystems,suchassensorandpeer-to-peernetworks,arecomposedofindependententities. These 5 entitiesrecordobservationsfromtheirlocalenvironments,forexampleasdatabasesofobservedtuples. Theentities 0 oftenhavetoadapttofuturechanges,buttheirlocalviewsmaybetoorestrictedtocomputeclassifiersthatcanmake . 1 accuratepredictions.Therefore,aglobalclassifierthatcanbeusedtopredictglobaltrendsisoftendesirable.Entities 0 mightnotcomputeaglobalclassifierbyexchangingtheirdatasetsdirectly,becauseofsecurityconcerns,orbecause 5 the large quantity of informationcould overwhelmlimited resources such as local storage or battery power. Thus, 1 the global classifier has to be realized by combining the classifiers deduced by the entities. The need to combine : v classifiersalsoarisesinothersituations. SeveralexamplesillustratetheneedtocombineclassifiersinSection2,and i X otherscanbefoundintheseriesofannualInternationalWorkshopsonMultipleClassifierSystems[1]. Differentwayshavebeenproposedtorealizeaglobalclassifier(e.g.,ensembleclassifiers,meta-learning). These r a wayshavebeengroupeddifferently,butakeydistinctioncanbefoundbetweenfusionandselection([17],p. 106). Intuitively,classifiersarespecializedwhentheyarerestrictedtoinstanceswithspecificfeatures:whenanewinstance comesin,theappropriateclassifieristhusselected. Alternatively,classifiersmaybedesignedforinstanceswiththe samefeatures,andtheyarethenfused.Inthispaper,weareconcernedwiththeintermediatecase:classifierscanhave anoverlappingfeaturespace, butsomemaybe morecompetentin partsofthatspace thanothers. Thisistypically reflectedinapproachesthatweighttheoutputsoftheclassifiersdependingontheinstancebeingclassified [17]. To emphasizethatourapproachisintermediate,wesaythatwemergeclassifiersinsteadofselectingorfusingthem. Mergingclassifiersisacomplicatedoperationforseveralreasons.Firstly,classifierscanbeofdifferentstructures, suchasdecisiontreesandsupportvectormachines. Secondly,conflictsarisewhenclassifiersdifferontheirpredic- tions. Solving these conflicts either uses heuristics (e.g., pure meta-learning [6, 28, 12]), or requires access to the ∗Correspondingauthor.Phone:+44(0)1223330315,FAX:+44(0)1223330316 Emailaddresses:[email protected](PhilippeJ.Giabbanelli),[email protected](JosephG.Peters) datasetused totrain theclassifiers, whichisundesirablein settingssuchassensornetworksdueto thehighcostof transmission. Inthispaper,weconsiderthecaseofmeta-learninginwhichtheclassifiersarepotentiallyofdifferent types.Theconceptofmeta-learningandtheprecisetypesaredescribedinSection3. Mostofthepreviouslyproposedsolutionshavebeenapplication-specificandwerevalidatedthroughexperiments, leadingresearcherstosuggestthatquestionsaboutcombiningclassifiersare(still)open“becausewedonotyethave a scientific understandingof the classifier combinationmechanisms”[16]. Consequently,our approachemphasizes theformalaspects. InSection4,weintroducedecisionspacesasanalgebraicframeworktoinvestigatethebehaviour ofdistributeddataminingalgorithmsinwhichlearnerspropagateonlylocalmodelsandnotlocalobservations. This frameworkcanbeusedinalargevarietyofcasessinceitallowsthecombinationofdifferentmodels(e.g.,decision trees,rulesets,supportvectormachineswithlinearkernels)anddoesnotrelyonhomogeneousobservations.InSec- tion5,wepresentaheuristicmergeoperatorthatsolvesallconflictsbetweenmodelswithoutusingtheirobservations. Weusethealgebratoprovethattheoperatorsatisfiesasetofdesirableproperties,suchascommutativity: whentwo modelsaremerged,theresultdoesnotdependonwhichmodelcomesfirst. However,theoperatorisnotassociative: ifthreeormoremodelshavetobemerged,thentheorderinwhichtheoperatorisappliedaffectstheresult. Thiscan be a desirable behaviourin settings such as data streams [20], in which the distribution changesover time and the mostrecentlyreceiveddataisconsideredtobethemostrepresentativeofcurrenttrends. Inothersituations,suchashomogeneousdistributions,thisbehaviourisnotappropriate. Forexample,amassive homogeneousdatabasecouldbepartitionedamongcomputationunitsforthesakeofefficiency,andthecontribution ofeachunittothefinalresultshouldnotdependonthetimeatwhichitsendsitsmodel.InSection6,wedeveloptwo protocols,or schemes. Thefirstoneusesstorageflexiblyto achieveseveraltypesofdecayina data streamsetting, andcanbeusedforthecaseofahomogeneousdatabase.Thesecondschemeusesnostoragespaceandhasnodecay. Ourformalframeworkcanalsobeusedtocharacterizeothercommon,yetdifficult,problemsofdatamining.For example, a learner can generate a sequence of models and analyze this sequence to find patterns of changesin the underlyingsystem. Inadatastreamsetting,thisisreferredtoasablindmethodoperatingoveraslidingwindow. In Section7,wedefineareductionoperatorthatreducesasequenceofmodelstoasinglemodeltopermiteasieranalysis of the sequence, and we characterizeits propertiesusing an algebraicapproach. We also briefly discuss how more complexoperatorscanbedefinedusingcompositionsofthereductionandmergeoperators. 2. Applications 2.1. Combiningmaps Letusconsiderthatwearemonitoringtheincidenceofachronicdiseaseinacity. Foreachpartofthecity,we want to know whether the incidence is high enoughto prompta specific action. Different groupsprovide maps of the disease, butthese groupsmightnotuse the same spatial unit. One groupmay havedividedthe city intoblocks whileanotheroneusesadministrativeborders. Furthermore,reportsmayhavebeenreleasedindifferentmonths,and administrative borders may have evolved. Thus, we need to merge maps reporting on the incidence with different units. Specifically, we do not have access to individualdata (i.e., the data points uponwhich the maps are based), andwemustguaranteethatallmapshavethesameimpactinourcompositepicture. Thedecisionspaceframework thatweproposeisabletohandletheserequirements,andthealgebracanbeusedtoprovepropertiesofthecomposite map.Ourframeworkcanalsohandlethesituationinwhichthemonitoringwouldbedoneoveralongperiodoftime, andmorerecentmapsneedtohaveahigherimpactinthecompositemap. Notethat,inourframework,eachelement ofthemaphastobeapolygon;ifaregionisdelimitedbycurvesthenitwillhavetobeapproximated. 2.2. Mapalgebras The problem of combining maps is classically faced by geographical information systems (GIS). In the early 1980s, Tomlin proposed to consider maps as two-dimensional arrays and to design script-like languages to ma- nipulate them in GIS [26]. These languages allow operations such as the selection of areas (e.g., School = area distance(SCHOOLS) < 300m),aswellasthecombinationofselectedareasusingnumericaloperations. Thus,map algebrasarespecializedprogramminglanguagesformanipulatingthecellsofgrids,whichnowofferthepossibility toprocessmapswithinGISusingcellularautomataorimage-processingtechniques[7,22]. Whilethesealgebrascan beusedtocombinemaps[9],thereisno(mathematicallyproven)controloftheimpactthateachmaphasonthefinal combination. 2 2.3. Wirelesssensornetworks Wirelesssensornetworks[2]areanaturalapplicationforourapproach.Theycanbefoundinapplicationssuchas forestfiredetection,wheretemperaturesensorsmeasurethecurrenttemperatureanditsrateofchange.Theyarealso usedtostudyearthquakeactivitybymeasuringthestrengthanddurationofseismic wavesin theearth’scrust. The measurementsareforwardedtocollectionnodes,whichforwardthedatatoabasestationforprocessing. Atypical sensorhasamodestbatterylifewhichcanbequicklydrainedbythetransceiverwhensendinglargeamountsofdata. Thus it is infeasible to forward all of the collected data to the base station. However, sensors have the computing capabilitiestoperformsummariessuchasaveragesandstandarddeviationswhichcanthenbesenttothebasestation. Inourapproach,eachsensorcanconstructaclassifierbasedonitslocalobservations,andsenditasasummary. This summary is richer than a simple average when the goal is to make predictions. This richness may come at a cost in some classifiers: for example, a model derived using random forest classifiers [5] may be heavier than its trainingdata,andwouldnotbeappropriateforwirelesssensornetworks.However,themodelisusuallylighterwhen itsimplifiesthedata(e.g.,usingpruningindecisiontrees),andwefocusonclassifiersthatleadtosuchmodels.Once a classifier is sent by each sensor, a collection node can merge the classifiers that it receives to form a model of a regionbeforeforwardingittowardthebase station. Themergingof classifiersprovidesa trade-offbetweenbattery consumptionandpredictionaccuracy.Themergingimprovestheaccuracyofpredictionscomparedtosendingsimple averagesinexchangeforanincreaseinbatteryconsumption. Themergingalsoprovidesatransparentmeanstogive greaterweighttothemostrecentobservationswhenthenetworkisdeployedoveralongtime. 3. Background Ourworkfocusesonclassifiers. Aclassifierlearnsfromadatasetofexampleinstancesabouthowtargetattributes arebasedonthevaluesofpredictiveattributes.Then,itcanaddresstheclassificationproblemofpredictingthetarget attributesofanewinstancebasedonthevaluesofitspredictiveattributes. Withoutlossofgenerality,weexplainour frameworkbyfocussingononetargetattribute,buttheoperationscanbeappliedforanynumberoftargetattributes. Formally,aninstancehastheform(a ,...,a ,y)wherea isthevalueofthei-thattribute,andyisaclasslabel 1 m i for this data. For example, suppose that there are two instances (a = 30,y = No) and (a = 70,y = Yes) where 1 1 attributea isanagein[0,125]andtheclasslabeliseitherYesorNo,correspondingtoanindividualbeingclassified 1 asoldornot. Thegoalisto findthebestapproximationto the function f(a ,...,a ) = y thatdeterminesthe label 1 m ofanunclassifiedinstancegiventhevaluesofitsattributes. Threeofthemostcommonlyusedtechniquestotraina classifier,i.e.,todeduceanapproximationof f givenasetofinstances,are[15,14] • ASupportVectorMachine(SVM)classifiestheinstancesintotwoclassesbyseparatingthemwiththe(m−1)- dimensionalhyperplanethat leavesthe maximummarginbetweenthe two classes [25]. It is also possible to obtainnon-linearclassificationsusingthekernelmethod[4]. • One of the most popular classifiers is the decision tree. A decision tree learner [23] applies a divide-and- conquertechniquetorecursivelysplitthedatausingthevalueofanattributewhilemaximizingametricsuch astheinformationgainorGiniindex. Eachsplitisrepresentedasanodeandtherecursiveprocedureyieldsa tree. Thus, a pathin the tree correspondsto a set of conditionson the valuesofthe attributes, and leadsto a classdistributionvectorexpressingthepercentageofinstancesineachclass. • A classifier canalso bea setofrulesbasedonthevaluesofthe attributes. Thisis commonlyreferredtoasa ruleset [8]. Aruleisaconjunctionofconditionsontheattributesthatresultsinaclassdistributionvectoras inadecisiontree. Anattributecanberepeatedatmosttwiceinarule,tospecifyalowerboundandanupper boundofaninterval. Inmeta-learning,eachlearnerbuildsaclassifierusingonlyitsowndata. Forexample,eachunitinanetworkof radarsensorscancollectdataaboutthemovementsoftargetswithinitsradius. Then,eachsensorusesitsowndata todeduceaclassifier suchasa decisiontree. Insuchanenvironment,allsensorsmaywanttohavea classifierthat isasaccurateaspossibleandthustheyexchangetheirclassifiersandmergethem. Inanothersetting,onemaywant to achieve a speedup by using parallel algorithmsand thus a homogeneousdatabase is partitioned among different 3 computationunits, either byprovidingthemwith subsets ofobservationsor subsets ofattributes(verticalpartition- ing [24]). Then, the units send their classifiers to common sources in charge of merging. In both homogeneous databasesanddistributedenvironments,theproblemistomergeasetofclassifierstocreateasingleclassifier. Anumberofresearchershavefocussedonmergingdecisiontrees:itwasnotedin[3]that“akindofdecisiontree induction[thatis]efficientinawideareasystememploysmeta-learning,[inwhich]eachcomputerinducesadecision treebasedonitslocaldataandthenthedifferentmodelsaremergedtoformthefinaltree”. Forexample,ithasbeen proposed[13]totransformdecisiontreesintothesetsofrulesthattheyrepresentandtomergethosesets. Arulenot in conflict with other rules is kept intact and otherwise the conflicts are resolved using a heuristic. However, there aretwopotentialproblemswiththedesignoftheheuristicproposedin[13]. Firstly,theauthorsarguethatconflicts thatarenothandledbytheheuristicare“unlikelyifthetrainingsetscontainsimilardistributionsofexamplesfrom a coherentlargertrainingset”, thusthe approachis limited to homogeneousdata bases. Secondly,their algorithm couldaskalearnertosendallofthedataonwhichthereisaconflictinordertoperformdataminingagain,andthis preventsusesinsettingssuchassensornetworks. Therequirementthatthedataneedstobeexaminedincaseofconflictsisfoundinseveralotherapproaches. For example, arbiter meta-learning[27] is a techniquethat mergesclassifiers in a hierarchicalway. Thistechniquehas beendesignedforhomogeneousdatabases,andpartsofthedatamustbepropagatedduringthecombinationprocess. Ganti and colleagues [11] focused on a different problem: they examined how one could quantify the difference betweentwodatasetsbyanalyzingtheirmodels,whichcanbeusedtodeterminewhetheramodelshouldbeupdated. Theirconceptsaresimilartooursinthattheyconsideredamodelasageometricalspacedividedintounits,eachof whichisassignedavalue,andcombiningtwomodelsrequiresscanningthedata.Thekeydifferenceofourframework, introducedinthenextsection,isitsabilitytomergepotentiallyheterogeneousclassifierswithoutrequiringanydata besidesthemodels. 4. Aframework:decisionspaces 4.1. Introducingthestructure Intuitively, a decision space is an m-dimensionalspace, in which each dimension corresponds to one of the m attributes. It contains a set of non-overlappingelements which, if they cover all the space, form a partition of the space.Ageometricalinterpretationofanelementisasubspacedefinedbyanm-polytope.Thus,theonlyrequirement for the classifiers considered in this paper is that their elements have to be polytopes, which is the case for SVMs withlinearkernels,decisiontrees,andrulesets. Ourstudyofpolytopesisafirststeptoinvestigatingthetheoretical behaviourofpuremeta-learning.Furtherresearchcouldremovetherestrictiontopolytopestogeneralizeourapproach tohandleotherclassifierssuchasSVMswithnon-linearkernels[4]. Informally,non-linearkernelscandefineshapes intermsofcurveswhereaspolytopescanonlyuselines. Each element, or polytope, has a class distribution vector that specifies the percentage of instances within the coveredspacethatareassignedtoeachclass. Determiningthepercentagesbyenumeratingtheinstancesisstraight- forwardforthethreeclassifiersconsideredhere. AnexampleofadecisionspaceisshowninFigure1(a): ithastwo attributes,degreeandage,andthreeelements,eachwithaclassdistributionvectorofsize2(withclassesYesandNo). Anyclassifieriscapableofproducingalabelforagiveninstance,andcanberepeteadlypromptedforlabelsovera discretespace. Therefore,anyclassifiercanprovideaclassdistributionvector. InXu’scategorizationoftheoutputs used to combine classifiers, our requirementis the most universal(Type 1) [29]. These conceptsare formalizedin Definitions1and2. Definition1. Adecisionspaceisanm-dimensional(product)spaceD ×···×D ×···×D wheremisthe attr1 attri attrm numberofattributesandD isthespacecoveredbythei-thattributespecifiedasaboundedposet(i.e.,apartially attri orderedsetwithaleastandagreatestelement). Definition2. Anelementofanm-dimensionaldecisionspaceDisasubspaceofD,i.e.,anm-dimensionalpolytope (m-polytope). It is identified by a set of coordinatesfor each attribute, and has a class distributionvector V with c components,wherecisthenumberofclasses. Thei-thcomponentofthevectoristhepercentageofinstancesinclass i,whichisobtainedbycountingallinstancesthathaveclassiandarewithintheelement’sspace. Thus,thecontent ofthevectorsumsto100%. WewillrefertothevectorV asthevalueoftheelement. 4 Figure1:Decisiontree(a)anddecisionspace(b)representations. 4.2. Conversions Theconstraintsonthestructureofaclassifierareusedbyadataminingalgorithmtoguidethesearch.Adecision spaceisaframeworkanddoesnotresultdirectlyfromadataminingalgorithm,thusitsstructureislessconstrained thanclassifiers suchasdecisiontrees. Thisensuresthatelementsfromseveralkindsofclassifiers canbeconverted to elements of a decision space with no loss of information. To converta classifier into a decision space, the only datarequiredbesidestheclassifieritselfaretherangesofattributesforthedatasetonwhichtheclassifierwastrained. Theserangescanbededucedinonepassoverthedatasetbyscanningforthe maximumandminimumvalues. The ranges can also be user-supplied, but should not be smaller than what is found in the dataset for consistency sake. Thus, givena classifier and the ranges of the attributes, the main task of the conversionis to extractthe individual elements,orpolytopes.ThepolytopesforelementsofthethreeclassifiersdefinedinSection3,inorderofincreasing constraintsontheshapes,areasfollows: (1) TheelementsinanSVMcanhavethemostgeneralshapesbecausethedatacanbeassignedintoregions. (2) Aruleinarulesetdefinesanaxis-parallelrectangle. (3) Each path of a decision tree (Figure 1(b)) can be converted to a rule, and this rule defines an axis-parallel rectangle(Figure1(a)). A decision tree cannot generate all axis-parallel rectangles. Indeed, a decision tree belongs to the data mining familyofdivide-and-conqueralgorithmsthatimposesconstraintsonthesearch. Intuitively,acutinthespacealong theborderofanelement,eitherverticalorhorizontal,shouldnotcutanyelement[10]. Anexampleofasetofrules thatviolatesthisconstraintisgivenbelow,andshowninFigure2. IFage≥0andage<4anddegree≥0anddegree<2THENA IFage≥4andage<6anddegree≥0anddegree<4THENB IFage≥0andage<2anddegree≥2anddegree<6THENC IFage≥2andage<6anddegree≥4anddegree<6THEND IFage≥2andage<4anddegree≥2anddegree<4THENE TheelementsofrulesetsanddecisiontreescanbeconvertedtoelementsofadecisionspaceusingAlgorithm1. Thealgorithmusespattern-matching.Forexample,inline4,attr OP ={<,≤}val isapatternforwhichthevalue k 1 kup ofanattributehastobelowerorstrictlylowerthanthevalueval . Ifthepatternisfound,thenattr ,OP ,andval kup k 1 kup areboundtotheactualvalues. Forexample,apatternage < 5willresultinthebindingattr = age,OP = “ < ”, k 1 val =5. kup Foreachrule,thealgorithmconsidersallpatternsthatspecifyanupperbound(line4). Ifthereisalsoapattern specifyingalowerboundforthesameattribute(line6),thentherangeoftheattributecanbespecified. Ifnopattern isfoundforthelowerboundofanattributeattr (line15),thenweusethelowerboundoftheattribute’srangewhich k wedenotemin(D ). Finally,ifnoupperboundisfoundforattr (line20),weusetheupperboundoftheattribute’s attrk k rangewhichwedenotemax(D ). attrk 5 Figure2:Apartitionofspacenotallowedbydecisiontreesbutallowedbydecisionspaces. Convertingelementsfromasupportvectormachinefollowsasimplerprocess:eachelementofthedecisionspace correspondstoexactlyoneelementoftheSVMwiththesamedistributionvectorandthesamecoordinatesspecifying thespacespanned. Algorithm1 RulesToSpaces(RulesetR,AttributerangesD ×···×D ) attr1 attrM Require:Rulesareexpressedinthefollowingform: r:=IFattr ≍val AND···ANDattr ≍val THENclass= X 1 1 m m 1. DecisionspaceS ←∅ //Initially,thedecisionspaceisempty 2. forr∈Rdo //Eachruleoftherulesetisconvertedtooneelement 3. elemente.value←r.X //Theelementsvalueistheonepredictedbytherule 4. P←allpatternsattr OP ={<,≤}val inR //Aruleisaconjunctionofpatterns,eachdelimitinga k 1 kup bound 5. for p∈Pdo //Useeachupperboundtofindtheelementsspace 6. if∃apatternattr OP ={>,≥}val inRthen //Checkifthereisalowerboundonthesamevariable k 2 klow 7. ifOP is<andOP is>then //Thetwoboundsdonotincludetheendpoints 1 2 8. e.space←e.space∪(val ,val ) klow kup 9. elseifOP is<andOP is≥then //Onlytheupperendpointoftheboundisincluded 1 2 10. e.space←e.space∪(val ,val ] klow kup 11. elseifOP is≤andOP is>then //Onlythelowerendpointoftheboundisincluded 1 2 12. e.space←e.space∪[val ,val ) klow kup 13. else //Bothendpointsoftheboundareincluded 14. e.space←e.space∪[val ,val ] klow kup 15. else //Weonlyhaveanupperboundontheelement 16. ifOP is<then 1 17. e.space←e.space∪[min(D ),val ) attrk kup 18. else 19. e.space←e.space∪[min(D ),val ] attrk kup 20. ifP=∅then //Iftherewasnoupperbound 21. P←allpatternsattr OP={>,≥}val inR //Thenaccessthelowerbound k klow 22. for p∈Pdo 23. ifOP is>then //Thelowerboundisexcluded 2 24. e.space←e.space∪(val ,max(D )] klow attrk 25. else 26. e.space←e.space∪[val ,max(D )] //Thelowerboundisincluded klow attrk 27. S ←S ∪e //Addtheelementtothedecisionspace 6 5. Mergeoperator 5.1. Preliminarydefinitions Themostfundamentaloperationondecisionspacesismerge.Inthefollowing,weusetheterm“element”torefer toageometricalspacespanned,and“value”torefertoaclassdistributionvector. GiventwodecisionspacesX and Y,wemergethemintoZusingthefollowingprinciples: Mergeprinciples (1) Ifanelement(subspace)x∈ Xdoesnotintersectwithanyelementy∈Y,thenthepredictionrepresentedby xhasnoconflictsandcanbeaddedtoZ. Anelementy∈Y withnoconflictsistreatedsimilarly. (2) Ifanelementy ∈ Y isstrictly containedwithinanelement x ∈ X withthesame value(i.e.,thesame class distributionvectorV),thenweconsiderytobetoospecialized(asexplainedinDefinition5)anddeleteit. (3) Ifneitherofthefirsttwoconditionsissatisfied,thentheelement x ∈ X intersectswithatleastoneelement ofY andconflictsmustberesolved. Prior to establishing the algorithm to merge elements, we introduce the formal notation on which it relies, and specifythewaysthatelementscanintersect. Fromhereon,weusethefollowingnotationforanelementx∈X: • xhasasetofattributesA(x). • Eachattributea∈ A(x)coversa(one-dimensional)spaceS(x,a).Itcanbeanintervalsuchas[8,10],oraunion ofintervals. • Thesizeofthespacecoveredbyxfortheattributeaisdenoted|S(x,a)|. • xhasaclassdistributionvectorV(x). Definition3. An element x ∈ X subsumes an element y ∈ Y, denoted y (cid:22) x, if A(x) = A(y) and ∀a ∈ A(x), S(y,a)⊆S(x,a). Inotherwords,anelementyissubsumedbyanelementxwhenthespacecoveredbyyisincludedinthespacecov- eredby x. Wedenotestrictsubsumptionby≺when∀a∈ A(x),S(y,a)⊂ S(x,a). Themainpropertyofsubsumption isestablishedbyTheorem1. Theorem1. LetXandY betwodecisionspaces. Foreachy∈Y,thereisatmostonex∈X suchthaty(cid:22) x. Proof. The elements of X and Y partition the spaces formed by X and Y, respectively. Thus, one element (i.e., componentofapartition)canbecompletelyincludedinatmostoneotherelement. Definition4. Theintersectionofanelementx∈ XwithadecisionspaceY isdenotedx⊎Y andisthelargestsubset of elements I = {y ,...,y } ⊆ Y such that for each y ∈ I, ∃a ∈ (A(x)∩ A(y)) such that S(x,a)∩S(y,a) , ∅. 1 n i i i Subsumptionisaspecialcaseofintersection. ThefirsttwoprinciplesofmergingcanbehandledbythenotionsinDefinitions3and4. Forthethirdprinciple, weresolveeachconflictbetweentwoelementsx ∈ X andy ∈ Y bycreatinganewelementzforeachintersection. A simpleapproachwouldbetoassigntozavaluethatistheaverageofthevaluesofxandy,butitwouldnottakeinto considerationthespacescoveredby xandy(i.e.,theirregionsofcompetence). Consequently,Definition5specifies howtomeasurethespacecoveredbyanelement,whichwecallitsspecialization(alsoknownascompetence).Then, Definition 6 calculates the value of z is calculated as a weighted average based on the spaces covered by x and y. It should be noted that Definition 6 is a classical combination scheme intermediate between classifier fusion and selection (cf. Introduction). In these schemes, all classifiers contribute to the outcome of a given space space with weightsbasedontheircompetenceforthatspace[17]. P |S(x,a)| Definition5. Thespecializationofanelementx∈XisM(x)= a∈A(X) . |A(X)| 7 Intuitively,we sumthesizesofthespacesoftheattributescharacterizing x andwe normalizebythenumberof attributes. Small values of M indicate specialized predictionsbased on small spaces. Other possible metrics could takeintoaccountthenumberofinstances,ortheclassdistributionvector.However,inthecaseofpuremeta-learning investigatedinthispaper,wecannotusethenumberofinstancesortheclassdistributionvectorinaspecificpartof anelement. Thiscouldonlybedonebyrequiringtheoriginalclassifiertoperformanexhaustivesearchinitsdataset, andrepeateduseofsuchanoperationcouldleadtosignificantoverhead. Therefore,suchalternativemetricsarenot consideredhere. Definition6. Thevalueoftheintersectionoftwoelementsx∈ Xandy∈Y basedontheirspecializationsistheclass distributionvector M(x) M(y) V(x⊗y)=V(x)× +V(y)× . M(x)+M(y) M(x)+M(y) Iftheelementspredictseveralattributes,thentheclassdistributionvectorforeachattributeiscomputedusingthe aboveformula. Other formulae for V(x⊗y) could be designed to suit application-specificneeds. Indeed, both the element’sspaceanditsvaluemustbetakenintoaccount,butspecificapplicationsmayrequirethatthespecialization benormalizeddifferently. However,careshouldbetakentoensurethatcustomformulaesatisfy thecommutativity, idempotency,anduniqueidentityproperties.Indeed,suchpropertiesarecriticalforprovingthatthemergealgorithm behavesappropriately,aswillbeshowninSection5.3. 5.2. Mergingalgorithm Themergeoperator⊗:(X,Y)7→ZisdefinedinanalgorithmicwaybyAlgorithm2andisillustratedinFigure3. First,weapplyMergeprinciple(1): allelementswithnointersectionareadded. Then,weapplyprinciple(2): each element x ∈ X thatisstrictlycontainedinanelementy ∈ Y withthesamevalueisdeleted. Finally,principle(3)is appliedasfollows: 8 Algorithm2 ⊗:(X,Y)7→Z 1. Z ←newdecisionspace 2. H ←∅ //Hashmap:set(y,y′)ofkeysyandassociatedvaluesy′ 3. forx∈ Xdo //Foreachelementx 4. if∄y∈Y suchthatx≺yandV(x)=V(y)then //If there is no y that subsumes it with the same value, then processit 5. if x⊎Y =∅then //If x doesnotconflictwith any elementof Y, then add itto theresult 6. Z ←Z∪x 7. else //Otherwise,resolvetheconflicts 8. tmp← x.space //Theinitialspaceofxissaved 9. fory∈ x⊎Y do //Createanelementztohandleeachconflict 10. z←newelement 11. z.space←tmp∩y.space //zs space is the one initially sharedby x and the conflicting element 12. z.value←V(x⊗y) //zsvalue(classdistributionvector)isbasedonbothxsandys 13. Z ←Z∪z //zisaddedtotheresultingdecisionspace 14. if∄(y,y′)∈H then //Ifweneverregisteredaconflictwithythenregisterit 15. H ←H∪(y,y.space\z.space) //Registerthatthespaceofythatconflictedhasbeenhandled 16. else //Otherwiseupdatethepreviousrecord 17. H ←H\(y,y′) 18. H ←H∪(y,y′.space\z.space) 19. x.space← x.space\z.space //Alsoupdatethatsomeofxsconflictingspacewashandled 20. if x.space,∅then //If, after solving the conflicts, x has (nonconflicting)space thenwekeepit 21. Z ←Z∪x 22.fory∈Y suchthat∄(y,y′)∈ Hdo //Ifsomeelementsyhaveneverfoundtobeinaconflict... 23. if∄x∈ Xsuchthaty≺ xandV(y)=V(x)then //...andthattheyarenotsubsumedbyanelementxwiththe samevalue 24. Z ←Z∪y //Thenweaddthemtotheresult 25.for(y,y′)∈H do //Foreachelementythatwasfoundtobeinaconflict 26. ify′ ,∅then //Ifsomeofthiselementwasfreeofconflict 27. Z ←Z∪y′ //Thenweaddit 28.returnZ • Weconsiderallpossibleintersectionsx⊎Y andhandlethemsequentially. • Foreachy ∈ (x⊎Y),wecreateanelementz. Thespacecoveredbyzistheintersectionofthespacescovered byxandy,andthevalueisV(x⊗y)(Definition6). • Therewillbea“remainder”whentheprocessisfinishedif xonlyhasapartialintersectionwithY. Thus,we removefromxthespaceofeachzresultingfromtheintersection,andweaddtheremainder,ifany,totheresult. Afterapplyingallthreemergeprinciplesforeachelementx∈X,weprocesstheelementsy∈Y. Mergeprinciples (1)and(2)havetobeappliedtotheseelements,butprinciple(3)canbepartiallyavoided.Indeed,theelementsinthe intersectionofX andY havealreadybeencomputedwhenprocessingX. WeuseahashmapH asacache,inwhich theelementyisusedasakeyanditscorrespondingvaluey′ inthehashmapisthespacethatisupdatedthroughout theprocess. Whenanintersectionbetween x andyisfound,thepartintheintersectionisvirtuallyremovedfromy bychangingthevalueforyinH. Afteralloftheintersectionshavebeencomputed,Hcontainstheremainderofeach elementofY andthusitcanbeaddeddirectlytotheresult. Example. InFigure3,theleftdecisiontreewastrainedonadatasetwiththedegreerangingfrom0to15,andthe agerangingfrom0to8. Therightdecisiontreewastrainedonadatasetwiththedegreerangingfrom0to11,and 9 Figure3:Mergingtwodecisiontreesbyconvertingthemintodecisionspacesandcreatingauniondecisionspace. theagerangingfrom0to9. Thelightgreyelementintheleftdecisionspaceandtheelementwiththecheckerboard pattern in the right decision space intersect. As two elements converted from decision trees only intersect in one contiguousspace,theresultisthenewspacezwhoseagerangesfrom7to8,andwhosedegreerangesfrom3to10. UsingDefinition6,thepercentagefortheclassYesis(15 ×40)/(15 + 13)+(13 ×0)/(15 + 13) ≈ 21,andsincethere 2 2 2 2 2 2 areonlytwoclassesinthisexample,thepercentagefortheclassNois100−21=79. Theintersectionofrectanglesinm-dimensions(formattributes)canresultinspacesthatarenotrectangles. For example,Figure4showstheintersectionoftwodecisionspacesXandY. DecisionspaceXcontainsfourrectangular elementsandYcontainsonerectangularelement,buttheintersectionoftheelementofYwithanyelementofXleaves anon-rectangularremainderofY. Constrainingelementstoberectanglesrequiresaheuristicthatcanbiastheresult since rectangles are only an approximationof the element’s actual specialization. We avoid this problem by using polytopesinsteadofrectanglestoprovideanexactalgebra. Thepossibilitythatelementsmayhavetobeconstrained tosimplepolytopes(e.g.,rectangles)forcomputationalreasonsisdiscussedintheConclusions. 5.3. Algebraicproperties The goalof a mergeoperator⊗ is to mergethe informationof two decisionspaces, resolvingany conflictsthat arise. Thus,ithastoobeyasetofalgebraicpropertiesinordertobeconsistent: 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.