Table Of Content

Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based probabilistic WSD algorithm Andon TCHECHMEDJIEV1 Didier SCHWAB1 Jéroˆme GOULIAN1 GillesSE´RASSET1 (1)Univ.Grenoble-Alpes,LIG-GETALP,France andon.tchechmedjiev, didier.schwab, jerome.goulian, gilles.serasset @imag.fr { } ABSTRACT Inthisarticleweproposeamethodbasedonsimulatedannealingfortheparameterestimationof probabilisticalgorithms,wherethesolutionprovidedbythealgorithmcanvaryfromexecution to execution. Such algorithms are often very interesting to solve complex combinatorial problems,yettheyinvolvemanyparametersthatcanbedifficulttoestimatemanuallydueto theirrandomizedoutput. Weappliedandevaluatedamethodfortheparameterestimation ofsuchalgorithmsandapplieditforanAntColonyAlgorithmforWSD.Fortheevaluation, weusedtheSemeval2007Task7corpus.Wesplitthecorpusandtookinturnonetextasa trainingcorpusandthefourremainingtextsasatestcorpus.Wetunedtheparameterswithan increasingnumberofsentencesfromthetrainingtextinordertoestimatethequantityofdata necessarytoobtainanefficientandgeneralsetofparameters.Wefoundthattheresultsgreatly dependonthenatureofthetext,evenaverysmallamountoftrainingsentencescanleadto goodresultsifthetexthastherightproperties. RÉSUMÉ (French) EstimationdeparamètresàbasedeRecuitSimulésousincertitudeappliquéeàunalgo- rithmesàcoloniesdefourmisprobabiliste. NousproposonsuneméthodebaséesurunRecuitSimulépourl’estimationdeparamètrespour desalgorithmesprobabilistesoùlessolutionsgénéréesvarient.Cesalgorithmessontsouvent trèsintéressantpourlarésolutiondeproblèmescombinatoirescomplexes,maisilsrequièrent denombreuxparamètrespouvantêtredifficilesàestimermanuellementàcausedelanature aléatoiredessolutions.NousavonsappliquésPlusspécifiquement,nousappliquonsetévaluons cetteméthodeàpourestimerlesparamètresdetelsalgorimesetl’appliquonsàunAlgorithme àColoniesdeFourmispourladésambiguïsationlexicale.Pourl’évaluation,nousavonsutilisé lecorpusdeSemeval2007Tâche7.Nousavonsrepectivementséparéuntextecommecorpus d’entraînementetlesquatreautrescommecorpusdetest.Nousestimonslesparamètrespour unnombrecroissantdephrasespourdéterminercombiendedonnéessontnécessairespour obtenirunensembledevaleursdeparamètresgénéralesetefficaces.Nousconcluonsquela qualitédesrésultatsdépendsdelanaturedestextes.Mêmedespetitesquantitésdedephrases peuventsuffiràobtenirdebonsrésultats,dumomentqueletexteàlesbonnespropriètés. KEYWORDS: Word Sense Disambiguation, Parameter Estimation, Simulated Annealing, Uncertainty, StochasticAlgorithms. KEYWORDSINFR:DésabiguïsationLexicale,EstimationdeParamètres,RecuitSimulé,Incertitude,Algo- rithmesStocastiques. ProceedingsoftheFirstInternationalWorkshoponOptimizationTechniquesforHumanLanguageTechnology,pages109–124, COLING2012,Mumbai,December2012. 109 1 Introduction WordSenseDisambiguation(WSD)isaverydifficultyetcentraltaskinNaturalLanguageProcessing (NLP),thatpertainstothelabellingofthewordsofatextwiththesensesthatmostcloselymatchthe context. NumerousapproachesexisttotackleWSD,yetmanyofthesemethodshaveincommona sizeablecomplexity,reflectedbyanumberofparametersthatneedtobetunedinordertoobtainthebest performance.Dependingonthenumberofparameters,itcanbeverydifficultforahumantodirectly estimatetheoptimalparameters.Eventhen,itremainsaquestionofguessworkwithnoguaranteeof success. Thisissueisallthemoresalientwithalgorithmsandmodelsthattypicallyinvolvemanyparametersupon whichtheresultsgreatlydepend.SuchalgorithmsforWSDincludeNeuralNetworks(VeronisandIde, 1990),SimulatedAnnealing(SA)(Cowieetal.,1992),GeneticAlgorithms(GA)(Gelbukhetal.,2003) andAntColonyAlgorithms(ACA)(SchwabandGuillaume,2011),andmanyothermachinelearning methods. Insuchapproaches,thevaluesoftheparametersareparamount.Theyarewhatmakethealgorithms workforspecificapplicationssuchasWSD.Yet,theauthorsoftheaforementionedarticlesgivelittle insightastohowtheparametersaredetermined.Theyonlyprovidethe“best”values.Althoughatrial anderrorapproachisagoodstartwhendevisinganalgorithmortailoringittoanewapplication,itmakes theadaptationandthescalabilityofsuchsystemsanissue.Naturally,forwellknownandunderstood algorithmssuchasSimulatedAnnealing,therearesometheoreticalcriteriatoselecttheparameters. However,theprocessremainstoberepeatedforeverynewsettingandcanbetedioustoperform. Furthermore,evenwithfewparameters,evaluatingallpossibleparameterconfigurationsisaprohibitively longprocess.Forexample,inthecaseofastochasticalgorithmwith10parametersthathave20possible valueseach,assumingthealgorithmneedstoberun100times(duetoitsstochasticnature)andthat oneexecutiontakesonaverage30seconds,thetimenecessarytoevaluateallparametercombinationsis 2012 100 30=3 1015swhichisalmost1billionyears!Thus,theadvantagesofconsideringautomated orad·aptiv×eheurist×icapproachestotheestimationofparametersareclear. Inthecaseofadeterministicalgorithm,manycombinatorialoptimisationmethodscanbeusedwithlittle effort.However,duetothecomplexityofWSD,manyapproachesarestatisticalorprobabilisticinnature andtheuseclassicalcombinatorialoptimizationheuristicsbecomesimpossible.Therefore,weadaptthe classicalsimulatedannealingalgorithmtoworkforanuncertainobjectivefunctionbasedontheideasof (PaintonandDiwekar,1995),byusingstandardnon-parametricstatisticalsignificancetests. First,wepresentthetoolsandthemetricsfortheevaluationofWSD,followedbythedescriptionofthe AntColonyAlgorithmconsideredinthisarticle.Then,webrieflysurveyotherapproachesforparameter estimationunderuncertaintyandexpresstheproblemformally.Subsequently,wepresentthegeneral formulationorsimulatedannealinganddifferentaspectspertainingtoitandthenpresentitsadaptation touncertainobjectivefunction.Wethenpresentandanalyseourexperimentalprotocolandtheresultsof theexperiments.Finally,weconcludeontheresultsanddrawsomeperspectivesforfutureresearch. 2 EvaluationofWSDSystems TheuseofPrecisionandRecallconstitutesthestandardevaluationmetricsforWSDsystems. Precisionrepresentsthenumberofcorrectlydisambiguatedwordsamongalltheattemptedwords,while Recallrepresentsthenumberofcorrectlydisambiguatedwordsamongalltheambiguouswordsinthe document.ItiscustomarytocomputetheF1scorebetweenPrecisionandRecallas2PR,inordertohave asingleevaluationmetricthatequallycapturestheinformationconveyedbybothPPr·+e·RcisionandRecall. InthisworkinparticularweusetheSemeval2007Task7allwordscoarsegrainedcorpusforthe evaluationofthequalityofthedisambiguation.Theall-wordstaskprovidesacorpusoftexts,wherein eachtextallambiguouswords(T)needtobedisambiguatedbytaggingthemwithWordnet2.1sense 110 keys. Analgorithmmustcorrectlytagasmanyambiguouswordsaspossible(TP)dependingonthe context.Asenseisconsideredcorrectifthesensekeyassignedtoitmatchesthesenseprovidedinthe goldstandardoranyofitssubsumedsenses(hencecoarse-grained).Ifwewritethenumberofincorrectly assignedsensesas(TN),then,forthisparticulartask,precisioncanbeexpressedasP= TP andrecall asR=TP TP+TN T Thechoiceoftheallwordstaskinparticularisbasedonthefunctioningofthealgorithmthatismore adaptedtoagloballycoherenttext,ratherthanindependentsentencesorwords.Assuch,themetrics availablefortheevaluationarePrecision,RecallandF-measure,whichleadstosomeimportantrestrictions onthemethodsorcriteriaavailableforparameterestimationaswillbedetailedinthesubsequentsections. 3 Context: anantcolonyalgorithmforWSD Theantcolonyalgorithmpresentedby(SchwabandGuillaume,2011)and(Schwabetal.,2012)thatour workfocuseson,offerssomeinterestingpropertiesregardingthequalityofthesolutions,whichareonpar withstate-of-the-artknowledgerichWSDsystems,combinedwithafastexecutiontimecomparedtothe lattersystems.Furthermore,itisalsowellsuitedtoworkingonfulltextsatatimewithafullannotation coverage. Moreover,itsstochasticnatureleadstothegenerationofsolutionswithvariableprecisionscores,which isusuallynotaverysoughtafterpropertyinalgorithms. However,thisvariabilityenablestheuseof votingstrategiesthatleadtoresultsapproachingsupervisedWSDsystemsandovercometheelusive(for unsupervisedandknowledge-basedsystems)FirstSenseBaseline.Duetothequickexecutiontime(inthe 60sto65srange),votingstrategiescandirectlybeappliedwhilestillleadingtocumulativeexecution timesforseveralexecutionsthatarelowertothatofothersystems. However, thosequalitiesarealsoamajordisadvantagewhenitcomestoselectingandtuningthe parametersofthealgorithm.Notonlyarethererelativelymany,buttheycanonlybedeterminedmanually tosomeextentthroughpainstakingtrialanderrormodifications, withnoguaranteesofoptimality. Notwithstandingwiththefactthatthestochasticnatureofthealgorithmmakesitratherimpracticalto determinewhethertheimprovementresultingfromagivenparameterchangeissimplyduetorandom variationsofthesolutiondistributionorreallytosignificantvariations. 3.1 Parametermodel Letusnowreviewthevariousparametersofthesystem,theirtypicalvaluerangesandmeaningful increments. TheparametersaresummarizedinTable2. Therearesevenparametersintotal,fiveof whicharediscreteandrepresentedbyintegersandtwocontinuousparametersrepresentedaspositive realnumbersbetweenzeroandone. Giventhenumberofparameters,evenwithsomeknowledgeabouthowthealgorithmworks,onecanat bestchosesensibleparametervaluerangesinordertolimitthecombinatorialexplosion. Initially,beforetheestimationmethodproposedinthispaper,wedeterminedthevaluesofsomeparameters bymakingiterativeandindependentchanges.Ofcoursesuchanapproachislimitedduetothefactthatit isaheuristicbasedontheassumptionthattheparametersofthesystemareindependent.ForACAitisin facttheopposite,asitiswithothermethods. Thevaluesobtainedthroughthisprocesswere ω=10,Ea=1,Emax =8,E0=10,δv =0.9,δ= 0.9,LV=90andyieldedresultsaround75%ontheSemeval2007Task7corpus(SchwabandGuillaume, 2011). Furthermore,giventhenumberofparametersandtheirvalueranges,anexhaustiveenumerationis intractable.Ifwecalculatethenumberofcombinations(assuming0.01stepsforcontinuousvariables), weobtain60 60 100 55 25 35 100=17325 108combinations. Knowingthatduetothe stochasticnatu×rewe×mayn×eedt×omak×eat×least50or100e·xecutions,wegetto17325 109combinations. · 111 Notation Description Value Exploration granularity E Energytakenbyanantwhenitarrivesona 1–60 1 a node E Maximumquantityofenergyanantcancarry 1–60 1 max δ Evaporationrateofthepheromonebetween 0.0–1.0 0.01 φ twocycles E Initialquantityofenergyoneachnode 5–60 1 0 ω Antlife-span 5–30(cycles) 1 L Odourvectorlength 20–200 5 V δ Ratioofodourvectorcomponents(words)de- 0.0–1.0 0.01 V positedbyanantwhenitarrivesonanode Table2:ParametersoftheAntColonyAlgorithmandtheirtypicalvalue-ranges Assumingleapsincomputationalpowerorheavyparallelismthatwouldmakethealgorithmtakeonly onesecondtorun,thesearchfortheoptimalparameterswouldstilltake549372years. Forthisreason,weareinterestedinfindingandautomatedwayofestimatingtheoptimalparameters.Of course,whendealingwithlinguisticdata,thereisnosuchthingasoptimal.Byoptimal,wemerelymeana setofparametersthatyieldresultswithF-scoresashighascanbeachieved. 4 Relatedwork Inthecontextofthisarticle,whenconsideringWSDalgorithmevaluatedasdescribedinSection2, theoutputoftheevaluationofthealgorithmoutputissimplyaF-scorepercentage. Inordertoapply classicalestimationtheoryapproaches(Kay,1993),wewouldneedtoeitherfindamodeloftheposterior PDF,ortoempiricallyestimateit.However,giventhesizeofthesearchspaceandthecostofrunning thealgorithm,thiswouldbeverymuchequivalenttomakinganexhaustivesearch. Furthermore,the relationshipbetweenthevaluesoftheparametersandtheresultingscoresarenon-linear.However,there aresomemodelsthatattempttoprovideaprobabilitydistributionforprecision,recall,andF-measurefor informationretrievalthatcouldpotentiallybeappliedtoWSD(GoutteandGaussier,2005). Infact,itismuchsimplertotreattheestimationasacombinatorialoptimisationproblem,byattempting tosimplymaximizethef-scorevalue. Inthiscontext,giventhatwewanttheabilitytohandlemany parametersthatcantakemanydiscretevalues,weneedtoconsiderheuristicsforanefficientestimation ofcomplexnon-linearmultivariatefunctions. ApopularchoicesforparameterestimationareGeneticAlgorithms(GA)orSimulatedannealing(SA) amongothers. Forinstance,GAshavebeenwidelyapplied,eitherinageneralparameterestimation context,suchasin(SharmanandMcClurkin,1989),(YaoandSethares,1994)or(Paterakisetal.,1998); orforspecificapplicationdomainssuchasrobotics(Bolhasani,2004),appliedstatistics(Panetal.,1995), meteorology(Leeetal.,2006),chemistry(Katareetal.,2004)andmanyothers. ForWSD,(Daelemansetal.,2003)and(Decadtetal.,2004)havealsoproposedtouseGAsforjoint parameterestimationandfeatureselection.Althoughjointoptimisationisveryinteresting,itisapplication andtaskspecific,andwouldbedifficulttoimplementinaprobabilisticsetting. Allthetechniquesmentionedabovearemeanttooptimiseadeterministicobjectivefunction.Whenthe objectivefunctionitselfisuncertainornoisy,insteadofasinglevalue,onehasasetofobservationsamples withdifferentvaluesthathavetobedealtwith.Themainissuesarethequestionofhowtoknowwhat valuetoconsiderforthemaximisationandhowtotakethevariabilityintoaccounttoavoideffectsdueto randomchance. Onemodelofstochasticsimulatedannealinghasbeenproposedin(PaintonandDiwekar,1995)andthen furtherextendedin(KimandDiwekar,2002),wheretheevaluationfunctionissampledseveraltimes,and 112 theexpectedvalueisincorporatedintoamodifiedobjectivefunction,withthevarianceactingapenalty term. 5 Problemformalization Beforedescribingtheparameterestimationalgorithm,itisimportanttoformulateinamoreformaland genericwaytheparametermodelofanyWSDalgorithmwitharandomizedoutput. Letθ~=[θ1,θ2,...,θi,...θn] Θ,betheparametermodelofaWSDalgorithm,whereΘ=Θ1 Θ2 ... Θi ... ΘnandeachΘ∈iisafinitediscretesetofintegersoraboundedrealrange. × × × × × Forexample,inthecaseoftheparametermodeloftheantcolonyalgorithm,thegeneralformofθ wouldbeθ= ω, Ea, Emax, E0,δv,δ, LV .Theinitialsetofparameterswouldbeaninstancewrittenas θI= 10,1,8,1{0,0.9,0.9,90 . } { } WecanrepresentadeterministicWSDsystemwithafunctionWSD ,thatforagivencorpusCanda C parametervectorθ~1returnsaF scorevaluebetween0and1: − WSDC:Θ 0 x R 1 (1) → ≤ ∈ ≤ θ F (2) score 7→ Thus,theproblemoffindingthebestθ,θ canbeformalizedasfollows: ∗ θ∗=argmaxWSDC(θ) (3) θ Θ ∈ Inotherwords,ifwedefineasequenceθnthatenumeratespossibleparametercombinationsinΘ,the problemcanbeexpressedas: θn∗= θθn oifthWeSrwDiCs(eθn)>WSDC(θn−1) (4) ( n 1 − However,whentheWSDsystemisprobabilistic,thereisn’tauniqueF-scoregivenforaθ ,butevery n evaluationofWSD maypotentiallyreturnadifferentF-scorevalue.Thus,WSD followsanunknown C C probabilitydistributionXthatdependsonθn:WSDC(θn) X(θn). ∼ Asensibleapproachinthiscasewouldbetofollowthemodel(Fig.1b)proposedby(PaintonandDiwekar, 1995),andtoevaluatetheobjectivefunctionWSD severaltimesandthenconsidertheexpectedvalueas C anestimatoroftheresultingdistribution.However,evenifthemeansaredifferent,thereisnoguarantee thedifferenceisnotduetorandomchance. Themoregeneralwaytodealwiththisuncertaintyaboutthestatisticalsignificanceofthedifference ofexpectedvaluesistosimplyapplyastandardstatisticaltestinordertodeterminethesignificance levelanddecidewhetherornottogoaheadwiththecomparison.Anothersimplerapproach,theone retainedinfact,istointegrateintothescoringfunctionapenaltyfactorthatdirectlytakesintoaccount thevariabilityofthedistributions.ThisproblemwillspecificallybeaddressedinmoredetailinSection 6.3. 6 SimulatedAnnealing 6.1 GeneralPresentation Simulatedannealingisastochasticcombinatorialoptimisationtechniqueoriginallyproposedby(Kirk- patricketal.,1983). Givenafunction f(θ)thattakesavectorofdiscreteparametersθ,Simulated 1Fromnowon,thevectorarrowwillbeomittedandθ~willbewrittenasθ. 113 (a)Classicaloptimisationmodel (b)Stochasticoptimisationmodel Figure1:Classicaloptimisationmodelversusstochasticoptimisationmodel Annealingaimsatfindingthecombinationofparameterthatmaximisesorminimisesthatfunction. (Kirkpatricketal.,1983)drawaparallelbetweenstatisticalmechanicsandcombinatorialoptimisationand applyideasfromtheMetropolis-Hastingsalgorithm(Metropolisetal.,1953)(simulationofthebehaviour ofmanybodysystems)tocombinatorialoptimisation. TheprincipleofSimulatedannealingistofirststartfromaninitialconfiguration(combination)ofthe functionparametersθ aswellasaninitialtemperatureT.Theinitialθ,isoftenchosenrandomly,but 0 0 0 puttinganalreadygoodsolutioncanacceleratethesearch. Then,thealgorithmiterativelyappliesarandomchangeoperatorRtothecurrentconfigurationinorder toobtainamodifiedconfigurationθ0=R(θ). Thevalueofthemodifiedconfigurationf(θ0)iscomparedtothevalueofthecurrentconfigurationf(θ), suchthat∆f =f(θ0) f(θ).AnacceptancefunctionP(∆f)isthenusedtodeterminewhethertokeep thecurrentconfigurati−onortoacceptthemodifiedconfiguration. IntheoriginalSAalgorithm,theacceptancefunctionisexpressedusingtheMetropoliscriterionandthe BoltzmanndistributionasshowninEquation5. 1 if∆f <0 P(∆f)= exp ∆f otherwise (5) ( −T (cid:128) (cid:138) Withgradientdescentorhillclimbingsearch,onlyconfigurationthathaveahigherscorearekept,while inferiorconfigurationsaresystematicallyrejected.Theproblemswiththeseapproachesliesinthefact thatcomplexfunctionsoftenhavemanylocalminimaandmaxima,andthealgorithmwilllikelyconverge onsuchaminimumormaximum. SimulatedAnnealing,ontheotherhand,hasaprobabilityofacceptinginferiorconfigurationsasa localminima/maximaescapestrategy.Theinitialtemperatureischosensothatatthebeginningofthe exploration,inferiorconfigurationsarealmostalwayskept.Aftereachiterationofthesimulation,the temperaturedecreasesandwithittheprobabilitytokeepinferiorconfiguration.Assuch,thealgorithm startswithawideandcoarseexplorationofthesearchspaceandthengraduallyconvergesonafine grainedexplorationofaspecificareaaroundaminimumormaximum(hopefullyclosetotheglobal minimumormaximum). Theconvergencecriterionforthealgorithmcaneitherbewhenthetemperaturereachesathresholdcalled 114 freezingtemperatureorwhenthesystemisdeemedtostabilize(usuallyanumberofcyclesduringwhich improvementstotheobjectivefunctionaremarginalornon-existent). 6.2 Coolingscheduleandcandidategeneration InSA,thewaythetemperaturedecreasesiskeytotheperformanceofthealgorithm. Alogarithmic coolingschedule(Tn=lnT(0n))istheoreticallysufficientinordertoguaranteeconvergence(Ingber,1996), however,itistooslowformanyproblems.AmorecommoncoolingscheduleisgeometricTn=T0 (α)n, wherealphaisthecoolingfactor.However,ontheonehandtheslopeofthecoolingcurvewillbe·very steepevenforαclosetooneandontheotherhand,thecurveitselfisconvex.Thismeansthattheslope ofthecurvewillgetsteepveryfast,thusleadingtoafastconvergenceatthecostofoptimality. Forthisreason,(Ingber,1996)proposesinhisAdaptiveSimulatedAnnealingalgorithmtouseaninverse exponentialcoolingschedulethatwilldecreasequicklyatthestartandthenslowdown.Suchaschedule correspondsmuchbettertocombinatorialoptimisationalgorithms,wherewewanttostartwithabroad search(acceptingmanyinferiorconfigurations)andthenfocusonaspecificareaofthesearchspace. (Ingber,1996)proposedtheexpressioninequations6and7,whichweadoptedforourimplementation. Here,mandnaretwoscalingfactorsthatcanbeusedtotunethescheduleforspecificproblems. 1 Tk=T0·exp −ci·k|θ| (6) (cid:16) n(cid:17) ci=m·exp −D (7) (cid:18) (cid:19) Furthermore,thetraditionalSA,generatesanewcandidateconfigurationbyuniformlyselectinganindex ofthevectorandbyapplyingarandomuniformchangeonthevalue,overthefullrangeofpossible valuesforthatindex.(Ingber,1996)arguesthatoncethesystemiscolder,thereisnopointtoexplorethe fullrange.Indeed,exploringanincreasinglysmallerneighbourhoodasthetemperaturelowersandthe systemsconvergesonaspecificareaofthesearchspace,allowsanincreaseincomputationalefficiency whilehavingnonegativeeffectsontheendresult. Equation8presentstheprobabilitydistributionusedforthecandidategenerationforeachparameter, whereui U(0,1)isauniformrandomnumberbetween0and1. ∈ 1 1 2u 1 y=sign(cid:18)u−2(cid:19)Tk(cid:150)(cid:18)1+Tk(cid:19)| −|−1(cid:153) (8) Fromthere,thenewvalueofaparameterθ(i)ofθatiterationkcanbedeterminedwiththeexpressionin Equation9forarealparameterandEquation10foranintegerparameter. θk(+i)1=θk(i)+y(θm(ia)x−θm(ii)n) (9) θk(+i)1=θk(i)+by(θm(ia)x−θm(ii)n)c (10) 6.3 Handlinguncertainty AsmentionedinSection5,whendealingwithanuncertainobjectivefunction,oneneedstoaddanextra loopintheprocess,inordertogeneratesamplesfromtheevaluationoftheobjectivefunctionf andthen tousetheexpectedvalueasaglobalobjectivefunction. TFfuohnrecnat,igoΦinvEethcnaaθnt,ubwseeesucstaehdnedbexuirpieledccttlayeldiinsvtsatoelfuaesdaomoffpthfleeasssLatsmh(eθpl)gel=odbiSsatlrNi=iob0bu{jtefic(otθniv):}eΦafEun(ndθc)tthi=oennx.θco,wnshidereeraxθm=odN1ifi·edxoib∈Ljes(cθt)ivxei. P 115 6.4 AdaptivePenaltyterm Inordertodealwiththevariabilityofthedistributions,(PaintonandDiwekar,1995)proposetoadd wtoetighheteexdpbryesasiqounaonftiΦtyE,thaaptedneapletyndfascotnorthbeasteedmopnertahteursec.aMledorsetasnpdecairfidcadlelyv,iathtieoinreσxθpr=esqsioPnxif∈oLrs(θΦN)(Exi−isx:θ)2, 2 σ ΦE(θ)=xθ+b(T)· p·Nθ (11) b b(T)=kT0 (12) Where b0isasmallconstant,kisaconstantthatrepresentstherateofincreaseof b(T)andT the temperatureofthesystem. Giventhatatthebeginning,thetemperatureishigh,thesearchiswideintothesearchspace. Thus, acceptingnon-significantchangeshaslittleadverseeffect,althoughitleadstolessevaluationofthe objectivefunction(whichiscostlytocomputeinprinciple). 6.5 Significancetesting Inthisarticle,wetakeaslightlydifferentapproach,byintegratingastandardstatisticaltestdirectlyinthe metropoliscriterion,andbyusingΦEasonlytheexpectedvalue.Notonlyaresuchtestswidelyavailable inexistinglibraries,additionally,weavoidthehassleofaddingtwomoreconstantstotheestimation algorithm. Dependingonthepropertiesofthedistribution,differenttestsmaybeapplied.Themostwidelyusedtest forsignificanceistheunpairedtwo-samplestudentt-test.Howeverthreeimportantpre-requisitesmustbe metbeforethetestcanbeapplied. Thedistributionsofsamplesinthegroupsmustbenormal,thesamplesinthegroupsmustbeindependent, andthevariancesofthegroupsmustbeequal.Inthecaseofsimulatedannealingunderuncertainty,there isnoguaranteethatthedistributionsallkeeptheseproperties.Thus,inordertousethet-test(Pressetal., 1988),otwouldberequiredtotestthehypothesesforeverynewre-samplingoftheobjectivefunction. ThiscouldbeachievedbyapplyingtheShapiro–Wilk(ShapiroandWilk,1965)testtotestthenormality assumptionandLevene’stest(Levene,1960)fortheequalityofvariances. Incasetheequalityofvariancescriterionisnotmet,thereisanalternative,Welch’st-test(Welch,1947) thatdoesnotmakethatassumption.However,ifthedistributionisnotnormal,thereisnootherchoice buttoapplyanon-parametrictest. A much simpler alternative, is to directly apply a non-parametric significance test, such as the Mann–WhitneyUunpairedtest, astherearenoassumptionaboutthedistributionofthesamples. Theonlyrequirementisthatthevaluesareordinalandnotregularlypaced. 6.5.1 ModifiedMetropoliscriterion Sincetheobjectivefunctionreturnsanumericalvalue,theMann–WhitneyUcanbeapplieddirectlyinthe metropoliscriterionassuch: P(∆f)=(1e−∆TPhiE oifth∆ePrwhiiEse<0andpθ<α (13) Thetestisbasedontheformulationofanull-hypothesisthatstatesthatthedistributionofthetwogroups ofsamplesareequal.Here,p isthep-value(probabilityoftruepositives)resultingfromthetestandαa θ 116 threshold.Ifthep-valueisbelowthethreshold,thenwerejectthenull-hypothesisandconcludethatthe distributionsmustbedifferent. Forexamplewithα=0.05thenull-hypothesiswillberejectediftheprobabilityofhavingatypeIerroris morethat95%:inotherwords,(1 p)>(1 α). − − TheMann–WhitneyU(MannandWhitney,1947)isbasedonthecomparisonoftheranksofthesamples withineachgroup.Thefirststeptowardcalculatingthep-valueistogroupallthesampletogether,tosort theminascendingorder,andtoassignaranktoeachvalue.Fromtherespectiveranksofthesamples,can bsaemcpomlespufoterdboRtθh,gisrothuepssuamreoaflwraanykssefqouraLlsto(θN).andRθ0 thesumofranksforLs(θ0).Here,thenumberof 6.5.2 Computingthep-value Fromthesumofranks, theUstatisticcanbecomputedas U = min Rθ−N(N2−1);Rθ0−N(N2−1) = min Rθ;Rθ~0 −N(N2−1). (cid:128) (cid:138) Fora(cid:16)sufficien(cid:17)tlylargeN,theUstatisticisanapproximationofthenormaldistribution,andafterapplying astandardizationbycalculatingthezstatistic,thep-valuecanberetrievedfromtheprobabilitydensity functionofthenormaldistribution: U U N2 N2(2N+1) 1 1 z U 2 z= σ−U U= 2 σU=r 12 pθ=N(z,U,σU)=p2π σUe−2(cid:129)σ−U(cid:139) (14) · Despitetheapparentcomplexityoftheprocessofcalculatingandtestingthep-value,theMann–Whitney Utestisratherstandardandavailableinmanylibrariesforvariousprogramminglanguages,including Javathatweusedforourimplementation. 7 Experimentalprotocol Theprincipleusedfortheevaluationofthevariantofsimulatedannealingconsideredfortheestimation ofparametersinanuncertainsetting,wastoconsidertheSemeval2007Task7annotatedcorpus.Wesplit thecorpusintoatrainingandatestsetinordertoevaluatetheparametersearchwiththeAntColony AlgorithmforWSDoriginallyproposedin(SchwabandGuillaume,2011)andthenfurtherimprovedupon in(Schwabetal.,2012).Furthermore,wewanttodetermineexactlyhowmuchdatawasnecessaryto obtainagoodsetofparametervalues. Normally,inordertoobtainareliabletrainingandthusparameters,itwouldbenecessarytohavea somewhatlargertestsetandtosplititintoseveralpartsbyrandomlyaggregatingwordstoperform ak foldcrossvalidation. However,theAntColonyalgorithmismeanttoworkonafulltextasit expl−oitsitsstructure.Notably,theorderofthewordsandthecontinuityofsentencesareveryimportant. Makingcross-validationsetsrandomlywouldthusonlycreateaverynoisytrainingdata. Evenjust pickingsentencesatrandomstillcompromisestheglobalcoherenceofthesolutionsfound,andthusthe parameters. Thisiswhyweonlyconsideredatrainingonsuccessivelylargerportionsofourtrainingcorpus.More specifically,theexperimentswerecarriedoutwith6,12,18,24,30and36sentencesinorderfromthefirst sentencestothefulltextinmultiplesof6. Furthermore,itisapparentthatthenatureofthetrainingtextitselfhasabigimportanceontheresulting parametersfound.Itisnecessarytouseatextthatisasgeneralaspossibleintermsofthesensesituses andofitsthemeaswellaslexicallyvaried. Eventhough,k foldcrossvalidationcannotbeapplied directlyinourexperimentalsetting,wehavedecidedtoperfo−rmtheexperimentbytaking,inturn,each textasatrainingcorpus,withtheremainderasatestcorpus.Thus,wemayalsostudytheeffectofthe textitselfanditscharacteristics(typeoftext,averagepolysemy,etc). 117 Theimplementationofthisuncertainsimulatedannealing,runstheantcolonyalgorithmandpassesthe resultsthroughtothescorertoobtaintheF-scorevalues.Forthestochasticloop,foreverynewcandidate configuration,theantcolonyalgorithmissystematicallyrun50timesinordertoguaranteeasufficient levelofstatisticalsignificance.Thesearchwaslefttoconvergeforeachsuccessivenumberofsentences andthen,eachresultingparametervaluesweretestedover100executionsoftheantcolonyalgorithmon thetestcorpusinordertoascertainthequalityoftheparameters. Theresultsforeachnumberofsentencesareevaluatedwithrelationtobothclassicalleskbaselineand thebaselineobtainedwiththeparametersbeforetheestimation(BL).Weusedstandardstatisticalteststo guaranteethestatisticalsignificanceofthecomparisons. 8 ExperimentsandResults 8.1 Initialparameterforthesimulatedannealingestimator Beforestartingtheexperiment,itisimportanttoselecttheparametersofthesimulatedannealing-based algorithmweareusing.Intotalthereare4parameters.k thenumberofcycleswithnoacceptednew stb configurationsbeforethesystemisconsideredtohaveconverged.T theinitialtemperature,andmand 0 n,theparametersofthecoolingschedule.Weareawarethatthemanualestimationoftheparametersfor thesimulatedannealingalgorithmcontradictsthegeneralorientationofthispaper,howeveroneofthe mainreasonsimulatedannealingwaschosenisbecauseitsbehaviouriswellknownandtheparameters canbesetheuristicallyquiteeffectively. Indeedavastbodyofworkcoversthealgorithmandcanbe usedtoguidetheparameterestimation.ThemainideaistouseSAonalgorithmswithmorecomplex and/ornotsowellunderstoodparametermodels. Infact,thisprocesscouldbeconsiderasaformof bootstrappingofsorts. Forthenumberofcyclesbeforeconvergence,20isagoodcompromisebetweenefficiencyandoptimality. Asfornandm,wewantedtohaveacoolingschedulewiththehighestpossibleprobabilityforsubsequent iteration,whileapproaching0after100iterations.Tothiseffect,aftervariousexperiments,weselected thevaluesn= 3andm=1. − 1 0.9 0.9 PEOBABILITY OF ACCEPTANCE00000000........12345678 000000000........12345678PROBABILITY OF ACCEPTANCE 0 159131721252933374145495357616569737781858993 0 0.02 0.04 0.06 0.08 0.1 0.12 CYCLES INITIAL TEMPERATURE (a)Choiceoftheinitialtemperature (b)Coolingscheduleacceptanceprobabilities Figure2:Classicaloptimisationmodelversusstochasticoptimisationmodel However,themostimportantparameteristheinitialtemperature,asitneedstobesettoacertainlevel ofprobabilityofacceptance. Ofcourse,sincetheprobabilityofacceptancedependsofthedifference betweensuccessivescores,thefirststeptowardselectingtheinitialtemperatureistomeasuretheaverage differencebetweensuccessivescores.Therefore,itisnecessarytorunthealgorithmwhileonlysampling parameterconfiguration,withoutmakinganydecisionsaboutacceptanceorconvergence.Thus,weran thealgorithmthiswayandobtainedanaverageF-scoredeltaof0.57%withastandarddeviationof 0.0047. Itisdescribedintheliteraturethatagoodvaluefortheinitialacceptanceprobabilityis0.8. Thus,to 118

Description:

or adaptive heuristic approaches to the estimation of parameters are clear. In the case of a . A popular choices for parameter estimation are Genetic Algorithms (GA) or Simulated annealing (SA) We can represent a deterministic WSD system with a function WSDC , that for a given corpus C and a.

Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF

16 Pages·2013·0.37 MB·English

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF Free - Full Version

by Unknow| 2013| 16 pages| 0.37| English

Download Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based

Detailed Information

Author:	Unknown
Publication Year:	2013
Pages:	16
Language:	English
File Size:	0.37
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF?

Yes, on https://PDFdrive.to you can download Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based on my mobile device?

After downloading Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based?

Yes, this is the complete PDF version of Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.