ebook img

Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based PDF

16 Pages·2013·0.37 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based

Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based probabilistic WSD algorithm Andon TCHECHMEDJIEV1 Didier SCHWAB1 J´eroˆme GOULIAN1 GillesSE´RASSET1 (1)Univ.Grenoble-Alpes,LIG-GETALP,France andon.tchechmedjiev, didier.schwab, jerome.goulian, gilles.serasset @imag.fr { } ABSTRACT Inthisarticleweproposeamethodbasedonsimulatedannealingfortheparameterestimationof probabilisticalgorithms,wherethesolutionprovidedbythealgorithmcanvaryfromexecution to execution. Such algorithms are often very interesting to solve complex combinatorial problems,yettheyinvolvemanyparametersthatcanbedifficulttoestimatemanuallydueto theirrandomizedoutput. Weappliedandevaluatedamethodfortheparameterestimation ofsuchalgorithmsandapplieditforanAntColonyAlgorithmforWSD.Fortheevaluation, weusedtheSemeval2007Task7corpus.Wesplitthecorpusandtookinturnonetextasa trainingcorpusandthefourremainingtextsasatestcorpus.Wetunedtheparameterswithan increasingnumberofsentencesfromthetrainingtextinordertoestimatethequantityofdata necessarytoobtainanefficientandgeneralsetofparameters.Wefoundthattheresultsgreatly dependonthenatureofthetext,evenaverysmallamountoftrainingsentencescanleadto goodresultsifthetexthastherightproperties. RÉSUMÉ (French) EstimationdeparamètresàbasedeRecuitSimulésousincertitudeappliquéeàunalgo- rithmesàcoloniesdefourmisprobabiliste. NousproposonsuneméthodebaséesurunRecuitSimulépourl’estimationdeparamètrespour desalgorithmesprobabilistesoùlessolutionsgénéréesvarient.Cesalgorithmessontsouvent trèsintéressantpourlarésolutiondeproblèmescombinatoirescomplexes,maisilsrequièrent denombreuxparamètrespouvantêtredifficilesàestimermanuellementàcausedelanature aléatoiredessolutions.NousavonsappliquésPlusspécifiquement,nousappliquonsetévaluons cetteméthodeàpourestimerlesparamètresdetelsalgorimesetl’appliquonsàunAlgorithme àColoniesdeFourmispourladésambiguïsationlexicale.Pourl’évaluation,nousavonsutilisé lecorpusdeSemeval2007Tâche7.Nousavonsrepectivementséparéuntextecommecorpus d’entraînementetlesquatreautrescommecorpusdetest.Nousestimonslesparamètrespour unnombrecroissantdephrasespourdéterminercombiendedonnéessontnécessairespour obtenirunensembledevaleursdeparamètresgénéralesetefficaces.Nousconcluonsquela qualitédesrésultatsdépendsdelanaturedestextes.Mêmedespetitesquantitésdedephrases peuventsuffiràobtenirdebonsrésultats,dumomentqueletexteàlesbonnespropriètés. KEYWORDS: Word Sense Disambiguation, Parameter Estimation, Simulated Annealing, Uncertainty, StochasticAlgorithms. KEYWORDSINFR:DésabiguïsationLexicale,EstimationdeParamètres,RecuitSimulé,Incertitude,Algo- rithmesStocastiques. ProceedingsoftheFirstInternationalWorkshoponOptimizationTechniquesforHumanLanguageTechnology,pages109–124, COLING2012,Mumbai,December2012. 109 1 Introduction WordSenseDisambiguation(WSD)isaverydifficultyetcentraltaskinNaturalLanguageProcessing (NLP),thatpertainstothelabellingofthewordsofatextwiththesensesthatmostcloselymatchthe context. NumerousapproachesexisttotackleWSD,yetmanyofthesemethodshaveincommona sizeablecomplexity,reflectedbyanumberofparametersthatneedtobetunedinordertoobtainthebest performance.Dependingonthenumberofparameters,itcanbeverydifficultforahumantodirectly estimatetheoptimalparameters.Eventhen,itremainsaquestionofguessworkwithnoguaranteeof success. Thisissueisallthemoresalientwithalgorithmsandmodelsthattypicallyinvolvemanyparametersupon whichtheresultsgreatlydepend.SuchalgorithmsforWSDincludeNeuralNetworks(VeronisandIde, 1990),SimulatedAnnealing(SA)(Cowieetal.,1992),GeneticAlgorithms(GA)(Gelbukhetal.,2003) andAntColonyAlgorithms(ACA)(SchwabandGuillaume,2011),andmanyothermachinelearning methods. Insuchapproaches,thevaluesoftheparametersareparamount.Theyarewhatmakethealgorithms workforspecificapplicationssuchasWSD.Yet,theauthorsoftheaforementionedarticlesgivelittle insightastohowtheparametersaredetermined.Theyonlyprovidethe“best”values.Althoughatrial anderrorapproachisagoodstartwhendevisinganalgorithmortailoringittoanewapplication,itmakes theadaptationandthescalabilityofsuchsystemsanissue.Naturally,forwellknownandunderstood algorithmssuchasSimulatedAnnealing,therearesometheoreticalcriteriatoselecttheparameters. However,theprocessremainstoberepeatedforeverynewsettingandcanbetedioustoperform. Furthermore,evenwithfewparameters,evaluatingallpossibleparameterconfigurationsisaprohibitively longprocess.Forexample,inthecaseofastochasticalgorithmwith10parametersthathave20possible valueseach,assumingthealgorithmneedstoberun100times(duetoitsstochasticnature)andthat oneexecutiontakesonaverage30seconds,thetimenecessarytoevaluateallparametercombinationsis 2012 100 30=3 1015swhichisalmost1billionyears!Thus,theadvantagesofconsideringautomated orad·aptiv×eheurist×icapproachestotheestimationofparametersareclear. Inthecaseofadeterministicalgorithm,manycombinatorialoptimisationmethodscanbeusedwithlittle effort.However,duetothecomplexityofWSD,manyapproachesarestatisticalorprobabilisticinnature andtheuseclassicalcombinatorialoptimizationheuristicsbecomesimpossible.Therefore,weadaptthe classicalsimulatedannealingalgorithmtoworkforanuncertainobjectivefunctionbasedontheideasof (PaintonandDiwekar,1995),byusingstandardnon-parametricstatisticalsignificancetests. First,wepresentthetoolsandthemetricsfortheevaluationofWSD,followedbythedescriptionofthe AntColonyAlgorithmconsideredinthisarticle.Then,webrieflysurveyotherapproachesforparameter estimationunderuncertaintyandexpresstheproblemformally.Subsequently,wepresentthegeneral formulationorsimulatedannealinganddifferentaspectspertainingtoitandthenpresentitsadaptation touncertainobjectivefunction.Wethenpresentandanalyseourexperimentalprotocolandtheresultsof theexperiments.Finally,weconcludeontheresultsanddrawsomeperspectivesforfutureresearch. 2 EvaluationofWSDSystems TheuseofPrecisionandRecallconstitutesthestandardevaluationmetricsforWSDsystems. Precisionrepresentsthenumberofcorrectlydisambiguatedwordsamongalltheattemptedwords,while Recallrepresentsthenumberofcorrectlydisambiguatedwordsamongalltheambiguouswordsinthe document.ItiscustomarytocomputetheF1scorebetweenPrecisionandRecallas2PR,inordertohave asingleevaluationmetricthatequallycapturestheinformationconveyedbybothPPr·+e·RcisionandRecall. InthisworkinparticularweusetheSemeval2007Task7allwordscoarsegrainedcorpusforthe evaluationofthequalityofthedisambiguation.Theall-wordstaskprovidesacorpusoftexts,wherein eachtextallambiguouswords(T)needtobedisambiguatedbytaggingthemwithWordnet2.1sense 110 keys. Analgorithmmustcorrectlytagasmanyambiguouswordsaspossible(TP)dependingonthe context.Asenseisconsideredcorrectifthesensekeyassignedtoitmatchesthesenseprovidedinthe goldstandardoranyofitssubsumedsenses(hencecoarse-grained).Ifwewritethenumberofincorrectly assignedsensesas(TN),then,forthisparticulartask,precisioncanbeexpressedasP= TP andrecall asR=TP TP+TN T Thechoiceoftheallwordstaskinparticularisbasedonthefunctioningofthealgorithmthatismore adaptedtoagloballycoherenttext,ratherthanindependentsentencesorwords.Assuch,themetrics availablefortheevaluationarePrecision,RecallandF-measure,whichleadstosomeimportantrestrictions onthemethodsorcriteriaavailableforparameterestimationaswillbedetailedinthesubsequentsections. 3 Context: anantcolonyalgorithmforWSD Theantcolonyalgorithmpresentedby(SchwabandGuillaume,2011)and(Schwabetal.,2012)thatour workfocuseson,offerssomeinterestingpropertiesregardingthequalityofthesolutions,whichareonpar withstate-of-the-artknowledgerichWSDsystems,combinedwithafastexecutiontimecomparedtothe lattersystems.Furthermore,itisalsowellsuitedtoworkingonfulltextsatatimewithafullannotation coverage. Moreover,itsstochasticnatureleadstothegenerationofsolutionswithvariableprecisionscores,which isusuallynotaverysoughtafterpropertyinalgorithms. However,thisvariabilityenablestheuseof votingstrategiesthatleadtoresultsapproachingsupervisedWSDsystemsandovercometheelusive(for unsupervisedandknowledge-basedsystems)FirstSenseBaseline.Duetothequickexecutiontime(inthe 60sto65srange),votingstrategiescandirectlybeappliedwhilestillleadingtocumulativeexecution timesforseveralexecutionsthatarelowertothatofothersystems. However, thosequalitiesarealsoamajordisadvantagewhenitcomestoselectingandtuningthe parametersofthealgorithm.Notonlyarethererelativelymany,buttheycanonlybedeterminedmanually tosomeextentthroughpainstakingtrialanderrormodifications, withnoguaranteesofoptimality. Notwithstandingwiththefactthatthestochasticnatureofthealgorithmmakesitratherimpracticalto determinewhethertheimprovementresultingfromagivenparameterchangeissimplyduetorandom variationsofthesolutiondistributionorreallytosignificantvariations. 3.1 Parametermodel Letusnowreviewthevariousparametersofthesystem,theirtypicalvaluerangesandmeaningful increments. TheparametersaresummarizedinTable2. Therearesevenparametersintotal,fiveof whicharediscreteandrepresentedbyintegersandtwocontinuousparametersrepresentedaspositive realnumbersbetweenzeroandone. Giventhenumberofparameters,evenwithsomeknowledgeabouthowthealgorithmworks,onecanat bestchosesensibleparametervaluerangesinordertolimitthecombinatorialexplosion. Initially,beforetheestimationmethodproposedinthispaper,wedeterminedthevaluesofsomeparameters bymakingiterativeandindependentchanges.Ofcoursesuchanapproachislimitedduetothefactthatit isaheuristicbasedontheassumptionthattheparametersofthesystemareindependent.ForACAitisin facttheopposite,asitiswithothermethods. Thevaluesobtainedthroughthisprocesswere ω=10,Ea=1,Emax =8,E0=10,δv =0.9,δ= 0.9,LV=90andyieldedresultsaround75%ontheSemeval2007Task7corpus(SchwabandGuillaume, 2011). Furthermore,giventhenumberofparametersandtheirvalueranges,anexhaustiveenumerationis intractable.Ifwecalculatethenumberofcombinations(assuming0.01stepsforcontinuousvariables), weobtain60 60 100 55 25 35 100=17325 108combinations. Knowingthatduetothe stochasticnatu×rewe×mayn×eedt×omak×eat×least50or100e·xecutions,wegetto17325 109combinations. · 111 Notation Description Value Exploration granularity E Energytakenbyanantwhenitarrivesona 1–60 1 a node E Maximumquantityofenergyanantcancarry 1–60 1 max δ Evaporationrateofthepheromonebetween 0.0–1.0 0.01 φ twocycles E Initialquantityofenergyoneachnode 5–60 1 0 ω Antlife-span 5–30(cycles) 1 L Odourvectorlength 20–200 5 V δ Ratioofodourvectorcomponents(words)de- 0.0–1.0 0.01 V positedbyanantwhenitarrivesonanode Table2:ParametersoftheAntColonyAlgorithmandtheirtypicalvalue-ranges Assumingleapsincomputationalpowerorheavyparallelismthatwouldmakethealgorithmtakeonly onesecondtorun,thesearchfortheoptimalparameterswouldstilltake549372years. Forthisreason,weareinterestedinfindingandautomatedwayofestimatingtheoptimalparameters.Of course,whendealingwithlinguisticdata,thereisnosuchthingasoptimal.Byoptimal,wemerelymeana setofparametersthatyieldresultswithF-scoresashighascanbeachieved. 4 Relatedwork Inthecontextofthisarticle,whenconsideringWSDalgorithmevaluatedasdescribedinSection2, theoutputoftheevaluationofthealgorithmoutputissimplyaF-scorepercentage. Inordertoapply classicalestimationtheoryapproaches(Kay,1993),wewouldneedtoeitherfindamodeloftheposterior PDF,ortoempiricallyestimateit.However,giventhesizeofthesearchspaceandthecostofrunning thealgorithm,thiswouldbeverymuchequivalenttomakinganexhaustivesearch. Furthermore,the relationshipbetweenthevaluesoftheparametersandtheresultingscoresarenon-linear.However,there aresomemodelsthatattempttoprovideaprobabilitydistributionforprecision,recall,andF-measurefor informationretrievalthatcouldpotentiallybeappliedtoWSD(GoutteandGaussier,2005). Infact,itismuchsimplertotreattheestimationasacombinatorialoptimisationproblem,byattempting tosimplymaximizethef-scorevalue. Inthiscontext,giventhatwewanttheabilitytohandlemany parametersthatcantakemanydiscretevalues,weneedtoconsiderheuristicsforanefficientestimation ofcomplexnon-linearmultivariatefunctions. ApopularchoicesforparameterestimationareGeneticAlgorithms(GA)orSimulatedannealing(SA) amongothers. Forinstance,GAshavebeenwidelyapplied,eitherinageneralparameterestimation context,suchasin(SharmanandMcClurkin,1989),(YaoandSethares,1994)or(Paterakisetal.,1998); orforspecificapplicationdomainssuchasrobotics(Bolhasani,2004),appliedstatistics(Panetal.,1995), meteorology(Leeetal.,2006),chemistry(Katareetal.,2004)andmanyothers. ForWSD,(Daelemansetal.,2003)and(Decadtetal.,2004)havealsoproposedtouseGAsforjoint parameterestimationandfeatureselection.Althoughjointoptimisationisveryinteresting,itisapplication andtaskspecific,andwouldbedifficulttoimplementinaprobabilisticsetting. Allthetechniquesmentionedabovearemeanttooptimiseadeterministicobjectivefunction.Whenthe objectivefunctionitselfisuncertainornoisy,insteadofasinglevalue,onehasasetofobservationsamples withdifferentvaluesthathavetobedealtwith.Themainissuesarethequestionofhowtoknowwhat valuetoconsiderforthemaximisationandhowtotakethevariabilityintoaccounttoavoideffectsdueto randomchance. Onemodelofstochasticsimulatedannealinghasbeenproposedin(PaintonandDiwekar,1995)andthen furtherextendedin(KimandDiwekar,2002),wheretheevaluationfunctionissampledseveraltimes,and 112 theexpectedvalueisincorporatedintoamodifiedobjectivefunction,withthevarianceactingapenalty term. 5 Problemformalization Beforedescribingtheparameterestimationalgorithm,itisimportanttoformulateinamoreformaland genericwaytheparametermodelofanyWSDalgorithmwitharandomizedoutput. Letθ~=[θ1,θ2,...,θi,...θn] Θ,betheparametermodelofaWSDalgorithm,whereΘ=Θ1 Θ2 ... Θi ... ΘnandeachΘ∈iisafinitediscretesetofintegersoraboundedrealrange. × × × × × Forexample,inthecaseoftheparametermodeloftheantcolonyalgorithm,thegeneralformofθ wouldbeθ= ω, Ea, Emax, E0,δv,δ, LV .Theinitialsetofparameterswouldbeaninstancewrittenas θI= 10,1,8,1{0,0.9,0.9,90 . } { } WecanrepresentadeterministicWSDsystemwithafunctionWSD ,thatforagivencorpusCanda C parametervectorθ~1returnsaF scorevaluebetween0and1: − WSDC:Θ 0 x R 1 (1) → ≤ ∈ ≤ θ F (2) score 7→ Thus,theproblemoffindingthebestθ,θ canbeformalizedasfollows: ∗ θ∗=argmaxWSDC(θ) (3) θ Θ ∈ Inotherwords,ifwedefineasequenceθnthatenumeratespossibleparametercombinationsinΘ,the problemcanbeexpressedas: θn∗= θθn oifthWeSrwDiCs(eθn)>WSDC(θn−1) (4) ( n 1 − However,whentheWSDsystemisprobabilistic,thereisn’tauniqueF-scoregivenforaθ ,butevery n evaluationofWSD maypotentiallyreturnadifferentF-scorevalue.Thus,WSD followsanunknown C C probabilitydistributionXthatdependsonθn:WSDC(θn) X(θn). ∼ Asensibleapproachinthiscasewouldbetofollowthemodel(Fig.1b)proposedby(PaintonandDiwekar, 1995),andtoevaluatetheobjectivefunctionWSD severaltimesandthenconsidertheexpectedvalueas C anestimatoroftheresultingdistribution.However,evenifthemeansaredifferent,thereisnoguarantee thedifferenceisnotduetorandomchance. Themoregeneralwaytodealwiththisuncertaintyaboutthestatisticalsignificanceofthedifference ofexpectedvaluesistosimplyapplyastandardstatisticaltestinordertodeterminethesignificance levelanddecidewhetherornottogoaheadwiththecomparison.Anothersimplerapproach,theone retainedinfact,istointegrateintothescoringfunctionapenaltyfactorthatdirectlytakesintoaccount thevariabilityofthedistributions.ThisproblemwillspecificallybeaddressedinmoredetailinSection 6.3. 6 SimulatedAnnealing 6.1 GeneralPresentation Simulatedannealingisastochasticcombinatorialoptimisationtechniqueoriginallyproposedby(Kirk- patricketal.,1983). Givenafunction f(θ)thattakesavectorofdiscreteparametersθ,Simulated 1Fromnowon,thevectorarrowwillbeomittedandθ~willbewrittenasθ. 113 (a)Classicaloptimisationmodel (b)Stochasticoptimisationmodel Figure1:Classicaloptimisationmodelversusstochasticoptimisationmodel Annealingaimsatfindingthecombinationofparameterthatmaximisesorminimisesthatfunction. (Kirkpatricketal.,1983)drawaparallelbetweenstatisticalmechanicsandcombinatorialoptimisationand applyideasfromtheMetropolis-Hastingsalgorithm(Metropolisetal.,1953)(simulationofthebehaviour ofmanybodysystems)tocombinatorialoptimisation. TheprincipleofSimulatedannealingistofirststartfromaninitialconfiguration(combination)ofthe functionparametersθ aswellasaninitialtemperatureT.Theinitialθ,isoftenchosenrandomly,but 0 0 0 puttinganalreadygoodsolutioncanacceleratethesearch. Then,thealgorithmiterativelyappliesarandomchangeoperatorRtothecurrentconfigurationinorder toobtainamodifiedconfigurationθ0=R(θ). Thevalueofthemodifiedconfigurationf(θ0)iscomparedtothevalueofthecurrentconfigurationf(θ), suchthat∆f =f(θ0) f(θ).AnacceptancefunctionP(∆f)isthenusedtodeterminewhethertokeep thecurrentconfigurati−onortoacceptthemodifiedconfiguration. IntheoriginalSAalgorithm,theacceptancefunctionisexpressedusingtheMetropoliscriterionandthe BoltzmanndistributionasshowninEquation5. 1 if∆f <0 P(∆f)= exp ∆f otherwise (5) ( −T (cid:128) (cid:138) Withgradientdescentorhillclimbingsearch,onlyconfigurationthathaveahigherscorearekept,while inferiorconfigurationsaresystematicallyrejected.Theproblemswiththeseapproachesliesinthefact thatcomplexfunctionsoftenhavemanylocalminimaandmaxima,andthealgorithmwilllikelyconverge onsuchaminimumormaximum. SimulatedAnnealing,ontheotherhand,hasaprobabilityofacceptinginferiorconfigurationsasa localminima/maximaescapestrategy.Theinitialtemperatureischosensothatatthebeginningofthe exploration,inferiorconfigurationsarealmostalwayskept.Aftereachiterationofthesimulation,the temperaturedecreasesandwithittheprobabilitytokeepinferiorconfiguration.Assuch,thealgorithm startswithawideandcoarseexplorationofthesearchspaceandthengraduallyconvergesonafine grainedexplorationofaspecificareaaroundaminimumormaximum(hopefullyclosetotheglobal minimumormaximum). Theconvergencecriterionforthealgorithmcaneitherbewhenthetemperaturereachesathresholdcalled 114 freezingtemperatureorwhenthesystemisdeemedtostabilize(usuallyanumberofcyclesduringwhich improvementstotheobjectivefunctionaremarginalornon-existent). 6.2 Coolingscheduleandcandidategeneration InSA,thewaythetemperaturedecreasesiskeytotheperformanceofthealgorithm. Alogarithmic coolingschedule(Tn=lnT(0n))istheoreticallysufficientinordertoguaranteeconvergence(Ingber,1996), however,itistooslowformanyproblems.AmorecommoncoolingscheduleisgeometricTn=T0 (α)n, wherealphaisthecoolingfactor.However,ontheonehandtheslopeofthecoolingcurvewillbe·very steepevenforαclosetooneandontheotherhand,thecurveitselfisconvex.Thismeansthattheslope ofthecurvewillgetsteepveryfast,thusleadingtoafastconvergenceatthecostofoptimality. Forthisreason,(Ingber,1996)proposesinhisAdaptiveSimulatedAnnealingalgorithmtouseaninverse exponentialcoolingschedulethatwilldecreasequicklyatthestartandthenslowdown.Suchaschedule correspondsmuchbettertocombinatorialoptimisationalgorithms,wherewewanttostartwithabroad search(acceptingmanyinferiorconfigurations)andthenfocusonaspecificareaofthesearchspace. (Ingber,1996)proposedtheexpressioninequations6and7,whichweadoptedforourimplementation. Here,mandnaretwoscalingfactorsthatcanbeusedtotunethescheduleforspecificproblems. 1 Tk=T0·exp −ci·k|θ| (6) (cid:16) n(cid:17) ci=m·exp −D (7) (cid:18) (cid:19) Furthermore,thetraditionalSA,generatesanewcandidateconfigurationbyuniformlyselectinganindex ofthevectorandbyapplyingarandomuniformchangeonthevalue,overthefullrangeofpossible valuesforthatindex.(Ingber,1996)arguesthatoncethesystemiscolder,thereisnopointtoexplorethe fullrange.Indeed,exploringanincreasinglysmallerneighbourhoodasthetemperaturelowersandthe systemsconvergesonaspecificareaofthesearchspace,allowsanincreaseincomputationalefficiency whilehavingnonegativeeffectsontheendresult. Equation8presentstheprobabilitydistributionusedforthecandidategenerationforeachparameter, whereui U(0,1)isauniformrandomnumberbetween0and1. ∈ 1 1 2u 1 y=sign(cid:18)u−2(cid:19)Tk(cid:150)(cid:18)1+Tk(cid:19)| −|−1(cid:153) (8) Fromthere,thenewvalueofaparameterθ(i)ofθatiterationkcanbedeterminedwiththeexpressionin Equation9forarealparameterandEquation10foranintegerparameter. θk(+i)1=θk(i)+y(θm(ia)x−θm(ii)n) (9) θk(+i)1=θk(i)+by(θm(ia)x−θm(ii)n)c (10) 6.3 Handlinguncertainty AsmentionedinSection5,whendealingwithanuncertainobjectivefunction,oneneedstoaddanextra loopintheprocess,inordertogeneratesamplesfromtheevaluationoftheobjectivefunctionf andthen tousetheexpectedvalueasaglobalobjectivefunction. TFfuohnrecnat,igoΦinvEethcnaaθnt,ubwseeesucstaehdnedbexuirpieledccttlayeldiinsvtsatoelfuaesdaomoffpthfleeasssLatsmh(eθpl)gel=odbiSsatlrNi=iob0bu{jtefic(otθniv):}eΦafEun(ndθc)tthi=oennx.θco,wnshidereeraxθm=odN1ifi·edxoib∈Ljes(cθt)ivxei. P 115 6.4 AdaptivePenaltyterm Inordertodealwiththevariabilityofthedistributions,(PaintonandDiwekar,1995)proposetoadd wtoetighheteexdpbryesasiqounaonftiΦtyE,thaaptedneapletyndfascotnorthbeasteedmopnertahteursec.aMledorsetasnpdecairfidcadlelyv,iathtieoinreσxθpr=esqsioPnxif∈oLrs(θΦN)(Exi−isx:θ)2, 2 σ ΦE(θ)=xθ+b(T)· p·Nθ (11) b b(T)=kT0 (12) Where b0isasmallconstant,kisaconstantthatrepresentstherateofincreaseof b(T)andT the temperatureofthesystem. Giventhatatthebeginning,thetemperatureishigh,thesearchiswideintothesearchspace. Thus, acceptingnon-significantchangeshaslittleadverseeffect,althoughitleadstolessevaluationofthe objectivefunction(whichiscostlytocomputeinprinciple). 6.5 Significancetesting Inthisarticle,wetakeaslightlydifferentapproach,byintegratingastandardstatisticaltestdirectlyinthe metropoliscriterion,andbyusingΦEasonlytheexpectedvalue.Notonlyaresuchtestswidelyavailable inexistinglibraries,additionally,weavoidthehassleofaddingtwomoreconstantstotheestimation algorithm. Dependingonthepropertiesofthedistribution,differenttestsmaybeapplied.Themostwidelyusedtest forsignificanceistheunpairedtwo-samplestudentt-test.Howeverthreeimportantpre-requisitesmustbe metbeforethetestcanbeapplied. Thedistributionsofsamplesinthegroupsmustbenormal,thesamplesinthegroupsmustbeindependent, andthevariancesofthegroupsmustbeequal.Inthecaseofsimulatedannealingunderuncertainty,there isnoguaranteethatthedistributionsallkeeptheseproperties.Thus,inordertousethet-test(Pressetal., 1988),otwouldberequiredtotestthehypothesesforeverynewre-samplingoftheobjectivefunction. ThiscouldbeachievedbyapplyingtheShapiro–Wilk(ShapiroandWilk,1965)testtotestthenormality assumptionandLevene’stest(Levene,1960)fortheequalityofvariances. Incasetheequalityofvariancescriterionisnotmet,thereisanalternative,Welch’st-test(Welch,1947) thatdoesnotmakethatassumption.However,ifthedistributionisnotnormal,thereisnootherchoice buttoapplyanon-parametrictest. A much simpler alternative, is to directly apply a non-parametric significance test, such as the Mann–WhitneyUunpairedtest, astherearenoassumptionaboutthedistributionofthesamples. Theonlyrequirementisthatthevaluesareordinalandnotregularlypaced. 6.5.1 ModifiedMetropoliscriterion Sincetheobjectivefunctionreturnsanumericalvalue,theMann–WhitneyUcanbeapplieddirectlyinthe metropoliscriterionassuch: P(∆f)=(1e−∆TPhiE oifth∆ePrwhiiEse<0andpθ<α (13) Thetestisbasedontheformulationofanull-hypothesisthatstatesthatthedistributionofthetwogroups ofsamplesareequal.Here,p isthep-value(probabilityoftruepositives)resultingfromthetestandαa θ 116 threshold.Ifthep-valueisbelowthethreshold,thenwerejectthenull-hypothesisandconcludethatthe distributionsmustbedifferent. Forexamplewithα=0.05thenull-hypothesiswillberejectediftheprobabilityofhavingatypeIerroris morethat95%:inotherwords,(1 p)>(1 α). − − TheMann–WhitneyU(MannandWhitney,1947)isbasedonthecomparisonoftheranksofthesamples withineachgroup.Thefirststeptowardcalculatingthep-valueistogroupallthesampletogether,tosort theminascendingorder,andtoassignaranktoeachvalue.Fromtherespectiveranksofthesamples,can bsaemcpomlespufoterdboRtθh,gisrothuepssuamreoaflwraanykssefqouraLlsto(θN).andRθ0 thesumofranksforLs(θ0).Here,thenumberof 6.5.2 Computingthep-value Fromthesumofranks, theUstatisticcanbecomputedas U = min Rθ−N(N2−1);Rθ0−N(N2−1) = min Rθ;Rθ~0 −N(N2−1). (cid:128) (cid:138) Fora(cid:16)sufficien(cid:17)tlylargeN,theUstatisticisanapproximationofthenormaldistribution,andafterapplying astandardizationbycalculatingthezstatistic,thep-valuecanberetrievedfromtheprobabilitydensity functionofthenormaldistribution: U U N2 N2(2N+1) 1 1 z U 2 z= σ−U U= 2 σU=r 12 pθ=N(z,U,σU)=p2π σUe−2(cid:129)σ−U(cid:139) (14) · Despitetheapparentcomplexityoftheprocessofcalculatingandtestingthep-value,theMann–Whitney Utestisratherstandardandavailableinmanylibrariesforvariousprogramminglanguages,including Javathatweusedforourimplementation. 7 Experimentalprotocol Theprincipleusedfortheevaluationofthevariantofsimulatedannealingconsideredfortheestimation ofparametersinanuncertainsetting,wastoconsidertheSemeval2007Task7annotatedcorpus.Wesplit thecorpusintoatrainingandatestsetinordertoevaluatetheparametersearchwiththeAntColony AlgorithmforWSDoriginallyproposedin(SchwabandGuillaume,2011)andthenfurtherimprovedupon in(Schwabetal.,2012).Furthermore,wewanttodetermineexactlyhowmuchdatawasnecessaryto obtainagoodsetofparametervalues. Normally,inordertoobtainareliabletrainingandthusparameters,itwouldbenecessarytohavea somewhatlargertestsetandtosplititintoseveralpartsbyrandomlyaggregatingwordstoperform ak foldcrossvalidation. However,theAntColonyalgorithmismeanttoworkonafulltextasit expl−oitsitsstructure.Notably,theorderofthewordsandthecontinuityofsentencesareveryimportant. Makingcross-validationsetsrandomlywouldthusonlycreateaverynoisytrainingdata. Evenjust pickingsentencesatrandomstillcompromisestheglobalcoherenceofthesolutionsfound,andthusthe parameters. Thisiswhyweonlyconsideredatrainingonsuccessivelylargerportionsofourtrainingcorpus.More specifically,theexperimentswerecarriedoutwith6,12,18,24,30and36sentencesinorderfromthefirst sentencestothefulltextinmultiplesof6. Furthermore,itisapparentthatthenatureofthetrainingtextitselfhasabigimportanceontheresulting parametersfound.Itisnecessarytouseatextthatisasgeneralaspossibleintermsofthesensesituses andofitsthemeaswellaslexicallyvaried. Eventhough,k foldcrossvalidationcannotbeapplied directlyinourexperimentalsetting,wehavedecidedtoperfo−rmtheexperimentbytaking,inturn,each textasatrainingcorpus,withtheremainderasatestcorpus.Thus,wemayalsostudytheeffectofthe textitselfanditscharacteristics(typeoftext,averagepolysemy,etc). 117 Theimplementationofthisuncertainsimulatedannealing,runstheantcolonyalgorithmandpassesthe resultsthroughtothescorertoobtaintheF-scorevalues.Forthestochasticloop,foreverynewcandidate configuration,theantcolonyalgorithmissystematicallyrun50timesinordertoguaranteeasufficient levelofstatisticalsignificance.Thesearchwaslefttoconvergeforeachsuccessivenumberofsentences andthen,eachresultingparametervaluesweretestedover100executionsoftheantcolonyalgorithmon thetestcorpusinordertoascertainthequalityoftheparameters. Theresultsforeachnumberofsentencesareevaluatedwithrelationtobothclassicalleskbaselineand thebaselineobtainedwiththeparametersbeforetheestimation(BL).Weusedstandardstatisticalteststo guaranteethestatisticalsignificanceofthecomparisons. 8 ExperimentsandResults 8.1 Initialparameterforthesimulatedannealingestimator Beforestartingtheexperiment,itisimportanttoselecttheparametersofthesimulatedannealing-based algorithmweareusing.Intotalthereare4parameters.k thenumberofcycleswithnoacceptednew stb configurationsbeforethesystemisconsideredtohaveconverged.T theinitialtemperature,andmand 0 n,theparametersofthecoolingschedule.Weareawarethatthemanualestimationoftheparametersfor thesimulatedannealingalgorithmcontradictsthegeneralorientationofthispaper,howeveroneofthe mainreasonsimulatedannealingwaschosenisbecauseitsbehaviouriswellknownandtheparameters canbesetheuristicallyquiteeffectively. Indeedavastbodyofworkcoversthealgorithmandcanbe usedtoguidetheparameterestimation.ThemainideaistouseSAonalgorithmswithmorecomplex and/ornotsowellunderstoodparametermodels. Infact,thisprocesscouldbeconsiderasaformof bootstrappingofsorts. Forthenumberofcyclesbeforeconvergence,20isagoodcompromisebetweenefficiencyandoptimality. Asfornandm,wewantedtohaveacoolingschedulewiththehighestpossibleprobabilityforsubsequent iteration,whileapproaching0after100iterations.Tothiseffect,aftervariousexperiments,weselected thevaluesn= 3andm=1. − 1 0.9 0.9 PEOBABILITY OF ACCEPTANCE00000000........12345678 000000000........12345678PROBABILITY OF ACCEPTANCE 0 159131721252933374145495357616569737781858993 0 0.02 0.04 0.06 0.08 0.1 0.12 CYCLES INITIAL TEMPERATURE (a)Choiceoftheinitialtemperature (b)Coolingscheduleacceptanceprobabilities Figure2:Classicaloptimisationmodelversusstochasticoptimisationmodel However,themostimportantparameteristheinitialtemperature,asitneedstobesettoacertainlevel ofprobabilityofacceptance. Ofcourse,sincetheprobabilityofacceptancedependsofthedifference betweensuccessivescores,thefirststeptowardselectingtheinitialtemperatureistomeasuretheaverage differencebetweensuccessivescores.Therefore,itisnecessarytorunthealgorithmwhileonlysampling parameterconfiguration,withoutmakinganydecisionsaboutacceptanceorconvergence.Thus,weran thealgorithmthiswayandobtainedanaverageF-scoredeltaof0.57%withastandarddeviationof 0.0047. Itisdescribedintheliteraturethatagoodvaluefortheinitialacceptanceprobabilityis0.8. Thus,to 118

Description:
or adaptive heuristic approaches to the estimation of parameters are clear. In the case of a . A popular choices for parameter estimation are Genetic Algorithms (GA) or Simulated annealing (SA) We can represent a deterministic WSD system with a function WSDC , that for a given corpus C and a.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.