ebook img

DTIC ADA463040: Entropy Based Classifier Combination for Sentence Segmentation PDF

0.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA463040: Entropy Based Classifier Combination for Sentence Segmentation

ENTROPYBASEDCLASSIFIERCOMBINATIONFORSENTENCESEGMENTATION M.Magimai.-Doss1,D.Hakkani-Tu¨r1,O¨.C¸etin1,E.Shriberg1,2,J.Fung1,N.Mirghafori1 1InternationalComputerScienceInstitute,Berkeley,CA,USA 2SRIInternational,MenloPark,CA,USA ABSTRACT shownthatperformancecanbefurtherimprovedbycombiningthe individualclassifierprobabilitieswiththeprobabilitiesestimatedfrom Wedescriberecentextensionstoourpreviouswork, whereweex- hiddeneventlanguagemodel(HELM). ploredtheuseofindividualclassifiers, namely, boostingandmax- While our previous work focused on individual classifiers and imumentropymodelsforsentencesegmentation. Inthispaperwe theircombinationwithHELM,thispaperfocusesoncombiningthe extendthesetofclassificationmethodswithsupportvectormachine individualclassifieroutputstoimprovetheperformanceofsentence (SVM).Weproposeanewdynamicentropy-basedclassifiercombi- segmentation. More specifically, we study the use of the entropy- nationapproachtocombinetheseclassifiers,andcompareitwiththe basedclassifiercombinationapproach,whichhasbeensuccessfully traditionalclassifiercombinationtechniques,namely,voting,linear used for ASR [8]. The entropy-based approach dynamically esti- regressionandlogisticregression. Furthermore,wealsoinvestigate matestheweightforeachclassifierbasedonitsinstantaneousout- the combination of hidden event language models with the output putprobabilitiesforeachexampleandcombinestheoutputproba- oftheproposedclassifiercombination,andtheoutputofindividual bilitiesofthedifferentclassifiersbeforemakingthefinaldecision. classifiers. ExperimentalstudiesconductedontheMandarinTDT4 Wecomparethisapproachtoastandardclassifierdecisioncombina- broadcastnewsdatabaseshowsthattheSVMclassifierasanindivid- tionapproach,voting,andotherstatic-weight-basedclassifiercom- ualclassifierimprovesoverourpreviousbestsystem. However,the binationtechniquessuchaslinearregressionandlogisticregression. proposedentropy-basedclassifiercombinationapproachshowsthe Thepredictedadvantageoftheentropy-basedapproachoverlinear bestimprovementinF-Measureof1%absolute,andthevotingap- regressionandlogisticregressionisthatitcangeneralizewelltodif- proachshowsthebestreductioninNISTerrorrateof2.7%absolute ferentdatasets. OurresultsontheMandarinTDT4broadcastnews whencomparedtothepreviousbestsystem. database validate this prediction. The classifiers used in our stud- IndexTerms— sentencesegmentation, classifiercombination, iesareboosting,maximumentropymodels,andSVM.Finally,we entropy,lexicalandprosodicfeatures,hiddeneventlanguagemodel alsostudythecombinationofindividualclassifieroutputprobabili- tiesaswellasthecombinedprobabilitiesresultingfromtheentropy- 1. INTRODUCTION basedapproachwithHELMprobabilities. Thenewlyproposeddy- namicclassifiercombinationmethodforsentencesegmentationfur- Sentencesegmentationaimstoenrichtheunstructuredwordsequence therimprovesperformance.Whenwecombinetheoutcomeofclas- outputofautomaticspeechrecognition(ASR)systemswithsentence sifiercombinationwithHELMs,weobtainthebestperformanceof boundaries,toeasethefurtherprocessingbybothhumansandma- 55.2% NIST error rate, a 2.7% absolute reduction from the NIST chines.Forinstance,theenrichedoutputofASR(i.e.,withsentence errorrateof57.9%obtainedwiththepreviousapproach. Withthis boundariesmarked)canbeusedforlaterprocessingsuchasmachine newmethod,theF-measurealsoimprovesfrom70.6%to71.6%. translation,questionanswering,orstory/topicsegmentation. The rest of the paper is organized as follows. Section 2 for- Intheliterature,thetaskofdetectingsentenceboundariesisseen mulatesthesentencesegmentationproblemasaclassificationtask, asatwo-classclassificationproblem, anddifferentclassifiershave andthendescribesthedifferentclassifiersandclassifiercombination been investigated. [1] and [2] use a method that combines hidden techniquesthathavebeeninvestigated. Sections3describesourex- Markov models (HMM) with N-gram language models containing perimentalsetupandtheresultsarepresentedinSection4. Finally, words and sentence boundary tags associated with them [3]. This inSection5weconclude. methodwaslaterextendedwithconfusionnetworksin[4]. [5]pro- vides an overview of different classification algorithms (boosting, 2. SENTENCESEGMENTATION hidden-event language models, maximum entropy models and de- cision trees) applied to this task for multilingual broadcast news. Sentencesegmentationcanbeconsideredasabinaryboundaryclas- Besidesthetypeofclassifier,theuseofdifferentfeatureshasbeen sificationproblemwith“sentence-boundary”(s)and“non-sentence- widelystudied.[2,6]showedhowprosodicfeaturescanbenefitthe boundary”(n)asclasses[5].Foragivenwordsequence{w1,...,wN}, sentence segmentation task. Investigations on prosodic and lexi- thegoalistoestimatetheclassesforboundaries{s1,...,sN},where calfeaturesinthecontextoftelephoneconversationsandbroadcast si, i=1,...,Nistheboundarybetweenwiandwi+1.Usually,this newsspeecharealsopresentedin[6,5]. Morerecently,theuseof isdonebytrainingaclassifiertoestimatetheposteriorprobability syntacticfeatureshasbeenstudiedin[7]. P(si =k|oi),wherek∈{s,n}andoiarethefeatureobservations Inthispaper,weextendourpreviousworkonsentencesegmen- forthewordboundarysi. tation for broadcast news in the framework of the DARPA GALE Ideally,thedecisionoftheclassifieristheclasswithmaximum program[5],wheredifferentclassifiers,namely,decisiontrees,boost- probabilityP(s =k|o ).However,inasentencesegmentationtask i i ing,andmaximumentropymodels,wereinvestigated. Itwasfound the probability for sentence boundary P(s = s|o ) is compared i i thatboostingwasthebestindividualclassifier. Furthermore,itwas against a threshold. If above the threshold, a decision of sentence Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2007 2. REPORT TYPE 00-00-2007 to 00-00-2007 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Entropy Based Classifier Combination for Sentence Segmentation 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION International Computer Science Institute,1947 Center Street Suite REPORT NUMBER 600,Berkeley,CA,94704 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 4 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 boundaryismade, elsetheboundaryismarkedasanon-sentence- combination.Inthispaper,wehaveinvestigatedboth. boundary.Asitwillbeseenlater,fordifferentclassifiersanddiffer- Letc∈{1,...,C}andP (s =k|o )denotethedifferentclas- c i i entevaluationmetricstheoptimalthresholdisdifferent. sifiers(inourcaseC = 3)andtheiroutputprobabilitiesatinstant i. Giventhese, weinvestigatethefollowingclassifiercombination 2.1. Classifiers techniques: Inthiswork, wehaveusedboosting, maximumentropyandSVM • Inverseentropy classifierstoestimateP(si|oi). • Voting Boostingisaniterativelearningalgorithmthataimstocombine • Linearregression “weak”baseclassifierstocomeupwitha“strong”classifier.Ateach iteration,aweakclassifierislearnedsoastominimizethetraining • Logisticregression error,andadifferentdistributionorweightingoverthetrainingex- 2.2.1. InverseEntropy amples is used to give more emphasis to examples that are often misclassified by the preceding weak classifiers. For this approach Given the instantaneous classifier output probabilities, the idea of weusetheBoosTextertooldescribedin[9],whichhastheadvantage inverseentropycombinationistoassignlargeweightstoclassifiers ofdiscriminativetraining. Moreover,itcandealwithalargesetof thataremoreconfidentintheirdecisionsandsmallweightstoclas- featuresbothdiscreteandcontinuousvalued,andhasthecapability sifiersthatarelessconfidentintheirdecisions[8].Theconfidenceis tohandlemissingfeaturevalues. measuredintermsofentropyentcofclassifieroutputprobabilities, thatis, Maximum entropy (MaxEnt) models are prominently used for naturallanguageprocessing[10]. ThemainadvantagesofMaxEnt K X models are discriminative training, capability to handle a large set ent = −P (s =k|o )·log (P (s =k|o )), ∀c (1) c c i i 2 c i i offeatures, flexibilityinhandlingmissingfeaturevalues, andcon- k=1 vergencetoauniquedefinedglobaloptimum. InstandardMaxEnt whereKisthenumberofoutputclasses,whichinourcaseis2. models, the features are discrete valued. However, the feature set Theweightω foreachclassifieristhenestimatedas usedinourstudiescontainscontinuousvaluedfeaturesinaddition c todiscretevaluedfeatures. Theprocessofdiscretizingthecontinu- 1 ousvaluedfeaturesforMaxEntmodelsissubjecttoresearch.Inthis ωc = PCentc 1 , ∀c (2) work,wediscretizethecontinuousvaluedfeaturebybinningthem c=1 entc to10classessimilarto[5].WeusetheopenNLPtoolkittotrainthe Havingestimatedtheweightforeachclassifier,theoutputprob- MaxEntmodels[11]. abilitiesoftheclassifierscanbecombinedintwodifferentways: SVMs are used in wide range of pattern recognition applica- 1. Sumrule tions[12]. SVMsprovideadvantagessimilartoMaxEntmodelsin termsofdiscriminativetrainingandcapabilitytohandlealargeset C X of features. In addition to it, SVMs can handle both discrete val- P(si =k|oi)= ωc·Pc(si =k|oi), ∀k (3) ued and continuous valued features. In our studies, we normalize c=1 the continuous valued features in the training data so as to have a 2. Productrule zeromeanandunitvariance. Inourdata, thereareinstantswhere arefjeecattularebevlasl)u.eInmsauychbaecmaisses,inwge(ifnotrrotdhuecweoarndeswthtaotkAenSfRorouuntpkuntoswans P(si =k|oi)= Z1 ·YC Pc(si =k|oi)ωc, ∀k (4) discretevaluedfeatures. Forcontinuousvaluedfeatures,weusethe c=1 meanvalue,thatis0.0.WhentrainingSVMclassifierwefoundthat whereZisanormalizationfactor. thechoiceofthekerneltypeisnotobvious.Whenusingmainlylexi- ThedecisionaboutoutputclassisthenmadebaseduponP(s = calfeatures(discretevalued)thebestperformanceisachievedwitha i s|o ). linearkernel,whereaswhileusingbothlexicalandprosodicfeatures i (bothdiscreteandcontinuousvalued)theuseofapolynomialkernel 2.2.2. Voting ofdegree2yieldsthebestperformance. Forourstudies,weusethe Thevotingtechniquelooksforagreementbetweenclassifiersoutput SVMlighttoolkit[13]. decisions.Typically,eachclassifiervotesforeachoutputclassbased 2.2. ClassifierCombination on its decision. The final decision or the output class is the class gettingthemaximumnumberofvotes. Combiningclassifiersisawellresearchedtopicinmachinelearning In our case, all classifiers have one vote and each classifier c and spoken language processing [14, 15, among others]. Some of decidesitsvotebycomparingP (s =s|o )tothethreshold. c i i themostcommonclassifiercombinationmethodsusedinthelitera- 2.2.3. LinearRegression tureincludevoting,andlinearandlogisticregression. Thegeneral objectiveofclassifiercombinationistoexploitthecomplementary Inalinearregressionclassifiercombination,thecombinedprobabil- informationbetweentheclassifiers. Inasense,thedifferentclassi- ityfortheclasssentenceboundary(P(s =s|o ))isestimatedasthe i i fiersinaclassifiercombinationcanbeseenasacollectionofweak linearlyweightedsumofclassifieroutputprobabilitiesforsentence classifiers, where each classifier can solve some different difficult boundaryclassP (s =s|o ): c i i problems.Then,theprocessofcombinationinvolvescombiningthe C decisionsofclassifiersorassigningaweighttoeachclassifier’sout- P(s =s|o )=a+Xω ·P (s =s|o ) (5) i i c c i i putevidence(inourcaseP(s |o ))andcombiningtheevidenceso i i c=1 astoreducetheobjectiveerror. Theweightscanbeestimatedstati- whereaisabiasterm.Thebiasaandtheweightsw areoptimized c cally,thatis,apriori,onheld-outdataordevelopmentdata,forex- ondevelopmentdata,andarefrozenforthetestdata.Inourstudies, amplelinearregressionordynamically,forexampleinverseentropy weusetheMATLABfunctionregresstoestimatetheweightsand constant. 2.2.4. LogisticRegression 2.precision.recall FM = (7) (precision+recall) Theclassifieroutputprobabilitiescanbecombinedusinglogisticre- gressionbytreatingthemaspredictorvariablesofthelogisticfunc- (I+D) NIST = (8) tion,andestimatingthecombinedprobabilityforthesentencebound- (C+D) aryclass(P(s =s|o ))as i i HerenotethatunlikeFM,NIST doesnothaveanupperbound. 1 P(s =s|o )= (6) i i e−(a+PCc=1ωc·Pc(si=s|oi)) 4. RESULTS wherethebiasaandweightswc foreachclassifiercareestimated Tables2and3showsentencesegmentationperformancewithindi- onadevelopmentdatasimilartolinearregression. Inourwork,we vidualclassifiersandcombinationclassifierswithvariousmethods estimate the optimal a and wc on the development data using the for the development and test sets, respectively, using both the WP Newton-Raphsonmethod[16]. andALLfeaturesets.Inallthetests,weusetheoptimumthresholds onthedevelopmentsetforcomputingtheNISTerrorandF-measure 3. EXPERIMENTALSETUP on the test set. With the ALL features, out of the three classifiers, 3.1. Data SVMperformsthebestwith57.6%NISTerrorrateand69.8%F- measure. The inverse entropy (IE) classifier combination method Toevaluateourapproach,wehaveusedasubsetoftheTDT4Man- resultsinthebestF-measureof70.9%onthetestset,andthevot- darinBroadcastNewscorpus.Table1liststhepropertiesofthetrain- ing method results in the lowest NIST error rate of 56.6%. The ing,developmentandtestsets,whicharepickedoutinatimeorder performancedifferencebetweenthevotingandIEmethodsandthe (that is, all shows in the training set precede the development and linearandlogisticregressionmethodsonthetestsetisstatistically testsetshows). Thesentencesegmentationexperimentshavebeen significant,1 thefirsttwoperformingbetter. TheMaxEntclassifier performedontheASRoutputthatwasusedinourearlierwork[5]. performsthebestontheWP featureset,butperformspoorercom- paredtoboostingandSVMonfeaturesetALL. Thismaybepri- Training Dev Test marily due to the way the discretization of the continuous valued No.ofshows 131 17 17 featureisperformed. No.ofsentenceunits 19,643 2,903 2,991 WP ALL No.ofexamples 481,419 72,152 77,915 FM NIST FM NIST Table1.Training,Development(Dev)andTestdatasets. Boosting 68.0 64.9 72.6 54.3 MaxEnt 68.5 64.5 69.1 60.3 3.2. Featuresets SVM 67.9 64.9 72.7 52.2 Tostudytheuseofprosodicfeatures,weusetwofeaturesets,namely, IE(sum) 69.4 61.8 73.6 51.8 WPandALL: IE(prod) 69.2 61.7 73.5 51.8 Voting 69.0 62.4 73.4 51.6 • WP:includesonlythewordsaroundtheboundary,pausedu- Linear 69.1 61.9 73.6 52.2 ration,andtheirN-grams Logistic 69.6 61.4 73.9 51.9 • ALL:includesWP,prosodicfeatures,speakerturn,andpart- Table 2. Sentence segmentation performance on the development of-speech(POS)tagN-grams setforfeaturesetsWPandALL.TheFM andNIST areexpressed Inadditiontotheword-basedprosodicfeaturesrelatedtopitch, in %. IE denotes inverse entropy, and sum and prod refer to the pitch slope, energy and pause duration between words that sumandproductcombinationrulesin(3)and(4),respectively.Lin- wereusedinourearlierwork[5],wealsousespeakernormal- ear denotes the linear regression technique, and Logistic denotes izedversionsofpitchandenergyfeatures.WeusedtheICSI- thelogisticregressiontechnique.Notethatthelinearregressionand SRIspeakerdiarizationsystemtodividetheaudiodatainto logistic regression weights were estimated on the development set. hypotheticalspeakers[17].Thebaselineandrangesforpitch Thebestperformanceforindividualclassifiersandclassifiercombi- and energy features were estimated on each speaker, which nationisinboldface. werethenusedasparametersinthenormalization. Further, Table 4 shows the performance when we convert the posterior theprosodicfeaturealsoincludesturn-basedfeatureswhich probabilitiesfromeachclassifierandcombinationclassifieranduse describethepositionofawordinrelationtodiarizationseg- themasstateobservationlikelihoodsinthehiddeneventlanguage mentation.Thespeakerturnfeatureswereextractedfromthe model (HELM). The HELM used for this experiment is a 4-gram diarization segmentation. We use the POS tag features that modelandistrainedusingallthetextinthetrainingset,aswellas wereusedin[5]. otherdata. WeachievethebestNISTerrorof55.2%onthetestset whenthecombinedprobabilitiesareestimatedasclassprobabilities 3.3. EvaluationMetrics averagedacrossclassifierssupportingthedecisionofvotingmethod. We evaluate our systems in terms of F-measure (FM) and NIST errorrate(NIST)[18]. IfI isthenumberofinsertions(i.e.,false 5. CONCLUSIONS positives),Disthenumberofdeletions(i.e.,falsenegative)andC isthenumberofsentenceboundariescorrectlyrecognized,then: We have described the recent extensions to our previous work on sentence segmentation, where we extend the set of classification C C precision= , recall= (C+I) (C+D) 1accordingtotheZ-testwith95%confidenceinterval WP ALL FM NIST FM NIST [3] A.StolckeandE.Shriberg,“Automaticlinguisticsegmentation ofconversationalspeech,” inProc.ICSLP,1996. Boosting 64.4 71.5 68.9 61.3 MaxEnt 65.7 70.8 67.3 63.5 [4] D.Hillard,M.Ostendorf,A.Stolcke,Y.Liu,andE.Shriberg, SVM 64.7 71.9 69.8 57.6 “Improvingautomaticsentenceboundarydetectionwithcon- fusionnetworks,” inProc.HLT-NAACL,Boston,MA,2004. IE(sum) 66.5 68.2 70.7 57.6 IE(prod) 66.1 68.6 70.9 57.3 [5] M. Zimmermann, D. Hakkani-Tu¨r, J. Fung, N. Mirghafori, Voting 66.4 68.7 70.6 56.6 E. Shriberg, and Y. Liu, “The ICSI+ multi-lingual sentence Linear 64.1 70.2 67.4 60.0 segmentationsystem,” inProc.ICSLP,2006. Logistic 64.1 69.7 68.6 59.6 [6] Y.Liu,E.Shriberg,A.Stolcke,B.Peskin,J.Ang,D.Hillard, Table3. Sentencesegmentationperformanceonthetestsetforfea- M. Ostendorf, M. Tomalin, P. Woodland, and M. Harper, turesetsWPandALL.TheFM andNIST areexpressedin%.IE “StructuralmetadataresearchintheEARSprogram,” inProc. denotesinverseentropy,andsumandprodrefertothesumandprod- ICASSP,2005. uctcombinationrulesin(3)and(4),respectively.Lineardenotesthe [7] B.Roark,Y.Liu,M.Harper,R.Stewart,M.Lease,M.Snover, linearregressiontechnique,andLogisticdenotesthelogisticregres- I Shafran, B. Dorr, J. Hale, A. Krasnyanskaya, and L. Yung, siontechnique.Thelinearregressionandlogisticregressionweights “Rerankingforsentenceboundarydetectioninconversational areestimatedonthedevelopmentset. Thebestperformanceforin- speech,” inProc.ICASSP,2006. dividualclassifiersandclassifiercombinationisinboldface. [8] H. Misra, H. Bourlard, and V. Tyagi, “New entropy based Dev Test combinationrulesinHMM/ANNmulti-streamASR,”inProc. FM NIST FM NIST ICASSP,2003. Boosting+HELM 74.1 50.9 70.6 57.9 [9] R.E.SchapireandY.Singer, “BoosTexter: Aboosting-based MaxEnt+HELM 71.5 55.0 69.2 61.1 systemfortextcategorization,”MachineLearning,vol.39,no. SVM+HELM 75.5 48.9 71.9 56.7 2-3,pp.135–168,2000. IE(sum)+HELM 74.3 49.5 71.3 56.3 [10] A.Berger,V.J.DellaPietra,andS.A.DellaPietra, “Amaxi- IE(prod)+HELM 74.5 49.5 71.6 56.1 mumentropyapproachtonaturallanguageprocessing,” Com- Avg.Prob+HELM 74.2 49.5 70.8 55.2 putationalLinguistics,vol.22,no.1,pp.30–71,1996. Table 4. Results of sentence segmentation studies with HELM for [11] A. Ratnaparkhi, Maximum entropy models for natural lan- featuresetALLonthedevelopment(Dev)andtestset.TheFMand guageambiguityresolution, Ph.D.thesis,UniversityofPenn- NIST areexpressedin%.IE(sum):thecombinedclassifieroutput sylvania,1998, http://maxent.sourceforge.net/. probabilitiesareestimatedusing(3),IE(prod): thecombinedclas- [12] V. N. Vapnik, The Nature of Statistical Learning Theory, sifieroutputprobabilitiesareestimatedusing(4),Avg. Prob: Class Springer,1995. probabilitiesaveragedacrosstheclassifierssupportingthedecision [13] T. Joachims, “Making large-scale SVM learning practical,” of voting method. The best performance for individual classifiers in Advances in Kernel Methods - Support Vector Learning, andclassifiercombinationisinboldface. B.Schlkopf,C.Burges,andA.Smola,Eds.MITPress,1999, http://svmlight.joachims.org/. methodswithSVMsandcombineclassificationmethodswithanew, [14] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On com- dynamic, entropy-basedclassifiercombinationmethod, whichwas biningclassifiers,”IEEETransactionsonPatternAnalysisand alreadyshowntobeusefulforASR.Weshowimprovementswhen MachineIntelligence,vol.20,pp.226–239,1998. weuseanextendedsetofprosodicfeaturesinadditiontopausedu- [15] M.Karahan,D.Hakkani-Tu¨r,G.Riccardi,andG.Tur, “Com- rationwithallclassifiers.Furthermore,tomodelthesequenceinfor- biningclassifiersforspokenlanguageunderstanding,”inProc. mation, weincorporatethecombinedclassifieroutputwithhidden ASRU,2003. eventlanguagemodels. Weshowstatisticallysignificantimprove- [16] A.Agresti, CategoricalDataAnalysis, JohnWileyandSons, mentsforbothNISTerrorrateandF-measure. 1990. Acknowledgments [17] C.Wooters,J.Fung,B.Peskin,andX.Anguera, “Towardsro- ThisworkissupportedbyDARPAunderContractNo.HR0011-06- bustspeakersegmentation: theICSI-SRIfall2004diarization C-0023. Theviewsarethoseoftheauthorsanddonotnecessarily system,” inProc.RT-04FWorkshop,2004. representtheviewsofthefundingagency. TheauthorsthankYang [18] “The rich transcription fall 2003,” Liu,MatthiasZimmermann,LukeGottliebandGokhanTu¨r. http://nist.gov/speech/tests/rt/rt2003/fall/docs/rt03-fall-eval- plan-v9.pdf. 6. REFERENCES [1] Y. Gotoh and S. Renals, “Sentence boundary detection in broadcastspeechtranscripts,” inProc.ISCAITRWWorkshop, Paris,2000. [2] E. Shriberg, A. Stolcke, D. Hakkani, and G. Tur, “Prosody- based automatic segmentation of speech into sentences and topics,” SpeechCommunication,vol.31,pp.127–154,2000.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.