ebook img

Automatic Sentence Structure Annotation for Spoken Language Processing PDF

155 Pages·2012·1.22 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Automatic Sentence Structure Annotation for Spoken Language Processing

Automatic Sentence Structure Annotation for Spoken Language Processing DustinLundringHillard Adissertation submitted inpartialful(cid:2)llmentof therequirements forthedegreeof DoctorofPhilosophy UniversityofWashington 2008 ProgramAuthorized toOfferDegree: ElectricalEngineering UniversityofWashington GraduateSchool ThisistocertifythatIhaveexaminedthiscopyofadoctoral dissertation by DustinLundringHillard andhavefoundthatitiscompleteandsatisfactory inallrespects, andthatanyandallrevisions required bythe(cid:2)nal examiningcommitteehavebeenmade. ChairoftheSupervisory Committee: MariOstendorf ReadingCommittee: MariOstendorf JeffBilmes AndreasStolcke Date: In presenting this dissertation in partial ful(cid:2)llment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. Ifurtheragreethatextensivecopying ofthisdissertation isallowableonlyforscholarly purposes, consistent with(cid:147)fairuse(cid:148)asprescribed intheU.S.Copyright Law. Requests forcopying or reproduction of this dissertation may be referred to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted (cid:147)therighttoreproduce andsell(a)copies ofthemanuscript inmicroformand/or (b)printed copies ofthemanuscript madefrommicroform.(cid:148) Signature Date UniversityofWashington Abstract AutomaticSentenceStructureAnnotation forSpokenLanguageProcessing DustinLundringHillard ChairoftheSupervisory Committee: ProfessorMariOstendorf ElectricalEngineering Increasing amounts of easily available electronic data are precipitating a need for automatic pro- cessing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly signi(cid:2)cant portion of on-line information, from news and television broadcasts, to oralhistories, on-line lectures, oruser generated content. Automatic processing ofaudioand video sources requires automatic speech recognition (ASR) in order to provide transcripts. Typical ASR generates only words, without punctuation, capitalization, or further structure. Many techniques availablefromnaturallanguageprocessingthereforesufferwhenappliedtospeechrecognitionout- put,becausetheyassumethepresenceofreliablepunctuationandstructure. Inaddition,errorsfrom automatic transcription also degrade the performance of downstream processing such as machine translation, name detection, or information retrieval. We develop approaches for automatically annotating structure in speech, including sentence and sub-sentence segmentation, and then turn towardsoptimizing ASRandannotation fordownstreamapplications. The impact of annotation is explored at the sentence and sub-sentence level. We describe our generalapproachforpredictingsentencesegmentationanddealingwithuncertainty inASR.Anew ASRsystem combination approach isdescribed that improves ASRmore than any previously pro- posedmethods. Theimpactofautomaticsegmentationinmachinetranslation isalsoevaluated,and we(cid:2)ndthat optimizing segmentation directly fortranslation improves translation quality, perform- ing as well (or better than) using reference segmentation. Turning to sub-sentence annotation, we describe approaches for supervised comma detection and unsupervised learning of prosodic struc- ture. Theutility ofautomatic commasisthen assessed inthecontext ofinformation extraction and machine translation. Including commas in information extraction tasks signi(cid:2)cantly improves per- formance, especially when punctuation precision and recall are optimized directly for entity and relation extraction. We then also propose approaches for improving translation reordering models withcuesfromcommasandsentence structure.

Description:
2.4 Representing Uncertainty in Spoken Language Processing . 18 5.1 Sentence Segmentation for Machine Translation . 53.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.