ebook img

On Word Alignment Models for Statistical Machine Translation PDF

133 Pages·2011·0.96 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On Word Alignment Models for Statistical Machine Translation

On Word Alignment Models for Statistical Machine Translation by ShaojunZhao SubmittedinPartialFulfillment ofthe RequirementsfortheDegree DoctorofPhilosophy Supervisedby ProfessorDanielJ.Gildea DepartmentofComputerScience Arts,Sciences&Engineering EdmundA.HajimSchoolofEngineering&AppliedSciences UniversityofRochester Rochester,NewYork 2011 ii Tomyfamily iii Curriculum Vitae ShaojunZhao(Chinesename: 赵绍君,Englishnickname: Sam)wasbornintheWestZhao Village,whereallthepeoplehavethesamefamilynameZhao. ThissmallvillageisinTianmen, HubeiProvince,P.R.China. Whenhewastenyearsold,hemovedtoasmalltown,LiChang, with his father, mother, brother, sister and grandma. He went to Tianmen High School at the ageoffifteen,andNankaiUniversitythreeyearslater. Hewasthefirstoneinthatvillagetogo tocollege. ShaojunstudiedPhysicsfortwoyearandMathematicsforanothertwoyearsatNankaiUni- versity,andreceivedaB.S.degreeinMathematics. HewenttoPekingUniversity(alsoknown as Beijing University) to study Computer Science and Technology, supervised by Professor XuanWang. HereceivedaM.S.degreefromPekingUniversity. In2002,ShaojunworkedwithProfessorDekangLinforonemonthatMicrosoftResearch Asian,anddecidedtogotoCanada,workingasaresearchstaffforProfessorLinattheUniver- sityofAlberta. In 2005, Shaojun came to the University of Rochester to pursue his Ph.D. degree in Com- puterScience. HewassupervisedbyProfessorDanielGildea. HereceivedaM.S.degreefrom theUniversityofRochesterin2007. Shaojun worked as an intern for Yahoo (Sunnyvale, 2005) and Google (Mountain View, 2007and2008). Shaojun lives in Redmond Washington with his wife and son, and works for Amazon in Seattle. iv Acknowledgments This material is based upon research supported by NSF grants IIS-0546554, IIS-0428020, and IIS-0910611. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of above named organizations. Iwouldliketothankmyadvisor,ProfessorDanielGildea,andmycommitteemembers,Dr. DekangLin,ProfessorHenryKautz,ProfessorChristopherPal,andProfessorDavidOakes. v Abstract Machine translation remains the holy grail of computational linguistics. All statistical ma- chine translation systems are built upon the idea of word alignment. While the field of word alignmenthashadtremendousprogressinthelasttwodecades,itisstillingreatneedofspeed andqualityimprovement. We designed a fertility hidden Markov model for word alignment, which is dramatically fasterthanthemostwidelyusedIBMModel4. Infact,ourmodelisevenfasterandhaslower alignmenterrorrate(AER)thanthehiddenMarkovmodel. AnexperimentonChinese-English translation shows that our word alignment model leads to better translation results than IBM Model4,basedontheBLEUmetric. Wealsodesignedalgorithmsthatminemassiveandhighqualitybilingualtextsforavariety of language pairs fromthe web using wordalignment. The resultingdata improved a state-of- the-artmachinetranslationsystem. vi Table of Contents CurriculumVitae iii Acknowledgments iv Abstract v ListofTables x ListofFigures xii Foreword 1 I Introduction 2 II MachineTranslationandWordAlignment 5 1 StatisticalMachineTranslation 6 2 WordAlignmentModels 10 2.1 HeuristicAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 StatisticalAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 IBMmodel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 vii 2.2.2 IBMModel2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 HiddenMarkovModelforWordAlignment . . . . . . . . . . . . . . . . . . . 16 2.3.1 TheHiddenMarkovModel . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Fertility-BasedModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 OtherModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 ITGmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 III BuildingNewModelsforWordAlignment 20 3 LBFGSforaUn-normalizedJointProbabilityModel 22 3.1 AUn-normalizedJointProbabilityModel . . . . . . . . . . . . . . . . . . . . 22 3.2 ParameterEstimationFortheUNJPModelUsingLBFGS . . . . . . . . . . . 23 3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 LBFGSforIBMModel1 28 4.1 IBMmodel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 UsingLBFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Complexityanalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.5 ResultsandAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 ExtendingtheHiddenMarkovModel 33 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 viii 5.2 MinmaxModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 ReducingComplexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6 AddingFertilitytotheHiddenMarkovModels 39 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.3 StatisticalWordAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3.1 AlignmentandFertility . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3.2 IBMModel1andHMM . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.4 FertilityHiddenMarkovModel . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.5 ExpectationMaximizationAlgorithm . . . . . . . . . . . . . . . . . . . . . . 47 6.6 GibbsSamplingforFertilityHMM . . . . . . . . . . . . . . . . . . . . . . . . 48 6.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 IV MiningBilingualTextsfromtheWeb 56 7 Priorwork 58 7.1 Miningbilinguallexiconsfromnon-parallelcorpora . . . . . . . . . . . . . . . 58 7.2 Miningbilingualwebpages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.3 Miningbilingualsentencesfromnon-parallelcorpora . . . . . . . . . . . . . . 59 7.4 Miningbilingualphrasesfrommonolingualcorpora . . . . . . . . . . . . . . . 59 8 MiningParentheticalTranslationsfromtheWebbyWordAlignment 61 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 ix 8.2 MiningParentheticalTranslations . . . . . . . . . . . . . . . . . . . . . . . . 63 8.3 ConstructingaPartiallyParallelCorpus . . . . . . . . . . . . . . . . . . . . . 66 8.3.1 Filteringoutnon-translations . . . . . . . . . . . . . . . . . . . . . . . 66 8.3.2 Constrainingtermboundaries . . . . . . . . . . . . . . . . . . . . . . 67 8.3.3 Length-basedtrimming . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.4 WordAlignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.4.1 Dealingwithmulti-wordalignment . . . . . . . . . . . . . . . . . . . 70 8.4.2 Linkscoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8.4.3 Biasinthepartiallyparallelcorpus . . . . . . . . . . . . . . . . . . . 70 8.4.4 Capturingsyllable-levelregularities . . . . . . . . . . . . . . . . . . . 71 8.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.5.1 EvaluationwithWikipedia . . . . . . . . . . . . . . . . . . . . . . . . 73 8.5.2 Evaluationwithtermtranslationrequests . . . . . . . . . . . . . . . . 78 8.5.3 EvaluationwithSMT . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 V Conclusion 81 Bibliography 83 A WordAlignmentExamples 90 A.1 IBMModel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 A.2 HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 A.3 IBMModel3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.4 FertilityHMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.5 IBMModel4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 x List of Tables 4.1 ComparisontableforLBFGSandGIZA++ . . . . . . . . . . . . . . . . . . . 31 5.1 ResultsoftheHMMandtheMinmaxwithdifferenttrainingsize . . . . . . . . 38 6.1 AERresultsforIBM1,IBM1F,HMM,IBM3,HMMF,andIBM4 . . . . . . . 52 6.2 AERresultsaftersymmetrization . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.1 ChinesetextprecedingLowerEgypt . . . . . . . . . . . . . . . . . . . . . . . 64 8.2 TextprecedingChannelSpacing . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.3 Otherusesofparentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.4 Exampleprefixesandsuffixeswithtopφ2 . . . . . . . . . . . . . . . . . . . . 72 8.5 FromsevenlanguagestoEnglish . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.6 FromEnglishtosevenlanguages . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.7 ChinesetoEnglishresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.8 EnglishtoChineseresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.9 Arandomsampleofnon-exact-matches: theextractedtranslationistooshort . 75 8.10 Arandomsampleofnon-exact-matches: theextractedtranslationistoolong . . 76 8.11 Arandomsampleofnon-exact-matches: theextractedtranslationcontainsonly thelastname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.12 Arandomsampleofnon-exact-matches: theextractedtermiscompletelywrong 76

Description:
Curriculum Vitae. Shaojun Zhao (Chinese name: 赵绍君, English nickname: Sam) was born in the West Zhao. Village, where all the people .. 5.1 Results of the HMM and the Minmax with different training size 38. 6.1 AER
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.