On Word Alignment Models for Statistical Machine Translation by ShaojunZhao SubmittedinPartialFulfillment ofthe RequirementsfortheDegree DoctorofPhilosophy Supervisedby ProfessorDanielJ.Gildea DepartmentofComputerScience Arts,Sciences&Engineering EdmundA.HajimSchoolofEngineering&AppliedSciences UniversityofRochester Rochester,NewYork 2011 ii Tomyfamily iii Curriculum Vitae ShaojunZhao(Chinesename: 赵绍君,Englishnickname: Sam)wasbornintheWestZhao Village,whereallthepeoplehavethesamefamilynameZhao. ThissmallvillageisinTianmen, HubeiProvince,P.R.China. Whenhewastenyearsold,hemovedtoasmalltown,LiChang, with his father, mother, brother, sister and grandma. He went to Tianmen High School at the ageoffifteen,andNankaiUniversitythreeyearslater. Hewasthefirstoneinthatvillagetogo tocollege. ShaojunstudiedPhysicsfortwoyearandMathematicsforanothertwoyearsatNankaiUni- versity,andreceivedaB.S.degreeinMathematics. HewenttoPekingUniversity(alsoknown as Beijing University) to study Computer Science and Technology, supervised by Professor XuanWang. HereceivedaM.S.degreefromPekingUniversity. In2002,ShaojunworkedwithProfessorDekangLinforonemonthatMicrosoftResearch Asian,anddecidedtogotoCanada,workingasaresearchstaffforProfessorLinattheUniver- sityofAlberta. In 2005, Shaojun came to the University of Rochester to pursue his Ph.D. degree in Com- puterScience. HewassupervisedbyProfessorDanielGildea. HereceivedaM.S.degreefrom theUniversityofRochesterin2007. Shaojun worked as an intern for Yahoo (Sunnyvale, 2005) and Google (Mountain View, 2007and2008). Shaojun lives in Redmond Washington with his wife and son, and works for Amazon in Seattle. iv Acknowledgments This material is based upon research supported by NSF grants IIS-0546554, IIS-0428020, and IIS-0910611. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of above named organizations. Iwouldliketothankmyadvisor,ProfessorDanielGildea,andmycommitteemembers,Dr. DekangLin,ProfessorHenryKautz,ProfessorChristopherPal,andProfessorDavidOakes. v Abstract Machine translation remains the holy grail of computational linguistics. All statistical ma- chine translation systems are built upon the idea of word alignment. While the field of word alignmenthashadtremendousprogressinthelasttwodecades,itisstillingreatneedofspeed andqualityimprovement. We designed a fertility hidden Markov model for word alignment, which is dramatically fasterthanthemostwidelyusedIBMModel4. Infact,ourmodelisevenfasterandhaslower alignmenterrorrate(AER)thanthehiddenMarkovmodel. AnexperimentonChinese-English translation shows that our word alignment model leads to better translation results than IBM Model4,basedontheBLEUmetric. Wealsodesignedalgorithmsthatminemassiveandhighqualitybilingualtextsforavariety of language pairs fromthe web using wordalignment. The resultingdata improved a state-of- the-artmachinetranslationsystem. vi Table of Contents CurriculumVitae iii Acknowledgments iv Abstract v ListofTables x ListofFigures xii Foreword 1 I Introduction 2 II MachineTranslationandWordAlignment 5 1 StatisticalMachineTranslation 6 2 WordAlignmentModels 10 2.1 HeuristicAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 StatisticalAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 IBMmodel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 vii 2.2.2 IBMModel2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 HiddenMarkovModelforWordAlignment . . . . . . . . . . . . . . . . . . . 16 2.3.1 TheHiddenMarkovModel . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Fertility-BasedModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 OtherModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 ITGmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 III BuildingNewModelsforWordAlignment 20 3 LBFGSforaUn-normalizedJointProbabilityModel 22 3.1 AUn-normalizedJointProbabilityModel . . . . . . . . . . . . . . . . . . . . 22 3.2 ParameterEstimationFortheUNJPModelUsingLBFGS . . . . . . . . . . . 23 3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 LBFGSforIBMModel1 28 4.1 IBMmodel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 UsingLBFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Complexityanalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.5 ResultsandAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 ExtendingtheHiddenMarkovModel 33 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 viii 5.2 MinmaxModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 ReducingComplexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6 AddingFertilitytotheHiddenMarkovModels 39 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.3 StatisticalWordAlignmentModels . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3.1 AlignmentandFertility . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3.2 IBMModel1andHMM . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.4 FertilityHiddenMarkovModel . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.5 ExpectationMaximizationAlgorithm . . . . . . . . . . . . . . . . . . . . . . 47 6.6 GibbsSamplingforFertilityHMM . . . . . . . . . . . . . . . . . . . . . . . . 48 6.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 IV MiningBilingualTextsfromtheWeb 56 7 Priorwork 58 7.1 Miningbilinguallexiconsfromnon-parallelcorpora . . . . . . . . . . . . . . . 58 7.2 Miningbilingualwebpages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.3 Miningbilingualsentencesfromnon-parallelcorpora . . . . . . . . . . . . . . 59 7.4 Miningbilingualphrasesfrommonolingualcorpora . . . . . . . . . . . . . . . 59 8 MiningParentheticalTranslationsfromtheWebbyWordAlignment 61 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 ix 8.2 MiningParentheticalTranslations . . . . . . . . . . . . . . . . . . . . . . . . 63 8.3 ConstructingaPartiallyParallelCorpus . . . . . . . . . . . . . . . . . . . . . 66 8.3.1 Filteringoutnon-translations . . . . . . . . . . . . . . . . . . . . . . . 66 8.3.2 Constrainingtermboundaries . . . . . . . . . . . . . . . . . . . . . . 67 8.3.3 Length-basedtrimming . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.4 WordAlignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.4.1 Dealingwithmulti-wordalignment . . . . . . . . . . . . . . . . . . . 70 8.4.2 Linkscoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8.4.3 Biasinthepartiallyparallelcorpus . . . . . . . . . . . . . . . . . . . 70 8.4.4 Capturingsyllable-levelregularities . . . . . . . . . . . . . . . . . . . 71 8.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.5.1 EvaluationwithWikipedia . . . . . . . . . . . . . . . . . . . . . . . . 73 8.5.2 Evaluationwithtermtranslationrequests . . . . . . . . . . . . . . . . 78 8.5.3 EvaluationwithSMT . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 V Conclusion 81 Bibliography 83 A WordAlignmentExamples 90 A.1 IBMModel1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 A.2 HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 A.3 IBMModel3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.4 FertilityHMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.5 IBMModel4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 x List of Tables 4.1 ComparisontableforLBFGSandGIZA++ . . . . . . . . . . . . . . . . . . . 31 5.1 ResultsoftheHMMandtheMinmaxwithdifferenttrainingsize . . . . . . . . 38 6.1 AERresultsforIBM1,IBM1F,HMM,IBM3,HMMF,andIBM4 . . . . . . . 52 6.2 AERresultsaftersymmetrization . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.1 ChinesetextprecedingLowerEgypt . . . . . . . . . . . . . . . . . . . . . . . 64 8.2 TextprecedingChannelSpacing . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.3 Otherusesofparentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.4 Exampleprefixesandsuffixeswithtopφ2 . . . . . . . . . . . . . . . . . . . . 72 8.5 FromsevenlanguagestoEnglish . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.6 FromEnglishtosevenlanguages . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.7 ChinesetoEnglishresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.8 EnglishtoChineseresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.9 Arandomsampleofnon-exact-matches: theextractedtranslationistooshort . 75 8.10 Arandomsampleofnon-exact-matches: theextractedtranslationistoolong . . 76 8.11 Arandomsampleofnon-exact-matches: theextractedtranslationcontainsonly thelastname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.12 Arandomsampleofnon-exact-matches: theextractedtermiscompletelywrong 76
Description: