ebook img

Comparable Corpora and Computer-assisted Translation PDF

305 Pages·2014·3.361 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Comparable Corpora and Computer-assisted Translation

Comparable Corpora and Computer-assisted Translation To Elia Series Editor Narendra Jussien Comparable Corpora and Computer-assisted Translation Estelle Maryline Delpech Firstpublished2014inGreatBritainandtheUnitedStatesbyISTELtdandJohnWiley&Sons,Inc. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced, storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentionedaddress: ISTELtd JohnWiley&Sons,Inc. 27-37StGeorge’sRoad 111RiverStreet LondonSW194EU Hoboken,NJ07030 UK USA www.iste.co.uk www.wiley.com ©ISTELtd2014 TherightsofEstelleMarylineDelpechtobeidentifiedastheauthorofthisworkhavebeenassertedby herinaccordancewiththeCopyright,DesignsandPatentsAct1988. LibraryofCongressControlNumber:2014936484 BritishLibraryCataloguing-in-PublicationData ACIPrecordforthisbookisavailablefromtheBritishLibrary ISBN978-1-84821-689-1 PrintedandboundinGreatBritainbyCPIGroup(UK)Ltd.,Croydon,SurreyCR04YY Table of Contents ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi PART1.APPLICATIVEAND SCIENTIFICCONTEXT . . . . . . . . . . . . . 1 CHAPTER 1. LEVERAGING COMPARABLE CORPORA FOR COMPUTER- ASSISTEDTRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. From the beginnings of machine translation to comparable corpora processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1. Thedawnofmachinetranslation . . . . . . . . . . . . . . . . . . . 3 1.2.2. Thedevelopmentofcomputer-assistedtranslation . . . . . . . . . . 5 1.2.3. Drawbacksofparallelcorporaandadvantagesofcomparablecorpora 7 1.2.4. Difficultiesoftechnicaltranslation . . . . . . . . . . . . . . . . . . 9 1.2.5. Industrialcontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3. Termalignmentfromcomparablecorpora: astate-of-the-art . . . . . . 15 1.3.1. Distributionalapproachprinciple . . . . . . . . . . . . . . . . . . . 15 1.3.2. Termalignmentevaluation . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.3. Improvementandvariantsofthedistributionalapproach . . . . . . 20 1.3.4. Theinfluencedataandparametersonalignmentquality . . . . . . 28 1.3.5. Limitsofthedistributionalapproach . . . . . . . . . . . . . . . . . 30 1.4. CATsoftwareprototypeforcomparablecorporaprocessing . . . . . . 32 1.4.1. Implementationofatermalignmentmethod . . . . . . . . . . . . . 32 1.4.2. Terminologicalrecordsextraction . . . . . . . . . . . . . . . . . . . 36 vi ComparableCorporaandComputer-assistedTranslation 1.4.3. Lexiconconsultationinterface . . . . . . . . . . . . . . . . . . . . . 38 1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 CHAPTER2.USER-CENTERED EVALUATIONOF LEXICONS EXTRACTEDFROMCOMPARABLE CORPORA . . . . . . . . . . . . . . . . . 41 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2. Translationqualityevaluationmethodologies. . . . . . . . . . . . . . . 42 2.2.1. Machinetranslationevaluation. . . . . . . . . . . . . . . . . . . . . 42 2.2.2. Humantranslationevaluation . . . . . . . . . . . . . . . . . . . . . 46 2.2.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.3. Designandexperimentationofauser-centeredevaluation . . . . . . . 50 2.3.1. Methodologicalaspects . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.2. Experimentationprotocol. . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 CHAPTER3.AUTOMATICGENERATIONOFTERM TRANSLATIONS . . . 67 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2. Compositionalapproaches . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.1. Compositionaltranslationprinciple . . . . . . . . . . . . . . . . . . 68 3.2.2. Polylexicalunitscompositionaltranslation . . . . . . . . . . . . . . 70 3.2.3. Monolexicalunitscompositionaltranslation . . . . . . . . . . . . . 75 3.2.4. Candidatetranslationfiltering . . . . . . . . . . . . . . . . . . . . . 81 3.3. Data-drivenapproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.1. Analogy-basedtranslation . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.2. Rewritingruleslearning . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.3. Dealingwithmorphologicalvariation . . . . . . . . . . . . . . . . . 88 3.4. Evaluationoftermtranslatorgenerationmethods . . . . . . . . . . . . 91 3.5. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 PART2.CONTRIBUTIONSTOCOMPOSITIONALTRANSLATION . . . . . 99 CHAPTER4.MORPH-COMPOSITIONALTRANSLATION: METHODOLOGICALFRAMEWORK . . . . . . . . . . . . . . . . . . . . . . . 101 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2. Morpho-compositionaltranslationmethod . . . . . . . . . . . . . . . . 101 4.2.1. Scientificpositioning . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2. Definitionsandterminology . . . . . . . . . . . . . . . . . . . . . . 105 4.2.3. Underlyingassumptions . . . . . . . . . . . . . . . . . . . . . . . . 108 4.2.4. Advantagesoftheproposedapproachforprocessingcomparable corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3. Addressedissuesandcontributions . . . . . . . . . . . . . . . . . . . . 110 4.3.1. Generatingfertiletranslations . . . . . . . . . . . . . . . . . . . . . 110 Contents vii 4.3.2. Dealingwithdiversemorphologicalstructures . . . . . . . . . . . . 113 4.3.3. Candidatetranslationsranking . . . . . . . . . . . . . . . . . . . . . 116 4.4. Evaluationmethodology . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.1. Apriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.4.2. Aposteriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 CHAPTER5.EXPERIMENTALDATA . . . . . . . . . . . . . . . . . . . . . . . 123 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.2. Comparablecorpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3. Sourceterms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4. Referencedatafortranslationgenerationevaluation . . . . . . . . . . . 126 5.4.1. Apriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4.2. Aposteriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.5. Translationrankingtrainingandevaluationdata . . . . . . . . . . . . . 131 5.6. Linguisticresources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6.1. Generallanguagebilingualdictionary. . . . . . . . . . . . . . . . . 131 5.6.2. Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.6.3. Boundmorphemestranslationtable . . . . . . . . . . . . . . . . . . 132 5.6.4. Lexiconforworddecomposition . . . . . . . . . . . . . . . . . . . 133 5.6.5. Morphologicalfamilies . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.6.6. Dictionaryofcognates . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 CHAPTER 6. FORMALIZATION AND EVALUATION OF CANDIDATE TRANSLATIONGENERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2. Translationgenerationalgorithm . . . . . . . . . . . . . . . . . . . . . . 139 6.2.1. Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.2. Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2.3. Recomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2.4. Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.3. Morphologicalsplittingevaluation. . . . . . . . . . . . . . . . . . . . . 147 6.4. Translationgenerationevaluation . . . . . . . . . . . . . . . . . . . . . 148 6.4.1. Referencedataandevaluationmeasures . . . . . . . . . . . . . . . 148 6.4.2. Modelgenericityinfluence . . . . . . . . . . . . . . . . . . . . . . . 152 6.4.3. Linguisticresourcesinfluence . . . . . . . . . . . . . . . . . . . . . 156 6.4.4. Fallbackstrategyinfluence . . . . . . . . . . . . . . . . . . . . . . . 159 6.4.5. Fertiletranslationsinfluence . . . . . . . . . . . . . . . . . . . . . . 160 6.4.6. Popularsciencecorpusinfluence . . . . . . . . . . . . . . . . . . . 165 6.4.7. Qualitativeanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.5.1. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.5.2. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 176 viii ComparableCorporaandComputer-assistedTranslation CHAPTER 7. FORMALIZATION AND EVALUATION OF CANDIDATE TRANSLATION RANKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2. Rankingcriteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2.1. Contextsimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2.2. Candidatetranslationfrequency . . . . . . . . . . . . . . . . . . . . 180 7.2.3. Parts-of-speechtranslationprobability . . . . . . . . . . . . . . . . 180 7.2.4. Componentstranslationmode . . . . . . . . . . . . . . . . . . . . . 181 7.3. Criteriacombination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3.1. Valuestandardization . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3.2. Linearcombination . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.3.3. Learning-to-rankmodel. . . . . . . . . . . . . . . . . . . . . . . . . 186 7.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.4.1. Referencedataandevaluationmeasures . . . . . . . . . . . . . . . 187 7.4.2. Basesofcomparison . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 7.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7.5.1. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.5.2. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 196 CONCLUSIONANDPERSPECTIVES . . . . . . . . . . . . . . . . . . . . . . . 199 PART3.APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 APPENDIX1.MEASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 APPENDIX2.DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 APPENDIX3.COMPARABLE CORPORALEXICONS CONSULTATIONINTERFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 LISTOFALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 LISTOFEXTRACTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Acknowledgments IwouldliketogiveBéatriceDailleandEmmanuelMorinmydeepestthanksfor havingsupervisedandco-supervisedthisdoctoralresearch.Ifeltsohonoredtowork andlearnbytheirsides.Theybothhaveshownaneffectivecombinationofacademic rigorousness and pedagogy, which has helped me to progress over the past three years. Béatrice, thank you for suggesting this thesis to me – it was a very enriching experience.Manu,thanksagainforbeingtherewheneverIneededyou. IextendmywarmthankstoNabilHathout,ÉlisabethLavault-Olléon,Emmanuel Planas and Michel Simard for honoring me by attending my viva and being part of my committee in spite of their busy schedules. Their constructive comments were especiallyuseful.Iamgladtohaveprofitedfromsomanycomplementarypointsof viewonmywork.SpecialthankstoMichelSimardforcomingtoNantesalltheway fromCanada. I am particularly thankful to Emmanuel Planas, the former Scientific Director of Lingua et Machina, for trusting me and hiring me as a research engineer five years ago.Otherwise,Iwouldprobablynothavehadthechancetocarryoutadissertation withLINAortoworkonsuchafascinatingresearchsubjectwithinsuchastimulating industrialenvironment. Several people contributed to the work presented in this book. I would first like to thank Claire Lemaire from the University Stendhal in Grenoble, because she was an amazing colleague and co-doctoral candidate; and also for creating the resources forprocessingandassessingtheGermanlanguage.Thiswouldnothavebeenpossible withoutherandIamverythankful. IwouldalsoliketothankGeoffreyWilliamsandPierreZweigenbaumforagreeing tobeapartofmydoctoratemonitoringcommittee.Theirshrewdcommentsandadvice guidedmeintherightdirectionduringthisresearchproject.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.