ebook img

Comparable corpora and computer-assisted translation PDF

305 Pages·2014·2.182 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Comparable corpora and computer-assisted translation

W689-Delpech.qxp_Layout 1 12/06/2014 08:30 Page 1 COGNITIVE SCIENCE AND KNOWLEDGE MANAGEMENT SERIES Computer-assisted translation (CAT) has always used translation E s memories, which require the translator to have a corpus of previous te translations that the CAT software can use to generate bilingual lle M lexicons. This can be problematic when the translator does not have a such a corpus, for instance, when the text belongs to an emerging field. ry To solve this issue, CAT research has looked into the leveraging of lin Comparable Corpora e comparable corpora, i.e. a set of texts, in two or more languages, which D e deal with the same topic but are not translations of one another. lp and Computer-assisted e c This work had two primary objectives. The first is to assess the input of h lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify Translation bilingual-lexicon-extraction methods which best match the translators’ C o needs, determining the current limits of these techniques and m suggesting improvements. The author focuses, in particular, on the p identification of fertile translations, the management of multiple a r morphological structures, and the ranking of candidate translations. a b The experiments are carried out on two language pairs (English–French le and English–German) and on specialized texts dealing with breast C cancer. This research puts significant emphasis on applicability – o Estelle Maryline Delpech r methodological choices are guided by the needs of the final users. This p o book is organized in two parts: the first part presents the applicative and r a scientific context of the research, and the second part is given over to a efforts to improve compositional translation. n d The research work presented in this book received the PhD Thesis C award 2014 from the French association for natural language o m processing (ATALA). p u t e r - a s s i s Estelle Maryline Delpech holds a PhD in Computer Science from the t e University of Nantes in France, where she specialized in natural d language processing and computer-aided translation. She is currently T r Chief Scientist at Nomao, a web and mobile app search engine a n company. Her research interests include multilingualism, computational s l linguistics, information extraction and data integration. a t i o n Z(7ib8e8-CBGIJB( www.iste.co.uk Comparable Corpora and Computer-assisted Translation To Elia Series Editor Narendra Jussien Comparable Corpora and Computer-assisted Translation Estelle Maryline Delpech Firstpublished2014inGreatBritainandtheUnitedStatesbyISTELtdandJohnWiley&Sons,Inc. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced, storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentionedaddress: ISTELtd JohnWiley&Sons,Inc. 27-37StGeorge’sRoad 111RiverStreet LondonSW194EU Hoboken,NJ07030 UK USA www.iste.co.uk www.wiley.com ©ISTELtd2014 TherightsofEstelleMarylineDelpechtobeidentifiedastheauthorofthisworkhavebeenassertedby herinaccordancewiththeCopyright,DesignsandPatentsAct1988. LibraryofCongressControlNumber:2014936484 BritishLibraryCataloguing-in-PublicationData ACIPrecordforthisbookisavailablefromtheBritishLibrary ISBN978-1-84821-689-1 PrintedandboundinGreatBritainbyCPIGroup(UK)Ltd.,Croydon,SurreyCR04YY Table of Contents ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi PART1.APPLICATIVEAND SCIENTIFICCONTEXT . . . . . . . . . . . . . 1 CHAPTER 1. LEVERAGING COMPARABLE CORPORA FOR COMPUTER- ASSISTEDTRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. From the beginnings of machine translation to comparable corpora processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1. Thedawnofmachinetranslation . . . . . . . . . . . . . . . . . . . 3 1.2.2. Thedevelopmentofcomputer-assistedtranslation . . . . . . . . . . 5 1.2.3. Drawbacksofparallelcorporaandadvantagesofcomparablecorpora 7 1.2.4. Difficultiesoftechnicaltranslation . . . . . . . . . . . . . . . . . . 9 1.2.5. Industrialcontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3. Termalignmentfromcomparablecorpora: astate-of-the-art . . . . . . 15 1.3.1. Distributionalapproachprinciple . . . . . . . . . . . . . . . . . . . 15 1.3.2. Termalignmentevaluation . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.3. Improvementandvariantsofthedistributionalapproach . . . . . . 20 1.3.4. Theinfluencedataandparametersonalignmentquality . . . . . . 28 1.3.5. Limitsofthedistributionalapproach . . . . . . . . . . . . . . . . . 30 1.4. CATsoftwareprototypeforcomparablecorporaprocessing . . . . . . 32 1.4.1. Implementationofatermalignmentmethod . . . . . . . . . . . . . 32 1.4.2. Terminologicalrecordsextraction . . . . . . . . . . . . . . . . . . . 36 vi ComparableCorporaandComputer-assistedTranslation 1.4.3. Lexiconconsultationinterface . . . . . . . . . . . . . . . . . . . . . 38 1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 CHAPTER2.USER-CENTERED EVALUATIONOF LEXICONS EXTRACTEDFROM COMPARABLE CORPORA . . . . . . . . . . . . . . . . . 41 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2. Translationqualityevaluationmethodologies. . . . . . . . . . . . . . . 42 2.2.1. Machinetranslationevaluation. . . . . . . . . . . . . . . . . . . . . 42 2.2.2. Humantranslationevaluation . . . . . . . . . . . . . . . . . . . . . 46 2.2.3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.3. Designandexperimentationofauser-centeredevaluation . . . . . . . 50 2.3.1. Methodologicalaspects . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.2. Experimentationprotocol. . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 CHAPTER3.AUTOMATICGENERATIONOF TERM TRANSLATIONS . . . 67 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2. Compositionalapproaches . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.1. Compositionaltranslationprinciple . . . . . . . . . . . . . . . . . . 68 3.2.2. Polylexicalunitscompositionaltranslation . . . . . . . . . . . . . . 70 3.2.3. Monolexicalunitscompositionaltranslation . . . . . . . . . . . . . 75 3.2.4. Candidatetranslationfiltering . . . . . . . . . . . . . . . . . . . . . 81 3.3. Data-drivenapproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.1. Analogy-basedtranslation . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.2. Rewritingruleslearning . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.3. Dealingwithmorphologicalvariation . . . . . . . . . . . . . . . . . 88 3.4. Evaluationoftermtranslatorgenerationmethods . . . . . . . . . . . . 91 3.5. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 PART2.CONTRIBUTIONSTOCOMPOSITIONALTRANSLATION . . . . . 99 CHAPTER4.MORPH-COMPOSITIONALTRANSLATION: METHODOLOGICALFRAMEWORK . . . . . . . . . . . . . . . . . . . . . . . 101 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2. Morpho-compositionaltranslationmethod . . . . . . . . . . . . . . . . 101 4.2.1. Scientificpositioning . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2. Definitionsandterminology . . . . . . . . . . . . . . . . . . . . . . 105 4.2.3. Underlyingassumptions . . . . . . . . . . . . . . . . . . . . . . . . 108 4.2.4. Advantagesoftheproposedapproachforprocessingcomparable corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3. Addressedissuesandcontributions . . . . . . . . . . . . . . . . . . . . 110 4.3.1. Generatingfertiletranslations . . . . . . . . . . . . . . . . . . . . . 110 Contents vii 4.3.2. Dealingwithdiversemorphologicalstructures . . . . . . . . . . . . 113 4.3.3. Candidatetranslationsranking . . . . . . . . . . . . . . . . . . . . . 116 4.4. Evaluationmethodology . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.1. Apriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.4.2. Aposteriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 CHAPTER5.EXPERIMENTALDATA . . . . . . . . . . . . . . . . . . . . . . . 123 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.2. Comparablecorpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3. Sourceterms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4. Referencedatafortranslationgenerationevaluation . . . . . . . . . . . 126 5.4.1. Apriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4.2. Aposteriorireference . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.5. Translationrankingtrainingandevaluationdata . . . . . . . . . . . . . 131 5.6. Linguisticresources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6.1. Generallanguagebilingualdictionary. . . . . . . . . . . . . . . . . 131 5.6.2. Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.6.3. Boundmorphemestranslationtable . . . . . . . . . . . . . . . . . . 132 5.6.4. Lexiconforworddecomposition . . . . . . . . . . . . . . . . . . . 133 5.6.5. Morphologicalfamilies . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.6.6. Dictionaryofcognates . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 CHAPTER 6. FORMALIZATION AND EVALUATION OF CANDIDATE TRANSLATION GENERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2. Translationgenerationalgorithm . . . . . . . . . . . . . . . . . . . . . . 139 6.2.1. Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.2. Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2.3. Recomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2.4. Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.3. Morphologicalsplittingevaluation. . . . . . . . . . . . . . . . . . . . . 147 6.4. Translationgenerationevaluation . . . . . . . . . . . . . . . . . . . . . 148 6.4.1. Referencedataandevaluationmeasures . . . . . . . . . . . . . . . 148 6.4.2. Modelgenericityinfluence . . . . . . . . . . . . . . . . . . . . . . . 152 6.4.3. Linguisticresourcesinfluence . . . . . . . . . . . . . . . . . . . . . 156 6.4.4. Fallbackstrategyinfluence . . . . . . . . . . . . . . . . . . . . . . . 159 6.4.5. Fertiletranslationsinfluence . . . . . . . . . . . . . . . . . . . . . . 160 6.4.6. Popularsciencecorpusinfluence . . . . . . . . . . . . . . . . . . . 165 6.4.7. Qualitativeanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.5.1. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.5.2. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 176 viii ComparableCorporaandComputer-assistedTranslation CHAPTER 7. FORMALIZATION AND EVALUATION OF CANDIDATE TRANSLATION RANKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2. Rankingcriteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2.1. Contextsimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.2.2. Candidatetranslationfrequency . . . . . . . . . . . . . . . . . . . . 180 7.2.3. Parts-of-speechtranslationprobability . . . . . . . . . . . . . . . . 180 7.2.4. Componentstranslationmode . . . . . . . . . . . . . . . . . . . . . 181 7.3. Criteriacombination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3.1. Valuestandardization . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3.2. Linearcombination . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.3.3. Learning-to-rankmodel. . . . . . . . . . . . . . . . . . . . . . . . . 186 7.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.4.1. Referencedataandevaluationmeasures . . . . . . . . . . . . . . . 187 7.4.2. Basesofcomparison . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 7.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7.5.1. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.5.2. Researchperspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 196 CONCLUSIONANDPERSPECTIVES . . . . . . . . . . . . . . . . . . . . . . . 199 PART3.APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 APPENDIX1.MEASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 APPENDIX2.DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 APPENDIX3.COMPARABLE CORPORALEXICONS CONSULTATION INTERFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 LISTOFALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 LISTOFEXTRACTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.