ebook img

Statistical Machine Translation PDF

447 Pages·5.391 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Machine Translation

This page intentionally left blank Statistical Machine Translation Thefieldofmachinetranslationhasrecentlybeenenergizedbytheemer- genceofstatisticaltechniques,whichhavebroughtthedreamofautomatic language translation closer to reality. This class-tested textbook, authored byanactiveresearcherinthefield,providesagentleandaccessibleintro- duction to the latest methods and enables the reader to build machine translationsystemsforanylanguagepair. It provides the necessary grounding in linguistics and probabilities, andcoversthemajormodelsformachinetranslation:word-based,phrase- based,andtree-based,aswellasmachinetranslationevaluation,language modeling, discriminative training and advanced methods to integrate lin- guisticannotation.Thebookreportsonthelatestresearchandoutstanding challenges,andenablesnovicesaswellasexperiencedresearcherstomake contributionstothefield.Itisidealforstudentsatundergraduateandgrad- uatelevel,orforanyreaderinterestedinthelatestdevelopmentsinmachine translation. PHILIPP KOEHN is a lecturer in the School of Informatics at the Uni- versity of Edinburgh. He is the scientific coordinator of the European EuroMatrixprojectandisalsoinvolvedinresearchfundedbyDARPAin the USA. He has also collaborated with leading companies in the field, suchasSystranandAsiaOnline.Heimplementedthewidelyuseddecoder Pharaoh, and is leading the development of the open source machine translationtoolkitMoses. Statistical Machine Translation Philipp Koehn SchoolofInformatics UniversityofEdinburgh CAMBRIDGEUNIVERSITYPRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521874151 © P. Koehn 2010 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2009 ISBN-13 978-0-511-69132-4 eBook (NetLibrary) ISBN-13 978-0-521-87415-1 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. For Trishann and Phianna Contents Preface pagexi I Foundations 1 1 Introduction 3 1.1 Overview 4 1.2 HistoryofMachineTranslation 14 1.3 Applications 20 1.4 AvailableResources 23 1.5 Summary 26 2 Words,Sentences,Corpora 33 2.1 Words 33 2.2 Sentences 45 2.3 Corpora 53 2.4 Summary 57 3 ProbabilityTheory 63 3.1 EstimatingProbabilityDistributions 63 3.2 CalculatingProbabilityDistributions 67 3.3 PropertiesofProbabilityDistributions 71 3.4 Summary 75 II Core Methods 79 4 Word-BasedModels 81 4.1 MachineTranslationbyTranslatingWords 81 4.2 LearningLexicalTranslationModels 87 4.3 EnsuringFluentOutput 94 4.4 HigherIBMModels 96 4.5 WordAlignment 113 4.6 Summary 118 vii viii Contents 5 Phrase-BasedModels 127 5.1 StandardModel 127 5.2 LearningaPhraseTranslationTable 130 5.3 ExtensionstotheTranslationModel 136 5.4 ExtensionstotheReorderingModel 142 5.5 EMTrainingofPhrase-BasedModels 145 5.6 Summary 148 6 Decoding 155 6.1 TranslationProcess 156 6.2 BeamSearch 158 6.3 FutureCostEstimation 167 6.4 OtherDecodingAlgorithms 172 6.5 Summary 176 7 LanguageModels 181 7.1 N-GramLanguageModels 182 7.2 CountSmoothing 188 7.3 InterpolationandBack-off 196 7.4 ManagingtheSizeoftheModel 204 7.5 Summary 212 8 Evaluation 217 8.1 ManualEvaluation 218 8.2 AutomaticEvaluation 222 8.3 HypothesisTesting 232 8.4 Task-OrientedEvaluation 237 8.5 Summary 240 III Advanced Topics 247 9 DiscriminativeTraining 249 9.1 FindingCandidateTranslations 250 9.2 PrinciplesofDiscriminativeMethods 255 9.3 ParameterTuning 263 9.4 Large-ScaleDiscriminativeTraining 272 9.5 PosteriorMethodsandSystemCombination 278 9.6 Summary 283 10 IntegratingLinguisticInformation 289 10.1Transliteration 291 10.2Morphology 296 10.3SyntacticRestructuring 302

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.