ebook img

Parallel Text Processing: Alignment and Use of Translation Corpora PDF

416 Pages·2000·14.87 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel Text Processing: Alignment and Use of Translation Corpora

Parallel Text Processing Text, Speech and Language Technology VOLUME 13 Series Editors Nancy Ide, Vassar College, New York Jean Veronis, Universite de Provence and CNRS, France Editorial Board Harald Baayen, Max Planck Institute for Psycholinguistics, The Netherlands Kenneth W. Church, AT & T Bell Labs, New Jersey, USA Judith Klavans, Columbia University, New York, USA David T. Barnard, University of Regina, Canada Dan Tufis, Romanian Academy of Sciences, Romania Joaquim Llisterri, Universitat Autonoma de Barcelona, Spain Stig Johansson, University of Oslo, Norway Joseph Mariani, UMSI-CNRS, France The titles published in this series are listed at the end of this volume. Parallel Text Processing Alignment and Use of Translation Corpora Edited by Jean Veronis Universite de Provence, Aix-en-Provence, France SPRINGER-SCIENCE+BUSINESS MEDIA, B.V. A CLP. Catalogue record for this book is available from the Library of Congress. ISBN 978-90-481-5555-2 ISBN 978-94-017-2535-4 (eBook) DOI 10.1007/978-94-017-2535-4 Printed on acid-free paper All Rights Reserved © 2000 Springer Science+Business Media Oordrecht Originally published by Kluwer Academic Publishers in 2000 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner. But the LORD came down to see the Or Yahw? descendit pour voir fa ville et city and the tower that the men were la tour que les hommes avaient bdties. building. The LORD said, "If as one people Et Yahve dit : « Voici que taus font un speaking the same language they seul peuple et parient une seule lan have begun to do this, then nothing gue, et tel est Ie debut de leurs entre they plan to do will be impossible for prises ! Maintenant, aucun dessein ne them. sera irrealisable pour eux. Come, let us go down and confuse Allons! Descendons! Et ta, confon their language so they will not dons leur langage pour qu 'ils ne understand each other. " s 'entendent plus les uns les autres. » So the LORD scattered them from Yahve les dispersa de la sur tOUfe la there over all the earth, and they face de la terre et ils cesserent de bdtir stopped building the city. la ville. Genesis 11 :9, The Holy Bible: New Genese 11 :9, La Bible de Jerusalem International Version Contents Foreword ........................................................................................................... xi Terminological note .......................................................................................... xiii Preface by Martin Kay ....................................................................................... xv Contributors ..................................................................................................... xxi INTRODUCTION 1. From the Rosetta stone to the information society ................................. 1 A survey of parallel text processing Jean Veron is ALIGNMENT METHODOLOGY 2. Pattern recognition for mapping bitext correspondence ....................... 25 1. Dan Melamed 3. Multilingual text alignment .................................................................. 49 Aligning three or more versions of a text Michel Simard V1l1 Contents 4. A comprehensive bilingual word alignment system ............................. 69 Application to disparate languages: Hebrew and English Yaacov Choueka, Ehud S. Conley and Ido Dagan 5. A knowledge-lite approach to word alignment .................................... 97 Lars Ahrenberg, Mikael Andersson and Magnus Merkel 6. From sentences to words and clauses ................................................. 117 Stelios Piperidis, Harris Papageorgiou and Sotiris Boutsis 7. Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars .................................... 139 Dekai Wu 8. The translation network ...................................................................... 169 A model for a fine-grained description of translations Diana Santos 9. Parallel text alignment using cross lingual information retrieval techniques ........................................................................................... 187 Christian Fluhr, Frederique Bisson and Faiza Elkateb 10. Parallel alignment of structured documents ....................................... 201 Laurent Romary and Patrice Bonhomme ApPLICATIONS 11. A statistical view on bilingual lexicon extraction .............................. 219 From parallel corpora to non-parallel corpora Pascale Fung 12. Terminology extraction from parallel technical texts ......................... 237 Ingeborg Blank 13. Term alignment in use ........................................................................ 253 Machine-aided human translation Eric Gaussier, David Hull and Salah Ait-Mokhtar 14. Automatic dictionary extraction for cross-language information retrieval ............................................................................................... 275 RalfD . Brown, Jaime G. Carbonell and Yiming Yang Contents IX 15. Parallel texts in computer-assisted language learning ........................ 299 John Nerbonne RESOURCES AND EVALUATION 16. Japanese-English aligned bilingual corpora ....................................... 313 Hitoshi Isahara and Masahiko Haruno 17. Building a parallel corpus of English/Panjabi .................................... 335 Sukhdave Singh, Tony McEnery and Paul Baker 18. Sharing oftranslation memory databases derived from aligned parallel text ............................................. '" ......................................... 347 Alan K. Melby 19. Evaluation of parallel text alignment systems .................................... 369 The ARCADE project Jean Veronis and Philippe Langlais Index of terms .................................................................................................. 389 Index of authors ............................................................................................... 395 Index of languages and writing systems .......................................................... 401 Foreword This book evolved from the ARCADE evaluation exercisel that started in 1995. The project's goal is to evaluate alignment systems for parallel texts, i.e., texts accompanied by their translation. Thirteen teams from various places around the world have participated so far and for the first time, some ten to fifteen years after the first alignment techniques were designed, the community has been able to get a clear picture of the behaviour of alignment systems. Several chapters in this book describe the details of competing systems, and the last chapter is devoted to the description of the evaluation protocol and results. The remaining chapters were especially commissioned from researchers who have been major figures in the field in recent years, in an attempt to address a wide range of topics that describe the state of the art in parallel text processing and use. As I recalled in the introduction, the Rosetta stone won eternal fame as the prototype of parallel texts, but such texts are probably almost as old as the invention of writing. Nowadays, parallel texts are electronic, and they are be coming an increasingly important resource for building the natural language processing tools needed in the "multilingual information society" that is cur rently emerging at an incredible speed. Applications are numerous, and they are expanding every day: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc. After the introduction, which gives an overview of the field and places the various individual chapters in context (Chapter 1), the book is divided into three parts: alignment methodology (Chapters 2 to 10), applications (Chapters 11 to IS), and resources and evaluation (Chapters 13 to 19). This division was made for clarity's sake, and it is true that all chapters have a main focus that falls into one of these three areas. However, the reader should keep in mind that in many I http://www.up.univ-mrs.fr/-veronislarcade XlI Foreword cases, the individual chapters do address several topics. For example, a chapter focussing on a given application may very well present an improvement in the alignment techniques, and describe the corpus resources used or developed for its training and evaluation of that application. I would like to thank the thirty-four authors who contributed to the book, not only for their outstanding papers, but also for the remarkable collective work they have accomplished reviewing colleagues' chapters, helping with refer ences, etc. I have caused them a great deal of trouble with deadlines and for mats, and I hope they will forgive me. I would also like to express my gratitude to the many people who read the various drafts of the book and provided their very helpful remarks and comments, especially Ken Church and Stig Johansson, as well ,~s the anonymous reviewers who thoroughly read all chapters and helped "harmonize" the volume. Of course, as usual, remaining errors are our own-and especially mine, since as editor I should have caught them. Thanks go also to Joseph Mariani, who encouraged and supported the ARCADE project which was the starting point of this collection. I extend my warmest thanks to Martin Kay, who agreed to write a preface to this book. This is a great honour for us, because Martin is not only one of the fathers of computational linguistics, but also the person who, with his student Martin R6scheisen, designed the first parallel text alignment system in 1987. In his preface, Martin Kay modestly refers to that pioneering work as an exercise with "nothing of importance" in its outcome, but we all now know that as early as 1980, he had already realised how important translation corpora were, as witnessed by his famous memorandum, The proper place of men and machines in translation. His work on alignment was therefore hardly an accident, but rather, reveals one of the many insights that Martin has had on what was to become important for the future of computational linguistics. In a few simple words here, Martin highlights the challenges that are facing us in the alignment of finer-grained units such as words and clauses, and reminds us in a striking way that (good) translation is not as simple and compositional an activity as we like to pretend in trying to make our systems work. I am convinced that the future will show that parallel corpora are modern day Rosetta stones which have helped us better understand translation, that incredibly complex activity, and hopefully, will help us do it some day as well on machines as humans do---or at least not worse. Jean Veronis, Aix-en-Provence, March 2000

Description:
l This book evolved from the ARCADE evaluation exercise that started in 1995. The project's goal is to evaluate alignment systems for parallel texts, i. e. , texts accompanied by their translation. Thirteen teams from various places around the world have participated so far and for the first time, s
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.