Machine Translation Yorick Wilks Machine Translation Its Scope and Limits 123 YorickWilks DepartmentofComputerScience TheUniversityofSheffield RegentCourt,211PortobelloStreet Sheffield,S14DP,UK [email protected] ISBN:978-0-387-72773-8 e-ISBN:978-0-387-72774-5 LibraryofCongressControlNumber:2008931409 (cid:2)c SpringerScience+BusinessMediaLLC2009 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein connectionwithanyformofinformationstorageandretrieval,electronicadaptation,computersoftware, orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. Theuseinthispublicationoftradenames,trademarks,servicemarks,andsimilarterms,eveniftheyare notidentifiedassuch,isnottobetakenasanexpressionofopinionastowhetherornottheyaresubjectto proprietaryrights. Printedonacid-freepaper springer.com Foreword This book is a set of essays covering aspects of machine translation (MT) past, present and future. Some have been published before, some are new but, taken together,theyaremeanttopresentacoherentaccountofthestateofMT,itsevolu- tionuptothepresent,anditsscopeforthefuture.Atcertainpoints,“Afterwords” havebeenaddedtocommentonthe,possiblychanged,relevanceofachapteratthe timeofpublication. The argument for reprinting here some older thoughts on MT is an attempt to showsomecontinuityofoneresearcher’sthoughts,sofaraspossible,inthewelter of argument and dispute that has gone on over decades on how MT is to be done. Thebookiscertainlynotintendedasacomprehensivehistoryofthefield,andthese already exist. Nor is any one MT system described or advocated here. The author hasbeeninvolvedintheproductionofthreequitedifferentsystems:atoysemantics- basedsystematStanfordin1971,whosecodewasplacedintheComputerMuseum inBostonasthefirstmeaning-drivenMTsystem.Later,in1985,Iwasinvolvedin New Mexico in ULTRA, a multi-language system with strong semantic and prag- matic features, intended to show that the architecture of the (failed) EUROTRA systemcouldperformbetterat1%ofwhatthatcost.Nocomprehensivedescription of EUROTRA is given here, and the history of that project remains to be written. Lastly,in1990,IwasoneofthreePIsintheDARPA-fundedsystemPANGLOSS, a knowledge-based system set up in competition with IBM’s CANDIDE system, that became the inspiration for much of the data-driven changes that have over- taken language processing since 1990. None of these systems is being presented here as a solution to MT, for nothing yet is, but only as a test-bed of ideas and performance. Machine translation is not, as some believe, solved, nor is it impossible, as others still claim. It is a lively and important technology, whose importance in a multi-lingual and information-driven world can only increase, intellectually and commercially.Intellectually,itremains,asitalwayshasbeen,theultimatetestbed ofalllinguisticandlanguageprocessingtheories. In writing this book, I am indebted to too many colleagues and students to mention, though I must acknowledge joint work with David Farwell (chapters 10 v vi Foreword and 14), and Sergei Nirenburg, Jaime Carbonnel and Ed Hovy (chapter 8). I also need to thank Lucy Moffatt for much help in its preparation, and Roberta, for everything,asalways. Sheffield,2008 YorickWilks History Page Somechaptershaveappearedinotherformselsewhere: Chapter 2: Wilks, Y. (1984) Artificial Intelligence and Machine Translation. In S. and W. Sedelow (eds.) Current Trends in the Language Sciences. Amsterdam: NorthHolland. Chapter3:Wilks,Y.(1973)AnArtificialIntelligenceApproachtoMachineTransla- tion.InR.SchankandK.Colby(eds.)ComputermodelsofThoughtandLanguage. SanFrancisco:Freeman. Chapter4:Wilks,Y.(1992)SYSTRAN:itobviouslyworks,buthowmuchcanitbe improved?InJ.Newton(ed.)ComputersandTranslation.London:Routledge. Chapter 7: Wilks, Y. (1994) Developments in machine translation research in the US. In the Aslib Proceedings, Vol. 46 (The Association of Information Management). vii Contents 1 Introduction ................................................... 1 PartI MTPast 2 FiveGenerationsofMT ......................................... 11 3 AnArtificialIntelligenceApproachtoMachineTranslation ......... 27 4 It Works but How Far Can It Go: Evaluating the SYSTRAN MTSystem .................................................... 65 PartII MTPresent 5 Where Am I Coming From: The Reversibility of Analysis andGenerationinNaturalLanguageProcessing ................... 89 6 What are Interlinguas for MT: Natural Languages, Logics orArbitraryNotations?......................................... 97 7 StoneSoupandtheFrenchRoom:TheStatisticalApproachtoMT atIBM ........................................................101 8 TheRevivalofUSGovernmentMTResearchin1990...............115 ix x Contents 9 TheRoleofLinguisticKnowledgeResourcesinMT ................125 10 TheAutomaticAcquisitionofLexiconsforanMTSystem...........139 PartIII MTFuture 11 SensesandTexts ...............................................157 12 SenseProjection................................................169 13 LexicalTuning .................................................177 14 WhatWouldPragmatics-BasedMachineTranslationbeLike? ......195 15 WherewasMTattheEndoftheCentury:WhatWorksandWhat Doesn’t?.......................................................215 16 TheFutureofMTintheNewMillennium.........................225 References.........................................................237 Index .............................................................247 Part I MT Past Chapter2 Five Generations of MT Introduction(2007) ThereisanancientChinesecursethatdoomsrecipientstoliveinaninterestingage, and by those standards MT workers are at present having a bad time. The reason thingsareinterestingatthemomentisthatthereisanumberofconflictingclaims in the air about how to do MT, and whether it can, or indeed has already, been done. Such a situation is unstable, and we may confidently expect some kind of outcome–alwayscheeringfortheempiricist–inthenearfuture.Thischapterwas initially published as (Wilks, 1979) as an exploration of the relationship of MT to ArtificialIntelligence(AI)ingeneral,andshouldbeseenasprovidingasnapshotof thatintellectualperiod. Whathappenedinitiallywasthreefold.First,the“bruteforce”methodsforMT, thatwerethoughttohavebeenbroughttoanendbytheALPAC(1966)Reporthave surfaced again, like some Coelacanth from the deep, long believed extinct. Such systems were sold for many years under such trade names as LOGOS, XYZYX, SMART,WeidnerandSYSTRAN;andthelast,andbestknown,hasbeenusedfor thirtyyearsbytheEUinParis(VanSlype,1976)andLuxembourg. Secondly, some large-scale, more theoretically based, MT projects continued, usually based in Universities, and have been tested in use, though sometimes on a scale smaller than that originally envisaged. METEO, for example, in Montreal (Chandioux, 1976), which was to have translated official documents from English to French, is still in use for the translation of the more limited world of weather reports. Thirdly,workersinnaturallanguageinthefieldknownasArtificialIntelligence (AI)begantomakedistinctclaimsabouttheneedfortheirapproachifthereisever tobegeneralandhighqualityMT(Wilks,1973a;Charniak,1973;Schank,1975a). Smallpilotsystemsillustratingtheirclaimswereprogrammed,buttheirroleinMT discussionwasmainlyofatheoreticalnature. However,thesearenotmerelythreecomplementaryapproaches,fortheyseemto bemakingdifferentclaims,and,unlesswetaketheeasywayoutandsimplydefine somelevelofMTappropriatetoeachoftheenterprises,itseemstheycannotallbe right,andthatwemayhopeforsomeresolutionbeforetoolong. 11