ebook img

Controlled Document Authoring in a Machine Translation Age PDF

236 Pages·2020·12.353 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Controlled Document Authoring in a Machine Translation Age

Routledge Studies in Translation Technology CONTROLLED DOCUMENT AUTHORING IN A MACHINE TRANSLATION AGE Rei Miyata Controlled Document Authoring in a Machine Translation Age This book explains the concept, framework, implementation, and evaluation of controlled document authoring in this age of translation technologies. Machine translation(MT)isroutinelyusedinmanysituations,bycompanies,governments, andindividuals.Despiterecentadvances,MTtoolsarestillknowntobeimperfect, sometimes producing critical errors. To enhance the performance of MT, researchers and language practitioners have developed controlled languages that imposerestrictionsontheformorlengthofthesource-languagetext.However,a fundamental, persisting problem is that both current MT systems and controlled languages deal only with the sentence as the unit of processing. To be effective, controlledlanguagesmustbecontextualisedatthedocumentlevel,consequently enabling MT to generate outputs appropriate for their functional context within thetargetdocument.WithaspecificfocusonJapanesemunicipaldocuments,this book establishes a framework for controlled document authoring by integrating various research strands including document formalisation, controlled language, andterminologymanagement.Itthenpresentsthedevelopmentandevaluationof anauthoringsupportsystem,MuTUAL,thatisdesignedtohelpnon-professional writers create well-organised documents that are both readable and translatable. The book provides useful insights for researchers and practitioners interested in translation technology, technical writing, and natural language processing applications. ReiMiyata,PhD,isanAssistantProfessorattheGraduateSchoolofEngineering, NagoyaUniversity,Japan. RoutledgeStudiesinTranslationTechnology SeriesEditor:ChanSin-wai This cutting-edge research series examines translation technology and explores the relationships between human beings and machines in translating the written and spoken word. The series welcomes authored monographsandeditedcollections. TheFutureofTranslationTechnology TowardsaWorldwithoutBabel ChanSin-wai TheHumanFactorinMachineTranslation EditedbyChanSin-wai ControlledDocumentAuthoringinaMachineTranslationAge ReiMiyata Formoreinformationonthisseries,pleasevisit www.routledge.com/Routledge-Studies-in-Translation-Technology/book- series/RSITT Controlled Document Authoring in a Machine Translation Age Rei Miyata Firstpublished2021 byRoutledge 2ParkSquare,MiltonPark,Abingdon,OxonOX144RN andbyRoutledge 52VanderbiltAvenue,NewYork,NY10017 RoutledgeisanimprintoftheTaylor&FrancisGroup,aninformabusiness ©2021ReiMiyata TherightofReiMiyatatobeidentifiedasauthorofthisworkhasbeenassertedbyhimin accordancewithsections77and78oftheCopyright,DesignsandPatentsAct1988. WiththeexceptionofthePreface,Chapter1,andtheBibliography,nopartofthisbookmay bereprintedorreproducedorutilisedinanyformorbyanyelectronic,mechanical,orother means,nowknownorhereafterinvented,includingphotocopyingandrecording,orinany informationstorageorretrievalsystem,withoutpermissioninwritingfromthepublishers. ThePreface,Chapter1,andtheBibliographyofthisbookareavailableforfreeinPDF formatasOpenAccessfromtheindividualproductpageatwww.routledge.com.Theyhave beenmadeavailableunderaCreativeCommonsAttribution-NonCommercial-No Derivatives4.0license Trademarknotice:Productorcorporatenamesmaybetrademarksorregisteredtrademarks, andareusedonlyforidentificationandexplanationwithoutintenttoinfringe. BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressCataloging-in-PublicationData Names:Miyata,Rei,author. Title:Controlleddocumentauthoringinamachine translationage/ReiMiyata. Description:London;NewYork:Routledge,2020.| Series:Routledgestudiesintranslationtechnology| Includesbibliographicalreferencesandindex. Identifiers:LCCN2020020455 Subjects:LCSH:Machinetranslating.|Computationallinguistics.| Naturallanguageprocessing(Computerscience) Classification:LCCP308.M592020|DDC418/.020285–dc23 LCrecordavailableathttps://lccn.loc.gov/2020020455 ISBN:978-0-367-50019-1(hbk) ISBN:978-1-003-04852-7(ebk) TypesetinTimesNewRoman byNewgenPublishingUK Contents Listoffigures vii Listoftables viii Listofabbreviations x Preface xi Acknowledgements xiii PARTI Researchbackground 1 1 Introduction 3 2 Relatedwork 12 PARTII Controlleddocumentauthoring 57 3 Documentformalisation 59 4 Controlledlanguage 68 5 CLcontextualisation 91 6 Terminologymanagement 101 PARTIII MuTUAL:Anauthoringsupportsystem 125 7 Systemdevelopment 127 8 EvaluationofCLviolationdetectioncomponent 138 9 Systemusabilityevaluation 144 vi Contents PARTIV Conclusion 167 10 Researchfindingsandoutlook 169 Appendices 177 Bibliography 199 Index 215 Figures 1.1 Personalsealregistrationprocedure(excerptedfromthewebsite ofShinjukuCity) 5 1.2 Chapterorganisationwithresearchquestions 10 2.1 Functionalelementsofresearchpapers(Kando,1997,p.3) 18 2.2 Machinetranslationworkflowwithhumanintervention 23 3.1 Functionalelementsofmunicipalproceduraldocuments 62 3.2 Exampleofthefunctionalelementsintheprocedureforpersonal sealregistration(excerptedfromadocumentinCLAIR) 63 3.3 Exampleofthefunctionalelementsintheprocedureforpersonal sealregistration(excerptedfromadocumentinHamamatsu) 63 3.4 AnalysisofanexistingmunicipaldocumentusingourDITA framework 65 3.5 Modelexampleofthesealregistrationprocedure 66 4.1 MTqualityquestionnaire–Step2(when[1]or[2]isselectedin Step1) 74 4.2 Source-readabilityquestionnaire 83 5.1 ApproachestoresolveincompatibilitiesbetweenCL andCL st tt forSteps(*undesirablesentence) 93 5.2 Revisedflowofpre-andpost-translationprocessingforSteps (*undesirablesentence) 99 6.1 Termregistrationplatform 105 6.2 Thefrequencyspectrumoftermsinmunicipalcorpus(m:frequency class;V(m,N):numberoftypeswithfrequencym) 106 6.3 Growthcurveofterminologiesinmunicipalcorpus 119 7.1 ModulesofMuTUAL 128 7.2 Tasktopictemplate 130 7.3 CLauthoringassistant,MTandbacktranslation(BT) 131 7.4 Terminologycheckfunction 132 7.5 CLruleselectionmodal 133 7.6 CLguidelinemodal 134 7.7 Similartextsearch 136 9.1 Userinterface 147 Tables 2.1 SeveninformationtypesdefinedinInformationMapping (Horn,1989,pp.110–111) 15 2.2 Controlledwritingprocessesandsupportmechanisms 50 3.1 Examplesofhierarchicallevelsofwebsites 60 3.2 Thenumberofdocumentsineachcategoryofmunicipal-life information(thenumberofproceduraldocumentsisinparenthesis) 61 3.3 SpecialisationoftheDITATasktopictomunicipalprocedures 64 4.1 AlistoftechnicalwritingbasedCLrules(CL-T) 70 4.2 Exampleofsource-textrewriting 71 4.3 AlistofrewritingtrialbasedCLrules(CL-R) 72 4.4 Resultcategories 75 4.5 OverallresultsofMTquality(CL-T) 76 4.6 OverallresultsofMTquality(CL-R) 76 4.7 Improvementin[MT–Useful]category(CL-T) 78 4.8 Improvementin[MT–Useful]category(CL-R) 78 4.9 Overallresultsafteroptimalruleswereselected(CL-T) 81 4.10 Overallresultsafteroptimalruleswereselected(CL-R) 82 4.11 ImprovementinJapanesereadability(CL-T) 83 4.12 ImprovementinJapanesereadability(CL-R) 84 4.13 SelectedoptimalrulesfortwoMTsystems(commonrulesare showninbold) 87 5.1 Linguisticspecification 92 5.2 ThenumberofMToutputsthatrealisethedesiredlinguistic formsintheEnglishtranslationbeforeandafterpre-translation processing 95 5.3 ExampleMToutputsbeforeandafterapplyingRule1(inserted segmentinbrackets) 96 5.4 ExampleofremainingdisconformitytoCL 96 tt 5.5 ExampleMToutputsbeforeandafterapplyingRule2‘shiro’ (transformedsegmentinbrackets) 97 5.6 ExampleMToutputsbeforeandafterapplyingRule2‘shinasai’ (transformedsegmentinbrackets) 98 Tables ix 5.7 ExampleMToutputsbeforeandafterapplyingRule2 ‘shitekudasai’(transformedsegmentinbrackets) 98 6.1 Basicstatisticsofextractedsentences 103 6.2 Thenumberoftermsextractedandtheiroccurrencesinthecorpus 106 6.3 The20mostfrequenttermsinthecorpus(beforecontrolling) 107 6.4 Exampleofextractedbilingualtermpairs 108 6.5 Criteriafordefiningpreferredandproscribedterms 112 6.6 Examinationoftermvariations 113 6.7 Thebasicstatisticsofcontrolledterminology 113 6.8 The20mostfrequentcontrolledtermsoccurredinthecorpus 114 6.9 PopulationtypesE[S]andcoverageCR 118 6.10 Growthrate 120 6.11 Shiftinthecoverageratio(%) 120 8.1 Resultsofthebenchmarkevaluation 140 9.1 Quantitativeaspectstobemeasured 145 9.2 CLrulesandimplementation(withconfidencescores) 146 9.3 After-scenarioquestionnaire(ASQ) 148 9.4 Systemusabilityscale(SUS) 149 9.5 Effectivenessforeachcondition 151 9.6 Correctionrateforeachrule(*implementedrule) 152 9.7 ResultofMTqualityevaluation(systemB) 153 9.8 ResultofMTqualityevaluation(systemD) 153 9.9 ExampleofSTandMT(systemD)ofdifferentconditions 154 9.10 ResultofSTqualityevaluation 155 9.11 Timeefficiency 155 9.12 Detailededitlog 156 9.13 Editdistance 157 9.14 ResultofquestionnaireASQ(satisfactionwiththetask) 158 9.15 ResultofquestionnaireSUS(satisfactionwiththesystem) 159 9.16 Textsimilaritybetweenparticipants 161 9.17 Exampleofhigh-text-similaritysentence 162 9.18 Exampleoflow-text-similaritysentence 163 9.19 Tasktimetransition(timepersentence,inseconds) 164 9.20 Tasktimetransition(timepercharacter,inseconds) 164

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.