ebook img

QCRI Machine Translation Systems for IWSLT 16 PDF

0.09 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview QCRI Machine Translation Systems for IWSLT 16

QCRI Machine Translation Systems for IWSLT 16 NadirDurrani, FahimDalvi, HassanSajjad, StephanVogel QatarComputingResearch Institute– HBKU {ndurrani,faimaduddin,hsajjad,svogel}@qf.org.qa Abstract phrase-basedsystem: This paper describes QCRI’s machine translation sys- • We applied MML-based data selection [8] to the UN tems for the IWSLT 2016evaluation campaign. We partic- andOpenSub-titledata,withthegoalsoffilteringout 7 ipated in the Arabic→English and English→Arabic tracks. harmfuldata. 1 We built both Phrase-based and Neuralmachine translation 0 models, in an effort to probe whether the newly emerged • We trainedOSMmodels[9]onseparatecorpora,and 2 NMTframeworksurpassesthetraditionalphrase-basedsys- interpolatedthem[10]byoptimizingperplexityonthe n tems in Arabic-English language pairs. We trained a very tuning-set. We also tried this on the OSM models a J strongphrase-basedsystemincluding,abiglanguagemodel, trainedonthewordclasses[11]. 4 theOperationSequenceModel,NeuralNetworkJointModel • Wetriedthefine-tuningmethodoftrainingtheNNJM 1 and Class-based models along with different domain adap- modelontheout-domaindataandfine-tuningwiththe tation techniquessuch as MML filtering, mixture modeling ] in-domainTEDdata[12]. andusingfinetuningoverNNJMmodel.However,aNeural L C MT system, trained by stacking data from different genres • We trainedbiglanguagemodelsusingalltheEnglish . throughfine-tuning, and applying ensemble over 8 models, mono data available from the WMT campaign and s beatour verystrong phrase-basedsystem by a significant2 gigawordcorpusforArabic. c [ BLEU pointsmarginin Arabic→Englishdirection. We did not obtain similar gains in the other direction but were still We trained our Neural MT system using the Nematus 1 abletooutperformthephrase-basedsystem. Wealsoapplied toolkit. WeusedBidirectionalRNN’sfortheencoder,1024 v 4 systemcombinationonphrase-basedandNMToutputs. LSTMunits, andawordembeddingsizeof500. Belowwe 2 itemizewhatworkedwhentrainingtheneuralMTsystem: 1. Introduction 9 3 • WetrainedourbaselinemodelonalloftheUNcorpus, We describe QCRI’s phrase-basedand NeuralMT systems. 0 thencontinuedtrainingwiththeOpensubtitlescorpus, . We participated in the Arabic-to-English and English-to- 1 andfinallyfine-tunedwiththein-domaindata Arabic MT tracks. Our translation engines have been his- 0 7 toricallybasedonthephrase-basedsystemtrainedusingthe • We fine-tunedall of our modelswithoutfreezingany 1 Moses toolkit [1], but during the course of this evaluation, layersinthe network,sincewe hadsufficientamount : wemadeatransitiontowardsthenewlyemergedNeuralMT v ofdatatotrainon. i framework[2],usingNematus,atoolkitusedbythetopper- X formingteam[3],duringtherecentWMTcampaign. • We used dropout when fine-tuning with in-domain r AninterestingchallengeassociatedwiththeIWSLTcam- data, since it is relatively small compared to the UN a paign is the problem of domain adaptation. The in-domain andOpensubtitledata. data based on TED talks is available in very little quan- • We trained our final models with an ensemble of the tity compared to the out-domainUN corpus [4], which has last eight models, where each model was fine-tuned been foundto be harmfulpreviously when simply concate- withthein-domaindata. nated to the training [5]. In this year’s IWSLT, two addi- tionaldataresourcesOpussubtitles[6]andtheQEDcorpus Finallyweappliedsystemcombinationovertheoutputs [7] were introduced. The latter was also used as an official of best NeuralMT and phrase-basedsystems using MEMT test-set. Thereforeapartfromexploringphrase-basedversus [13]. OureffortsweremainlyfocusedtowardstheAR→EN Neural MT, we geared ourselves towards adapting our sys- TEDtask. Inthe endwe justreplicatedourbestsystemfor tem towards TED and QED talks in this multi-domainsce- theEN→ARdirectionandtheQEDtask. ForourbestNeu- nario. With thesegoalsinmindwe re-exploredbothmodel ral MT system, we were unable to use an ensemble in the weightinganddatafilteringtechniques,inthesenewdataset- EN→ARdirection,sincewecouldnottrainseveralcompa- tings.Belowweitemizethemostsuccessfulattributesofour rablemodelstocombine. TED QED UN OPUS Percentage ted-11 ted-12 ted-13 ted-14 Avg Stats 240K 153K 18.5M 40M ID 27.5 30.6 30.4 26.3 28.7 2.5% 27.3 30.6 31.5 27.0 29.1 Table1:NumberofSentencesinParallelData 3.75% 27.1 30.7 31.6 27.2 29.2 5% 27.1 30.4 31.4 27.1 29.1 Segmentation ted-11 ted-12 ted-13 ted-14 Avg 10% 27.0 30.2 31.5 27.2 29.0 MADA 27.5 30.6 30.4 26.3 28.7 30% 26.3 29.5 30.9 26.7 28.4 Farasa 27.4 30.3 30.2 26.4 28.6 Full 25.3 28.7 29.4 25.5 27.3 Back-offPT 27.0 30.4 31.5 27.3 29.1 Table2: MADAversusFarasaTokenization 3.75%+1OPUS 28.2 32.4 32.3 28.6 30.4 2 Table3:DataSelectionusingMMLandBack-offPT 2. DataSettings andPre/PostProcessing System ted-11 ted-12 ted-13 ted-14 Avg We trained our systems using the data made available Baseline 28.2 32.4 32.3 28.6 30.4 through IWSLT 2016 campaign. This contained two in- +bigLM 28.3 32.8 33.2 29.2 30.9 domain data sets TED talks and QED corpus [14] and two out-domain data sets UN corpus [4] and OPUS data [6]. Table4: BiggerLanguageModel ThestatisticsareshowninTable1. Forlanguagemodelwe trainedusingthetargetsideoftheparallelcorpusandallthe availableEnglishdatafromtherecentWMTcampaign[15], and out-domain phrase-tablesseparately and using the out- andGigaWordandOPUSmonocorpusforArabic. domainphrase-tableonlyasa back-off. Secondlast row of WesegmentedArabicdatausingbothMADAMIRAand Table3showsresults. Whileitgaveimprovementontopof Farasa. We foundMADAMIRA [16] performed0.1 BLEU thebaselinesystem,itwasslightlybehindMMLfiltering. points better than Farasa [17] (See Table 2) and decided to We then triedto findoptimalcut-offon theOPUS data, use it for the competition. We tokenized the English side and selected 20 Million sentences (half of the Opus). Our using standard tokenizer of Moses. For English→Arabic, bestsystemsused3.75%oftheUNdataandhalfoftheOpus outputsweredetokenizedusingMADAdetokenizer. Before data. Adding the selected Opus data gave an average im- scoringtheoutput,wenormalizedthemandreferencetrans- provementof+1.2BLEUpoints. lationsusingtheQCRInormalizer[5]. 3.3. LanguageModel 3. Phrase-based System Wetrainedbiggerlanguagemodelbyusingalltheavailable English data from the recent WMT campaign1 and target- 3.1. BaselineSettings side of the parallel data. A Kneser-Ney smoothed 5-gram languagemodelwastrainedoneachsub-corpusindividually Wetrainedphrase-basedMosessystem,withthesettingsde- and then interpolated to minimize perplexity on the target scribed in [18]: a maximum sentence length of 80, Fast- partofthemonolingualdata. Wewereabletoobtainagain Aligner for word-alignments [19], an interpolated Kneser- of+0.5usingbiggerlanguagemodel.SeeTable4. Ney smoothed 5-gram language model[20], lexicalized re- ordering model [21], a 5-gram operation sequence model 3.4. InterpolationofOperationSequenceModels [22], a 14-gram NNJM model [23], with the baseline set- The OSM model has been a regular feature of the phrase- tingsdescribedin[24]. Weuseddefaultdistortionlimit,100- based pipeline [27] in the competition grade systems. It is besttranslationoptions,phrase-lengthof5,monotone-over- ajointsequencetranslationmodelwhichintegratesreorder- punctuationheuristic,cube-pruningwithalimitof1000dur- ing.[10]recentlyfoundthatanOSMmodeltrainedonplain ingtuning5000duringtest.Weusedk-bestbatchMIRA[25] concatenationofdataissub-optimalandcanbeimprovedby fortuning.WeusedcasedBLEU[26]tomeasureprogress. training OSM models on each domain individually and in- 3.2. DataSelection terpolatingthembyminimizingperplexityonthein-domain tune-set. Table5showsthatusinginterpolatedOSMmodel Duetoourexperiencefrompreviouscompetitions,wewere (OSMi) instead of the one trained on plain concatenation wary of the fact that simply adding the UN data is harmful fortheAR→MTsystem,wethereforeselecteddatathrough (OSMc)givesanaverageimprovementof+0.6BLEUpoints. MMLfiltering[8]. Weselected2.5%,3.75%,5%,10%and 3.5. NNJMAdaptation 30% of the UN data and trained MT pipeline by concate- We also explored the award winning Neural Network Joint natingtheselecteddatawiththein-domaindata. Wedidnot Model(NNJM)inourpipelineandtriedtoadaptittowards includeOpusdata(40MillionSentences)andNNJMinthese thein-domaindata. WetrainedanNNJMmodelsontheUN experimentstogettheresultsquickly. Table3showsthere- and Opus data for 25 epochs and then fine-tuned[12] it by sults. Wefound3.75%(≈685ksentences)tobetheoptimal threshold. Alternativetodataselection,wetriedtrainingin- 1http://www.statmt.org/wmt16/translation-task.html System ted-11 ted-12 ted-13 ted-14 Avg System ted-11 ted-12 ted-13 ted-14 Avg OSMc 28.3 32.8 33.2 29.2 30.9 Baseline 30.3 34.2 34.7 30.4 32.4 OSMi 29.0 33.5 33.8 29.7 31.5 Drop-OOV 30.5 34.2 35.0 30.5 32.6 Transliteration 30.4 34.2 34.7 30.6 32.5 Table5:InterpolatedOperationSequenceModel Table8: HandlingOOVs System ted-11 ted-12 ted-13 ted-14 Avg Baseline 29.0 33.5 33.8 29.7 31.5 System ted-11 ted-12 ted-13 ted-14 Avg +NNJM 29.8 34.1 34.4 30.1 32.1 Baseline 27.4 30.3 30.2 26.4 28.7 +FT(UN) 29.7 33.8 33.9 30.2 31.9 +SelectedUN 27.1 30.7 31.6 27.2 29.2 +FT(OPUS) 30.1 34.1 34.6 30.3 32.3 +SelectedOPUS 28.2 32.4 32.3 28.6 30.4 +bigLM 28.3 32.8 33.2 29.2 30.9 Table6:NeuralNetworkJointModel+DifferentAdaptation +OSMi 29.0 33.5 33.8 29.7 31.5 Methods +NNJM 29.8 34.1 34.4 30.1 32.1 +FT(OPUS) 30.1 34.1 34.6 30.3 32.3 Drop-OOV 30.5 34.2 35.0 30.5 32.6 runningfor25moreepochsonthein-domaindata. Because thedataishuge,theentiretrainingtook1.5monthsofwall- Table9:IncrementalProgressArabic-to-EnglishSystem clock time. Table 6 shows results. The NNJM modelgave significantimprovement(+0.6)ontopofbaselinewhichdoes not include it. We found fine-tuning method to give slight Table10showsprogressonthislanguagepair. Thebaseline gains (+0.2) when the baseline model was trained on the system(ID)wastrainedonthetheTEDdataandtargetside Opus data. On the contrary, fine-tuning did not help when of all the permissible parallel data. In the second row, we themodeltrainedwasonUN. addedalltheparalleldataexceptfortheUN.Inthethirdrow we additionally added the UN data that we selected in the 3.6. Class-basedModels Arabic→Englishdirection. Additionalparalleldatagivesan We explored the use of automatic word clusters in phrase- averageimprovementof +1.4BLEU point. Thenwe added based models [11]. We used 50 classes, obtained by run- an NNJM model trained on in-domain TED data on top of ning mkcls. The clusters ids were included in the phrase- this system to improve it by +0.8. Adding GigaWord and table.Weadditionallytrainedin-domainlanguagemodelus- monolingualOPUSdata(another20MSentencesotherthan ingword-classesandinterpolatedOSMonword-classes.But the target-sideofthe paralleldata) gavean improvementof weonlysawverysmallimprovementsusingwordclasses. +0.3. Finally we replaced the baseline NNJM with the one trainedonOPUSdataandfine-tunedwiththein-domaindata 3.7. HandlingUnknownWords togetourbestsystem. We tried to handle OOV words using drop-oov and 3.10. QEDSystems throughtransliteration[28,29]. Theformerworkedslightly betterandwas usedinthebestsystem. Ofcoursethegains We simply replicated QED systems by replacing QED cor- fromthetwomethodsareadditivebecausetheyareaddress- pus to be in-domain data, instead of TED data. We used ingdifferentOOVs,butthere’snogoodwaytoautomatically thesameUNdatathatweselectedforourArabic→English findwhichwordtodropandwhichonetotransliterate. system, therefore our phrase-tables remain the same. The main changes are caused when training adapted OSM and 3.8. FinalSystem NNJM models. For NNJM we simply fine tune with QED Table9showsincrementalprogressonthisArabic→English corpus instead of the TED corpus. For interpolated OSM, languagepair. OurbestsystemincludedMMLselectedUN weconcatenatedTEDandQEDcorpusandbuildOSMonit, and Opus corpora, big language model, interpolated OSM whichis theninterpolatedwith the OSM modelstrainedon andfine-tunedNNJMmodels. Weweuseddrop-oovop- the selected UN and Opusdata. We used IWSLT tuningto tiontohandleunknownwords. gettheinterpolationweights. ThiswaytheOSMsub-model 3.9. English-to-ArabicSystems WedidnotdodetailedexperimentsfortheEnglish→Arabic System ted-11 ted-12 ted-13 ted-14 Avg ID 14.8 15.6 16.7 14.5 15.4 direction because of computational limitations, but simply replicated what worked for the Arabic→English direction. +Parallel 15.5 16.4 18.2 16.3 16.6 +MML(UN) 15.6 16.3 18.4 16.7 16.8 +NNJM 16.5 17.4 19.2 17.4 17.6 System ted-11 ted-12 ted-13 ted-14 Avg +bigLM 16.6 17.6 20.0 17.4 17.9 +NNJM(FT) 16.7 17.9 20.2 17.7 18.1 Baseline 30.1 34.1 34.6 30.3 32.3 Class-based 30.3 34.2 34.7 30.4 32.4 Table10:IncrementalProgressEnglish-to-ArabicSystem Table7:UsingWordClasses System ted-11 ted-12 ted-13 ted-14 Avg System ted-11 ted-12 ted-13 ted-14 Avg 30%+FT(TED) 27.2 31.4 30.8 27.1 29.1 5% 25.8 29.4 29.3 25.0 27.4 30%+TED+FT(TED) 27.2 30.8 30.1 25.8 28.5 5%(Frozen) 24.7 27.7 27.4 23.9 25.9 30% 27.2 30.8 30.1 25.8 28.5 Table11: FineTuningonOut-domainversusConcatenation 30%(Frozen) 26.5 30.4 28.9 25.0 27.7 –Modelsrunfor3epochs Table12:Fine-tuningwith/withoutfreezingtheEmbeddings createdfromTED+QEDcorpusgetsbestweights. We also System ted-11 ted-12 ted-13 ted-14 Avg retrainedthelanguagemodelinthissimilarfashion.Weused 5%+FT(TED) 25.8 29.4 29.3 25.0 27.4 thetuningweightsobtainedfromourbestTEDsystemsand 30%+FT(TED) 28.4 32.7 32.9 27.8 30.4 replacedtheTEDadaptedOSM,NNJMandlanguagemod- Full+FT(TED) 28.1 32.3 31.6 27.0 29.8 30%+FT(OPUS) 26.1 30.6 32.5 27.1 29.1 elswiththeirQEDadaptedvariants. Full+FT(OPUS) 28.2 31.7 34.3 29.2 30.8 4. Neural Machine Translation Table13:Dataselection 4.1. Pre/Post-processing We used a similar pre/post-processing pipeline for Neural suchlayerinourcaseisthewordembeddinglayer. Wetried MT as our phrase-based systems (Section 2), and addition- a variation in which we do not freeze any layer. This lat- allyappliedBPE [30]beforetrainingthem. OurBPEmod- ter variantwas foundto outperformthe defaultsetting (See els are trained separately for both the Arabic and English Table12). datasets instead of jointly training them, since the charac- DropoutsarefoundtobeusefulinNNtraining,whenthe tersetdiffersbetweenthelanguages.Welimitedthenumber trainingdataissmall. Weexperimentedwithusingdropouts of operations to 59,500, as suggested in [30]. We experi- in our experiments, but did not find any significant differ- mented with BPE models trained on the TED data, and on ence. Hence we decided to use it only when fine-tuning theconcatenationoftheTED andout-domaindata. We did withthein-domaindata(TED/QED),sincebothoftheother notseeanyconsiderabledifferenceinperformancebetween datasets(UNandOPUS)werebiganddidnotposeanyrisk these models. Thus we used the BPE modeltrained on the ofinducingtheproblemofoverfitting. TEDdatafortheexperimentsreportedinthispaper. 4.5. DataSelection 4.2. Baseline Sincewefounddataselectionusefulinthephrasebasedsys- WeuseddefaultparametersinNematustotrainoursystems: tem,wealsotrainedourneuralsystemsusing5%,30%and a batchsize of 80, sourceand targetvocabularyof50K en- 100%oftheUNdata.Intheseexperiments,weconcatenated trieseach,1024LSTMunits,andtheembeddinglayersizeof the 5% and 30% of the UN data with the in-domain data. 500.BaselinesystemweretrainedusingonlyTEDcorpus. Toevaluatethemostpromisingmodels,wetrainedallofthe modelsuntilthelearningplateaued,andthenfine-tunedthese 4.3. FineTuningonConcatenationversusOD modelswithin-domaindata.2 TheresultsareshownininTa- ble 13. Using only 5% of the data provedharmful, and the The best phrase based systems are usually trained by con- system did notgeneralize as well as the other models. The catenatinginandout-domaindata. Ontheotherhand,deep modeltrainedon30%ofthedataperformedbetterthanthe learningsystemsaretrainedontheout-domaindatafirst,and modeltrainedonallthedata,by0.7BLEUpoints. thenfine-tunedwithin-domaindata. Weexperimentedwith In our subsequentexperimentswe tried to verify if this both strategies. In the interest of time we selected 30% of finding holds when we add the OPUS data. We therefore theUNdatausingMMLfiltering(Table3). Wetrainedtwo trainedtwosystemsbyfine-tuning30%selectedUNdataor systems, one by concatenating the in-domain data with the full UN data using OPUS. Here the results flipped and the selected (30%)UN dataandotherjust onthe selected data. wefoundthatmodelthatusedalloftheUNdataperformed Thenwefine-tunedboththemodelswiththein-domainTED better (Compare last two rows in Table 13). Therefore, we data after running them for 3 epochs. Table 11 shows that decidedtofocusoureffortsonthemodeltrainedontheentire fine-tuning a system trained on out-domain data only, out- UNdataforallofthefollowingexperiments. performsthesystemfine-tunedonconcatenation. 4.6. Ensemble 4.4. Fine-tuningVariantsandDropouts Ensemblingmodelshasshownto givea consistentboostin ThedefaultversionofNematusappliesfine-tuningbyfreez- performanceinpastbestperformingsystems[3]. Wethere- ing the weights of embedding layer. The intuition behind foreexperimentedwithseveralvariations.Wefoundthebest freezing a layer is to not allow the weights in that layer to changewithadditionaldata. Thisissometimesusefulwhen 2Becausewewererunningexperimentsinparallel,wewerenotawareat wecanlearncertainlayersbetterfromout-domaindata.One thispointthatfine-tuningonout-domainisabetterstrategy System ted-11 ted-12 ted-13 ted-14 Avg System ted-11 ted-12 ted-13 ted-14 Avg Full+FT(OPUS) 28.2 31.7 34.3 29.2 30.8 Arabic→English +FT(TED) 31.8 36.2 36.1 30.8 33.7 Phrase-based 30.5 34.2 35.0 30.5 32.6 Ensemble(8) 32.5 37.0 37.2 31.5 34.6 NeuralMT 32.5 37.0 37.2 31.5 34.6 SystemComb 32.8 36.5 37.4 31.7 34.6 Table14:Ensemblingover8Fine-tunedModels English→Arabic Phrase-based 16.7 17.9 20.2 17.7 18.1 System ted-11 ted-12 ted-13 ted-14 Avg NeuralMT 17.1 18.9 20.1 17.7 18.5 Baseline 24.0 26.4 25.2 22.4 24.5 SystemComb 16.8 19.1 20.7 17.6 18.6 UN 15.9 17.9 20.0 16.3 17.5 +OPUS 28.2 31.7 34.3 29.2 30.8 Table17:ResultsforSystemCombination +TED 31.8 36.2 36.1 30.8 33.7 +Ensemble 32.5 37.0 37.2 31.5 34.6 System ted-15 ted-16 qed-15 Arabic→English Table15:Arabic-to-EnglishNMTSystemprogress Primary 34.1 31.8 28.1 Contrastive 33.7 31.5 28.1 English→Arabic performing combination by fine-tuning the last eight mod- Primary 19.5 18.4 23.1 elsoftheUN+OPUSsystem,andthenensembletheseeight Contrastive 19.5 18.1 22.9 fine-tunedmodels. Performanceimprovementsfromtheen- semble are shown in Table 14. The second row shows sys- Table18:ResultsonOfficialTestSets temswhenwefinetuneourbestsysteminTable13withthe in-domainTEDdata. Inthelastrowweperformensemble. the contrary significant improvement was obtained only in 4.7. FinalSystem test-2013intheEnglish→Arabicdirection. Table18shows OurfinalsystemwastrainedbyfirstusingalloftheUNdata. resultsontheofficialtest-sets. We then continued training on OPUS data. Once learning hadplateauedontheOPUSdata,wetookthelasteightmod- 6. Summary els which were verysimilar in performance,and fine-tuned eachof the themusing TED data. We thencombinedthese We trained a very strong phrase-based system with SOTA eightfine-tunedmodelsin an ensembleas ourfinalsystem. featuressuchasOSM, NNJM andbigLM. Thesystemim- TheprogressisshowninTable15. Weusedthesamestrat- provedgreatly by applying domain adaptation. To this end egyfortheQEDsystemsbyfine-tuningthelasteightOPUS weappliedMML-basedfiltering,interpolatedOSMandfine- modelswithQEDdata,andcombiningtheseinanensemble. tuningof NNJM models. Overall, ourphrase-basedsystem achievedagainof4BLEUpointsontopofthebaselinesys- 4.8. English-to-ArabicSystems tem. We also applied data selection for training our NMT. However,theNMTsystemsquicklyoverfitanddidnotper- WeusedinsightsgainedfromourArabic-to-Englishsystem form well. Our experiments showed that the NMT system experimentstotrainourEnglish→Arabicsystems. Ourfinal trained on the full UN data performed best, and the final modelforbothTEDandQEDwasfirsttrainedonallofthe NMT system made use of all the available out-of-domain UNdata,followedbytheOPUSdata,andfinallyfine-tuned data. However, the training was performed incrementally, withthein-domaindata. TheprogressisshowninTable16. startingwithUNdatafor50kiterations,finetunedonOPUS 5. System Combination for 25k more iterations and then fine tuned the final model using TED talks for a few iterations. We simply replicated WecombinedhypothesesproducedbyourbestPhrase-based our settings to train QED systems. Finally we applied sys- and Neural MT systems. For this purpose we used Multi- temcombinationofthetwosystemsusingMEMT. Engine MT system, or MEMT [13]. The results are shown While it is computationally expensive, we found train- in Table 17. We didnotgain anysubstantialimprovements ing a neural MT system much simpler than a competitive using system combination. Small improvements were ob- phrase-basedsystem,wherealotofsub-componentsneedto tainedintheArabic→Englishdirectionbaringtest-2012.On be optimizedindependentlyto reach the best configuration. Onthecontrary,anNMTsystemrequiresleastsupervision. Secondly once a neural system is trained, the effort can be System ted-11 ted-12 ted-13 ted-14 Avg easily reused to adapt the system towards another domain, UN 9.1 9.3 11.2 9.4 9.8 +OPUS 10.8 11.2 13.4 10.9 11.6 as in this case we simply fine-tunedour UN+OPUSsystem +TED 17.1 18.9 20.1 17.7 18.5 with the QED corpus. On the contrary, almost all the sub- componentof a phrase-based system had to be retrained to Table16:English-to-ArabicNMTSystemprogress adaptthesystemtowardsQEDcorpus. 7. References Linguistics: Human Language Technologies (ACL- HLT’11),Portland,OR,USA,2011. [1] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, [10] N.Durrani,H.Sajjad,S.Joty,A.Abdelali,andS.Vo- C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, gel, “Using jointmodelsfordomainadaptationin sta- andE.Herbst,“Moses:Opensourcetoolkitforstatisti- tisticalmachinetranslation,”inProceedingsoftheFif- calmachinetranslation,”inProceedingsoftheAssoci- teenth Machine Translation Summit (MT Summit XV). ationforComputationalLinguistics(ACL’07),Prague, Florida,USA:AMTA,November2015. CzechRepublic,2007. [11] N.Durrani,P.Koehn,H.Schmid,andA.Fraser,“Inves- [2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural tigatingtheusefulnessofgeneralizedwordrepresenta- machine translation by jointly learning to align tionsinsmt,”inProceedingsofthe25thAnnualConfer- and translate,” in ICLR, 2015. [Online]. Available: ence on Computational Linguistics, ser. COLING’14, http://arxiv.org/pdf/1409.0473v6.pdf Dublin,Ireland,2014,pp.421–432. [3] R. Sennrich, B. Haddow, and A. Birch, “Ed- [12] M.-T.LuongandC.D.Manning,“Stanfordneuralma- inburgh neural machine translation systems for chinetranslationsystemsforspokenlanguagedomain,” wmt 16,” in Proceedings of the First Con- inInternationalWorkshoponSpokenLanguageTrans- ference on Machine Translation. Berlin, Ger- lation,DaNang,Vietnam,2015. many: Association for Computational Linguistics, August 2016, pp. 371–376. [Online]. Available: [13] K. Heafield and A. Lavie, “CMU system combination http://www.aclweb.org/anthology/W16-2323 in WMT 2011,” in Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine [4] M.Ziemski, M.Junczys-Dowmunt,andB. Pouliquen, Translation, Edinburgh, Scotland, United King- “Theunited nationsparallelcorpusv1.0,” in Proceed- dom, July 2011, pp. 145–151. [Online]. Available: ingsoftheTenthInternationalConferenceonLanguage http://kheafield.com/professional/avenue/wmt2011.pdf ResourcesandEvaluationLREC2016,Portorozˇ,Slove- nia,May23-28,2016,2016. [14] F.Guzma´n,H.Sajjad,S.Vogel,andA.Abdelali,“The AMARAcorpus:Buildingresourcesfortranslatingthe [5] H.Sajjad,F.Guzmn,P.Nakov,A.Abdelali,K.Murray, web’seducationalcontent,”inProceedingsofthe10th F.A.Obaidli,andS.Vogel,“QCRIatIWSLT2013:Ex- InternationalWorkshoponSpokenLanguageTechnol- periments in Arabic-English and English-Arabic spo- ogy(IWSLT-13),December2013. ken language translation,” in Proceedings of the 10th InternationalWorkshoponSpokenLanguageTechnol- [15] O. Bojar, R. Chatterjee, C. Federmann, Y. Gra- ogy(IWSLT-13),December2013. ham, B. Haddow, M. Huck, A. Jimeno Yepes, P. Koehn, V. Logacheva, C. Monz, M. Negri, [6] P. Lison and J. Tiedemann, “Opensubtitles2016: Ex- A. Neveol, M. Neves, M. Popel, M. Post, R. Rubino, tracting large parallelcorporafrom movie and tv sub- C. Scarton, L. Specia, M. Turchi, K. Verspoor, titles,” in Proceedingsof the Tenth InternationalCon- and M. Zampieri, “Findings of the 2016 confer- ferenceonLanguageResourcesandEvaluation(LREC ence on machine translation,” in Proceedings of the 2016). European Language Resources Association First Conference on Machine Translation. Berlin, (ELRA),may2016. Germany: Association for Computational Linguis- [7] A. Abdelali, F. Guzman, H. Sajjad, and S. Vogel, tics, August 2016, pp. 131–198. [Online]. Available: “The AMARA corpus: Building parallel language re- http://www.aclweb.org/anthology/W/W16/W16-2301 sourcesfortheeducationaldomain,”inProceedingsof [16] A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, the Ninth International Conference on Language Re- R.Eskander,N.Habash,M.Pooleery,O.Rambow,and sources and Evaluation (LREC’14), Reykjavik, Ice- R.M.Roth,“MADAMIRA:Afast,comprehensivetool land,May2014. formorphologicalanalysisanddisambiguationofAra- [8] A.Axelrod,X.He,andJ.Gao,“Domainadaptationvia bic,” in Proceedings of the Language Resources and pseudoin-domaindataselection,”inProceedingsofthe EvaluationConference,ser.LREC’14,Reykjavik,Ice- ConferenceonEmpiricalMethodsinNaturalLanguage land,2014,pp.1094–1101. Processing,ser.EMNLP’11,Edinburgh,UnitedKing- [17] A.Abdelali,K.Darwish,N.Durrani,andH.Mubarak, dom,2011. “Farasa: A fast and furious segmenter for arabic,” [9] N. Durrani, H. Schmid, and A. Fraser, “A joint se- in Proceedings of the 2016 Conference of the quence translation model with integrated reordering,” North American Chapter of the Association for in Proceedings of the Association for Computational Computational Linguistics: Demonstrations. San Diego, California: Association for Computational MeetingonAssociationforComputationalLinguistics, Linguistics,June2016,pp.11–16.[Online].Available: ser. ACL ’02, Morristown, NJ, USA, 2002, pp. 311– http://www.aclweb.org/anthology/N16-3003 318. [18] A. Birch, M. Huck, N. Durrani, N. Bogoychev, and [27] N. Durrani, A. Fraser, H. Schmid, H. Hoang, and P. Koehn, “Edinburgh SLT and MT system descrip- P. Koehn, “Can markov models over minimal transla- tionfortheIWSLT2014evaluation,”inProceedingsof tion units help phrase-based smt?” in Proceedings of the 11th InternationalWorkshop on SpokenLanguage the51stAnnualMeetingoftheAssociationforCompu- Translation, ser. IWSLT ’14, Lake Tahoe, CA, USA, tationalLinguistics(Volume2: ShortPapers). Sofia, 2014. Bulgaria: Association for Computational Linguis- tics, August 2013, pp. 399–405. [Online]. Available: [19] C. Dyer, V. Chahuneau, and N. A. Smith, “A simple, http://www.aclweb.org/anthology/P13-2071 fast,andeffectivereparameterizationofibmmodel2,” inProceedingsofNAACL’13,2013. [28] H. Sajjad, A. Fraser, and H. Schmid, “A statistical model for unsupervised and semi-supervised translit- [20] K. Heafield, “KenLM: Faster and Smaller Lan- eration mining,” in Proceedings of the Association guage Model Queries,” in Proceedings of the for Computational Linguistics (ACL’12), Jeju, Korea, Sixth Workshop on Statistical Machine Trans- 2012. lation, Edinburgh, Scotland, United Kingdom, July 2011, pp. 187–197. [Online]. Available: [29] N.Durrani,H.Sajjad, H. Hoang,andP.Koehn,“Inte- http://kheafield.com/professional/avenue/kenlm.pdf gratingan unsupervisedtransliterationmodelinto sta- tisticalmachinetranslation,”inProceedingsofthe15th [21] M.GalleyandC.D.Manning,“ASimpleandEffective ConferenceoftheEuropeanChapteroftheACL(EACL Hierarchical Phrase Reordering Model,” in Proceed- 2014),Gothenburg,Sweden,April2014. ings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, [30] R. Sennrich, B. Haddow, and A. Birch, “Neu- October 2008, pp. 848–856. [Online]. Available: ral machine translation of rare words with sub- http://www.aclweb.org/anthology/D08-1089 word units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Lin- [22] N. Durrani, H. Schmid, A. Fraser, P. Koehn, and guistics (Volume 1: Long Papers). Berlin, Ger- H. Schu¨tze, “The Operation Sequence Model – Com- many: Association for Computational Linguistics, biningN-Gram-basedandPhrase-basedStatisticalMa- August 2016, pp. 1715–1725. [Online]. Available: chineTranslation,”ComputationalLinguistics,vol.41, http://www.aclweb.org/anthology/P16-1162 no.2,pp.157–186,2015. [23] J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz, andJ. Makhoul,“Fast androbustneuralnetworkjoint modelsforstatisticalmachinetranslation,”inProceed- ingsofthe52ndAnnualMeetingoftheAssociationfor Computational Linguistics (Volume 1: Long Papers), 2014. [24] S.Joty,H.Sajjad,N.Durrani,K.Al-Mannai,A.Abde- lali, andS. Vogel, “How toAvoidUnwantedPregnan- cies: DomainAdaptationusingNeuralNetworkMod- els,”inProceedingsofthe2015ConferenceonEmpir- icalMethodsinNaturalLanguageProcessing,Lisbon, Portugal,September2015. [25] C. Cherry and G. Foster, “Batch tuning strategies for statistical machine translation,” in Proceedings of the 2012AnnualConferenceoftheNorthAmericanChap- ter of the Association for Computational Linguistics: HumanLanguageTechnologies,ser.NAACL-HLT’12, Montre´al,Canada,2012. [26] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A Method for Automatic Evaluation of Ma- chine Translation,” in Proceedings of the 40th Annual

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.