ebook img

Parsing Universal Dependencies without training PDF

0.13 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parsing Universal Dependencies without training

Parsing Universal Dependencies without training HéctorMartínez Alonso♠ Željko Agic´♥ Barbara Plank♣ Anders Søgaard♦ ♠Univ. Paris Diderot,SorbonneParis Cité– Alpage, INRIA, France ♥IT UniversityofCopenhagen,Denmark ♣Center forLanguageandCognition,UniversityofGroningen,TheNetherlands ♦UniversityofCopenhagen, Denmark [email protected] Abstract Furthermore, given that UD rests on a series of simpleprinciplesliketheprimacyoflexicalheads, 7 We propose UDP, the first training-free cf. Johannsen etal.(2015) for more details, we 1 parser for Universal Dependencies (UD). 0 expectthatsuchaformalismlendsitselfmorenat- 2 OuralgorithmisbasedonPageRankanda urally to a simple and linguistically sound rule- n small set of head attachment rules. It fea- based approach to cross-lingual parsing. In this a tures two-step decoding to guarantee that paperwepresentsuchanapproach. J function words are attached as leaf nodes. Oursystemisadependencyparserthatrequires 1 The parser requires no training, and it is 1 no training, and relies solely on explicit part-of- competitive with a delexicalized transfer speech(POS)constraintsthatUDimposes. Inpar- ] L system. UDPoffers a linguistically sound ticular, UDprescribes that trees aresingle-rooted, C unsupervised alternative to cross-lingual and that function words like adpositions, auxil- s. parsing for UD, which can be used as a iaries, and determiners are always dependents of c baseline for such systems. The parser has contentwords,whileotherformalismsmighttreat [ very few parameters and is distinctly ro- them as heads (DeMarneffeetal.,2014). We as- 1 busttodomainchangeacrosslanguages. cribeourworktotheviewpointsofBender(2009) v 3 abouttheincorporation oflinguistic knowledgein 1 Introduction 6 language-independent systems. 1 Grammarinductionandunsuperviseddependency 3 Contributions Weintroduce, to the best ofour 0 parsing are active fields of research in natural . knowledge, the first unsupervised rule-based de- 1 language processing (KleinandManning, 2004; pendencyparserforUniversalDependencies. 0 Gellingetal.,2012). However, many data-driven 7 Ourmethodgoessubstantiallybeyondtheexist- approaches struggle with learning relations that 1 ing work on rule-aided unsupervised dependency : match the conventions of the test data, e.g., Klein v parsing, specifically by: andManningreported thetendency oftheirDMV i X parser to make determiners the heads of German i) adapting the dependency head rules to UD- ar nouns,whichwouldnotbeanerrorifthetestdata compliant POSrelations, used a DP analysis (Abney,1987). Even super- ii) incorporating the UD restriction of function visedtransferapproaches (McDonaldetal.,2011) wordsbeingleaves, suffer from target adaptation problems when fac- iii) applying personalized PageRank to improve ingwordorderdifferences. mainpredicate identification, andby The Universal Dependencies (UD) project iv) makingtheparsingentirelyfreeoflanguage- (Nivreetal.,2015; Nivreetal.,2016)offers a de- specific parameters by estimating adposition pendency formalism that aims at providing a attachment direction atruntime. consistent representation across languages, while Weevaluate oursystem on32languages1 inthree enforcing a few hard constraints. The ar- setups, depending on the reliability of available rival of such treebanks, expanded and improved POS tags, and compare to a multi-source delexi- on a regular basis, provides a new milestone 1Outof33languagesinUDv1.2. WeexcludeJapanese for cross-lingual dependency parsing research because the treebank isdistributedwithout word forms and (McDonaldetal.,2013). hencewecannotprovideresultsonpredictedPOS. calized transfer system. In addition, we evaluate word-splitting for truly low-resource languages. the systems’ sensitivity to domain change for a Further, Johannsen etal.(2016) introduced joint subset of UD languages for which domain infor- projectionofPOSanddependenciesfrommultiple mationwasretrievable. Theresults exposeasolid sourceswhilesharingtheoutlookonbiasremoval andcompetitivesystemforallUDlanguages. Our inreal-world multilingual processing. unsupervised parser compares favorably to delex- Rule-based parsing Cross-lingual methods, re- icalized parsing, while being more robust to do- alistic or not, depend entirely on the availability mainchange. of data: for the sources, for the targets, or most often for both sets of languages. Moreover, they 2 Related work typically do not exploit constraints placed on lin- Cross-lingual learning Recent years have seen guisticstructuresthroughaformalism,andtheydo exciting developments in cross-lingual linguistic sobydesign. structurepredictionbasedontransferorprojection WiththeemergenceofUDasthepracticalstan- of POS and dependencies (DasandPetrov,2011; dard for multilingual POS and syntactic depen- McDonaldetal.,2011). These works mainly use dency annotation, we argue for an approach that supervised learning and domain adaptation tech- takes a fresh angle on both aspects. Specifically, niquesforthetargetlanguage. we propose a parser that i) requires no training The first group of approaches deals with anno- data,andincontrast ii)critically reliesonexploit- tationprojection(Yarowskyetal.,2001),whereby ingtheUDconstraints. parallel corpora are used to transfer annotations These two characteristics make our between resource-rich source languages and low- parser unsupervised. Data-driven unsuper- resource target languages. Projection relies on vised dependency parsing is now a well- the availability and quality of parallel corpora, established discipline (KleinandManning,2004; source-side taggers and parsers, but also tok- Spitkovskyetal.,2010a; enizers, sentence aligners, and word aligners for Spitkovskyetal.,2010b). Still, the perfor- sources and targets. Hwaetal.(2005) were the mance of these parsers falls far behind the first to project syntactic dependencies, and Tiede- approaches involving anysortofsupervision. mann et al. (2014; 2016) improved on their pro- Our work builds on the line of research jection algorithm. Current state of the art in on rule-aided unsupervised dependency cross-lingual dependency parsing involves lever- parsing by Gillenwateretal.(2010) and aging parallel corpora for annotation projection Naseemetal.(2010), and also relates to Sø- (MaandXia,2014;RasooliandCollins,2015). gaard’s (2012a; 2012b) work. Our parser, The second group of approaches deals with however,featurestwokeydifferences: transferring source parsing models to target lan- i) the usage of PageRank personalization guages. ZemanandResnik(2008) were the first (Lofgren, 2015),andof to introduce the idea of delexicalization: remov- ii) two-step decoding to treat content and func- inglexicalfeaturesbytrainingandcross-lingually tion words differently according to the UD applying parsers solely on POS sequences. formalism. Søgaard(2011) and McDonaldetal.(2011) inde- Throughthesedifferences,evenwithoutanytrain- pendently extended the approach by using mul- ingdata,weparsenearlyaswellasadelexicalized tiple sources, requiring uniform POS and depen- transfer parser, and with increased stability to do- dencyrepresentations (McDonaldetal.,2013). mainchange. Both model transfer and annotation projection rely on a large number of presumptions to derive 3 Method their competitive parsing models. By and large, thesepresumptionsareunrealisticandexclusiveto Our approach does not use any training or unla- agroupofverycloselyrelated,resource-richIndo- beled data. We have used the English treebank European languages. Agic´ et al. (2015; 2016) during development to assess the contribution of exposed some of these biases in their proposal individual head rules, and to tune PageRank pa- for realistic cross-lingual tagging and parsing, as rameters(Sec.3.1)andfunction-worddirectional- theyemphasized thelack ofperfect sentence- and ity (Sec. 3.2). Adposition direction is calculated on the fly at runtime. We refer henceforth to our 1: H =∅;D=∅ 2: C =hc1,...cmi;F =hf1,...fmi UDparserasUDP. 3: forc∈Cdo 4: if|H|=0then 3.1 PageRanksetup 5: h=root 6: else Our system uses the PageRank (PR) algorithm 7: h=argminj∈H {γ(j,c)|δ(j,c)∧κ(j,c)} (Pageetal.,1999)toestimate therelevance ofthe 8: endif 9: H =H∪{c} content words of a sentence. PR uses a random 10: D=D∪{(h,c)} walk to estimate which nodes in the graph are 11: endfor more likely to be visited often, and thus, it gives 12: forf ∈F do 13: h=argminj∈H {γ(j,f)|δ(j,f)∧κ(j,f)} higherranktonodeswithmoreincomingedges,as 14: D=D∪{(h,f)} well as to nodes connected to those. Using PR to 15: endfor 16: return D score word relevance requires an effective graph- building strategy. Wehave experimented with the Figure1: Two-stepdecoding algorithm forUDP. strategies by Søgaard(2012b), such as words be- ing connected to adjacent words, but our system fares best strictly using the dependency rules in have the highest PR, making it the dependent of Table1tobuildthegraph. UDtreesareoftenvery the root node (Fig. 1, lines 4-5). We use PR per- flat,andahighlyconnectedgraphyieldsaPRdis- sonalization to give 5 times more weight (over an tribution that iscloser to uniform, thereby remov- otherwise uniform distribution) tothenode that is ingsomeofthedifference ofwordrelevance. estimated to be main predicate, i.e., the first verb We build a multigraph of all words in the sen- orthefirstcontentwordiftherearenoverbs. tence covered by the head-dependent rules in Ta- 3.2 Headdirection ble1,givingeachwordanincomingedgeforeach eligible dependent, i.e., ADV depends on ADJ and Headdirection isanimportanttraitindependency VERB. This strategy does not always yield con- syntax (Tesnière,1959). Indeed, the UD feature nectedgraphs,andweuseateleportprobabilityof inventorycontainsatraittodistinguishthegeneral 0.05toensurePRconvergence. adposition tag ADP inpre-andpost-positions. Teleport probability is the probability that, in Instead of relying on this feature from the tree- anyiteration ofthePRcalculation, thenextactive banks, which is not always provided, we estimate node is randomly chosen, instead of being one of the frequency of ADP-NOMINAL vs. NOMINAL- the adjacent nodes of thecurrent active node. See ADP bigrams.3 We calculate this estimation di- BrinandPage(2012) for more details on teleport rectly on input data at runtime to keep the system probability, where the authors refer to one minus training-free. Moreover, it requires very few ex- teleportprobability asdampingfactor. amples to converge (10-15 sentences). If a lan- We chose this value incrementally in intervals guage hasmore ADP-NOMINAL bigrams, wecon- of 0.01 during development until we found the sider all its ADP to be prepositions (and thus de- smallestvaluethatguaranteed PRconvergence. A pendent ofelements attheir right). Otherwise, we high teleport probability is undesirable, because considerthempostpositions. the resulting stationary distribution can be almost For other function words, we have determined uniform. We did not have to re-adjust this value on the English dev data whether to make them whenrunningontheactualtestdata. strictly right- or left-attaching, or to allow ei- The main idea behind our personalized PR ap- ther direction. There, AUX, DET, and SCONJ proachistheobservationthatrankingisonlyrele- are right-attaching, while CONJ and PUNCT are vantforcontentwords.2 PRcanincorporate apri- left-attaching. There are no direction constraints oriknowledge oftherelevance ofnodesbymeans for the rest. Punctuation is a common source of of personalization, namely giving more weight to parsing errors that has very little interest in this certainnodes. setup. While wedo evaluate on all tokens includ- Intuitively, the higher the rank of a word, the ing punctuation, we also apply a heuristic for the closer it should be to the root node, i.e., the main last token in a sentence; if it is a punctuation, we predicate of the sentence is the node that should makeitadependent ofthemainpredicate. 2ADJ,NOUN,PROPN,andVERBmarkcontentwords. 3NOMINAL={NOUN,PROPN,PRON} ADJ −→ ADV then all function words (F) in the second block NOUN −→ ADJ,NOUN,PROPN whereHisnotupdated, thereby ensuring leafness NOUN −→ ADP,DET,NUM forallf ∈ F. Theorderofheadattachmentisnot PROPN −→ ADJ,NOUN,PROPN monotonic wrt. PR between the first and second PROPN −→ ADP,DET,NUM block, andcan yield non-projectivities. Neverthe- VERB −→ ADV,AUX,NOUN less,itstillisaone-passalgorithm. Decodingruns 2 VERB −→ PROPN,PRON,SCONJ inlessthanO(n ),namelyO(n×|C|). However, runningPRincursthemaincomputation cost. Table1: UDdependency rules. 4 Parserrun example This section exemplifies a full run of UDP for 3.3 Decoding the example sentence from the English test data: Fig. 1 shows the tree-decoding algorithm. It has “They also had a special connection to some ex- two blocks, namely a first block (3-11) where we tremists”. assigntheheadofcontentwordsaccordingtotheir PageRank and the constraints of the dependency 4.1 PageRank rules,andasecondblock(12-15)whereweassign Given an input sentence and its POStags, we ob- theheadoffunctionwordsaccordingtotheirprox- tain rank of each word by building a graph using imity, direction of attachment, and dependency headrules andrunning PRonit. Table2provides rules. Thealgorithm requires: the sentence, the POS of each word, the number of incoming edges for each word after building 1. ThePR-sortedlistofcontentwordsC. the graph with the head rules from Sec. 3.1, and 2. The set of function words F, sorting is irrel- thepersonalization vectorforPRonthissentence. evantbecause function-head assignations are Note that all nodes have the same personalization inter-independent. weight, except the estimated main predicate, the 3. A set H for the current possible heads, and verb“had”. a set D for the dependencies assigned at each iteration, which we represent as head- Word: They also had a special connection to some extremists dependent tuples(h,d). POS: PRON ADV VERB DET ADJ NOUN ADP DET NOUN Personalization: 1 1 5 1 1 1 1 1 1 4. Asymbolrootfortherootnode. Incomingedges: 0 0 4 0 1 5 0 0 5 5. A function γ(n,m) that gives the linear dis- Table2: Words,POS,personalization, andincom- tancebetweentwonodes. ingedgesfortheexamplesentence. 6. A function κ(h,d) that returns whether the dependency (h,d) has a valid attachment di- Table 4 shows the directed multigraph used for rectiongiventhePOSofthed(cf. Sec.3.2). PR in detail. We can see, e.g., that the four in- 7. A function δ(h,d) that determines whether coming edges for the verb “had” from the two (h,d)islicensedbytherulesinTable1. nouns, plus from the adverb “also” and the pro- The head assignations in lines 7 and 13 read as noun“They”. follow: the head h of a word (either c or f)is the AfterrunningPR,weobtainthefollowingrank- closestelementofthecurrentlistofheads(H)that ingforcontent words: has the right direction (κ) and respects the POS- C = hhad,connection,extremists,speciali dependency rules (δ). These assignations have a Even though the verb has four incoming edges back-off option to ensure the final D is a tree. If and the nouns have five each, the personalization theconditionsdeterminedbyκandδaretoostrict, makestheverbthehighest-ranked word. i.e., if the set of possible heads is empty, we drop theδ head-ruleconstraintandrecalculatetheclos- 1 2 3 est possible head that respects the directionality 5 7 8 imposed by κ. If the set is empty again, we drop 6 4 9 bothconstraints andassigntheclosesthead. root They also had a special connection to some extremists Lines4and5enforcethesingle-rootconstraint. Figure 2: Example dependency tree predicted by To enforce the leaf status of function nodes, the thealgorithm. algorithm firstattaches allcontent words(C),and 4.2 Decoding canonical language name (e.g., Finnish instead of Finnish-FTB). We use standard unlabeled attach- Once C is calculated, we can follow the algo- mentscore(UAS)andevaluateonallsentencesof rithm in Fig. 1 to obtain a dependency parse. thecanonical UDtestsets. Table 3 shows a trace of the algorithm, with C = hhad,connection,extremists,speciali and 5.1 Baseline F = {They,also,a,to,some}. We compare our UDP system with the perfor- mance of a rule-based baseline that uses the head it word h H 1 had root ∅ rules in Table 5. The baseline identifies the first 2 connection had {had} verb(orfirstcontentwordiftherearenoverbs)as 3 extremists had {had,connection} 4 special connection {had,connection,extremists} themainpredicate, andassigns headstoallwords 5 They had {had,connection,extremists,special} accordingtotherulesinTable1. Wehaveselected 6 also had ... 7 a connection ... thesetofhead rules tomaximize precision onthe 8 to extremists ... developmentset,andtheydonotprovidefullcov- 9 some extremists ... erage. The system makes any word not covered Table3: Algorithmtraceforexamplesentence. it: bytherules (e.g.,awordwithaPOSsuchas X or iteration number, word: current word, H: set of SYM)either dependent oftheir leftorright neigh- possible heads. bor,accordingtotheestimatedruntimeparameter. We report the best head direction and its score The first four iterations calculate the head of for each language in Table 5. This baseline content words following their PR,and the follow- finds the head of each token based on its clos- ing iterations attach the function words in F. Fi- est possible head, or on its immediate left or nally, Fig. 2shows the resulting dependency tree. right neighbor if there is no head rule for the Full lines are assigned in the first block (content POS at hand, which means that this system does dependents), dotted lines are assigned in the sec- not necessarily yield well-formed tress. Each ond block (function dependents). Theedge labels token receives a head, and while the structures indicate in which iteration the algorithm has as- are single-rooted, they are not necessarily con- signed each dependency. Note that the algorithm nected. Note that we do not include results for is deterministic for a certain input POSsequence. theDMVmodelbyKleinandManning(2004),as Any10-tokensentencewiththePOSlabelsshown it has been outperformed by a system similar to inTable2wouldyieldthesamedependencytree.4 ours(Søgaard,2012b). Theusualadjacency base- line for unsupervised dependency parsing, where 5 Experiments all words depend on their left or right neighbor, This section describes the data, metrics and com- faresmuchworsethanourbaseline(20%UASbe- parison systems used to assess the performance low on average) even with an oracle pick for the of UDP. We evaluate on the test sections of the best per-language direction, and we do not report UD1.2 treebanks (Nivreetal.,2015) that contain thosescores. word forms. If there is more than one treebank 5.2 Evaluationsetup per language, we use the treebank that has the Our system relies solely on POS tags. To esti- 4The resulting trees always pass the validation script in mate the quality degradation of our system un- github.com/UniversalDependencies/tools. dernon-gold POSscenarios, weevaluate UDPon two alternative scenarios. The first is predicted −→ They also had a special connection to some extremists POS (UDPP), where we tag the respective test They - • • set with TnT (Brants, 2000) trained on each lan- also - • • had - guage’s training set. The second is a naive type- a - • • special - • • constrained two-POS tag scenario (UDPN), and connection • - • to • - • approximates a lower bound. We give each word some • - • extremists • • - either CONTENT or FUNCTION tag, depending on the word’s frequency. The 100 most frequent Table 4: Matrix representation of the directed words of the input test section receive the FUNC- graphforthewordsinthesentence. TION tag. Finally, we compare our parser UDP to a su- Language BLG UDPG MSDG MSDP UDPP UDPN pervised cross-lingual system (MSD). It is a AncientGreek 42.2L 43.4 48.6 46.5 41.6 27.0 Arabic 34.8R 47.8 52.8 52.6 47.6 41.0 multi-source delexicalized transfer parser, re- Basque 47.8R 45.0 51.2 49.3 43.1 22.8 ferred to as multi-dir in the original paper Bulgarian 54.9R 70.5 78.7 76.6 68.1 27.1 ChurchSlavonic 53.8L 59.2 61.8 59.8 59.2 35.2 by McDonaldetal.(2011). For this baseline Croatian 41.6L 56.7 69.1 65.6 54.5 25.2 we train TurboParser (Martinsetal.,2013) on a Czech 46.5R 61.0 69.5 67.6 59.3 25.3 Danish 47.3R 57.9 70.2 65.6 53.8 26.9 delexicalized training set of 20k sentences, sam- Dutch 36.1L 49.5 57.0 59.2 50.0 24.1 pled uniformly from theUD training data exclud- English 46.2R 53.0 62.1 59.9 51.4 27.9 Estonian 73.2R 70.0 73.4 66.1 65.0 25.3 ingthetargetlanguage. MSDisacompetitiveand Finnish 43.8R 45.1 52.9 50.4 43.1 21.6 French 47.1R 64.5 72.7 70.6 62.1 36.3 realistic baseline in cross-lingual transfer parsing German 48.2R 60.6 66.9 62.5 57.0 24.2 work. Thisgives usanindication how oursystem Gothic 50.2L 57.5 61.7 59.2 55.8 34.1 Greek 45.7R 58.5 68.0 66.4 57.0 29.3 comparestostandardcross-lingual parsers. Hebrew 41.8R 55.4 62.0 58.6 52.8 35.7 Hindi 43.9R 46.3 34.6 34.5 45.7 27.0 Hungarian 53.1R 56.7 58.4 56.8 54.8 22.7 5.3 Results Indonesian 44.6L 60.6 63.6 61.0 58.4 35.3 Irish 47.5R 56.6 62.5 61.3 53.9 35.8 Table 5 shows that UDP is a competitive system; Italian 50.6R 69.4 77.1 75.2 67.9 37.6 Latin 49.4L 56.2 59.8 54.9 52.4 37.1 because UDPG is remarkably close to the super- Norwegian 49.1R 61.7 70.8 67.3 58.6 29.8 vised MSDG system, with an average difference Persian 37.8L 55.7 57.8 55.6 53.6 33.9 Polish 60.8R 68.4 75.6 71.7 65.7 34.6 of 6.4%. Notably, UDP even outperforms MSD Portuguese 45.8R 65.7 72.8 71.4 64.9 33.5 ononelanguage (Hindi). Romanian 52.7R 63.7 69.2 64.0 58.9 32.1 Slovene 50.6R 63.6 74.7 71.0 56.0 24.3 More interestingly, on the evaluation scenario Spanish 48.2R 63.9 72.9 70.7 62.1 35.0 with predicted POS we observe that our system Swedish 52.4R 62.8 72.2 67.2 58.5 25.3 Tamil 41.4R 34.2 44.2 39.5 32.1 20.3 drops only marginally (2.2%) compared to MSD Average 47.8 57.5 63.9 61.2 55.3 29.9 (2.7%). In the least robust rule-based setup, the error propagation rate from POS to dependency Table 5: UAS for baseline with gold POS (BLG) wouldbedoubled,aseitherawronglytaggedhead with direction (L/R) for backoff attachments, or dependent would break the dependency rules. UDP with gold POS (UDPG) and predicted POS However, with an average POS accuracy by TnT (UDPP), PR with naive content-function POS of 94.1%, the error propagation is 0.37, i.e, each (UDPN),andmulti-sourcedelexicalizedwithgold POS error causes 0.37 additional dependency er- and predicted POS (MSDG and MSDP, respec- rors. In contrast, for MSD this error propagation tively). BL values higher than UDPG are under- is0.46,thushigher. 5 lined,andUDPG valueshigher thanMSDG arein FortheextremePOSscenario,contentvs. func- boldface. tion POS (CF), the drop in performance for UDP is very large, but this might be too crude an eval- uationsetup. Nevertheless, UDP,thesimpleunsu- 6.1 PageRankcontribution pervised system with PageRank, outperforms the adjacency baselines (BL) by ∼4% on average on UDP depends on PageRank to score content the two type-based naive POS tag scenario. This words, and on two-step decoding to ensure the difference indicates that even with very deficient leaf status of function words. In this section we POStags,UDPcanprovidebetterstructures. isolate the constribution of both parts. We do so by comparing the performance of BL, UDP, and 6 Discussion UDPNoPR, a version of UDP where we disable PRandrankcontentwordsaccordingtotheirread- In this section we provide a further error analysis ing order, i.e., the first word in the ranking is the of the UDP parser. We examine the contribution first word to be read, regardless of the specific to the overal results of using PageRank to score language’s script direction. The baseline BL de- content words, the behavior of the system across scribed in 5.1 already ensures function words are differentpartsofspeech,andweassesstherobust- leaf nodes, because they have no listed dependent nessofUDPontextfromdifferent domains. POS in the head rules. The task of the decoding steps is mainly to ensure the resulting structures 5Err.prop.=(E(ParseP)−E(ParseG))/E(POSP), whereE(x)=1−Accuracy(x). arewell-formeddependency trees. If we measure the difference between UDPNoPR (an average of 21%, standard deviation of 9.6), andBL,weseethatUDPNoPR contributes with4 even lower than the otherwise problematic CONJ. UAS points on average over the baseline. Nev- Eventhough conjunctions arepervasive and iden- ertheless, the baseline is oracle-informed about tifying their scope is one of the usual challenges thelanguage’sbestbranchingdirection,aproperty for parsers, the average UAS for CONJ is much thatUDPdoesnothave. Instead,thedecodingstep larger (an average of 38%, standard deviation of determines head direction as described in Section 13.5)thanfor PUNCT. BothPOSshowlarge stan- 3.2. Complementarily, wecanmeasurethecontri- dard deviations, which indicates great variability. bution ofPRby observing thedifference between This variability can be caused by linguistic prop- regular UDPand UDPNoPR. The latter scores on erties of the languages or evaluation datasets, but average 9 UAS points lower than UDP. These 9 alsobydifferences inannotation convention. points are caused by the difference attachment of contentwordsinthefirstdecoding step. 6.3 Cross-domain consistency Models with fewer parameters are less likely 6.2 BreakdownbyPOS to overfit for a certain dataset. In our case, UD is a constantly improving effort, and not all a system with few, general rules is less likely v1.2 treebanks have the same level of formalism to make attachment decisions that are very compliance. Thus, the interpretation of, e.g., the particular of a certain language or dataset. AUX–VERB or DET–PRON distinctions might dif- PlankandvanNoord(2010)haveshownthatrule- fer across treebanks. However, we ignore these based parsers can be more stable to domain shift. differences in our analysis and consider all tree- WeexploreiftheirfindingholdsforUDPaswell, bankstobeequallycompliant. bytestingoni)theUDdevelopmentdataasaread- Therootaccuracyscoresoscillatearoundanav- ilyavailable proxyfordomainshift,andii)manu- erage of 69%, with Arabic and Tamil (26%) and allycurateddomainsplitsofselectUDtestsets. Estonian (93%) as outliers. Given the PR per- sonalization (Sec. 3.1), UDPhas a strong bias for Language Domain BLG MSDG UDPG MSDP UDPP choosing the first verb as main predicate. With- Bulgarian bulletin 48.3 67.5 67.4 65.4 61.5 legal 47.9 76.9 69.2 73.0 68.6 outpersonalization, performance drops2%onav- literature 53.6 74.2 69.0 72.8 66.6 erage. Thisdifference is consistent even forverb- news 49.3 74.6 70.2 73.0 68.2 various 51.4 74.2 72.5 72.6 69.5 finallanguageslikeHindi,giventhatthemainverb Croatian news 41.2 62.4 57.9 61.8 52.2 ofasimpleclausewillbeitsonlyverb,regardless wiki 41.9 64.8 55.8 58.2 56.3 of where it appears. Moreover, using PR person- English answers 44.1 61.6 55.9 59.5 53.7 email 42.8 58.8 52.1 57.1 56.3 alization makes the ranking calculations converge newsgroup 41.7 55.5 49.7 52.9 51.1 awholeorderofmagnitude faster. reviews 47.4 66.8 54.9 63.9 52.2 weblog 43.3 51.6 50.9 49.8 53.8 The bigram heuristic to determine adposition magazine† 41.4 60.9 55.6 58.4 53.3 direction succeeds at identifying the predominant bible† 38.4 56.2 56.2 56.8 48.6 questions† 38.7 69.7 55.6 60.5 47.2 pre- or postposition preference for all languages Italian europarl 50.8 64.1 70.6 62.7 69.7 (average ADP UAS of 75%). The fixed direc- legal 51.1 67.9 69.0 64.4 67.2 news 49.4 68.9 67.5 67.0 65.3 tion for the other functional POS is largely effec- questions 48.7 80.0 77.0 79.1 76.1 tive,withfewexceptions, e.g.,DET isconsistently various 49.7 67.8 69.0 65.3 67.6 wiki 51.8 71.2 68.1 70.3 66.6 right-attachingonalltreebanksexceptBasque(av- Serbian news 42.8 68.0 58.8 65.6 53.3 erageoverall DET UASof84%,32%forBasque). wiki 42.4 68.9 58.8 62.8 55.8 These alternations could also be estimated from the data in a manner similar to ADP. Our rules Table 6: Evaluation across domains. UAS for do not make nouns eligible heads for verbs. As a baseline with gold POS (BLG), UDP with gold result,thesystemcannotinferrelativeclauses. We POS (UDPG) and predicted POS (UDPP), and haveexcluded the NOUN → VERB ruleduring de- multi-source delexicalized with gold and pre- velopmentbecauseitmakesthehierarchybetween dictedPOS(MSDG andMSDP). Englishdatasets verbsandnounslessconclusive. marked with†are in-house annotated. Lowestre- sultsperlanguage underlined. Bold: UDPoutper- We have not excluded punctuation from the formsMSD. evaluation. Indeed, theUASforthe PUNCT islow Development sets We have used the English de- Language BLG MSDG UDPG MSDP UDPP velopment data to choose which relations would Bulgarian 50.1±2.4 73.5±3.5 69.7±1.8 71.3±3.3 66.9±3.2 Croatian+Serbian 42.1±0.7 66.0±3.0 57.8±1.4 62.1±3.0 54.4±2.0 be included as head rules in the final system (Ta- English 42.2±2.8 60.1±6.2 53.9±2.5 57.3±4.3 52.0±3.3 Italian 50.3±1.2 70.0±5.4 70.1±3.3 68.1±6.0 68.7±3.9 ble1). Itwouldbepossible thatsome oftherules AverageStd. 1.8 4.5 2.5 4.2 3.1 are indeed more befitting for the English data or forthatparticular section. Table 7: Average language-wise domain evalua- However, if we regard the results for UDPG in tion. We report average UAS and standard devi- Table 5, we can see that there are 24 languages ation per language. The bottom row provides the (out of 32) for which the parser performs better averagestandard deviationforeachsystem. than for English. This result indicates that the head rules are general enough to provide reason- guages. We attribute this higher stability to UDP able parses for languages other than the one cho- being developed to satisfy a set of general prop- sen for development. If we run UDPG on the de- erties of the UD syntactic formalism, instead of velopment sections for the other languages, we beingadata-drivenmethodmoresensitivetosam- find the results are very consistent. Any language pling bias. This holds for both the gold-POS and scoresonaverage±1UASwithregardstothetest predicted-POS setup. The differences in standard section. There is no clear tendency for either sec- deviation are unsurprisingly smaller in the pre- tionbeingeasiertoparsewithUDP. dicted POSsetup. Ingeneral, therule-based UPD Cross-domain test sets To further assess the is less sensitive to domain shifts than the data- cross-domain robustness, weretrievedthedomain driven MSD counterpart, confirming earlier find- (genre)splitsfromthetestsectionsoftheUDtree- ings(PlankandvanNoord,2010). banks where the domain information is available Table 6 gives the detailed scores per language as sentence metadata: from Bulgarian, Croatian, and domain. From the scores we can see that and Italian. Wealso include aUD-compliant Ser- presidential bulletin, legal and weblogs bian dataset which is not included in the UD re- are amongst the hardest domains to parse. How- lease but which is based on the same parallel cor- ever, the systems often donot agree onwhich do- pus as Croatian and has the same domain splits main is hardest, with the exception of Bulgarian (Agic´ andLjubešic´,2015). When averaging we bulletin. Interestingly, fortheItalian dataand pool Croatian and Serbian together as they come some of the hardest domains UDP outperforms fromthesamedataset. MSD,confirmingthatitisarobustbaseline. For English, we have obtained the test data splitsmatchingthesentencesfromtheoriginaldis- 6.4 Comparisontofullsupervision tributionoftheEnglishWebTreebank. Inaddition to these already available datasets, we have an- Inorder to assess how much information the sim- notated three different datasets to assess domain ple principles in UDP provide, we measure how variation more extensively, namely the first 50 many gold-annotated sentences are necessary to versesoftheKingJamesBible,50sentencesfrom reach itsperformance, thatis, after whichsizethe a magazine, and 75 sentences from the test split treebankprovidesenoughinformationfortraining in QuestionBank (Judgeetal.,2006). We include that goes beyond the simple linguistic principles the third dataset to evaluate strictly on questions, outlinedinSection3. which we could do already in Italian. While the For this comparison we use a first-order non- answers domain in English is made up of text projective TurboParser (Martinsetal.,2013) fol- from the Yahoo! Answers forum, only one fourth lowing the setup of Agic´ etal.(2016). The su- of the sentences are questions. Note these three pervised parsers require around 100 sentences to small datasets are not included in the results on reach UDP-comparable performance, namely a thecanonical testsections inTable5. mean of 300 sentences and a median of 100 sen- Table 7 summarizes the per-language average tences, with Bulgarian (3k), Czech (1k), and Ger- scoreandstandarddeviation,aswellasthemacro- man (1.5k) as outliers. The difference between averaged standard deviation across languages. mean and median shows there is great variance, UDP has a much lower standard deviation across while UDPprovides very constant results, also in domainscomparedtoMSD.Thisholdsacrosslan- termsofPOSanddomainvariation. 7 Conclusion References We have presented UDP, an unsupervised depen- [Abney1987] Steven Paul Abney. 1987. The English noun phrase in its sentential aspect. Ph.D. thesis, dency parser for Universal Dependencies (UD) MassachusettsInstituteofTechnology. that makes use of personalized PageRank and a small set of head-dependent rules. The parser re- [Agic´etal.2016] ŽeljkoAgic´, AndersJohannsen,Bar- quiresnotrainingdataandestimatesadpositiondi- bara Plank, Héctor Alonso Martínez, Natalie rectiondirectly fromtheinput. Schluter, and AndersSøgaard. 2016. Multilingual Projection for Parsing Truly Low-Resource Lan- Weachieve competitive performance on all but guages. Transactions of the Association for Com- two UD languages, and even beat a multi-source putationalLinguistics,4. delexicalized parser (MSD) on Hindi. We eval- uated the parser on three POS setups and across [Agic´andLjubešic´2015] Željko Agic´ and Nikola Ljubešic´. 2015. Universal Dependencies for domains. Our results show that UDP is less af- Croatian(thatWorkforSerbian,too). InBSNLP. fected by deteriorating POS tags than MSD, and is more resilient to domain changes. Given how [Agic´etal.2015] Željko Agic´, Dirk Hovy, and Anders much of the overall dependency structure can be Søgaard. 2015. If All You Have is a Bit of explained by this fairly system, we propose UDP the Bible: Learning POS Taggers for Truly Low- ResourceLanguages. InACL. as an additional UD parsing baseline. The parser, the in-house annotated test sets, and the domain [Bender2009] Emily M. Bender. 2009. Linguistically datasplitsaremadefreelyavailable.6 naÏve != language independent: Why NLP needs UD is a running project, and the guidelines are linguistic typology. In Proceedings of the EACL 2009WorkshopontheInteractionBetweenLinguis- bound to evolve overtime. Indeed, the UD 2.0 tics and Computational Linguistics: Virtuous, Vi- guidelines have been recently released. UDP can ciousorVacuous? be augmented with edge labeling for some deter- ministic labels like case or det. Some further [Brants2000] ThorstenBrants. 2000. TnT–AStatisti- constrains canbeincorporated inUDP.Moreover, calPart-of-SpeechTagger. InANLP. the parser makes no special treatment of multi- [BrinandPage2012] Sergey Brin and Lawrence Page. word expression that would require alexicon, co- 2012. Reprint of: The Anatomy of a Large-Scale ordinations orpropernames. Allthesethreekinds Hypertextual Web Search Engine. Computer net- of structures have a flat tree where all words de- works,56(18). pend on the leftmost one. While coordination at- [DasandPetrov2011] Dipanjan Das and Slav Petrov. tachment isaclassical problem inparsing andout 2011. Unsupervised Part-of-Speech Tagging with ofthescopeofourwork, apropernamesequence BilingualGraph-BasedProjections. InACL. can be straightforwardly identified from the part- of-speechtags,anditfallsthusintheareaofstruc- [DeMarneffeetal.2014] Marie-Catherine De Marn- effe,TimothyDozat,NataliaSilveira,KatriHaveri- tures predictable using simple heuristics. More- nen, Filip Ginter, Joakim Nivre, andChristopherD over, our use of PageRank could be expanded to Manning. 2014. UniversalStanfordDependencies: directly score the potential dependency edges in- ACross-LinguisticTypology. InLREC. steadofwords,e.g.,bymeansofedgereification. [Gellingetal.2012] DouweGelling, TrevorCohn, Phil Acknowledgments Blunsom,andJoaoGraça. 2012. ThePascalChal- lengeonGrammarInduction. InProceedingsofthe NAACL-HLTWorkshopontheInductionofLinguis- Wethank theanonymous reviewersfortheirvalu- ticStructure. able feedback. Héctor Martínez Alonso is funded bythe French DGAproject VerDi. Barbara Plank [Gillenwateretal.2010] Jennifer Gillenwater, Kuzman thanks the Center for Information Technology of Ganchev, Joao Graça, Fernando Pereira, and Ben the University of Groningen for the HPC cluster. Taskar. 2010. Sparsity in Dependency Grammar Induction. InACL. Željko Agic´ and Barbara Plank thank the Nvidia Corporation for supporting their research. An- [Hwaetal.2005] Rebecca Hwa, Philip Resnik, Amy dersSøgaardisfunded bytheERCStartingGrant Weinberg, Clara Cabezas, and Okan Kolak. 2005. LOWLANDSNo. 313695. Bootstrapping Parsers via Syntactic Projection AcrossParallelTexts. NaturalLanguageEngineer- 6https://github.com/hectormartinez/ud_unsup_ipnagr,1s1e(r03). [Johannsenetal.2015] Anders Johannsen, Héctor Héctor Martínez Alonso, Jan Mašek, Yuji Mat- Martínez Alonso, and Barbara Plank. 2015. Uni- sumoto, Ryan McDonald, Anna Missilä, Verginica versal dependencies for Danish. In International Mititelu, Yusuke Miyao, Simonetta Montemagni, Workshop on Treebanks and Linguistic Theories ShunsukeMori,HannaNurmi,PetyaOsenova,Lilja (TLT14),page157. Øvrelid, Elena Pascual, Marco Passarotti, Cenel- Augusto Perez, Slav Petrov, Jussi Piitulainen, Bar- [Johannsenetal.2016] AndersJohannsen,ŽeljkoAgic´, bara Plank, Martin Popel, Prokopis Prokopidis, and Anders Søgaard. 2016. Joint Part-of-Speech Sampo Pyysalo, Loganathan Ramasamy, Rudolf andDependencyProjectionfromMultipleSources. Rosa, Shadi Saleh, Sebastian Schuster, Wolfgang InACL. Seeker,MojganSeraji,NataliaSilveira,MariaSimi, Radu Simionescu, Katalin Simkó, Kiril Simov, [Judgeetal.2006] John Judge, Aoife Cahill, and Josef Aaron Smith, Jan Šteˇpánek, Alane Suhr, Zsolt Van Genabith. 2006. Questionbank: Creating a Szántó,TakaakiTanaka,ReutTsarfaty,SumireUe- CorpusofParse-AnnotatedQuestions. InACL. matsu,LarraitzUria,ViktorVarga,VeronikaVincze, Zdeneˇk Žabokrtský, Daniel Zeman, and Hanzhi [KleinandManning2004] Dan Klein and Christopher Zhu. 2015. UniversalDependencies1.2. Manning. 2004. Corpus-Based Induction of Syn- tactic Structure: Models of Dependency and Con- [Nivreetal.2016] Joakim Nivre, Marie-Catherine stituency. InACL. de Marneffe, Filip Ginter, Yoav Goldberg, Jan [Lofgren2015] Peter Lofgren. 2015. Efficient Al- Hajic, Christopher D Manning, Ryan McDonald, gorithms for Personalized PageRank. CoRR, Slav Petrov,SampoPyysalo, Natalia Silveira, etal. abs/1512.04633. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the 10th [MaandXia2014] XuezheMaandFeiXia. 2014. Un- International Conference on Language Resources supervised Dependency Parsing with Transferring andEvaluation(LREC2016). DistributionviaParallelGuidanceandEntropyReg- ularization. InACL. [Pageetal.1999] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageR- [Martinsetal.2013] André F. T. Martins, Miguel ankCitationRanking:BringingOrdertotheWeb. Almeida, and Noah A. Smith. 2013. Turning on the Turbo: Fast Third-Order Non-Projective Turbo [PlankandvanNoord2010] BarbaraPlankand Gertjan Parsers. InACL. van Noord. 2010. Grammar-Driven versus Data- Driven:WhichParsingSystemIsMoreAffectedby [McDonaldetal.2011] Ryan McDonald, Slav Petrov, DomainShifts? In Proceedingsofthe 2010Work- and Keith Hall. 2011. Multi-Source Transfer of shoponNLPandLinguistics: FindingtheCommon DelexicalizedDependencyParsers. InEMNLP. Ground. [McDonaldetal.2013] RyanMcDonald,JoakimNivre, [RasooliandCollins2015] MohammadSadeghRasooli Yvonne Quirmbach-Brundage,Yoav Goldberg, Di- and Michael Collins. 2015. Density-Driven panjan Das, Kuzman Ganchev, Keith Hall, Slav Cross-LingualTransfer of Dependency Parsers. In Petrov, Hao Zhang, Oscar Täckström, Claudia Be- EMNLP. dini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Mul- [Spitkovskyetal.2010a] Valentin Spitkovsky, Hiyan tilingualParsing. InACL. Alshawi, and Daniel Jurafsky. 2010a. From Baby [Naseemetal.2010] TahiraNaseem,HarrChen,Regina Steps to Leapfrog: How Less is More in Unsuper- Barzilay,andMarkJohnson. 2010. UsingUniversal visedDependencyParsing. InNAACL. LinguisticKnowledgetoGuideGrammarInduction. InEMNLP. [Spitkovskyetal.2010b] Valentin Spitkovsky, Hiyan Alshawi,DanielJurafsky,andChristopherManning. [Nivreetal.2015] JoakimNivre,ŽeljkoAgic´,MariaJe- 2010b. ViterbiTrainingImprovesUnsupervisedDe- sus Aranzabe, Masayuki Asahara, Aitziber Atutxa, pendencyParsing. InCoNLL. Miguel Ballesteros, John Bauer, Kepa Bengoetxea, Riyaz AhmadBhat, Cristina Bosco, Sam Bowman, [Søgaard2011] AndersSøgaard. 2011. Data PointSe- Giuseppe G. A. Celano, Miriam Connor, Marie- lection for Cross-Language Adaptation of Depen- Catherine de Marneffe, Arantza Diaz de Ilarraza, dencyParsers. InNAACL. Kaja Dobrovoljc, Timothy Dozat, Tomaž Erjavec, Richárd Farkas, Jennifer Foster, Daniel Galbraith, [Søgaard2012a] Anders Søgaard. 2012a. Two Base- Filip Ginter,IakesGoenaga,KoldoGojenola,Yoav linesforUnsupervisedDependencyParsing. InPro- Goldberg, Berta Gonzales, Bruno Guillaume, Jan ceedings of the NAACL-HLT Workshop on the In- Hajicˇ, Dag Haug, Radu Ion, Elena Irimia, An- ductionofLinguisticStructure. ders Johannsen, Hiroshi Kanayama, Jenna Kan- erva, Simon Krek, Veronika Laippala, Alessan- [Søgaard2012b] Anders Søgaard. 2012b. Unsuper- dro Lenci, Nikola Ljubešic´, Teresa Lynn, Christo- viseddependencyparsingwithouttraining. Natural pherManning,Ca˘ta˘linaMa˘ra˘nduc,DavidMarecˇek, LanguageEngineering,18(2).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.