ebook img

DTIC ADA460498: Converting Dependency Structures to Phrase Structures PDF

6 Pages·0.07 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA460498: Converting Dependency Structures to Phrase Structures

Converting Dependency Structures to Phrase Structures Fei Xia and Martha Palmer University of Pennsylvania Philadelphia, PA 19104, USA fxia,mpalmer @linc.cis.upenn.edu f g 1. INTRODUCTION (b) In the dependency structure, make the head of each non- Treebanksareoftwotypesaccordingtotheirannotationschemata: head-childdependontheheadofthehead-child. phrase-structureTreebankssuchastheEnglishPennTreebank[8] Figure1showsaphrasestructureintheEnglishPennTreebank and dependency Treebanks such as the Czech dependency Tree- [8]. In addition to the syntactic labels (such as NP for a noun bank[6]. LongbeforeTreebanksweredevelopedandwidelyused phrase),theTreebankalsousesfunctiontags(suchasSBJforthe fornaturallanguageprocessing,therehadbeenmuchdiscussionof subject)forgrammaticalfunctions.Inthisphrasestructure,theroot comparisonbetweendependencygrammarsandcontext-freephrase- node has twochildren: theNPand theVP.Thealgorithm would structuregrammars[5]. Inthispaper,weaddresstherelationship choose theVPasthehead-child andtheNPasanon-head-child, betweendependencystructuresandphrasestructuresfromapracti- and make the head Vinkin of the NP depend on the head join of calperspective;namely,theexplorationofdifferentalgorithmsthat theVPinthedependency structure. Thedependency structureof convertdependencystructurestophrasestructuresandtheevalua- the sentence isshown in Figure 2. A more sophisticated version tionoftheirperformanceagainstanexistingTreebank. Thiswork ofthealgorithm(asdiscussedin[10])takestwoadditionaltables notonlyprovideswaystoconvertTreebanksfromonetypeofrep- (namely,theargumenttableandthetagsettable)asinputandpro- resentationtotheother,butalsoclarifiesthedifferencesinrepre- ducesdependencystructureswiththeargument/adjunctdistinction sentationalcoverageofthetwoapproaches. (i.e.,eachdependent ismarkedinthedependency structureasei- theranargumentoranadjunctofthehead). 2. CONVERTING PHRASE STRUCTURES S TODEPENDENCYSTRUCTURES NP-SBJ VP The notion of head is important in both phrase structures and MD VP dependency structures. Inmany linguistictheoriessuch asX-bar NNP theory and GB theory, each phrase structure has a head that de- Vinken will VB NP PP-CLR NP-TMP terminesthemainpropertiesofthephraseandaheadhasseveral join DT NN NNP CD levelsofprojections; whereasinadependency structurethehead IN NP islinkedtoitsdependents. Inpractice,theheadinformationisex- the board Nov 29 plicitlymarkedinadependency Treebank,butnotalwayssoina as DT JJ NN phrase-structure Treebank. A common way to find the head in a director a phrasestructureistouseaheadpercolationtable,asdiscussedin nonexecutive [7,1]amongothers. Forexample,theentry(SrightS/VP)inthe headpercolationtablesaysthattheheadchild1ofanSnodeisthe Figure1:AphrasestructureinthePennTreebank firstchildofthenodefromtherightwiththelabelSorVP. Once the heads in phrase structures are found, the conversion fromphrasestructurestodependencystructuresisstraightforward, join asshownbelow: (a)Marktheheadchildofeachnodeinaphrasestructure,using Vinken will board as 29 theheadpercolationtable. the director Nov Thehead-child ofanode XPisthechildofthenodeXPthatis 1 theancestoroftheheadoftheXPinthephrasestructure. a nonexecutive Figure2:AdependencytreeforthesentenceinFigure1.Heads aremarkedasparentsoftheirdependentsinanorderedtree. Itisworthnotingthatquiteoftenthereisnoconsensusonwhat the correct dependency structure for a particular sentence should be.TobuildadependencyTreebank,theTreebankannotatorsmust decidewhichworddependsonwhichword;forexample,theyhave . to decide whether the subject Vinken in Figure 1 depends on the Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2001 2. REPORT TYPE 00-00-2001 to 00-00-2001 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Converting Dependency Structures to Phrase Structures 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Department of Computer and Information Sciences,University of REPORT NUMBER Pennsylvania,Philadelphia,PA,19104 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 5 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 modalverbwillorthemainverbjoin. Incontrast,theannotators furtherprojectstoXP.Therearethreetypesofrules,asshownin forphrase-structureTreebanksdonothavetomakesuchdecisions. Figure3(a). Algorithm1,asadoptedin[4,3],strictlyfollowsX- Theusersofphrase-structureTreebankscanmodifytheheadper- bar theory and uses the following heuristic rules to build phrase colationtablestogetdifferentdependencystructuresfromthesame structures: phrasestructure. Inotherwords,phrasestructuresoffermoreflex- ibility than dependency structures with respect to the choices of XP heads. (1) XP -> YP X’ X YP(Spec) X’ Thefeasibilityofusingtheheadpercolationtabletoidentifythe (2) X’ -> X’ WP headsinphrasestructuresdependsonthecharacteristicsofthelan- X’ WP(Mod) guage, theTreebank schema,andthedefinitionofthecorrectde- Y Z W (3) X’ -> X ZP (Spec) (Arg) (Mod) pendency structure. Forinstance,theheadpercolationtablefora X ZP(Arg) strictlyhead-final (orhead-initial)language isveryeasytobuild, and the conversion algorithm works very well. For the English PennTreebank,whichweusedinthispaper,theconversionalgo- (a) rules in (b) d-tree (c) phrase structure rithmworksverywellexceptforthenounphraseswiththeappos- X-bar theory itive construction. For example, the conversion algorithm would choose the appositive the CEO of FNX as the head child of the Figure 3: Rules in X-bar theory and Algorithm 1 (which is phraseJohnSmith,theCEOofFNX,whereasthecorrectheadchild basedonit) shouldbeJohnSmith. Twolevelsofprojectionsforanycategory:anycategoryXhas 3. CONVERTING DEPENDENCY STRUC- twolevelsofprojection:X’andXP. TURES TOPHRASE STRUCTURES Maximal projections for dependents: a dependent Y always projectstoY’thenYP,andtheYPattachestothehead’sprojection. Themaininformationthatispresentinphrasestructuresbutnot Fixed positions of attachment: Dependents are divided into independencystructuresisthetypeofsyntacticcategory(e.g.,NP, three types: specifiers, modifiers, and arguments. Each type of VP,andS);therefore,torecoversyntacticcategories,anyalgorithm dependentattachestoafixedposition,asshowninFigure3(c). that converts dependency structures to phrase structures needs to The algorithm would convert the d-tree in Figure 3(b) to the addressthefollowingquestions: phrase structure inFigure 3(c). Ifahead has multiple modifiers, Projectionsforeachcategory: foracategoryX,whatkindof thealgorithmcoulduseeitherasingleX’orstackedX’[3].Figure projectionscanXhave? 4showsthephrasestructureforthed-treeinFigure2,wherethe Projectionlevelsfordependents:Givenacategory depends algorithmusesasingleX’formultiplemodifiersofthesamehead.2 on a category in a dependency structure, how farYshould projectbeforeitXattachestoX’sprojection? Y VP Attachmentpositions: Givenacategory dependsonacate- gory inadependencystructure,towhatpoYsitiononX’sprojec- NP V’ tioncXhainshouldY’sprojectionattach? Inthissection,wediscussthreeconversionalgorithms,eachof N’ VP V’ PP NP which gives different answers to these three questions. To make NNP V’ VB NP P’ N’ thecomparisoneasy,weshallapplyeachalgorithmtothedepen- dencystructure(d-tree)inFigure2andcomparetheoutputofthe Vinken MD join DTP N’ IN NP NP N’ algorithmwiththephrasestructureforthatsentenceintheEnglish will DT’ NN DTP N’ N’ CD PennTreebank,asinFigure1. as Evaluating these algorithms is tricky because just like depen- DT board DT’ ADJP N’ NNP 29 dency structures there is often no consensus on what the correct the DT ADJ’ NN Nov phrasestructureforasentence shouldbe. Inthispaper, wemea- suretheperformanceofthealgorithmsbycomparingtheiroutput a JJ director with an existing phrase-structure Treebank (namely, the English nonexecutive PennTreebank) becauseofthefollowingreasons: first,theTree- bankisavailabletothepublic,andprovidesanobjectivealthough Figure4: Thephrasestructurebuiltbyalgorithm1forthed- imperfectstandard;second,onegoaloftheconversionalgorithms treeinFigure2 istomakeitpossibletocomparetheperformanceofparsersthat produce dependency structureswiththeones thatproduce phrase 3.2 Algorithm 2 structures. Sincemoststate-of-the-artphrase-structureparsersare evaluated against an existing Treebank, we want to evaluate the Algorithm2,asadoptedbyCollinsandhiscolleagues[2]when conversion algorithms in the same way; third, a potential appli- theyconvertedtheCzechdependency Treebank[6]intoaphrase- cation oftheconversion algorithms istohelp construct aphrase- structure Treebank, produces phrase structures that are as flat as structureTreebankforonelanguage,givenparallelcorporaandthe possible.Itusesthefollowingheuristicrulestobuildphrasestruc- phrase structures inthe other language. One way to evaluate the tures: qualityoftheresultingTreebankistocompareitwithanexisting Onelevelofprojectionforanycategory: Xhasonlyonelevel Treebank. ofprojection:XP. 3.1 Algorithm1 2Tomakethephrasestructuremorereadable,weuseN’andNPas theX’andXPforallkindsofPOStagsfornouns(e.g.,NNP,NN, According toX-bartheory, acategory XprojectstoX’,which andCD).Verbsandadjectivesaretreatedsimilarly. Minimalprojectionsfordependents:Adependent doesnot nounmodifiesaverb(orVP)suchasyesterdayinhecameyester- projectto unlessithasitsowndependents. Y day,thenounalwaysprojectstoNP,butwhenanoun modifiers Fixed pYosPition of attachment: A dependent is a sister of its another noun , projects to NP if is to theNr1ight of headinthephrasestructure.3 (e.g.,inanappNos2itiNve1construction)anditNd1oesnotprojecttoNPN2if Thealgorithmtreatsallkindsofdependentsequally. Itconverts istotheleftof . the pattern in Figure 5(a) to the phrase structure in Figure 5(b). N1AttachmentposNiti2ons: Bothalgorithmsassumethatallthede- NoticethatinFigure5(b),YdoesnotprojecttoYPbecauseitdoes pendentsofthesamedependencytypeattachatthesamelevel(e.g., nothaveitsowndependents. Theresultingphrasestructureforthe inAlgorithm1,modifiersaresistersofX’,whileinAlgorithm2, d-treeinFigure2isinFigure6,whichismuchflatterthantheone modifiersaresistersofX);butinthePTB,thatisnotalwaystrue. producedbyAlgorithm1. For example, an ADVP,which depends on a verb, may attach to eitheranSoraVPinthephrasestructureaccordingtotheposition X XP oftheADVPwithrespecttotheverbandthesubjectoftheverb. Also,innounphrases,leftmodifiers(e.g.,JJ)aresistersofthehead noun,whiletherightmodifiers(e.g.,PP)aresistersofNP. Y Z W Y X ZP WP Forsome applications, thesedifferencesbetween theTreebank (dep) (dep) (dep) andtheoutputoftheconversionalgorithmsmaynotmattermuch, and by nomeans areweimplying thatanexisting Treebank pro- vides the gold standard for what the phrase structures should be. (a) d-tree (b) phrase structure Nevertheless,becausethegoalofthisworkistoprovideanalgo- rithmthathas the flexibilitytoproduce phrase structuresthatare Figure5:TheschemeforAlgorithm2 as close to the ones in an existing Treebank aspossible, we pro- pose anewalgorithmwithsuch flexibility. Thealgorithm distin- guishes two types of dependents: arguments and modifiers. The algorithm also makes use of language-specific information inthe VP formofthreetables: theprojectiontable,theargumenttable,and the modification table. The projection table specifies the projec- NNP MD VB NP PP NP tionsforeachcategory.Theargumenttable(themodificationtable, resp.)liststhetypesofarguments(modifiers,resp)thataheadcan Vinken will join DT NN IN NP NNP CD takeandtheirpositionswithrespecttothehead. Forexample,the DT JJ NN Nov 29 entry intheprojectiontablesaysthataverbcan the board as projecVtto!avVePrb!phraSse, whichinturnprojectstoasentence; the a director entry(P01NP/S)intheargumenttableindicatesthatapreposition nonexecutive cantakeanargumentthatiseitheranNPoranS,andtheargument istotherightofthepreposition;theentry(NPDT/JJPP/S)inthe Figure 6: The phrase structure built by Algorithm 2 for the modificationtablesaysthatanNPcanbemodifiedbyadeterminer d-treeinFigure2 and/oranadjectivefromtheleft,andbyaprepositionphraseora sentencefromtheright. Giventhesetables,weusethefollowingheuristicrulestobuild 3.3 Algorithm3 phrasestructures:5 Oneprojectionchainpercategory:Eachcategoryhasaunique Theprevious twoalgorithmsarelinguisticallysound. Theydo projectionchain,asspecifiedintheprojectiontable. notuseanylanguage-specificinformation,andasaresultthereare Minimalprojectionfordependents: Acategoryprojectstoa severalmajordifferencesbetweentheoutputofthealgorithmsand higherlevelonlywhennecessary. thephrasestructuresinanexistingTreebank,suchasthePennEn- Lowestattachmentposition:Theprojectionofadependentat- glishTreebank(PTB). tachestoaprojectionofitsheadaslowlyaspossible. Projectionsforeachcategory:Bothalgorithmsassumethatthe The last two rules require further explanation, as illustrated in numbersofprojectionsforallthecategoriesarethesame,whereas Figure7. Inthefigure,thenodeXhasthreedependents: YandZ inthePTBthenumberofprojectionsvariesfromheadtohead.For arearguments,andWisamodifierofX.Let’sassumethattheal- example, inthePTB,determiners do notproject, adverbs project gorithmhasbuiltthephrasestructureforeachdependent. Toform only one levelto adverbial phrases, whereas verbs project toVP, thentoS,thentoSBAR.4 the phrase structure for the whole d-tree, the algorithm needs to attachthephrasestructuresfordependentstotheprojectionchain Projection levels for dependents: Algorithm 1 assumes the of the head X. For an argument such as , sup- maximalprojectionsforallthedependents,whileAlgorithm2as- 0 1 k pXos;eXits;p:r:o:Xjectionchainis andtherootoftheZphrase sumesminimalprojections;butinthePTB,thelevelofprojection 0 1 u structure headed by is Z.;TZhe;:a::lZgorithm would findthe low- ofadependentmaydependonseveralfactorssuchasthecategories s est position on tZhe heZad projection chain, such that has a ofthedependentandthehead,thepositionofthedependentwith h respecttothehead,andthedependencytype.Forexample,whena projection Xt thatcanbe anargumentof h(cid:0)1 accordinZg tothe argumenttaZbleand isnolowerthan Xontheprojectionchain t s 3IfadependentYhasitsowndependents,itprojectstoYPandYP for Z.The algorithmZthen makes t aZchildof h in the phrase isasisteroftheheadX;otherwise,YisasisteroftheheadX. structure.NoticethatbasedontheZsecondheuristXicrule(i.e.,mini- 4SissimilartoIP(IPisthemaximalprojection ofINFL)inGB malprojectionfordependents), tdoesnotfurtherprojectto uin theory,soisSBARtoCP(CPisthemaximalprojectionofComp); Z Z therefore,itcouldbearguedthatonlyVPisaprojectionofverbs In theory, the lasttwoheuristicrules may conflicteach otherin 5 in thePTB.Nevertheless, because PTBdoes notmark INFLand somecases.Inthosecases,wepreferthethirdruleoverthesecond. Comp,wetreatSandSBARasprojectionsofverbs. Inpractice,suchconflictingcasesareveryrare,ifexist. thiscasealthough isavalidprojectionof .Theattachmentfor u modifiersissimilaZrexceptthatthealgorithmZusesthemodification Althoughthethreealgorithmsadoptdifferentheuristicrulesto tableinsteadoftheargumenttable.6 build phrase structures, the first two algorithms are special cases of thelastalgorithm; that is, we can design a distinctset ofpro- Xk jection/argument/modification tables for each of the first two al- gorithmssothatrunningAlgorithm3withtheassociatedsetofta- Yr Xj Wn bsalemseforersAullgtsoaristhrumn1ni(nAglAgolgriothrimth2m,r1es(Apelcgtoivrietlhym)w2o,ureldsppercotdivuecley)t.he X Xj-1 Forexample,toproducetheresultsofAlgorithm2withthecode Y Z W Yq Xi m for(aA)lgInortihthempr3o,jethcteiothnretaebtlaeb,leeascshhohuealddbXehcraesaotendlyasonfoellporwojse:ction (Arg) (Arg) (Mod) YP Zu W Xh XP; Y0 Xh-1 Zt Wl ac(abte)gInortyheXairnguamde-tnrteeta,bthlee,nifinaccluadteegboorythYYcaanndbeYaPnaasragrugmumenetnotsf X0 Zs W0 ofX; Z0 (c)Inthemodificationtable, ifacategoryYcanbeamodifier ofacategoryXinad-tree,thenincludebothYandYPassister- (a) d-tree (b) phrase structure modifiersofXP. Figure7:TheschemeforAlgorithm3 4. EXPERIMENTS Sofar,wehavedescribedtwoexistingalgorithmsandproposed anewalgorithmforconverting d-treesintophrase structures. As S explained atthebeginning ofSection3, weevaluated the perfor- NP manceofthealgorithmsbycomparingtheiroutputwithanexist- NP ingTreebank.BecausetherearenoEnglishdependencyTreebanks VP available,wefirstranthealgorithminSection2toproduced-trees NNP NNP CD from the PTB, then applied these three algorithms to the d-trees NP Nov 29 andcomparedtheoutputwiththeoriginalphrasestructuresinthe Vinken MD VB DT NN PP PTB.7TheprocessisshowninFigure9. (e) (a) will join the board IN NP as DT JJ NN head percolation table head projection table (b) (f) argument table argument table a director tagset table modification table (c) nonexecutive new phrase phr-struct => d-trees d-tree => phr-struct structures compare results (d) d-tree (alg1, alg2, alg3) structures Figure8:ThephrasestructureproducedbyAlgorithm3 phrase structures in the PTB ThephrasestructureproducedbyAlgorithm3forthed-treein Figure2isinFigure8.InFigure8,(a)-(e)arethephrasestructures Figure9:Theflowchartoftheexperiment forfivedependentsoftheheadjoin;(f)istheprojectionchainfor thehead. Thearrowsindicatethepositionsoftheattachment. No- TheresultsareshowninTable1,whichuseSection0ofthePTB. ticethattoattach(a)to(f),theNNPVinkenneedstofurtherproject Theprecisionandrecallratesareforunlabelledbrackets. Thelast toNPbecauseaccordingtotheargumenttable,aVPcantakean columnshowstheratioofthenumberofbracketsproducedbythe NP,butnotanNNP,asitsargument. algorithms and the number of brackets in the original Treebank. InthePTB,amodifiereithersister-adjoinsorChomsky-adjoins From the table (especially the last column), itis clear thatAlgo- tothemodifiee. Forexample,inFigure1,theMDwillChomsky- rithm1producesmanymorebracketsthantheoriginalTreebank, adjoinswhereastheNPNov. 29sister-adjoinstotheVPnode. To resultinginahighrecallratebutlowprecisionrate. Algorithm2 accountforthat,wedistinguishthesetwotypesofmodifiersinthe producesveryflatstructures,resultinginalowrecallrateandhigh modificationtableandAlgorithm3isextendedsothatitwouldat- precisionrate. Algorithm3producesroughlythesamenumberof tachChomsky-adjoiningmodifiershigherbyinsertingextranodes. bracketsastheTreebankandhasthebestrecallrate,anditspreci- Toconvertthed-treeinFigure2,thealgorithminsertsanextraVP sionrateisalmostasgoodasthatofAlgorithm2. nodeinthephrasestructureinFigure8andattachestheMDwill The differences between the output of the algorithms and the tothenewVPnode;thefinalphrasestructureproducedbytheal- phrasestructuresinthePTBcomefromfoursources: gorithmisidenticaltotheoneinFigure1. (S1) AnnotationerrorsinthePTB 3.4 Algorithm 1 and 2 as special cases of Al- (S2) ErrorsintheTreebank-specifictablesusedbythealgorithms gorithm3 inSections2and3(e.g.,theheadpercolationtable,thepro- jectiontable,theargumenttable,andthemodificationtable) Note thatonce becomes a child of , other dependents of 6 t h (suchasW)thZatareonthesamesideaXs butarefurtheraway Punctuation marks are not part of the d-trees produced by Lex- 7 Xfrom canattachonlyto orhigheronZtheprojectionchainof Tract. Wewroteasimpleprogramtoattachthemashighaspossi- h . X X bletothephrasestructuresproducedbytheconversionalgorithms. X recall prec no-cross ave test/ improve theperformance oftheconversion algorithms inSection (%) (%) (%) cross gold 3,becausethephrasestructuresproducedbythealgorithmswould Alg1 81.34 32.81 50.81 0.90 2.48 thenhaveemptycategoriesaswell,justlikethephrasestructures Alg2 54.24 91.50 94.90 0.10 0.59 inthePTB.Third,ifasentenceincludesanon-projectiveconstruc- Alg3 86.24 88.72 84.33 0.27 0.98 tionsuchaswh-movementinEnglish,andifthedependency tree didnotincludeanemptycategorytoshowthemovement,travers- Table 1: Performance of three conversion algorithms on the ingthedependencytreewouldyieldthewrongwordorder.8 Section0ofthePTB 5. CONCLUSION (S3) The imperfection of the conversion algorithm in Section 2 Wehave proposed a new algorithm forconverting dependency (whichconvertsphrasestructurestod-trees) structures to phrase structures and compared it with two existing ones. We have shown that our algorithm subsumes the two ex- (S4) Mismatchesbetweentheheuristicrulesusedbythealgorithms isting ones. By using simple heuristic rules and taking as input inSection3andtheannotationschemataadoptedbythePTB certainkindsofTreebank-specificinformationsuchasthetypesof Toestimatethecontributionof(S1)–(S4)tothedifferencesbe- argumentsand modifiersthatahead cantake, ouralgorithmpro- tween the output of Algorithm 3 and the phrase structures in the ducesphrasestructuresthatareveryclosetotheonesinananno- PTB,wemanuallyexaminedthefirsttwentysentencesinSection tatedphrase-structureTreebank;moreover,thequalityofthephrase 0. Out of thirty-one differences in bracketing, seven are due to structuresproducedbyouralgorithmcanbefurtherimprovedwhen (S1),threearedueto(S2),sevenaredueto(S3),andtheremaining moreTreebank-specificinformationisused. Wealsoargueforin- fourteenmismatchesaredueto(S4). cludingemptycategoriesinthedependencystructures. Whilecorrectingannotationerrorstoeliminate(S1)requiresmore humaneffort,itisquitestraightforwardtocorrecttheerrorsinthe 6. ACKNOWLEDGMENTS Treebank-specific tables and therefore eliminate the mismatches This research reported here was made possible by NSF under caused by (S2). For(S3),wementioned inSection2thattheal- GrantNSF-89-20230-15andbyDARPAaspartoftheTranslingual gorithmchosethewrongheadsforthenounphraseswiththeap- InformationDetection,ExtractionandSummarization(TIDES)pro- positive construction. As for (S4), we found several exceptions gramunderGrantN66001-00-1-8915. (asshowninTable2)totheone-projection-chain-per-categoryas- sumption(i.e.,foreachPOStag,thereisauniqueprojectionchain), 7. REFERENCES anassumptionwhichwasusedbyallthreealgorithmsinSection3. Theperformanceoftheconversion algorithmsinSection2and3 [1] M.Collins.ThreeGenerative,LexicalisedModelsfor couldbeimprovedbyusingadditionalheuristicrulesorstatistical StatisticalParsing.InProc.ofthe35thACL,1997. information. For instance, Algorithm 3 in Section 3 could use a [2] M.Collins,J.Hajicˇ,L.Ramshaw,andC.Tillmann.A heuristicrulethatsaysthatanadjective(JJ)projectstoanNPifthe StatisticalParserforCzech.InProc.ofACL-1999,pages JJfollowsthedeterminertheandtheJJisnotfollowedbyanoun 505–512,1999. asinthericharegettingricher,anditprojectstoanADJPinother [3] M.Covington.AnEmpiricallyMotivatedReinterpretationof cases.NoticethatsuchheuristicrulesareTreebank-dependent. DependencyGrammar,1994.ResearchReportAI-1994-01. [4] M.Covington.GBTheoryasDependencyGrammar,1994. mostlikelyprojection otherprojection(s) ResearchReportAI-1992-03. JJ ADJP JJ NP [5] H.Gaifman.DependencySystemsandPhrase-Structure CD! NP CD! QP NP Systems.InformationandControl,pages304–337,1965. VBN! VP S VBN! VP! RRC [6] J.Hajicˇ.BuildingaSyntacticallyAnnotatedCorpus:The NN !NP ! NN !NX !NP PragueDependencyTreebank,1998.IssuesofValencyand VBG! VP S VBG! PP! Meaning(FestschriftforJarmilaPanevova´. ! ! ! [7] D.M.Magerman.StatisticalDecision-TreeModelsfor Table2:Someexamplesofheadswithmorethanoneprojection Parsing.InProc.ofthe33rdACL,1995. chain [8] M.Marcus,B.Santorini,andM.A.Marcinkiewicz.Building Emptycategoriesareoftenexplicitlymarkedinphrase-structures, aLargeAnnotatedCorpusofEnglish:thePennTreebank. buttheyarenotalwaysincludedindependencystructures. Webe- ComputationalLingustics,1993. lievethatincludingemptycategoriesindependencystructureshas [9] O.RambowandA.K.Joshi.Aformallookatdependency manybenefits. First,emptycategoriesareusefulforNLPapplica- grammarsandphrasestructuregrammarswithspecial tionssuchasmachinetranslation.Totranslateasentencefromone considertaionofword-orderphenomena.InL.Wenner, languagetoanother,manymachinetranslationsystemsfirstcreate editor,RecentTrendsinMeaning-TextTheory.John thedependency structureforthesentenceinthesourcelanguage, Benjamin,Amsterdam,Philadelphia,1997. thenproducethedependencystructureforthetargetlanguage,and [10] F.Xia,M.Palmer,andA.Joshi.AUniformMethodof finally generate a sentence in the target language. If the source GrammarExtractionanditsApplications.InProc.ofJoint language(e.g.,ChineseandKorean)allowsargumentdeletionand SIGDATConferenceonEmpiricalMethodsinNatural the target language (e.g., English) does not, it is crucial that the LanguageProcessingandVeryLargeCorpora dropped argument (which is a type of empty category) isexplic- (EMNLP/VLC),2000. itlymarkedinthesourcedependencystructure,sothatthemachine translationsystemsareawareoftheexistenceofthedroppedargu- mentandcanhandlethesituationaccordingly. Thesecondbenefit ofincludingemptycategoriesindependencystructuresisthatitcan Formorediscussionofnon-projectiveconstructions,see[9]. 8

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.