Table Of ContentConverting Dependency Structures to Phrase Structures
Fei Xia and Martha Palmer
University of Pennsylvania
Philadelphia, PA 19104, USA
fxia,mpalmer @linc.cis.upenn.edu
f g
1. INTRODUCTION (b) In the dependency structure, make the head of each non-
Treebanksareoftwotypesaccordingtotheirannotationschemata: head-childdependontheheadofthehead-child.
phrase-structureTreebankssuchastheEnglishPennTreebank[8] Figure1showsaphrasestructureintheEnglishPennTreebank
and dependency Treebanks such as the Czech dependency Tree- [8]. In addition to the syntactic labels (such as NP for a noun
bank[6]. LongbeforeTreebanksweredevelopedandwidelyused phrase),theTreebankalsousesfunctiontags(suchasSBJforthe
fornaturallanguageprocessing,therehadbeenmuchdiscussionof subject)forgrammaticalfunctions.Inthisphrasestructure,theroot
comparisonbetweendependencygrammarsandcontext-freephrase- node has twochildren: theNPand theVP.Thealgorithm would
structuregrammars[5]. Inthispaper,weaddresstherelationship choose theVPasthehead-child andtheNPasanon-head-child,
betweendependencystructuresandphrasestructuresfromapracti- and make the head Vinkin of the NP depend on the head join of
calperspective;namely,theexplorationofdifferentalgorithmsthat theVPinthedependency structure. Thedependency structureof
convertdependencystructurestophrasestructuresandtheevalua- the sentence isshown in Figure 2. A more sophisticated version
tionoftheirperformanceagainstanexistingTreebank. Thiswork ofthealgorithm(asdiscussedin[10])takestwoadditionaltables
notonlyprovideswaystoconvertTreebanksfromonetypeofrep- (namely,theargumenttableandthetagsettable)asinputandpro-
resentationtotheother,butalsoclarifiesthedifferencesinrepre- ducesdependencystructureswiththeargument/adjunctdistinction
sentationalcoverageofthetwoapproaches. (i.e.,eachdependent ismarkedinthedependency structureasei-
theranargumentoranadjunctofthehead).
2. CONVERTING PHRASE STRUCTURES
S
TODEPENDENCYSTRUCTURES
NP-SBJ VP
The notion of head is important in both phrase structures and
MD VP
dependency structures. Inmany linguistictheoriessuch asX-bar NNP
theory and GB theory, each phrase structure has a head that de- Vinken will VB NP PP-CLR NP-TMP
terminesthemainpropertiesofthephraseandaheadhasseveral
join DT NN NNP CD
levelsofprojections; whereasinadependency structurethehead IN NP
islinkedtoitsdependents. Inpractice,theheadinformationisex- the board Nov 29
plicitlymarkedinadependency Treebank,butnotalwayssoina as DT JJ NN
phrase-structure Treebank. A common way to find the head in a director
a
phrasestructureistouseaheadpercolationtable,asdiscussedin nonexecutive
[7,1]amongothers. Forexample,theentry(SrightS/VP)inthe
headpercolationtablesaysthattheheadchild1ofanSnodeisthe Figure1:AphrasestructureinthePennTreebank
firstchildofthenodefromtherightwiththelabelSorVP.
Once the heads in phrase structures are found, the conversion
fromphrasestructurestodependencystructuresisstraightforward, join
asshownbelow:
(a)Marktheheadchildofeachnodeinaphrasestructure,using Vinken will board as 29
theheadpercolationtable.
the director Nov
Thehead-child ofanode XPisthechildofthenodeXPthatis
1
theancestoroftheheadoftheXPinthephrasestructure.
a nonexecutive
Figure2:AdependencytreeforthesentenceinFigure1.Heads
aremarkedasparentsoftheirdependentsinanorderedtree.
Itisworthnotingthatquiteoftenthereisnoconsensusonwhat
the correct dependency structure for a particular sentence should
be.TobuildadependencyTreebank,theTreebankannotatorsmust
decidewhichworddependsonwhichword;forexample,theyhave
. to decide whether the subject Vinken in Figure 1 depends on the
Report Documentation Page Form Approved
OMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE 3. DATES COVERED
2001 2. REPORT TYPE 00-00-2001 to 00-00-2001
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Converting Dependency Structures to Phrase Structures
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
Department of Computer and Information Sciences,University of REPORT NUMBER
Pennsylvania,Philadelphia,PA,19104
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF
ABSTRACT OF PAGES RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE 5
unclassified unclassified unclassified
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
modalverbwillorthemainverbjoin. Incontrast,theannotators furtherprojectstoXP.Therearethreetypesofrules,asshownin
forphrase-structureTreebanksdonothavetomakesuchdecisions. Figure3(a). Algorithm1,asadoptedin[4,3],strictlyfollowsX-
Theusersofphrase-structureTreebankscanmodifytheheadper- bar theory and uses the following heuristic rules to build phrase
colationtablestogetdifferentdependencystructuresfromthesame structures:
phrasestructure. Inotherwords,phrasestructuresoffermoreflex-
ibility than dependency structures with respect to the choices of XP
heads. (1) XP -> YP X’
X YP(Spec) X’
Thefeasibilityofusingtheheadpercolationtabletoidentifythe
(2) X’ -> X’ WP
headsinphrasestructuresdependsonthecharacteristicsofthelan-
X’ WP(Mod)
guage, theTreebank schema,andthedefinitionofthecorrectde- Y Z W
(3) X’ -> X ZP
(Spec) (Arg) (Mod)
pendency structure. Forinstance,theheadpercolationtablefora
X ZP(Arg)
strictlyhead-final (orhead-initial)language isveryeasytobuild,
and the conversion algorithm works very well. For the English
PennTreebank,whichweusedinthispaper,theconversionalgo- (a) rules in (b) d-tree (c) phrase structure
rithmworksverywellexceptforthenounphraseswiththeappos-
X-bar theory
itive construction. For example, the conversion algorithm would
choose the appositive the CEO of FNX as the head child of the
Figure 3: Rules in X-bar theory and Algorithm 1 (which is
phraseJohnSmith,theCEOofFNX,whereasthecorrectheadchild
basedonit)
shouldbeJohnSmith.
Twolevelsofprojectionsforanycategory:anycategoryXhas
3. CONVERTING DEPENDENCY STRUC- twolevelsofprojection:X’andXP.
TURES TOPHRASE STRUCTURES Maximal projections for dependents: a dependent Y always
projectstoY’thenYP,andtheYPattachestothehead’sprojection.
Themaininformationthatispresentinphrasestructuresbutnot
Fixed positions of attachment: Dependents are divided into
independencystructuresisthetypeofsyntacticcategory(e.g.,NP,
three types: specifiers, modifiers, and arguments. Each type of
VP,andS);therefore,torecoversyntacticcategories,anyalgorithm
dependentattachestoafixedposition,asshowninFigure3(c).
that converts dependency structures to phrase structures needs to
The algorithm would convert the d-tree in Figure 3(b) to the
addressthefollowingquestions:
phrase structure inFigure 3(c). Ifahead has multiple modifiers,
Projectionsforeachcategory: foracategoryX,whatkindof
thealgorithmcoulduseeitherasingleX’orstackedX’[3].Figure
projectionscanXhave?
4showsthephrasestructureforthed-treeinFigure2,wherethe
Projectionlevelsfordependents:Givenacategory depends algorithmusesasingleX’formultiplemodifiersofthesamehead.2
on a category in a dependency structure, how farYshould
projectbeforeitXattachestoX’sprojection? Y VP
Attachmentpositions: Givenacategory dependsonacate-
gory inadependencystructure,towhatpoYsitiononX’sprojec- NP V’
tioncXhainshouldY’sprojectionattach?
Inthissection,wediscussthreeconversionalgorithms,eachof N’ VP V’ PP NP
which gives different answers to these three questions. To make NNP V’ VB NP P’ N’
thecomparisoneasy,weshallapplyeachalgorithmtothedepen-
dencystructure(d-tree)inFigure2andcomparetheoutputofthe Vinken MD join DTP N’ IN NP NP N’
algorithmwiththephrasestructureforthatsentenceintheEnglish will DT’ NN DTP N’ N’ CD
PennTreebank,asinFigure1. as
Evaluating these algorithms is tricky because just like depen- DT board DT’ ADJP N’ NNP 29
dency structures there is often no consensus on what the correct the DT ADJ’ NN Nov
phrasestructureforasentence shouldbe. Inthispaper, wemea-
suretheperformanceofthealgorithmsbycomparingtheiroutput a JJ director
with an existing phrase-structure Treebank (namely, the English nonexecutive
PennTreebank) becauseofthefollowingreasons: first,theTree-
bankisavailabletothepublic,andprovidesanobjectivealthough Figure4: Thephrasestructurebuiltbyalgorithm1forthed-
imperfectstandard;second,onegoaloftheconversionalgorithms treeinFigure2
istomakeitpossibletocomparetheperformanceofparsersthat
produce dependency structureswiththeones thatproduce phrase
3.2 Algorithm 2
structures. Sincemoststate-of-the-artphrase-structureparsersare
evaluated against an existing Treebank, we want to evaluate the Algorithm2,asadoptedbyCollinsandhiscolleagues[2]when
conversion algorithms in the same way; third, a potential appli- theyconvertedtheCzechdependency Treebank[6]intoaphrase-
cation oftheconversion algorithms istohelp construct aphrase- structure Treebank, produces phrase structures that are as flat as
structureTreebankforonelanguage,givenparallelcorporaandthe possible.Itusesthefollowingheuristicrulestobuildphrasestruc-
phrase structures inthe other language. One way to evaluate the tures:
qualityoftheresultingTreebankistocompareitwithanexisting Onelevelofprojectionforanycategory: Xhasonlyonelevel
Treebank. ofprojection:XP.
3.1 Algorithm1 2Tomakethephrasestructuremorereadable,weuseN’andNPas
theX’andXPforallkindsofPOStagsfornouns(e.g.,NNP,NN,
According toX-bartheory, acategory XprojectstoX’,which andCD).Verbsandadjectivesaretreatedsimilarly.
Minimalprojectionsfordependents:Adependent doesnot nounmodifiesaverb(orVP)suchasyesterdayinhecameyester-
projectto unlessithasitsowndependents. Y day,thenounalwaysprojectstoNP,butwhenanoun modifiers
Fixed pYosPition of attachment: A dependent is a sister of its another noun , projects to NP if is to theNr1ight of
headinthephrasestructure.3 (e.g.,inanappNos2itiNve1construction)anditNd1oesnotprojecttoNPN2if
Thealgorithmtreatsallkindsofdependentsequally. Itconverts istotheleftof .
the pattern in Figure 5(a) to the phrase structure in Figure 5(b). N1AttachmentposNiti2ons: Bothalgorithmsassumethatallthede-
NoticethatinFigure5(b),YdoesnotprojecttoYPbecauseitdoes pendentsofthesamedependencytypeattachatthesamelevel(e.g.,
nothaveitsowndependents. Theresultingphrasestructureforthe inAlgorithm1,modifiersaresistersofX’,whileinAlgorithm2,
d-treeinFigure2isinFigure6,whichismuchflatterthantheone modifiersaresistersofX);butinthePTB,thatisnotalwaystrue.
producedbyAlgorithm1. For example, an ADVP,which depends on a verb, may attach to
eitheranSoraVPinthephrasestructureaccordingtotheposition
X XP oftheADVPwithrespecttotheverbandthesubjectoftheverb.
Also,innounphrases,leftmodifiers(e.g.,JJ)aresistersofthehead
noun,whiletherightmodifiers(e.g.,PP)aresistersofNP.
Y Z W Y X ZP WP
Forsome applications, thesedifferencesbetween theTreebank
(dep) (dep) (dep) andtheoutputoftheconversionalgorithmsmaynotmattermuch,
and by nomeans areweimplying thatanexisting Treebank pro-
vides the gold standard for what the phrase structures should be.
(a) d-tree (b) phrase structure Nevertheless,becausethegoalofthisworkistoprovideanalgo-
rithmthathas the flexibilitytoproduce phrase structuresthatare
Figure5:TheschemeforAlgorithm2 as close to the ones in an existing Treebank aspossible, we pro-
pose anewalgorithmwithsuch flexibility. Thealgorithm distin-
guishes two types of dependents: arguments and modifiers. The
algorithm also makes use of language-specific information inthe
VP
formofthreetables: theprojectiontable,theargumenttable,and
the modification table. The projection table specifies the projec-
NNP MD VB NP PP NP tionsforeachcategory.Theargumenttable(themodificationtable,
resp.)liststhetypesofarguments(modifiers,resp)thataheadcan
Vinken will join DT NN IN NP NNP CD takeandtheirpositionswithrespecttothehead. Forexample,the
DT JJ NN Nov 29 entry intheprojectiontablesaysthataverbcan
the board as projecVtto!avVePrb!phraSse, whichinturnprojectstoasentence; the
a director entry(P01NP/S)intheargumenttableindicatesthatapreposition
nonexecutive cantakeanargumentthatiseitheranNPoranS,andtheargument
istotherightofthepreposition;theentry(NPDT/JJPP/S)inthe
Figure 6: The phrase structure built by Algorithm 2 for the modificationtablesaysthatanNPcanbemodifiedbyadeterminer
d-treeinFigure2 and/oranadjectivefromtheleft,andbyaprepositionphraseora
sentencefromtheright.
Giventhesetables,weusethefollowingheuristicrulestobuild
3.3 Algorithm3 phrasestructures:5
Oneprojectionchainpercategory:Eachcategoryhasaunique
Theprevious twoalgorithmsarelinguisticallysound. Theydo
projectionchain,asspecifiedintheprojectiontable.
notuseanylanguage-specificinformation,andasaresultthereare
Minimalprojectionfordependents: Acategoryprojectstoa
severalmajordifferencesbetweentheoutputofthealgorithmsand
higherlevelonlywhennecessary.
thephrasestructuresinanexistingTreebank,suchasthePennEn-
Lowestattachmentposition:Theprojectionofadependentat-
glishTreebank(PTB).
tachestoaprojectionofitsheadaslowlyaspossible.
Projectionsforeachcategory:Bothalgorithmsassumethatthe
The last two rules require further explanation, as illustrated in
numbersofprojectionsforallthecategoriesarethesame,whereas
Figure7. Inthefigure,thenodeXhasthreedependents: YandZ
inthePTBthenumberofprojectionsvariesfromheadtohead.For
arearguments,andWisamodifierofX.Let’sassumethattheal-
example, inthePTB,determiners do notproject, adverbs project
gorithmhasbuiltthephrasestructureforeachdependent. Toform
only one levelto adverbial phrases, whereas verbs project toVP,
thentoS,thentoSBAR.4 the phrase structure for the whole d-tree, the algorithm needs to
attachthephrasestructuresfordependentstotheprojectionchain
Projection levels for dependents: Algorithm 1 assumes the
of the head X. For an argument such as , sup-
maximalprojectionsforallthedependents,whileAlgorithm2as- 0 1 k
pXos;eXits;p:r:o:Xjectionchainis andtherootoftheZphrase
sumesminimalprojections;butinthePTB,thelevelofprojection 0 1 u
structure headed by is Z.;TZhe;:a::lZgorithm would findthe low-
ofadependentmaydependonseveralfactorssuchasthecategories s
est position on tZhe heZad projection chain, such that has a
ofthedependentandthehead,thepositionofthedependentwith h
respecttothehead,andthedependencytype.Forexample,whena projection Xt thatcanbe anargumentof h(cid:0)1 accordinZg tothe
argumenttaZbleand isnolowerthan Xontheprojectionchain
t s
3IfadependentYhasitsowndependents,itprojectstoYPandYP for Z.The algorithmZthen makes t aZchildof h in the phrase
isasisteroftheheadX;otherwise,YisasisteroftheheadX. structure.NoticethatbasedontheZsecondheuristXicrule(i.e.,mini-
4SissimilartoIP(IPisthemaximalprojection ofINFL)inGB malprojectionfordependents), tdoesnotfurtherprojectto uin
theory,soisSBARtoCP(CPisthemaximalprojectionofComp); Z Z
therefore,itcouldbearguedthatonlyVPisaprojectionofverbs In theory, the lasttwoheuristicrules may conflicteach otherin
5
in thePTB.Nevertheless, because PTBdoes notmark INFLand somecases.Inthosecases,wepreferthethirdruleoverthesecond.
Comp,wetreatSandSBARasprojectionsofverbs. Inpractice,suchconflictingcasesareveryrare,ifexist.
thiscasealthough isavalidprojectionof .Theattachmentfor
u
modifiersissimilaZrexceptthatthealgorithmZusesthemodification Althoughthethreealgorithmsadoptdifferentheuristicrulesto
tableinsteadoftheargumenttable.6 build phrase structures, the first two algorithms are special cases
of thelastalgorithm; that is, we can design a distinctset ofpro-
Xk jection/argument/modification tables for each of the first two al-
gorithmssothatrunningAlgorithm3withtheassociatedsetofta-
Yr Xj Wn bsalemseforersAullgtsoaristhrumn1ni(nAglAgolgriothrimth2m,r1es(Apelcgtoivrietlhym)w2o,ureldsppercotdivuecley)t.he
X
Xj-1 Forexample,toproducetheresultsofAlgorithm2withthecode
Y Z W Yq Xi m for(aA)lgInortihthempr3o,jethcteiothnretaebtlaeb,leeascshhohuealddbXehcraesaotendlyasonfoellporwojse:ction
(Arg) (Arg) (Mod) YP Zu W
Xh XP;
Y0 Xh-1 Zt Wl ac(abte)gInortyheXairnguamde-tnrteeta,bthlee,nifinaccluadteegboorythYYcaanndbeYaPnaasragrugmumenetnotsf
X0 Zs W0 ofX;
Z0 (c)Inthemodificationtable, ifacategoryYcanbeamodifier
ofacategoryXinad-tree,thenincludebothYandYPassister-
(a) d-tree (b) phrase structure modifiersofXP.
Figure7:TheschemeforAlgorithm3 4. EXPERIMENTS
Sofar,wehavedescribedtwoexistingalgorithmsandproposed
anewalgorithmforconverting d-treesintophrase structures. As
S explained atthebeginning ofSection3, weevaluated the perfor-
NP manceofthealgorithmsbycomparingtheiroutputwithanexist-
NP ingTreebank.BecausetherearenoEnglishdependencyTreebanks
VP available,wefirstranthealgorithminSection2toproduced-trees
NNP NNP CD
from the PTB, then applied these three algorithms to the d-trees
NP Nov 29 andcomparedtheoutputwiththeoriginalphrasestructuresinthe
Vinken MD VB DT NN PP PTB.7TheprocessisshowninFigure9.
(e)
(a) will join the board IN NP
as DT JJ NN head percolation table head projection table
(b) (f) argument table argument table
a director tagset table modification table
(c) nonexecutive
new phrase
phr-struct => d-trees d-tree => phr-struct structures compare results
(d) d-tree (alg1, alg2, alg3) structures
Figure8:ThephrasestructureproducedbyAlgorithm3 phrase
structures
in the PTB
ThephrasestructureproducedbyAlgorithm3forthed-treein
Figure2isinFigure8.InFigure8,(a)-(e)arethephrasestructures Figure9:Theflowchartoftheexperiment
forfivedependentsoftheheadjoin;(f)istheprojectionchainfor
thehead. Thearrowsindicatethepositionsoftheattachment. No- TheresultsareshowninTable1,whichuseSection0ofthePTB.
ticethattoattach(a)to(f),theNNPVinkenneedstofurtherproject Theprecisionandrecallratesareforunlabelledbrackets. Thelast
toNPbecauseaccordingtotheargumenttable,aVPcantakean columnshowstheratioofthenumberofbracketsproducedbythe
NP,butnotanNNP,asitsargument. algorithms and the number of brackets in the original Treebank.
InthePTB,amodifiereithersister-adjoinsorChomsky-adjoins From the table (especially the last column), itis clear thatAlgo-
tothemodifiee. Forexample,inFigure1,theMDwillChomsky- rithm1producesmanymorebracketsthantheoriginalTreebank,
adjoinswhereastheNPNov. 29sister-adjoinstotheVPnode. To resultinginahighrecallratebutlowprecisionrate. Algorithm2
accountforthat,wedistinguishthesetwotypesofmodifiersinthe producesveryflatstructures,resultinginalowrecallrateandhigh
modificationtableandAlgorithm3isextendedsothatitwouldat- precisionrate. Algorithm3producesroughlythesamenumberof
tachChomsky-adjoiningmodifiershigherbyinsertingextranodes. bracketsastheTreebankandhasthebestrecallrate,anditspreci-
Toconvertthed-treeinFigure2,thealgorithminsertsanextraVP sionrateisalmostasgoodasthatofAlgorithm2.
nodeinthephrasestructureinFigure8andattachestheMDwill The differences between the output of the algorithms and the
tothenewVPnode;thefinalphrasestructureproducedbytheal- phrasestructuresinthePTBcomefromfoursources:
gorithmisidenticaltotheoneinFigure1.
(S1) AnnotationerrorsinthePTB
3.4 Algorithm 1 and 2 as special cases of Al-
(S2) ErrorsintheTreebank-specifictablesusedbythealgorithms
gorithm3
inSections2and3(e.g.,theheadpercolationtable,thepro-
jectiontable,theargumenttable,andthemodificationtable)
Note thatonce becomes a child of , other dependents of
6 t h
(suchasW)thZatareonthesamesideaXs butarefurtheraway Punctuation marks are not part of the d-trees produced by Lex-
7
Xfrom canattachonlyto orhigheronZtheprojectionchainof Tract. Wewroteasimpleprogramtoattachthemashighaspossi-
h
. X X bletothephrasestructuresproducedbytheconversionalgorithms.
X
recall prec no-cross ave test/ improve theperformance oftheconversion algorithms inSection
(%) (%) (%) cross gold 3,becausethephrasestructuresproducedbythealgorithmswould
Alg1 81.34 32.81 50.81 0.90 2.48 thenhaveemptycategoriesaswell,justlikethephrasestructures
Alg2 54.24 91.50 94.90 0.10 0.59 inthePTB.Third,ifasentenceincludesanon-projectiveconstruc-
Alg3 86.24 88.72 84.33 0.27 0.98 tionsuchaswh-movementinEnglish,andifthedependency tree
didnotincludeanemptycategorytoshowthemovement,travers-
Table 1: Performance of three conversion algorithms on the ingthedependencytreewouldyieldthewrongwordorder.8
Section0ofthePTB
5. CONCLUSION
(S3) The imperfection of the conversion algorithm in Section 2 Wehave proposed a new algorithm forconverting dependency
(whichconvertsphrasestructurestod-trees) structures to phrase structures and compared it with two existing
ones. We have shown that our algorithm subsumes the two ex-
(S4) Mismatchesbetweentheheuristicrulesusedbythealgorithms
isting ones. By using simple heuristic rules and taking as input
inSection3andtheannotationschemataadoptedbythePTB
certainkindsofTreebank-specificinformationsuchasthetypesof
Toestimatethecontributionof(S1)–(S4)tothedifferencesbe- argumentsand modifiersthatahead cantake, ouralgorithmpro-
tween the output of Algorithm 3 and the phrase structures in the ducesphrasestructuresthatareveryclosetotheonesinananno-
PTB,wemanuallyexaminedthefirsttwentysentencesinSection tatedphrase-structureTreebank;moreover,thequalityofthephrase
0. Out of thirty-one differences in bracketing, seven are due to structuresproducedbyouralgorithmcanbefurtherimprovedwhen
(S1),threearedueto(S2),sevenaredueto(S3),andtheremaining moreTreebank-specificinformationisused. Wealsoargueforin-
fourteenmismatchesaredueto(S4). cludingemptycategoriesinthedependencystructures.
Whilecorrectingannotationerrorstoeliminate(S1)requiresmore
humaneffort,itisquitestraightforwardtocorrecttheerrorsinthe 6. ACKNOWLEDGMENTS
Treebank-specific tables and therefore eliminate the mismatches
This research reported here was made possible by NSF under
caused by (S2). For(S3),wementioned inSection2thattheal-
GrantNSF-89-20230-15andbyDARPAaspartoftheTranslingual
gorithmchosethewrongheadsforthenounphraseswiththeap-
InformationDetection,ExtractionandSummarization(TIDES)pro-
positive construction. As for (S4), we found several exceptions
gramunderGrantN66001-00-1-8915.
(asshowninTable2)totheone-projection-chain-per-categoryas-
sumption(i.e.,foreachPOStag,thereisauniqueprojectionchain),
7. REFERENCES
anassumptionwhichwasusedbyallthreealgorithmsinSection3.
Theperformanceoftheconversion algorithmsinSection2and3 [1] M.Collins.ThreeGenerative,LexicalisedModelsfor
couldbeimprovedbyusingadditionalheuristicrulesorstatistical StatisticalParsing.InProc.ofthe35thACL,1997.
information. For instance, Algorithm 3 in Section 3 could use a [2] M.Collins,J.Hajicˇ,L.Ramshaw,andC.Tillmann.A
heuristicrulethatsaysthatanadjective(JJ)projectstoanNPifthe StatisticalParserforCzech.InProc.ofACL-1999,pages
JJfollowsthedeterminertheandtheJJisnotfollowedbyanoun 505–512,1999.
asinthericharegettingricher,anditprojectstoanADJPinother [3] M.Covington.AnEmpiricallyMotivatedReinterpretationof
cases.NoticethatsuchheuristicrulesareTreebank-dependent. DependencyGrammar,1994.ResearchReportAI-1994-01.
[4] M.Covington.GBTheoryasDependencyGrammar,1994.
mostlikelyprojection otherprojection(s) ResearchReportAI-1992-03.
JJ ADJP JJ NP
[5] H.Gaifman.DependencySystemsandPhrase-Structure
CD! NP CD! QP NP
Systems.InformationandControl,pages304–337,1965.
VBN! VP S VBN! VP! RRC
[6] J.Hajicˇ.BuildingaSyntacticallyAnnotatedCorpus:The
NN !NP ! NN !NX !NP
PragueDependencyTreebank,1998.IssuesofValencyand
VBG! VP S VBG! PP!
Meaning(FestschriftforJarmilaPanevova´.
! ! !
[7] D.M.Magerman.StatisticalDecision-TreeModelsfor
Table2:Someexamplesofheadswithmorethanoneprojection
Parsing.InProc.ofthe33rdACL,1995.
chain
[8] M.Marcus,B.Santorini,andM.A.Marcinkiewicz.Building
Emptycategoriesareoftenexplicitlymarkedinphrase-structures, aLargeAnnotatedCorpusofEnglish:thePennTreebank.
buttheyarenotalwaysincludedindependencystructures. Webe- ComputationalLingustics,1993.
lievethatincludingemptycategoriesindependencystructureshas [9] O.RambowandA.K.Joshi.Aformallookatdependency
manybenefits. First,emptycategoriesareusefulforNLPapplica- grammarsandphrasestructuregrammarswithspecial
tionssuchasmachinetranslation.Totranslateasentencefromone considertaionofword-orderphenomena.InL.Wenner,
languagetoanother,manymachinetranslationsystemsfirstcreate editor,RecentTrendsinMeaning-TextTheory.John
thedependency structureforthesentenceinthesourcelanguage, Benjamin,Amsterdam,Philadelphia,1997.
thenproducethedependencystructureforthetargetlanguage,and [10] F.Xia,M.Palmer,andA.Joshi.AUniformMethodof
finally generate a sentence in the target language. If the source GrammarExtractionanditsApplications.InProc.ofJoint
language(e.g.,ChineseandKorean)allowsargumentdeletionand SIGDATConferenceonEmpiricalMethodsinNatural
the target language (e.g., English) does not, it is crucial that the LanguageProcessingandVeryLargeCorpora
dropped argument (which is a type of empty category) isexplic- (EMNLP/VLC),2000.
itlymarkedinthesourcedependencystructure,sothatthemachine
translationsystemsareawareoftheexistenceofthedroppedargu-
mentandcanhandlethesituationaccordingly. Thesecondbenefit
ofincludingemptycategoriesindependencystructuresisthatitcan Formorediscussionofnon-projectiveconstructions,see[9].
8