UUnniivveerrssiittyy ooff PPeennnnssyyllvvaanniiaa SScchhoollaarrllyyCCoommmmoonnss Publicly Accessible Penn Dissertations 2013 MMooddeellss ffoorr IImmpprroovveedd TTrraaccttaabbiilliittyy aanndd AAccccuurraaccyy iinn DDeeppeennddeennccyy PPaarrssiinngg Emily Pitler University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Computer Sciences Commons RReeccoommmmeennddeedd CCiittaattiioonn Pitler, Emily, "Models for Improved Tractability and Accuracy in Dependency Parsing" (2013). Publicly Accessible Penn Dissertations. 912. https://repository.upenn.edu/edissertations/912 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/912 For more information, please contact [email protected]. MMooddeellss ffoorr IImmpprroovveedd TTrraaccttaabbiilliittyy aanndd AAccccuurraaccyy iinn DDeeppeennddeennccyy PPaarrssiinngg AAbbssttrraacctt Automatic syntactic analysis of natural language is one of the fundamental problems in natural language processing. Dependency parses (directed trees in which edges represent the syntactic relationships between the words in a sentence) have been found to be particularly useful for machine translation, question answering, and other practical applications. For English dependency parsing, we show that models and features compatible with how conjunctions are represented in treebanks yield a parser with state-of-the-art overall accuracy and substantial improvements in the accuracy of conjunctions. For languages other than English, dependency parsing has often been formulated as either searching over trees without any crossing dependencies (projective trees) or searching over all directed spanning trees. The former sacrifices the ability to produce many natural language structures; the latter is NP-hard in the presence of features with scopes over siblings or grandparents in the tree. This thesis explores alternative ways to simultaneously produce crossing dependencies in the output and use models that parametrize over multiple edges. Gap inheritance is introduced in this thesis and 6 5 quantifies the nesting of subtrees over intervals. The thesis provides O(n ) and O(n ) edge-factored parsing algorithms for two new classes of trees based on this property, and extends the latter to include grandparent factors. This thesis then defines 1-Endpoint-Crossing trees, in which for any edge that is crossed, all other edges that cross that edge share an endpoint. This property covers 95.8% or more of dependency parses across a variety of languages. A crossing-sensitive factorization introduced in this thesis generalizes a commonly used third-order factorization (capable of scoring triples of edges simultaneously). This thesis provides exact dynamic programming algorithms that find the optimal 1-Endpoint-Crossing 4 tree under either an edge-factored model or this crossing-sensitive third-order model in O(n ) time, orders of magnitude faster than other mildly non-projective parsing algorithms and identical to the parsing time for projective trees under the third-order model. The implemented parser is significantly more accurate than the third-order projective parser under many experimental settings and significantly less accurate on none. DDeeggrreeee TTyyppee Dissertation DDeeggrreeee NNaammee Doctor of Philosophy (PhD) GGrraadduuaattee GGrroouupp Computer and Information Science FFiirrsstt AAddvviissoorr Mitchell P. Marcus SSeeccoonndd AAddvviissoorr Sampath Kannan KKeeyywwoorrddss algorithms, dependency, natural language processing, parsing, syntax, trees SSuubbjjeecctt CCaatteeggoorriieess Computer Sciences This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/912 MODELSFORIMPROVEDTRACTABILITYANDACCURACYINDEPENDENCY PARSING EmilyPitler ADISSERTATION in ComputerandInformationScience PresentedtotheFacultiesoftheUniversityofPennsylvania in PartialFulfillmentoftheRequirementsforthe DegreeofDoctorofPhilosophy 2013 SupervisorofDissertation Co-SupervisorofDissertation MitchellP.Marcus SampathKannan Professor,ComputerandInformationScience Professor,ComputerandInformationScience GraduateGroupChairperson ValTannen,Professor,ComputerandInformationScience DissertationCommittee: MichaelCollins,Professor,ComputerScience,ColumbiaUniversity ChrisCallison-Burch,AssistantProfessor,ComputerandInformationScience MarkLiberman,Professor,Linguistics BenTaskar,AssociateProfessor,ComputerandInformationScience MODELS FOR IMPROVED TRACTABILITY AND ACCURACY IN DEPENDENCY PARSING COPYRIGHT 2013 EmilyPitler Acknowledgements Mitch Marcus and Sampath Kannan were wonderful advisors and I am truly grateful to both of them for taking me on as a student. Mitch’s patience and unwavering support allowed me the freedom to find this topic for this thesis. His wide-ranging knowledge of both linguistics and computer science was incredibly valuable. Sampath inspired me to strive for clean solutions to problems. His clarity helped me to identify the crucial pieces ofanyresultandgreatlyimprovedthematerialpresentedhere. Iamgratefultomythesiscommittee: ChrisCallison-Burch,MikeCollins,MarkLiber- man, and Ben Taskar. I have benefited from Chris’s advice throughout graduate school, at both Penn and JHU. I have tried to emulate his clear presentation style in both my talks and my writing. Mike’s work in parsing served as inspiration for me for much of the work in this thesis. His detailed questions have both improved this document and pointed the way towards future directions. Mark’s questions at CLunch over the years always brought up subtle but important details. Discussions of treebank representations in this document are largely due to Mark’s influence. Ben helped me identify interesting directions at a for- mative stage in this work. Throughout the process, Ben encouraged me to simultaneously investigateboththeoreticalandempiricalissues. Alargenumberofpeoplecontributedtomyundertakingthisdissertationandfollowing through. Thanks especially to Jerry Berry and Stephen Rose at Thomas Jefferson High School for Science and Technology for introducing me to programming and computer sci- ence; Charles Yang, at Yale and Penn, for introducing me to computational linguistics; DanaAngluinandBrianScassellatiatYaleforencouragingmetopursuegraduateschool; iii and Ken Church, Dekang Lin, and Ani Nenkova for guiding me as I started research. I benefited tremendously from two summers with Ken: he patiently taught me how to write papers and how to approach problems from multiple perspectives. Dekang has been very supportive of me and my research ever since working together at a JHU CLSP Summer Workshop in 2009. Dekang introduced me to research in parsing and applications of pars- ing. Ani Nenkova and Annie Louis were wonderful collaborators. Annie Louis was my constant partner throughout graduate school; it was a pleasure to go through courses, re- search, internships, and other milestones together. Thanks to Katerina Fragkiadaki, Jenny Gillenwater, Arun Raghavan, and David Weiss for providing feedback on drafts and prac- ticetalks;ShaneBergsma,JasonEisner,AravindJoshi,JulieLegate,RyanMcDonald,Slav Petrov,GiorgioSatta,andDavidYarowskyforproductiveconversationsthatimpactedthis thesis; Terry Koo for his assistance with the dpo3 parser; Daniel Zeman for his assistance withtheHamleDTtreebankconversionsoftware;MikeFelkerforremovingadministrative hurdles; Cheryl Hickey for all her support; the students, staff, and volunteers at White- Williams Scholars/Philadelphia Futures for a rewarding time and memories; Drew Hilton, Santosh Nagarakatte, and Arun Raghavan for long games of bridge; Katie Gibson for bak- ingdeliciouscupcakesandfudge;andotherstudentsIhadthepleasureofoverlappingwith at Penn, including Adam Aviv, Alexis Baird, John Blitzer, Chris Casinghino, Erwin Chan, SanjianChen,NikhilDinesh,MarkDredze,RyanGabbard,KuzmanGanchev,Joa˜oGrac¸a, MichaelGreenberg,KaiHong,LiangHuang,MarieJacob,SarahJohnstone,AndrewKing, Alex Kulesza, Junyi Li, Constantine Lignos, Xi Lin, Ellie Pavlick, Alex Roderer, Sudeepa Roy,BenSapp,ParthaPratimTalukdar,AndrewWest,andQiuyeZhao. Iamdeeplygratefultomycollegefriends: Alison,Betsy,Caroline,Charlie,Guy,Illana, Jackie, Joel, Julia, Lekshmi, Michael, and Samarth for their visits and encouragement. Samarth and I were classmates from middle school through undergrad and we wrote our first paper together. I would like to thank my brother Will, my grandparents, and Sudipto fortheirsteadfastbeliefinme. Mostimportantly,Iamthankfultomyparentsfornurturing mycuriosityandfortheirloveandsupport. iv ABSTRACT MODELSFORIMPROVEDTRACTABILITYANDACCURACYINDEPENDENCY PARSING EmilyPitler MitchellP.Marcus SampathKannan Automaticsyntacticanalysisofnaturallanguageisoneofthefundamentalproblemsinnat- ural language processing. Dependency parses (directed trees in which edges represent the syntacticrelationshipsbetweenthewordsinasentence)havebeenfoundtobeparticularly usefulformachinetranslation,questionanswering,andotherpracticalapplications. For English dependency parsing, we show that models and features compatible with how conjunctions are represented in treebanks yield a parser with state-of-the-art overall accuracyandsubstantialimprovementsintheaccuracyofconjunctions. ForlanguagesotherthanEnglish,dependencyparsinghasoftenbeenformulatedasei- thersearchingovertreeswithoutanycrossingdependencies(projectivetrees)orsearching over all directed spanning trees. The former sacrifices the ability to produce many natu- ral language structures; the latter is NP-hard in the presence of features with scopes over siblingsorgrandparentsinthetree. Thisthesisexploresalternativewaystosimultaneouslyproducecrossingdependencies intheoutputandusemodelsthatparametrizeovermultipleedges. Gap inheritance is introduced in this thesis and quantifies the nesting of subtrees over intervals. The thesis provides O(n6) and O(n5) edge-factored parsing algorithms for two new classes of trees based on this property, and extends the latter to include grandparent factors. Thisthesisthendefines1-Endpoint-Crossingtrees,inwhichforanyedgethatiscrossed, all other edges that cross that edge share an endpoint. This property covers 95.8% or more of dependency parses across a variety of languages. A crossing-sensitive factorization in- troduced in this thesis generalizes a commonly used third-order factorization (capable of v scoringtriplesofedgessimultaneously). This thesis provides exact dynamic programming algorithms that find the optimal 1- Endpoint-Crossingtreeundereitheranedge-factoredmodelorthiscrossing-sensitivethird- order model in O(n4) time, orders of magnitude faster than other mildly non-projective parsing algorithms and identical to the parsing time for projective trees under the third- order model. The implemented parser is significantly more accurate than the third-order projectiveparserundermanyexperimentalsettingsandsignificantlylessaccurateonnone. vi Contents Acknowledgements iii 1 Introduction 1 1.1 Background: Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 ParsingwithUnlabeledDataandRelevantFactorizations . . . . . . 7 1.3 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 ProjectiveTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Arborescences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 ExistingDefinitionsofMildlyNon-projectiveTrees . . . . . . . . 10 1.3.4 ClassesofTreesProposedinthisThesis . . . . . . . . . . . . . . . 10 1.4 FactorizationsthatFacilitateSearch . . . . . . . . . . . . . . . . . . . . . 12 1.5 ThesisContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factoriza- tions 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 ConversiontoDependencyRepresentations . . . . . . . . . . . . . . . . . 18 2.3 ImplicationsofRepresentationsontheScopeofFactorization . . . . . . . 22 2.3.1 Edge-basedScoring . . . . . . . . . . . . . . . . . . . . . . . . . 22 vii
Description: