ebook img

UNIVERSITY OF CALIFORNIA, SAN DIEGO Acquiring latent linguistic structure using ... PDF

137 Pages·2014·0.55 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview UNIVERSITY OF CALIFORNIA, SAN DIEGO Acquiring latent linguistic structure using ...

UC San Diego UC San Diego Electronic Theses and Dissertations Title Acquiring latent linguistic structure using computational models Permalink https://escholarship.org/uc/item/0tx98383 Author Doyle, Gabriel R. Publication Date 2014 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITYOFCALIFORNIA,SANDIEGO Acquiringlatentlinguisticstructureusingcomputationalmodels Adissertationsubmittedinpartialsatisfactionofthe requirementsforthedegreeofDoctorofPhilosophy in Linguistics by GabrielR.Doyle Committeeincharge: ProfessorRogerLevy,Chair ProfessorEricBakovic ProfessorDavidBarner ProfessorCharlesElkan ProfessorAndrewKehler 2014 Copyright GabrielR.Doyle,2014 Allrightsreserved. The Dissertation of Gabriel R. Doyle is approved and is acceptable in qualityandformforpublicationonmicrofilmandelectronically: Chair UniversityofCalifornia,SanDiego 2014 iii TABLEOFCONTENTS SignaturePage........................................................ iii TableofContents ..................................................... iv ListofFigures ........................................................ vii ListofTables ......................................................... viii Acknowledgements.................................................... ix Vita ................................................................. xii AbstractoftheDissertation ............................................. xiii Chapter1 Introduction ............................................... 1 1.1 ComputationalModels ......................................... 3 1.2 Thelearningproblem .......................................... 6 1.3 AssessingComputationalModels ................................ 8 1.4 Overviewofthemodels ........................................ 13 1.4.1 Chapter2: ConstraintAcquisitionwithoutPhonologicalStructure 14 1.4.2 Chapter3: ConstraintAcquisitionwithPhonologicalStructure . 14 1.4.3 Chapter4: Multiple-CueWordSegmentation................ 15 1.4.4 Chapter5: BurstinessinTopicModels ..................... 16 Chapter2 Nonparametric learning of phonological constraints in Optimality Theory.................................................... 17 2.1 Introduction .................................................. 17 2.2 PhonologyandOptimalityTheory................................ 19 2.2.1 OTstructure ........................................... 19 2.2.2 OTasaweighted-constraintmethod ....................... 20 2.2.3 OTinpractice .......................................... 21 2.2.4 LearningConstraints .................................... 22 2.3 TheIBPOTModel............................................. 24 2.3.1 Structure .............................................. 24 2.3.2 Inference .............................................. 25 2.4 Experiment................................................... 27 2.4.1 Wolofvowelharmony ................................... 27 2.4.2 ExperimentDesign...................................... 29 2.4.3 Results................................................ 30 2.5 DiscussionandFutureWork .................................... 33 2.5.1 Relationtophonotacticlearning........................... 33 2.5.2 Extendingthelearningmodel ............................. 34 iv 2.6 Conclusion ................................................... 35 2.7 Acknowledgments............................................. 35 Chapter3 Data-drivenacquisitionofphonologicalconstraintswithunderlying phonologicalstructure....................................... 37 3.1 Introduction .................................................. 38 3.2 PhonologicalAcquisition ....................................... 39 3.2.1 Constraint-BasedPhonology.............................. 39 3.2.2 Constraintstructuresandtheiracquisition................... 40 3.2.3 Previousemergentistmodels.............................. 43 3.3 Modeldesign ................................................. 45 3.3.1 Generalstructure ....................................... 45 3.3.2 Constraintgrammarandviolationprofiles .................. 47 3.3.3 InferenceonM andw ................................... 48 3.3.4 Inferenceovertheconstraintdefinitions .................... 51 3.4 Experiment................................................... 54 3.4.1 Englishregularpluralmorphophonology ................... 54 3.4.2 Theconstraintgrammar.................................. 55 3.4.3 Modelparameters....................................... 58 3.5 Results ...................................................... 59 3.5.1 Observedforms ........................................ 60 3.5.2 Predictivebehavior...................................... 61 3.5.3 ViolationProfilesandConstraintDefinitions ................ 63 3.5.4 ExperimentSummary ................................... 67 3.6 DiscussionandFutureDirections ................................ 67 3.6.1 Expansionoftheemergentistview......................... 67 3.6.2 Thenatureoftheunderlyingrepresentation ................. 68 3.6.3 Extendingthemodel .................................... 69 3.7 Conclusion ................................................... 71 Chapter4 CombiningmultipleinformationtypesinBayesianwordsegmentation 73 4.1 Introduction .................................................. 73 4.2 Previouswork ................................................ 74 4.2.1 Goldwateretal(2006)................................... 74 4.2.2 Acognitively-plausiblevariant............................ 76 4.2.3 Othermultiple-cuemodels ............................... 77 4.3 Modeldesign ................................................. 77 4.3.1 Onsyllabificationandstress .............................. 78 4.4 Data......................................................... 80 4.5 Experiments .................................................. 81 4.5.1 Parametersetting ....................................... 81 4.5.2 Stressimprovesperformance ............................. 81 4.5.3 Areisolatedwordsnecessary? ............................ 84 v 4.5.4 Boundedrationalityinhumansegmentation ................. 85 4.6 Futurework .................................................. 89 4.7 Conclusion ................................................... 91 4.8 Acknowledgments............................................. 91 Chapter5 Accountingforburstinessintopicmodels ...................... 92 5.1 Introduction .................................................. 92 5.2 OverviewofModels ........................................... 94 5.2.1 LatentDirichletallocation(LDA) ......................... 94 5.2.2 Dirichletcompoundmultinomial(DCM) ................... 96 5.2.3 DCMLDA ............................................. 98 5.3 MethodsofInference .......................................... 99 5.4 ExperimentalDesign........................................... 103 5.5 EmpiricalLikelihood .......................................... 104 5.6 Results ...................................................... 107 5.7 Discussion ................................................... 110 5.8 Acknowledgments............................................. 110 Chapter6 Conclusion ................................................ 112 References ........................................................... 114 vi LISTOFFIGURES Figure2.1. TableauxofWolofinputforms. ............................. 21 Figure2.2. Wolof violation profiles for phonologically standard constraint definitions. .............................................. 31 Figure3.1. Exampletree-structureswithintheRROTconstraintCFG. ...... 58 Figure4.1. Percentageofrunssegmentedwiththestressbiasasbiasvaries... 87 Figure5.1. LDAandDCMLDAgraphicalmodels........................ 96 Figure5.2. Mean per-document log-likelihood on the S&P 500 dataset for DCMLDAandfittedLDAmodels. .......................... 108 Figure5.3. Meanper-documentlog-likelihoodontheNIPSdatasetforDCMLDA andLDAmodels.......................................... 109 vii LISTOFTABLES Table2.1. IBPOTlog-probabilities. ................................... 30 Table3.1. Ruleswithinthephonologicalcontext-freegrammarforRROT. ... 56 Table3.2. Phonemesandtheirfeaturevalues. ........................... 57 Table3.3. RROTlog-probabilities. .................................... 60 Table3.4. RROTpredictiveprobabilities................................ 62 Table3.5. LikelyRROTconstraintdefinitions. .......................... 64 Table4.1. Multiple-cueEnglishcorpusstresspatternsbytypesandtokens. .. 80 Table4.2. Precision,recall,andF-scoreovercorporawithandwithoutstress informationavailable. ...................................... 82 Table4.3. Examplesofsegmentinganartificiallanguageaccordingtotransi- tionprobabilities(top)orstressbias(bottom)................... 86 Table5.1. Sampletopics foundbya20-topic DCMLDAmodeltrainedon the S&P500dataset. .......................................... 106 Table5.2. Sampletopicsfoundby a20-topicLDAmodeltrainedonthe S&P 500dataset. .............................................. 106 viii ACKNOWLEDGEMENTS There’s a part at the end of Norton Juster’s classic “The Phantom Tollbooth” wherethe herohasreturnedfrom adifficultquestand askshispatronsabout asecretthat theycouldnottellhimbeforehefinishedthequest. Hispatrons,representingtherealms oflanguageandmathematics,replyoff-handedlythatthetaskwasimpossible–“butif we’dtoldyouthen,youmightnothavegone–and,asyou’vediscovered,somanythings arepossiblejustaslongasyoudon’tknowthey’reimpossible.” That line stuck with me long before I actually understood it. I think I do now, thanksmostprominentlytothreepeople. Thefirsttwoaremyparents,Karen&Mike, who alwaystreated it asthe most naturalthing inthe world thatsomeone from afamily withaspottyacademicrecordshouldwanttogetadoctorate,anddidanythingtheycould to help get me there (or wherever else I would have hoped to end up). Their endless supportofandbeliefinmeledtothisdissertation. Theother personwho’shammered homeJuster’spoint hasbeen myadvisor and committeechair,RogerLevy,whoalwaysmanagestomakeitseemthattheworkyou’re tryingtodoiswellwithinyourgrasp,evenifitisn’t,andconvincesyoutogoalittlebit further, even if that’s impossible. I couldn’t have ended up in a better place or with a betteradvisor. Iowedeepthankstotherestofmycommitteeaswell: EricBakovic´,DaveBarner, CharlesElkan, andAndyKehler– aswellasRachel Mayberry, whowasonmy original committee before the topic shifted – who never failed to provide ideas, inspirations, and helpful inquisitions along a very winding research path. They were contagiously enthusiasticindiscussionsevenwhen Iwaswornout,andtheirabilitytoremindme of thephilosophicalforestwhenI’dgetstuckontreeswasessential. Themembers,pastandpresent,oftheComputationalPsycholinguisticsLabare alsoabigpartofthisdissertation,throughmany,manydiscussionsoflanguageandmath ix

Description:
This dissertation investigates the acquisition of latent linguistic structure using computational models, across a variety of linguistic structures and covering both appli- cations and psycholinguistic facets. Chapters 2 and 3 build models for the acquisition of phonological constraints from data,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.