Theory and Applications of Natural Language Processing Series Editors: GraemeHirst(Textbooks) EduardHovy(Editedvolumes) Mark Johnson(Monographs) AimsandScope The field of Natural Language Processing (NLP) has expanded explosively over the past decade: growing bodies of available data, novel fieldsof applications, emerging areas and newconnectionstoneighboringfieldshaveallledtoincreasingoutputandtodiversification ofresearch. “TheoryandApplicationsofNaturalLanguageProcessing”isaseriesofvolumesdedicated toselectedtopicsinNLPandLanguageTechnology.Itfocusesonthemostrecentadvances inallareasofthecomputationalmodelingandprocessingofspeechandtextacrosslanguages and domains. Due to the rapid pace of development, the diversity of approaches and application scenarios are scattered in an ever-growing mass of conference proceedings, making entry into the field difficult for both students and potential users. Volumes in the seriesfacilitatethisfirststepandcanbeusedasateachingaid,advanced-level information resourceorapointofreference. The series encourages the submission of research monographs, contributed volumes and surveys,lecturenotesandtextbookscoveringresearchfrontiersonallrelevanttopics,offering aplatform for the rapid publication of cutting-edge research as well as for comprehensive monographsthatcoverthefullrangeofresearchonspecificproblemareas. The topics include applications of NLP techniques to gain insights into the use and functioningoflanguage,aswellastheuseoflanguagetechnologyinapplicationsthatenable communication,knowledgemanagementanddiscoverysuchasnaturallanguagegeneration, informationretrieval,question-answering,machinetranslation,localizationandrelatedfields. Thebooksareavailableinprintedandelectronic(e-book)form: * DownloadableonyourPC,e-readeroriPad * Enhanced by Electronic Supplementary Material, such as algorithms, demonstrations, software,imagesandvideos * Available online withinan extensive network of academic and corporate R&D libraries worldwide * Neveroutofprintthankstoinnovativeprint-on-demandservices * Competitively priced print editions for eBook customers thanks to MyCopy service http://www.springer.com/librarians/e-content/mycopy Forothertitlespublishedinthisseries,goto www.springer.com/series/8899 Aline Villavicencio Thierry Poibeau (cid:2) Anna Korhonen Afra Alishahi (cid:2) Editors Cognitive Aspects of Computational Language Acquisition 123 Editors AlineVillavicencio ThierryPoibeau InstituteofInformatics EcoleNormaleSupe´rieure FederalUniversityofRioGrandedoSul Universite´SorbonneNouvelle PortoAlegre,Brazil LATTICE-CNRS,Paris,France AnnaKorhonen AfraAlishahi ComputerLaboratory TilburgcenterforCognition UniversityofCambridge andCommunication(TiCC) Cambridge,UK TilburgUniversity Tilburg,TheNetherlands ISSN2192-032X ISSN2192-0338(electronic) ISBN978-3-642-31862-7 ISBN978-3-642-31863-4(eBook) DOI10.1007/978-3-642-31863-4 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2012954240 (cid:2)c Springer-VerlagBerlinHeidelberg2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Acknowledgements We would like to acknowledge the support of the labex (cluster of excellence) TransferS (France), the Royal Society and EPSRC grant EP/F030061/1 (UK), CAPES-COFECUBgrant707/11andCNPqgrants479824/2009-6,551964/2011-1 478222/2011-4and309569/2009-5(Brazil). v • Contents ComputationalModeling as a MethodologyforStudying HumanLanguageLearning..................................................... 1 Thierry Poibeau, Aline Villavicencio, Anna Korhonen, andAfraAlishahi 1 Overview....................................................................... 1 1.1 TheoreticalAccountsofLanguageModularityandLearnability..... 2 1.2 InvestigationsofLinguisticHypotheses............................... 5 2 ComputationalModelsofLanguageLearning............................... 7 2.1 WhattoExpectfromaModel.......................................... 8 2.2 ModelingFrameworks.................................................. 10 2.3 ResearchMethods....................................................... 13 3 ImpactofComputationalModelingontheStudyofLanguage............. 16 4 ThisCollection................................................................ 17 4.1 MethodsandToolsforInvestigatingPhoneticsandPhonology...... 17 4.2 ClassifyingWordsandMappingThemtoMeanings.................. 18 4.3 LearningMorphologyandSyntax...................................... 19 4.4 LinkingSyntaxtoSemantics........................................... 20 5 ConcludingRemarks.......................................................... 22 References......................................................................... 22 PartI MethodsandToolsforInvestigatingPhoneticsand Phonology Phon: A ComputationalBasis for PhonologicalDatabase BuildingandModelTesting..................................................... 29 YvanRose,GregoryJ.Hedlund,RodByrne,ToddWareham,and BrianMacWhinney 1 Introduction.................................................................... 29 2 ThePhonBankProject ........................................................ 31 2.1 PhonBank................................................................ 31 2.2 Phon ..................................................................... 32 vii viii Contents 3 Phon............................................................................ 32 3.1 ProjectManagement.................................................... 33 3.2 MediaLinkageandSegmentation...................................... 34 3.3 DataTranscription ...................................................... 34 3.4 Multiple-BlindTranscriptionandTranscriptValidation .............. 35 3.5 TranscribedUtteranceSegmentation................................... 36 3.6 SyllabificationAlgorithm............................................... 36 3.7 AlignmentAlgorithm................................................... 38 4 DatabaseQuery................................................................ 40 4.1 Terminology............................................................. 40 4.2 ExecutingaQuery ...................................................... 41 4.3 CreatingaQuery........................................................ 42 4.4 AnIllustrativeExample................................................. 42 4.5 AdditionalInformation ................................................. 44 5 FutureProjects................................................................. 45 5.1 InterfaceforAcousticData............................................. 45 5.2 ExtensionsofDatabaseQueryFunctionality.......................... 46 6 Discussion ..................................................................... 47 References......................................................................... 48 LanguageDynamicsintheFrameworkofComplexNetworks: ACaseStudyonSelf-OrganizationoftheConsonantInventories ......... 51 AnimeshMukherjee,MonojitChoudhury,NiloyGanguly,and AnupamBasu 1 Introduction.................................................................... 51 2 PhonologicalInventories:APrimer.......................................... 53 3 NetworkModelofConsonantInventories................................... 56 3.1 DefinitionofPlaNet..................................................... 56 3.2 ConstructionMethodology............................................. 57 4 TopologicalPropertiesofPlaNet............................................. 58 4.1 DegreeDistributionofPlaNet.......................................... 58 5 TheSynthesisModel.......................................................... 61 6 InterpretationoftheSynthesisModel........................................ 64 6.1 MathematicalAnalysisoftheModel................................... 64 6.2 LinguisticInterpretationoftheModel................................. 66 7 DynamicsoftheLanguageFamilies ......................................... 68 8 Conclusion..................................................................... 71 Appendix........................................................................... 72 References......................................................................... 76 Contents ix PartII ClassifyingWordsandMappingThemtoMeanings From Cues to Categories: A Computational Study of Children’sEarlyWordCategorization........................................ 81 FatemehTorabiAsr,AfsanehFazly,andZohrehAzimifar 1 Introduction.................................................................... 82 2 RelatedWork.................................................................. 83 2.1 ExperimentalandCorpus-BasedStudies .............................. 84 2.2 RelatedComputationalModels ........................................ 84 3 OverviewofThisStudy....................................................... 85 4 ComponentsoftheCategorizationModel ................................... 87 4.1 CategorizationAlgorithm............................................... 87 4.2 CuesUsedinCategorization ........................................... 88 5 ExperimentalSetup ........................................................... 89 5.1 Corpus................................................................... 89 5.2 FeatureExtraction....................................................... 90 5.3 ModelParameters....................................................... 91 6 DiscoveringSyntacticCategories ............................................ 91 6.1 EvaluationStrategy ..................................................... 92 6.2 NovelWordCategorization............................................. 92 7 WordCategorizationandSemanticPrediction............................... 95 7.1 SemanticFeaturePrediction............................................ 96 7.2 SimulationoftheBrownExperiment.................................. 97 8 ConclusionsandFutureDirections........................................... 99 Appendix........................................................................... 102 References......................................................................... 102 InLearningNounsandAdjectivesRememberingMatters: ACorticalModel................................................................. 105 AlessioPlebe,VivianM.DelaCruz,andMarcoMazzone 1 Introduction.................................................................... 105 1.1 OnLearningFirstWords ............................................... 106 1.2 OnLearningNouns..................................................... 108 1.3 OnLearningAdjectives................................................. 109 1.4 ModelingNounandAdjectiveAcquisition............................ 110 2 DescriptionoftheModel ..................................................... 111 2.1 BasicUnitsoftheModel ............................................... 112 2.2 TheVisualPathway..................................................... 113 2.3 AuditoryPathway....................................................... 115 2.4 TheHigherCorticalMap............................................... 116 3 NounsandAdjectivesAcquisition ........................................... 118 3.1 SimulationofIntrinsicandExtrinsicExperience...................... 119 3.2 EmergenceofOrganizationintheLowerMaps....................... 120 3.3 RepresentationofNounsandAdjectivesinModelPFC.............. 120 3.4 PatternsofConnectivityofNounsandAdjectives .................... 123 x Contents 4 Conclusions.................................................................... 125 References......................................................................... 125 PartIII LearningMorphologyandSyntax TreebankParsingandKnowledgeofLanguage.............................. 133 SandiwayFong,IgorMalioutov,BeracahYankama,andRobertC. Berwick 1 Introduction:TreebankParsingandKnowledgeofLanguage.............. 133 2 ExperimentalMethods........................................................ 140 2.1 ParsingSystemsUsed................................................... 140 2.2 TrainingData,Testing,andEvaluation ................................ 140 3 CaseStudy:ParsingWh-QuestionsandQuestionBank..................... 141 3.1 AugmentingtheTrainingData......................................... 143 4 ParsingandTense:TheCaseofRead........................................ 146 5 CaseStudy:ParsingPassivesbyLinguisticRegularization................. 154 5.1 PassiveTransformations:APilotStudy................................ 154 6 Parsing“Unnatural”Languages?............................................. 160 6.1 TheExperimentalEmulation........................................... 162 6.2 Training,TestingandResults........................................... 163 7 DiscussionandConclusions.................................................. 167 References......................................................................... 169 RethinkingtheSyntacticBurstinYoungChildren .......................... 173 ChristopheParisse 1 Introduction.................................................................... 173 2 AssumptionsAboutChildren’sBehavior .................................... 174 3 ATestingProcedureinThreeSteps.......................................... 175 4 Analysis1...................................................................... 177 5 Analysis2...................................................................... 179 6 Analysis3...................................................................... 182 6.1 ResultsandDiscussion:Question1.................................... 183 6.2 ResultsandDiscussion:Question2.................................... 184 7 Analysis4...................................................................... 187 8 Discussion ..................................................................... 188 Appendix........................................................................... 192 References......................................................................... 194 PartIV LinkingSyntaxtoSemantics Learning to Interpret Novel Noun-Noun Compounds: EvidencefromCategoryLearningExperiments ............................. 199 BarryJ.Devereux*andFintanJ.Costello 1 Introduction.................................................................... 200 2 AnExemplar-BasedAccountofCompoundInterpretation................. 201