Probabilistic Linguistics This page intentionally left blank Probabilistic Linguistics edited by Rens Bod, Jennifer Hay, and Stefanie Jannedy The MIT Press Cambridge, Massachusetts London, England 62003MassachusettsInstituteofTechnology Allrights reserved. Nopart of thisbook maybereproduced inanyform by any electronic or mechanical means (including photocopying, recording, or informa- tionstorageandretrieval)withoutpermissioninwritingfromthepublisher. This book was set in Times New Roman on 3B2 by Asco Typesetters, Hong Kong.PrintedandboundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationData Probabilisticlinguistics/editors:RensBod,JenniferHay,StefanieJannedy. p. cm. ‘‘...originatedasasymposiumon‘Probabilitytheoryinlinguistics’heldin Washington,D.C.aspartoftheLinguisticSocietyofAmericameetingin January2001’’—Preface. ‘‘Bradfordbooks.’’ Includesbibliographicalreferencesandindex. ISBN0-262-025360-1(hc.:alk.paper)—ISBN0-262-52338-8(pbk.:alk.paper) 1.Linguisticanalysis(Linguistics) 2.Linguistics—Statisticalmethods. 3.Probabilities. I.Bod,Rens,1965– II.Hay,Jennifer. III.Jannedy,Stefanie. P128.P73P76 2003 4100.105192—dc21 2002032165 10 9 8 7 6 5 4 3 2 1 Contents Preface vii Contributors ix Chapter1 Introduction 1 RensBod,JenniferHay,andStefanie Jannedy Chapter2 IntroductiontoElementaryProbability RensBod TheoryandFormalStochastic LanguageTheory 11 Chapter3 ProbabilisticModelingin DanJurafsky Psycholinguistics:Linguistic ComprehensionandProduction 39 Chapter4 ProbabilisticSociolinguistics:Beyond NormaMendoza-Denton,Jennifer VariableRules 97 Hay,andStefanieJannedy Chapter5 ProbabilityinLanguageChange 139 KieZuraw Chapter6 ProbabilisticPhonology: JanetB.Pierrehumbert DiscriminationandRobustness 177 Chapter7 ProbabilisticApproachesto R.HaraldBaayen Morphology 229 vi Contents Chapter8 ProbabilisticSyntax 289 ChristopherD.Manning Chapter9 ProbabilisticApproachesto ArielCohen Semantics 343 GlossaryofProbabilisticTerms 381 References 389 NameIndex 437 SubjectIndex 445 Preface A wide variety of evidence suggests that language is probabilistic. In lan- guage comprehension and production, probabilities play a role in access, disambiguation, and generation. In learning, probability plays a role in segmentation and generalization. In phonology and morphology, proba- bilities play a role in acceptability judgments and alternations. And in syntax and semantics, probabilities play a role in the gradience of cate- gories, syntactic well-formedness judgments, and interpretation. More- over, probabilities play a key role in modeling language change and language variation. This volume systematically investigates the probabilistic nature of lan- guage for a range of subfields of linguistics (phonology, morphology, syntax, semantics, psycholinguistics, historical linguistics, and sociolin- guistics), each covered by a specialist. The probabilistic approach to the study of language may seem opposed to the categorical approach, which has dominated linguistics for over 40 years. Yet one thesis of this book is thatthetwoapparentlyopposingviewsmayinfactgoverywelltogether: while categorical approaches focus on the endpoints of distributions of linguistic phenomena, probabilistic approaches focus on the gradient middleground. This book originated as the symposium ‘‘Probability Theory in Lin- guistics,’’ held in Washington, D.C., as part of the Linguistic Society of America meeting in January 2001. One outcome of the symposium was the observation that probability theory allows researchers to change the level of magnification when exploring theoretical and practical problems in linguistics. Another was the sense that a handbook on probabilistic linguistics, providing necessary background knowledge and covering the various subfields of language, was badly needed. We hope this book will fill that need. viii Preface We expect the book to be of interest to all students and researchers of language, whether theoretical linguists, psycholinguists, historical lin- guists, sociolinguists, or computational linguists. Because probability theory has not formed part of the traditional linguistics curriculum, we have included a tutorial on elementary probability theory and proba- bilistic grammars, which provides the background knowledge for under- standing the rest of the book. In addition, a glossary of probabilistic terms is given attheend of the book. Weare most grateful to the authors,who have givenmaximal e¤ort to write the overview chapters on probabilistic approaches to the various subfields of linguistics. We also thank the authors for their contribution to the review process. We are grateful to Michael Brent for his contri- bution to the original symposium and to Anne Mark for her excellent editorialwork.Finally,wewouldliketothanktheeditor,ThomasStone, for his encouragement and help during the processing of this book. The editors of this book worked on three di¤erent continents (with the South Pole equidistant from us all). We recommend this as a fabulously e‰cientway to work. The book neverslept. Contributors R. Harald Baayen R. Harald Baayen studied linguistics at the Free University of Amsterdam. In 1989, he completed his Ph.D. thesis on statistical and psychological aspects of morphological productivity. Since 1989, he has held postdoctoral positions at the Free University of Amsterdam and at the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands. He is now a‰liated with the University of Nijmegen. His research interests include lexical statistics in literary and linguisticcorpus-basedcomputing,generallinguistics,morphological theory,andthepsycholinguisticsofmorphologicalprocessinginlanguage comprehension and speech production. He has published in a variety of international journals, including Language,Linguistics, Computers and the Humanities, Computational Linguistics, Literary and Linguistic Com- puting, Journal of Quantitative Linguistics, Journal of Experimental Psy- chology, and Journal ofMemory and Language. Rens Bod Rens Bod received his Ph.D. from the University of Amster- dam. He is one of the principal architects of the Data-Oriented Parsing model, which provides a general framework for probabilistic natural lan- guage processing and which has also been applied to other perceptual modalities. He published his first scientific paper at the age of 15 and is the author of three books, including Beyond Grammar: An Experience- Based Theory of Language. He has also published in the fields of compu- tational musicology, vision science, aesthetics, and philosophy of science. He is a‰liated with the University of Amsterdam and the University of Leeds, where he works on spoken language processing and on unified modelsof linguistic, musical, and visual perception. ArielCohen ArielCohenreceivedhisPh.D.incomputationallinguistics from Carnegie Mellon University. In 1996, he joined the Department of