ebook img

Analyzing Linguistic Data: A Practical Introduction to Statistics using R PDF

353 Pages·2008·4.44 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Analyzing Linguistic Data: A Practical Introduction to Statistics using R

This page intentionally left blank Analyzing Linguistic Data APracticalIntroductiontoStatisticsUsingR Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook pro- videsastraightforwardintroductiontothestatisticalanalysisoflanguagedata. Designedforlinguistswithanon-mathematicalbackground,itclearlyintroduces thebasicprinciplesandmethodsofstatisticalanalysis,usingR,theleadingcom- putationalstatisticsprogrammingenvironment.Thereaderisguidedstep-by-step througharangeofrealdatasets,allowingthemtoanalyzephoneticdata,construct phylogenetictrees,quantifyregistervariationincorpuslinguistics,andanalyze experimentaldatausingstate-of-the-artmodels.Thevisualizationofdataplays a key role, both in the early stages of data exploration and later on when the readerisencouragedtocriticizeinitialmodelsfittedtothedata.Containingover 40 exercises with model answers, this book will be welcomed by all linguists wishingtolearnmoreaboutworkingwithandpresentingquantitativedata. TheprogramRisavailableathttp://cran.at.r-project.org/.Thedatasetsandancil- laryfunctionsdiscussedinthisbookhavebeenbroughttogetherinthelanguage Rpackage,whichisavailableatthesameURL. r. h. baayen isProfessorofQuantitativeLinguisticsattheUniversityofAl- berta,Edmonton.HeisauthorofWordFrequencyDistributions(2001),co-editor ofMorphologicalStructureinLanguageProcessing(2003),andhaspublished widelyinlinguisticsandpsycholinguisticsjournals. Analyzing Linguistic Data A Practical Introduction to Statistics Using R R. H. BAAYEN CAMBRIDGEUNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB28RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521882590 © R. H. Baayen 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008 ISBN-13 978-0-511-38630-5 eBook (EBL) ISBN-13 978-0-521-88259-0 hardback ISBN-13 978-0-521-70918-7 paperback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. ToJorn,Corine,Thera,andTineke Contents Preface x 1 AnintroductiontoR 1 1.1 Rasacalculator 2 1.2 GettingdataintoandoutofR 4 1.3 Accessinginformationindataframes 6 1.4 Operationsondataframes 10 1.4.1 Sortingadataframebyoneormorecolumns 10 1.4.2 Changinginformationinadataframe 12 1.4.3 Extractingcontingencytablesfromdataframes 13 1.4.4 Calculationsondataframes 15 1.5 Sessionmanagement 18 2 Graphicaldataexploration 20 2.1 Randomvariables 20 2.2 Visualizingsinglerandomvariables 21 2.3 Visualizingtwoormorevariables 32 2.4 Trellisgraphics 37 3 Probabilitydistributions 44 3.1 Distributions 44 3.2 Discretedistributions 44 3.3 Continuousdistributions 57 3.3.1 Thenormaldistribution 58 3.3.2 Thet,F,andχ2distributions 63 4 Basicstatisticalmethods 68 4.1 Testsforsinglevectors 71 4.1.1 Distributiontests 71 4.1.2 Testsforthemean 75 4.2 Testsfortwoindependentvectors 77 4.2.1 Arethedistributionsthesame? 78 4.2.2 Arethemeansthesame? 79 4.2.3 Arethevariancesthesame? 81 4.3 Pairedvectors 82 4.3.1 Arethemeansormediansthesame? 82 4.3.2 Functionalrelations:linearregression 84 vii viii contents 4.3.3 Whatdoesthejointdensitylooklike? 97 4.4 Anumericalvectorandafactor:analysisofvariance 101 4.4.1 Twonumericalvectorsandafactor:analysis ofcovariance 108 4.5 Twovectorswithcounts 111 4.6 Anoteonstatisticalsignificance 114 5 Clusteringandclassification 118 5.1 Clustering 118 5.1.1 Tableswithmeasurements:principalcomponentsanalysis 118 5.1.2 Tableswithmeasurements:factoranalysis 126 5.1.3 Tableswithcounts:correspondenceanalysis 128 5.1.4 Tableswithdistances:multidimensionalscaling 136 5.1.5 Tableswithdistances:hierarchicalclusteranalysis 138 5.2 Classification 148 5.2.1 Classificationtrees 148 5.2.2 Discriminantanalysis 154 5.2.3 Supportvectormachines 160 6 Regressionmodeling 165 6.1 Introduction 165 6.2 Ordinaryleastsquaresregression 169 6.2.1 Nonlinearities 174 6.2.2 Collinearity 181 6.2.3 Modelcriticism 188 6.2.4 Validation 193 6.3 Generalizedlinearmodels 195 6.3.1 Logisticregression 195 6.3.2 Ordinallogisticregression 208 6.4 Regressionwithbreakpoints 214 6.5 Modelsforlexicalrichness 222 6.6 Generalconsiderations 236 7 Mixedmodels 241 7.1 Modelingdatawithfixedandrandomeffects 242 7.2 Acomparisonwithtraditionalanalyses 259 7.2.1 Mixed-effectsmodelsandquasi-F 260 7.2.2 Mixed-effectsmodelsandLatinSquaredesigns 266 7.2.3 Regressionwithsubjectsanditems 269 7.3 Shrinkageinmixed-effectsmodels 275 7.4 Generalizedlinearmixedmodels 278 7.5 Casestudies 284 7.5.1 PrimedlexicaldecisionlatenciesforDutchneologisms 284 7.5.2 Self-pacedreadinglatenciesforDutchneologisms 287 7.5.3 VisuallexicaldecisionlatenciesofDutch eight-year-olds 289 7.5.4 Mixed-effectsmodelsincorpuslinguistics 295

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.