Use R! Advisors: RobertGentleman (cid:2) KurtHornik (cid:2) GiovanniParmigiani Use R! SeriesEditors:RobertGentleman,KurtHornik,andGiovanniParmigiani Albert:BayesianComputationwithR Bivand/Pebesma/Go´mez-Rubio:AppliedSpatialDataAnalysiswithR Claude:MorphometricswithR Cook/Swayne: Interactive and Dynamic Graphics for Data Analysis: With R andGGobi Hahne/Huber/Gentleman/Falcon:BioconductorCaseStudies Kleiber/Zeileis,AppliedEconometricswithR Nason:WaveletMethodsinStatisticswithR Paradis:AnalysisofPhylogeneticsandEvolutionwithR Peng/Dominici:StatisticalMethodsforEnvironmentalEpidemiologywithR: ACaseStudyinAirPollutionandHealth Pfaff:AnalysisofIntegratedandCointegratedTimeSerieswithR,2ndedition Sarkar:Lattice:MultivariateDataVisualizationwithR Spector:DataManipulationwithR Alain F. Zuur Elena N. Ieno l l Erik H.W.G. Meesters A Beginner’s Guide to R 1 3 AlainF.Zuur ElenaN.Ieno HighlandStatisticsLtd. HighlandStatisticsLtd. 6LaverockRoad 6LaverockRoad Newburgh Newburgh UnitedKingdomAB416FN UnitedKingdomAB416FN [email protected] [email protected] ErikH.W.G.Meesters IMARES,InstituteforMarine Resources&EcosystemStudies 1797SH’tHorntje TheNetherlands [email protected] ISBN978-0-387-93836-3 e-ISBN978-0-387-93837-0 DOI10.1007/978-0-387-93837-0 SpringerDordrechtHeidelbergLondonNewYork LibraryofCongressControlNumber:2009929643 #SpringerScienceþBusinessMedia,LLC2009 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permissionofthepublisher(SpringerScienceþBusinessMedia,LLC,233SpringStreet,NewYork, NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein connectionwithanyformofinformationstorageandretrieval,electronicadaptation,computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. Theuseinthispublicationoftradenames,trademarks,servicemarks,andsimilarterms,evenifthey arenotidentifiedassuch,isnottobetakenasanexpressionofopinionastowhetherornottheyare subjecttoproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To my future niece (who will undoubtedly cost me a lot of money) Alain F. Zuur To Juan Carlos and Norma Elena N. Ieno For Leontine and Ava, Rick, and Merel Erik H.W.G. Meesters Preface The Absolute R Beginner Forwhomwasthisbookwritten? Since2000,wehavetaughtstatisticstoover5000lifescientists.Thissoundsa lot, and indeed it is, but with some classes of 200 undergraduate students, numbers accumulate rapidly (although some courses have involved as few as 6 students). Most of our teaching has been done in Europe, but we have also conducted courses in South America, Central America, the Middle East, and New Zealand. Of course teaching at universities and research organisations means that our students may be from almost anywhere in the world. Partici- pants haveincluded undergraduates,butmost havebeenMScstudents,post- graduatestudents,post-docs,orseniorscientists,alongwithsomeconsultants andnonacademics. This experience has given us an informed awareness of the typical life scientist’s knowledge of statistics. The word ‘‘typical’’ may be misleading, as those scientists enrolling in a statistics course are likely to be those who are unfamiliar with the topic or have become rusty. In general, we have worked withpeoplewho,atsomestageintheireducationorcareer,havecompleteda statisticscoursecoveringsuchtopicsasmean,variance,t-test,Chi-squaretest, and hypothesis testing, and perhaps including half an hour devoted to linear regression. There are many books available on doing statistics with R. But this book doesnotdealwithstatistics,as,inourexperience,teachingstatisticsandRat the same time means two steep learning curves, one for the statistical metho- dologyandonefortheRcode.Thisismorethanmanystudentsarepreparedto undertake.Thisbookisintendedforpeopleseekinganelementaryintroduction to R. Obviously, the term ‘‘elementary’’ is vague; elementary in one person’s viewmaybeadvancedinanother’s. Rcontainsahigh‘‘youneedtoknowwhatyouaredoing’’content,andits applicationrequiresaconsiderableamountoflogicalthinking.Asstatisticians, itiseasytositinanivorytowerandexpectthelifescientisttoknockonourdoor andasktolearnourlanguage.Thisbookaimstomakethatlanguageassimple vii viii Preface as possible. If the phrase ‘‘absolute beginner’’ offends, we apologize, but it answersthequestion:Forwhomisthisbookintended? AllauthorsofthisbookareWindowsusersandhavelimitedexperiencewith LinuxandwithMacOS.Risalsoavailableforcomputerswiththeseoperating systems,andalltheRcodewepresentshouldrunproperlyonthem.However, theremaybesmalldifferenceswithsavinggraphs.Non-Windowsuserswillalso needtofindanalternativetothetexteditorTinn-R(Chapter1discusseswhere youcanfindinformationonthis). Datasets used in This book This book uses mainly life science data. Nevertheless, whatever your area of studyandwhateveryourdata,theprocedurespresentedwillapply.Scientistsin allfieldsneedtoimportdata,massagedata,makegraphs,and,finally,perform analyses.TheRcommandswillbeverysimilarineverycase.A200-pagebook doesnotofferagreatdealofscope forpresenting avarietyofdatasettypes, and, in our experience, widely divergent examples confuse the reader. The optimalapproachmaybetouseasingledatasettodemonstratealltechniques, butthisdoesnotmakemanypeoplehappy.Therefore,wehaveusedecologi- caldatasets(e.g.,involvingplants,marinebenthos,fish,birds)andepidemio- logicaldatasets. Alldatasetsusedinthisbookaredownloadablefromwww.highstat.com. Newburgh AlainF.Zuur Newburgh ElenaN.Ieno DenBurg ErikH.W.G.Meesters Acknowledgements We thank Chris Elphick for the sparrow data; Graham Pierce for the squid data; Monty Priede for the ISIT data; Richard Loyn for the Australian bird data;GerardJanssenforthebenthicdata;PamSikkinkforthegrasslanddata; AlexandreRoulinforthebarnowldata;MichaelReedandChrisElphickfor the Hawaiian bird data; Robert Cruikshanks, Mary Kelly-Quinn, and John O’HalloranfortheIrishriverdata;Joaquı´nVicenteandChristianGorta´zarfor thewildboaranddeerdata;KenMackenzieforthecoddata;SoniaMendesfor the whale data; Max Latuhihin and Hanneke Baretta-Bekker for the Dutch salinityandtemperaturedata;andAnto´nioMiraandFilipeCarvalhoforthe roadkilldata.Thefullreferencesaregiveninthetext. ThisisourthirdbookwithSpringer,andwethankJohnKimmelforgiving us the opportunity to write it. We also thank all course participants who commentedonthematerial. WethankAnatolySavelievandGemaHerna´dez-Milianforcommentingon earlierdraftsandKathleenHills(TheLucidusConsultancy)foreditingthetext. ix Contents Preface ................................................... vii Acknowledgements........................................... ix 1 Introduction............................................. 1 1.1 WhatIsR? ........................................ 1 1.2 DownloadingandInstallingR......................... 2 1.3 AnInitialImpression ................................ 4 1.4 ScriptCode........................................ 7 1.4.1 TheArtofProgramming....................... 7 1.4.2 DocumentingScriptCode...................... 8 1.5 GraphingFacilitiesinR.............................. 10 1.6 Editors ........................................... 12 1.7 HelpFilesandNewsgroups ........................... 13 1.8 Packages.......................................... 16 1.8.1 PackagesIncludedwiththeBaseInstallation....... 16 1.8.2 PackagesNotIncludedwiththeBase Installation.................................. 17 1.9 GeneralIssuesinR.................................. 19 1.9.1 QuittingRandSettingtheWorkingDirectory...... 21 1.10 AHistoryandaLiteratureOverview.................... 22 1.10.1 AShortHistoricalOverviewofR................ 22 1.10.2 BooksonRandBooksUsingR................. 22 1.11 UsingThisBook.................................... 24 1.11.1 IfYouAreanInstructor....................... 25 1.11.2 IfYouAreanInterestedReaderwithLimitedR Experience.................................... 25 1.11.3 IfYouAreanRExpert........................ 25 1.11.4 IfYouAreAfraidofR ........................ 25 1.12 CitingRandCitingPackages.......................... 26 1.13 WhichRFunctionsDidWeLearn?..................... 27 xi