ebook img

Getting started with R. An Introduction for Biologists PDF

242 Pages·2017·4.243 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Getting started with R. An Introduction for Biologists

Getting Started with R An Introduction for Biologists Second Edition ANDREW P. BECKERMAN DYLAN Z. CHILDS DepartmentofAnimalandPlantSciences, UniversityofSheffield OWEN L. PETCHEY DepartmentofEvolutionaryBiology andEnvironmentalStudies, UniversityofZurich 3 3 GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom ©AndrewBeckerman,DylanChilds,&OwenPetchey2017 Themoralrightsoftheauthorshavebeenasserted FirstEditionpublishedin2012 SecondEditionpublishedin2017 Impression:1 PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressControlNumber:2016946804 ISBN978–0–19–878783–9(hbk.) ISBN978–0–19–878784–6(pbk.) DOI10.1093/acprof:oso/9780198787839.001.0001 Printedandboundby CPILitho(UK)Ltd,Croydon,CR04YY Contents Preface ix Introductiontothesecondedition ix Whatthisbookisabout xii Howthebookisorganized xiv WhyR? xvi Updates xviii Acknowledgements xviii Chapter1: GettingandGettingAcquaintedwithR 1 1.1 Gettingstarted 1 1.2 GettingR 2 1.3 GettingRStudio 5 1.4 Let’splay 6 1.5 UsingRasagiantcalculator(thesizeofyourcomputer) 8 1.6 Yourfirstscript 15 1.7 Intermezzoremarks 21 1.8 Importantfunctionality:packages 21 1.9 Gettinghelp 24 1.10 Amini-practical—somein-depthplay 26 1.11 Somemoretoptipsandhintsforasuccessfulfirst (andmore)Rexperience 28 Appendix1aMini-tutorialsolutions 29 Appendix1bFileextensionsandoperatingsystems 30 Chapter2: GettingYourDataintoR 35 2.1 GettingdatareadyforR 35 2.2 GettingyourdataintoR 40 2.3 Checkingthatyourdataareyourdata 45 2.4 Basictroubleshootingwhileimportingdata 48 2.5 Summingup 49 AppendixAdvancedactivity:dealingwithuntidydata 50 Chapter3: DataManagement,Manipulation,andExploration withdplyr 57 3.1 Summarystatisticsforeachvariable 58 3.2 dplyrverbs 59 3.3 Subsetting 60 3.4 Transforming 67 3.5 Sorting 68 3.6 Mini-summaryandtwotoptips 69 3.7 Calculatingsummarystatisticsaboutgroupsofyourdata 70 3.8 Whathaveyoulearned...lots 73 Appendix3aComparingclassicmethodsanddplyr 73 Appendix3bAdvanceddplyr 74 Chapter4: VisualizingYourData 79 4.1 Thefirststepineverydataanalysis—makingapicture 79 4.2 ggplot2:agrammarforgraphics 80 4.3 Box-and-whiskerplots 85 4.4 Distributions:makinghistogramsofnumericvariables 87 4.5 Savingyourgraphsforpresentation,documents,etc. 90 4.6 Closingremarks 91 Chapter5: IntroducingStatisticsinR 93 5.1 GettingstarteddoingstatisticsinR 93 5.2 χ2contingencytableanalysis 95 5.3 Two-samplet-test 103 5.4 Introducing...linearmodels 108 5.5 Simplelinearregression 109 5.6 Analysisofvariance:theone-wayANOVA 118 5.7 Wrappingup 128 AppendixGettingpackagesnotonCRAN 128 Chapter6: AdvancingYourStatisticsinR 131 6.1 Gettingstartedwithmoreadvancedstatistics 131 6.2 Thetwo-wayANOVA 131 6.3 Analysisofcovariance(ANCOVA) 145 6.4 Overview:ananalysisworkflow 164 Chapter7: GettingStartedwithGeneralizedLinearModels 167 7.1 Introduction 167 7.2 Countsandrates—PoissonGLMs 170 7.3 Doingitwrong 173 7.4 Doingitright—thePoissonGLM 177 7.5 WhenaPoissonGLMisn’tgoodforcounts 194 7.6 Summary,andbeyondsimplePoissonregression 201 Chapter8: PimpingYourPlots:ScalesandThemesinggplot2 203 8.1 Whatyoualreadyknowaboutgraphs 203 8.2 Preparation 204 8.3 Whatyoumaywanttocustomize 206 8.4 Axislabels,axislimits,andannotation 207 8.5 Scales 209 8.6 Thetheme 212 8.7 Summingup 218 Chapter9: ClosingRemarks:FinalCommentsand Encouragement 219 GeneralAppendices 223 Appendix1DataSources 223 Appendix2FurtherReading 224 Appendix3RMarkdown 225 Index 227 Preface Introductiontothesecondedition ThisisabookabouthowtouseR,anopensourceprogramminglanguage andenvironmentforstatistics.Itisnotabookaboutstatisticsperse,buta bookaboutgettingstartedusingR.Itisabookthatwehopewillteachyou howusingRcanmakeyourlife(researchcareer)easier. Several years ago we published the first edition of this book, aiming to help people move from ‘hearing about R’ to ‘using R’. We had realized thattherewerelotsofbooksaboutexploringdataanddoingstatisticswith R, but none specifically designed for people that didn’t have a lot of ex- perience or confidence in using much more than a spreadsheet, people thatdidn’thavealotoftime,andpeoplethatappreciatedanengagingand sometimeshumorousinitialjourneyintoR.Thefirsteditionwasalsode- signedforpeoplewhodidknowstatisticsandotherpackages,butwanted a quick ‘getting started’ guide, because, well, it is hard to get started with R in some ways. Overall, we aimed to make the somewhat steep learning curvemoreofawalkinthepark. Over the past five years much has changed. Most significantly, R has evolved as a platform for doing data analysis, for managing data, and for producing figures. Other things have not changed. People still seem to needandappreciatehelpinnavigatingtheprocessofgettingstartedwork- ingwithR.Thus,thisnewversionofthebookdoestwothings.Itretains ourfocusonhelpingyougetstartedusingR.Welovedoingthisandwe’ve beenteachingthisfor15years.Notsurprisingly,manyofyouarealsofind- ingthatthisgetting-startedbookisgreatforundergraduateandgraduate teaching.Wethankyouallforyourfeedback! Second,wehavesubstantiallyrevisedhowweuse,andthussuggestyou use,R.Ourchangesandsuggestionstakeadvantageofsomenewandvery cool,efficient,andstraightforwardtools.Wethinkthesechangeswillhelp youfocusevenmoreonyourdataandquestions.Thisisgood. If you compare this second edition with the first, you will find sev- eral differences. We no longer rely on base R tools and graphics for data manipulation and figure making, instead introducing dplyr and ggplot2. We’vealsoexpandedthesetofbasicstatisticsweintroducetoyou,includ- ing new examples of a simple regression and a one-way and a two-way ANOVA,inadditiontotheoldANCOVAexample.Third,weprovidean entirenewchapteronthegeneralizedlinearmodel.Oh,yes,andwehave addedanauthor,Dylan. WHAT’S SO DIFFERENT FROM THE FIRST EDITION? We teach a particular workflow for quantitative problem solving: have a clear question, get the right data for that question, inspect and visualize thedata,usethevisualizationtorevealtheanswertothequestion,makea statisticalmodelthatreflectsyourquestion,checktheassumptionsofthe model, interpret the model to confirm or refute your answer, and clearly andbeautifullycommunicateyouranswerinafigure. In R there are many different tools, and combinations of these tools, foraccomplishingthisworkflow.Inthefirsteditionofthisbookweintro- duced a set of ‘classic’ R tools drawn from the base R installation. These classic tools worked and, importantly, continue to work very well. We taught them in our courses for years. We used them in our research for years.Westillusethemsometimes.AndasyoustarttouseR,andinteract with people using R, and perhaps share code, you will find many people usingtheseclassictoolsandmethods. Butthetoolsandtheirsyntaxweredesignedalongtimeago.Manyem- ployaratheridiosyncraticsetofsymbolsandsyntaxtoaccomplishtasks. For example, square brackets are used for selecting parts of datasets, and dollarsignsforreferringtoparticularvariables.Sometimesdifferenttools thatperformsimilartasksworkinverydifferentways.Thismakesforra- theridiosyncraticinstructionsthatarenotsoeasyforpeopletoreadorto rememberhowtowrite. So after much deliberation, and some good experiences, we decided that in this second edition we would introduce a popular and new set of tools contributed by Sir1 Hadley Wickham and many key collaborators (http://had.co.nz).Thesenewtoolsintroduceasetofquitestand- ardized and coherent syntax and exist in a set of add-on packages—you willlearnexactlywhattheseareandhow tousethemlater.Andyouwill alsolearnsomebaseR.Infact,youwilllearnagreatdealofbaseR. WedecidedtoteachthisnewwayofusingRbecause: • The tools use a more ‘natural language’ that is easier for humans to workwith. • Thestandardizationandcoherenceamongthetoolsmakethemeasy tolearnanduse. • Thetoolsworkverywellforsimpleandsmallproblems,butalsoscale veryintuitivelyandnaturallytoquitecomplexandlargeproblems. • Therearetoolsforeverypartoftheworkflow,fromdatamanagement tostatisticalanalysisandmakingbeautifulgraphs. • Each of us independently migrated to this new set of tools, giving us greater confidence that it’s the way forward. (Well, Andrew was forcedabit.) Thoughweareconfidentthatteachingnewcomersthesenewtoolsisthe rightthingtodo,therearesomerisksand,inparticular,peopletaughtonly thesenewtoolsmaynotbeabletoworkeasilywithpeopleorcodeusing 1Unofficial knighthood for contributions to making our R-lives so much easier and beautiful. the classic way. Furthermore, some colleagues have questioned the wis- domofteachingthis‘modern’approachtoentry-levelstudents(i.e.those withnoorlittlepreviousexperiencewithR),especiallyiftaughtintheab- sence of the classic approach (funnily enough, many of these ‘concerned’ colleaguesdon’tuseRatall!).Certainlytherisksmentionedabovearereal, andforthatreasonweprovideashortappendixinChapter3(thechapter onDatamanagement)thatlinkstheclassicandnewmethods.Theclassic waycanstillsometimesbethebestway.Andolddogsdon’toftenagreeto learningnewtricks. Another concern voiced asks why we’re teaching ‘advanced R’ at entry level, with the idea that the use of new tools and add-on packages im- plies ‘advanced’. After all, why wouldn’t the ‘base’ R distribution contain everything an entry-level user needs? Well, it does, but we’ve found the standardizationandsyntaxintheadd-onpackagestobevaluableevenfor usasseasonedusers.Andoneshouldnotread‘base’Rdistributionas‘ba- sic’ R distribution, or ‘add-on’ package as ‘advanced’ package. The ‘base’ distribution contains many advanced tools, and many add-on packages containverybasictools. WehopeyouenjoythisnewGettingStartedwithR. Whatthisbookisabout WeloveR.Weusestatisticsinoureverydaylifeasresearchersandteach- ers.Sometimesevenmore:Owenusedittoexplorethenursingbehaviour of his firstborn. We are first and foremost evolutionary and community ecologists, but over the past 15 years we have developed, first in parallel and then together, an affinity for R. We want to share our 40+ years of combined experience using R to show you how easy, important, and ex- citingitcanbe.Thisbookisbasedon3–5-daycourseswegiveinvarious guises around the world. The courses are designed to give students and staffalikeaboostupthesteepinitiallearningcurveassociatedwithR. We assume that course participants, and you as readers, already use somespreadsheet,statistical,andgraphingprograms(suchasExcel,SPSS, Minitab, SAS, JMP, Statistica, and SigmaPlot). Most participants, and we hope you, have some grasp of common statistical methods, including the chi-squared test, the t-test, and ANOVA. In return for a few days of their lives, we give participants knowledge about how to easily use R, and R only, to manage data, make figures, and do statistics. R changed our researchlives, and many participants agree that it has done the same forthem. The efforts we put into developing the course and this book are, how- ever, minuscule compared with the efforts of the R Core Development Team. Please remember to acknowledge them and package contributors whenyouuseRtoanalyseandpublishyouramazingfindings. WHAT YOU NEED TO KNOW TO MAKE THIS BOOK WORK FOR YOU Thereareafewthingsthatyouneedtoknowtomakethisbook,andour ideas, work for you. Many of you already know how to do most of these things, having been in the Internet age for long enough now, but just to besure: 1. YouneedtoknowhowtodownloadthingsfromtheInternet.Ifyou useWindows,Macintosh,orLinux,theprinciplesarethesame,but the details are different. Know your operating system. Know your browserandknowyourmouse/trackpad. 2. Youneedtoknowhowtomakefoldersonyourcomputerandsave filestothem.Thisisessentialforbeingorganizedandefficient. 3. It is useful, though not essential, to understand what a ‘path’ is on yourcomputer.Thisistheaddressofafolderorafile(i.e.thepath toafile).OnWindows,dependingonthetypeyouareusing,thisin- volvesadrivename,acolon(:),andslashes(\or/).OnaMacintosh and Linux/Unix, this requires the names of your hard drive, the nameofyourhomedirectory,atilde(~),thenamesoffolders,and slashes(/).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.