Table Of ContentBasic Statistics With R
Basic Statistics With R
Reaching Decisions With Data
Stephen C. Loftus
DivisionofScience,Technology,EngineeringandMath
SweetBriarCollege
SweetBriar,VA,UnitedStates
AcademicPressisanimprintofElsevier
125LondonWall,LondonEC2Y5AS,UnitedKingdom
525BStreet,Suite1650,SanDiego,CA92101,UnitedStates
50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates
TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UnitedKingdom
Copyright©2022ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,
electronicormechanical,includingphotocopying,recording,oranyinformationstorageand
retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek
permission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangements
withorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,
canbefoundatourwebsite:www.elsevier.com/permissions.
Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe
Publisher(otherthanasmaybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand
experiencebroadenourunderstanding,changesinresearchmethods,professionalpractices,or
medicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein
evaluatingandusinganyinformation,methods,compounds,orexperimentsdescribedherein.In
usingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyof
others,includingpartiesforwhomtheyhaveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,
assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts
liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products,
instructions,orideascontainedinthematerialherein.
LibraryofCongressCataloging-in-PublicationData
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
BritishLibraryCataloguing-in-PublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
ISBN:978-0-12-820788-8
ForinformationonallAcademicPresspublications
visitourwebsiteathttps://www.elsevier.com/books-and-journals
Publisher:KateyBirtcher
EditorialProjectManager:AliceGrant
ProductionProjectManager:BeulaChristopher
Designer:PatrickC.Ferguson
TypesetbyVTeX
To Michelle, to whom I should have listened sooner.
Contents
Biography xv
Preface xvii
Acknowledgments xix
Part I
An introduction to statistics and R
1. Whatisstatisticsandwhyisitimportant?
1.1 Introduction 3
1.2 Sowhatisstatistics? 4
1.2.1 Theprocessofstatistics 4
1.2.2 Hypothesis/questions 4
1.2.3 Datacollection 5
1.2.4 Datadescription 5
1.2.5 Statisticalinference 5
1.2.6 Theories/decisions 6
1.3 Computationandstatistics 6
2. AnintroductiontoR
2.1 Installation 7
2.2 Classesofdata 7
2.3 MathematicaloperationsinR 8
2.4 Variables 9
2.5 Vectors 11
2.6 Dataframes 12
2.7 Practiceproblems 13
2.8 Conclusion 14
Part II
Collecting data and loading it into R
3. Datacollection:methodsandconcerns
3.1 Introduction 17
vii
viii Contents
3.2 Componentsofdatacollection 17
3.3 Observationalstudies 18
3.3.1 Biasesinsurveysampling 19
3.3.2 Practiceproblems 21
3.4 Designedexperiments 21
3.4.1 Practiceproblems 23
3.5 Observationalstudiesandexperiments:whichtouse? 23
3.5.1 Practiceproblems 24
3.6 Conclusion 25
4. Rtutorial:subsettingdata,randomnumbers,and
selectingarandomsample
4.1 Introduction 27
4.2 Subsettingvectors 27
4.3 Subsettingdataframes 29
4.4 RandomnumbersinR 31
4.5 Selectarandomsample 32
4.6 GettinghelpinR 33
4.7 Practiceproblems 33
4.8 Conclusion 35
5. Rtutorial:librariesandloadingdataintoR
5.1 Introduction 37
5.2 LibrariesinR 37
5.3 Loadingdatasetsstoredinlibraries 42
5.4 LoadingcsvfilesintoR 42
5.5 Practiceproblems 43
5.6 Conclusion 43
Part III
Exploring and describing data
6. Exploratorydataanalyses:describingourdata
6.1 Introduction 47
6.2 Parametersandstatistics 47
6.3 Parameters,statistics,andEDAforcategoricalvariables 48
6.3.1 Practiceproblems 50
6.4 Parameters,statistics,andEDAforasinglequantitativevariable 51
6.4.1 Statisticsforthecenterofavariable 51
6.4.2 Practiceproblems 53
6.4.3 Statisticsforthespreadofavariable 54
6.4.4 Practiceproblems 56
6.5 Visualsummariesforasinglequantitativevariables 57
6.6 Identifyingoutliers 59
Contents ix
6.6.1 Practiceproblems 61
6.7 Exploringrelationshipsbetweenvariables 61
6.8 Exploringassociationbetweencategoricalpredictorand
quantitativeresponse 62
6.8.1 Practiceproblems 65
6.9 Exploringassociationbetweentwoquantitativevariables 65
6.9.1 Practiceproblems 71
6.10 Conclusion 72
7. Rtutorial:EDAinR
7.1 Introduction 73
7.2 FrequencyandcontingencytablesinR 73
7.3 NumericalexploratoryanalysesinR 74
7.3.1 Summariesforthecenterofavariable 74
7.3.2 Summariesforthespreadofavariable 75
7.3.3 Summariesfortheassociationbetweentwoquantitative
variables 76
7.4 Missingdata 77
7.5 Practiceproblems 78
7.6 GraphicalexploratoryanalysesinR 78
7.6.1 Scatterplots 78
7.6.2 Histograms 80
7.7 Boxplots 82
7.8 Practiceproblems 84
7.9 Conclusion 85
Part IV
Mechanisms of inference
8. Anincrediblybriefintroductiontoprobability
8.1 Introduction 89
8.2 Randomphenomena,probability,andtheLawofLarge
Numbers 90
8.3 Whatistheroleofprobabilityininference? 91
8.4 Calculatingprobabilityandtheaxiomsofprobability 92
8.5 Randomvariablesandprobabilitydistributions 94
8.6 Thebinomialdistribution 95
8.7 Thenormaldistribution 96
8.8 Practiceproblems 98
8.9 Conclusion 99
9. Samplingdistributions,orwhyexploratoryanalyses
arenotenough
9.1 Introduction 101
x Contents
9.2 Samplingdistributions 101
9.3 Propertiesofsamplingdistributionsandthecentrallimit
theorem 105
9.4 Practiceproblems 107
9.5 Conclusion 107
10. Theideabehindtestinghypotheses
10.1 Introduction 109
10.2 Aladytastingtea 109
10.3 Hypothesistesting 110
10.3.1 Whatarewetesting? 110
10.3.2 Howrareisourdata? 112
10.3.3 Whatisourlevelofdoubt? 113
10.4 Practiceproblems 115
10.5 Conclusion 115
11. Makinghypothesistestingworkwiththecentrallimit
theorem
11.1 Introduction 117
11.2 Recapofthenormaldistribution 117
11.3 Gettingprobabilitiesfromthenormaldistributions 118
11.3.1 Practiceproblems 119
11.4 Connectingdatatop-values 119
11.4.1 Practiceproblems 124
11.5 Conclusion 125
12. Theideaofintervalestimates
12.1 Introduction 127
12.2 Pointandintervalestimates 127
12.3 Whenintervalsare“right” 128
12.4 Confidenceintervals 128
12.5 Creatingconfidenceintervals 129
12.6 Interpretingconfidenceintervals 132
12.7 Practiceproblems 133
12.8 Conclusion 133
Part V
Statistical inference
13. Hypothesistestsforasingleparameter
13.1 Introduction 137
13.2 One-sampletestforproportions 138
13.2.1 Statehypotheses 138
Contents xi
13.2.2 Setsignificancelevel 138
13.2.3 Collectandsummarizedata 139
13.2.4 Calculateteststatistic 139
13.2.5 Calculatep-values 140
13.2.6 Conclude 141
13.2.7 Practiceproblems 142
13.3 One-samplet-testformeans 143
13.3.1 Statehypotheses 143
13.3.2 Setsignificancelevel 144
13.3.3 Collectandsummarizedata 144
13.3.4 Calculateteststatistic 144
13.3.5 Calculatep-values 145
13.3.6 Abriefinterlude:thet distribution 146
13.3.7 Conclude 148
13.3.8 Practiceproblems 150
13.4 Conclusion 150
14. Confidenceintervalsforasingleparameter
14.1 Introduction 151
14.2 Confidenceintervalforp 151
14.2.1 Practiceproblems 153
14.3 Confidenceintervalforμ 153
14.3.1 Practiceproblems 155
14.4 Otherusesofconfidenceintervals 156
14.4.1 Confidenceintervalsforpandsamplesizecalculations 156
14.4.2 Practiceproblems 159
14.4.3 Confidenceintervalsforμandhypothesistesting 159
14.4.4 Practiceproblems 161
14.5 Conclusion 162
15. Hypothesistestsfortwoparameters
15.1 Introduction 163
15.2 Two-sampletestforproportions 164
15.2.1 Statehypotheses 164
15.2.2 Setsignificancelevel 165
15.2.3 Collectandsummarizedata 165
15.2.4 Calculatetheteststatistic 166
15.2.5 Calculatep-values 168
15.2.6 Conclude 169
15.2.7 Practiceproblems 170
15.3 Two-samplet-testformeans 171
15.3.1 Statehypotheses 171
15.3.2 Setsignificancelevel 172
15.3.3 Collectandsummarizedata 172
15.3.4 Calculatetheteststatistic 173
15.3.5 Calculatep-values 175