Using R for Data Analysis in Social Sciences Using R for Data Analysis in Social Sciences AResearchProject-OrientedApproach QUAN LI 1 1 OxfordUniversityPressisadepartmentoftheUniversityofOxford.Itfurthers theUniversity’sobjectiveofexcellenceinresearch,scholarship,andeducation bypublishingworldwide.OxfordisaregisteredtrademarkofOxfordUniversity PressintheUKandincertainothercountries. PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica. ©OxfordUniversityPress2018 Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenseorundertermsagreedwiththeappropriatereproduction rightsorganization.Inquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove. Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer. LibraryofCongressCataloging-in-PublicationData Names:Li,Quan,1966–author. Title:UsingRfordataanalysisinsocialsciences:aresearch project-orientedapproach/QuanLi. Description:NewYork,NY:OxfordUniversityPress,[2018] Identifiers:LCCN2017010031|ISBN9780190656225(pbk.)| ISBN9780190656218(hardcover)|ISBN9780190656232(updf)| ISBN9780190656249(epub)Subjects:LCSH:Socialsciences–Research–Data processing.|Socialsciences–Statisticalmethods.|R(Computerprogramlanguage) Classification:LCCH61.3.L522018|DDC330.285/5133–dc23 LCrecordavailableathttps://lccn.loc.gov/2017010031 1 3 5 7 9 8 6 4 2 PaperbackprintedbyWebCom,Inc.,Canada HardbackprintedbyBridgeportNationalBindery,Inc.,UnitedStatesofAmerica CONTENTS ListofFigures ix ListofTables xi Acknowledgments xiii Introduction xv 1. LearnaboutRandWriteFirstToyPrograms 1 WHENTOUSERINARESEARCHPROJECT 2 ESSENTIALSABOUTR 3 HOWTOSTARTAPROJECTFOLDERANDWRITEOURFIRSTRPROGRAM 4 CREATE,DESCRIBE,ANDGRAPHAVECTOR:ASIMPLETOYEXAMPLE 7 SIMPLEREAL-WORLDEXAMPLE:DATAFROMIVERSENANDSOSKICE(2006) 23 CHAPTER1:RPROGRAMCODE 28 TROUBLESHOOTANDGETHELP 32 IMPORTANTREFERENCEINFORMATION:SYMBOLS,OPERATORS,ANDFUNCTIONS 34 SUMMARY 35 MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 36 EXERCISES 42 2.GetDataReady:Import,Inspect,andPrepareData 43 PREPARATION 43 IMPORTPENNWORLDTABLE7.0DATASET 45 INSPECTIMPORTEDDATA 49 PREPAREDATAI:VARIABLETYPESANDINDEXING 55 PREPAREDATAII:MANAGEDATASETS 59 PREPAREDATAIII:MANAGEOBSERVATIONS 65 PREPAREDATAIV:MANAGEVARIABLES 68 contents vi CHAPTER2PROGRAMCODE 78 SUMMARY 85 MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 86 EXERCISES 93 3.One-SampleandDifference-of-MeansTests 94 CONCEPTUALPREPARATION 95 DATAPREPARATION 101 WHATISTHEAVERAGEECONOMICGROWTHRATEINTHE WORLDECONOMY? 104 DIDTHEWORLDECONOMYGROWMOREQUICKLYIN 1990THANIN1960? 115 CHAPTER3PROGRAMCODE 128 SUMMARY 133 MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 133 EXERCISES 142 4.CovarianceandCorrelation 143 DATAANDSOFTWAREPREPARATIONS 143 VISUALIZETHERELATIONSHIPBETWEENTRADEANDGROWTH USING SCATTERPLOT 146 ARETRADEOPENNESSANDECONOMICGROWTHCORRELATED? 149 DOESTHECORRELATIONBETWEENTRADEANDGROWTHCHANGE OVERTIME? 154 CHAPTER4PROGRAMCODE 160 SUMMARY 163 MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 164 EXERCISES 168 5.RegressionAnalysis 170 CONCEPTUALPREPARATION:HOWTOUNDERSTANDREGRESSIONANALYSIS 171 DATAPREPARATION 175 VISUALIZEANDINSPECTDATA 182 HOWTOESTIMATEANDINTERPRETOLSMODELCOEFFICIENTS 185 HOWTOESTIMATESTANDARDERROROFCOEFFICIENT 187 HOWTOMAKEANINFERENCEABOUTTHEPOPULATION PARAMETER OFINTEREST 188 HOWTOINTERPRETOVERALLMODELFIT 190 HOWTOPRESENTSTATISTICALRESULTS 193 CHAPTER5PROGRAMCODE 194 SUMMARY 198 contents vii MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 199 EXERCISES 204 6.RegressionDiagnosticsandSensitivityAnalysis 206 WHYAREOLSASSUMPTIONSANDDIAGNOSTICSIMPORTANT? 206 DATAPREPARATION 211 LINEARITYANDMODELSPECIFICATION 215 PERFECTANDHIGHMULTICOLLINEARITY 221 CONSTANTERRORVARIANCE 223 INDEPENDENCEOFERRORTERMOBSERVATIONS 227 INFLUENTIALOBSERVATIONS 240 NORMALITYTEST 245 REPORTFINDINGS 247 CHAPTER6PROGRAMCODE 251 SUMMARY 259 MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 259 EXERCISES 262 7.ReplicationofFindingsinPublishedAnalyses 263 WHATEXPLAINSTHEGEOGRAPHICSPREADOFMILITARIZEDINTERSTATEDISPUTES? REPLICATIONANDDIAGNOSTICSOFBRAITHWAITE(2006) 264 DOESRELIGIOSITYINFLUENCEINDIVIDUALATTITUDESTOWARDINNOVATION? REPLICATIONOFBÉNABOUETAL.(2015) 284 CHAPTER7PROGRAMCODE 295 SUMMARY 301 8.Appendix:ABriefIntroductiontoAnalyzingCategorical DataandFindingMoreData 302 OBJECTIVE 302 GETTINGDATAREADY 303 DOMENANDWOMENDIFFERINSELF-REPORTEDHAPPINESS? 304 DOBELIEVERSINGODANDNON-BELIEVERSDIFFERINSELF-REPORTED HAPPINESS? 310 SOURCESOFSELF-REPORTEDHAPPINESS:LOGISTICREGRESSION 313 WHERETOFINDMOREDATA 323 ReferencesandReadings 327 Index 331 LIST OF FIGURES 1.1 HowtoWriteFirstToyPrograminR 8 1.2 HowtoInstallAdd-onPackage 18 1.3 DistributionofDiscreteVariablevd$v1:BarChart 21 1.4 DistributionofContinuousVariablevd$v1:Boxplotand Histogram 23 1.5 DistributionofWageInequalityfromIversenand Soskice(2006) 27 1.6 DistributionofPRandMajoritarianSystemsfromIversenand Soskice(2006) 27 1.7 RStudioScreenshot 38 2.1 UsingView()FunctiontoViewRawData 50 2.2 DistributionofVariablergdpl 55 3.1 TypesofErrorsandAlternativeSamplingDistributions 100 3.2 HistogramforGrowth 113 3.3 Meanand95%ConfidenceIntervalforGrowth 114 3.4 Meanand95%ConfidenceIntervalforGrowth:1960and1990 127 4.1 SimulatedPositiveCorrelationsofTwoRandomVariables 147 4.2 ScatterPlotofTradeOpennessandEconomicGrowth 148 4.3 CorrelationbetweenTradeandGrowthoverTime 157 4.4 PValueofCorrelationbetweenTradeandGrowthoverTime 159 4.5 AnscombeQuartetScatterPlot 166 5.1 OriginalStatisticalResultsfromFrankelandRomer(1999) 174 5.2 ComparingUnloggedandLoggedIncomeperPerson 184 5.3 TradeOpennessandLogofIncomeperPerson 184 5.4 CoefficientsPlotforModel1 194 5.5 PartialRegressionPlot 203 5.6 ExplorePairwiseRelationshipsamongVariables 204 6.1 AnscombeQuartetRegressions 210 6.2 AnscombeQuartetResidualsversusFittedValuesPlots 211