ebook img

Mathematical Statistics with Resampling and R PDF

541 Pages·2018·11.905 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mathematical Statistics with Resampling and R

Mathematical Statistics with Resampling and R SecondEdition LauraM.Chihara CarletonCollege TimC.Hesterberg Google Thissecondeditionfirstpublished2019 ©2019byJohnWiley&Sons,Inc. EditionHistory MathematicalStatisticswithResamplingandR,Wiley,2011 LibraryofCongressCataloging-in-PublicationData: Names:Chihara,Laura,1957–author.|Hesterberg,Tim,1959–author. Title:MathematicalstatisticswithresamplingandR/LauraM.Chihara (CarletonCollege),TimC.Hesterberg(Google). Description:Secondedition.|Hoboken,NJ:Wiley,2019.|Includes bibliographicalreferencesandindex.| Identifiers:LCCN2018008774(print)|LCCN2018013587(ebook)|ISBN 9781119416524(pdf)|ISBN9781119416531(epub)|ISBN9781119416548 (cloth) Subjects:LCSH:Resampling(Statistics)|Statistics.|Statistics–Data processing.|Mathematicalstatistics–Dataprocessing.|R(Computer programlanguage) Classification:LCCQA278.8(ebook)|LCCQA278.8.C452018(print)|DDC 519.5/4–dc23 LCrecordavailableathttps://lccn.loc.gov/2018008774 CoverdesignbyWiley Coverimages:CourtesyofCarletonCollege Setin10/12ptWarnockbySPiGlobal,Pondicherry,India PrintedintheUnitedStatesofAmerica Contents Preface xiii 1 DataandCaseStudies 1 1.1 CaseStudy:FlightDelays 1 1.2 CaseStudy:BirthWeightsofBabies 2 1.3 CaseStudy:VerizonRepairTimes 3 1.4 CaseStudy:IowaRecidivism 4 1.5 Sampling 5 1.6 ParametersandStatistics 6 1.7 CaseStudy:GeneralSocialSurvey 7 1.8 SampleSurveys 8 1.9 CaseStudy:BeerandHotWings 9 1.10 CaseStudy:BlackSpruceSeedlings 10 1.11 Studies 10 1.12 GoogleInterviewQuestion:MobileAdsOptimization 12 Exercises 16 2 ExploratoryDataAnalysis 21 2.1 BasicPlots 21 2.2 NumericSummaries 25 2.2.1 Center 25 2.2.2 Spread 26 2.2.3 Shape 27 2.3 Boxplots 28 2.4 QuantilesandNormalQuantilePlots 29 2.5 EmpiricalCumulativeDistributionFunctions 35 2.6 ScatterPlots 38 2.7 SkewnessandKurtosis 40 Exercises 42 3 IntroductiontoHypothesisTesting:PermutationTests 47 3.1 IntroductiontoHypothesisTesting 47 3.2 Hypotheses 48 3.3 PermutationTests 50 3.3.1 ImplementationIssues 55 3.3.2 One-sidedandTwo-sidedTests 61 3.3.3 OtherStatistics 62 3.3.4 Assumptions 64 3.3.5 RemarkonTerminology 68 3.4 MatchedPairs 68 Exercises 70 4 SamplingDistributions 75 4.1 SamplingDistributions 75 4.2 CalculatingSamplingDistributions 80 4.3 TheCentralLimitTheorem 84 4.3.1 CLTforBinomialData 86 4.3.2 ContinuityCorrectionforDiscreteRandomVariables 89 4.3.3 AccuracyoftheCentralLimitTheorem∗ 91 4.3.4 CLTforSamplingWithoutReplacement 92 Exercises 93 5 IntroductiontoConfidenceIntervals:TheBootstrap 103 5.1 IntroductiontotheBootstrap 103 5.2 ThePlug-inPrinciple 110 5.2.1 EstimatingthePopulationDistribution 112 5.2.2 HowUsefulIstheBootstrapDistribution? 113 5.3 BootstrapPercentileIntervals 118 5.4 Two-SampleBootstrap 119 5.4.1 MatchedPairs 124 5.5 OtherStatistics 128 5.6 Bias 131 5.7 MonteCarloSampling:The“SecondBootstrapPrinciple” 134 5.8 AccuracyofBootstrapDistributions 135 5.8.1 SampleMean:LargeSampleSize 135 5.8.2 SampleMean:SmallSampleSize 137 5.8.3 SampleMedian 138 5.8.4 Mean–VarianceRelationship 138 5.9 HowManyBootstrapSamplesAreNeeded? 140 Exercises 141 6 Estimation 149 6.1 MaximumLikelihoodEstimation 149 6.1.1 MaximumLikelihoodforDiscreteDistributions 150 6.1.2 MaximumLikelihoodforContinuousDistributions 153 6.1.3 MaximumLikelihoodforMultipleParameters 157 6.2 MethodofMoments 161 6.3 PropertiesofEstimators 163 6.3.1 Unbiasedness 164 6.3.2 Efficiency 167 6.3.3 MeanSquareError 171 6.3.4 Consistency 173 6.3.5 TransformationInvariance∗ 175 6.3.6 AsymptoticNormalityofMLE∗ 177 6.4 StatisticalPractice 178 6.4.1 AreYouAskingtheRightQuestion? 179 6.4.2 Weights 179 Exercises 180 7 MoreConfidenceIntervals 187 7.1 ConfidenceIntervalsforMeans 187 7.1.1 ConfidenceIntervalsforaMean,VarianceKnown 187 7.1.2 ConfidenceIntervalsforaMean,VarianceUnknown 192 7.1.3 ConfidenceIntervalsforaDifferenceinMeans 198 7.1.4 MatchedPairs,Revisited 204 7.2 ConfidenceIntervalsinGeneral 204 7.2.1 LocationandScaleParameters∗ 208 7.3 One-sidedConfidenceIntervals 212 7.4 ConfidenceIntervalsforProportions 214 7.4.1 Agresti–CoullIntervalsforaProportion 217 7.4.2 ConfidenceIntervalsforaDifferenceofProportions 218 7.5 BootstrapConfidenceIntervals 219 7.5.1 tConfidenceIntervalsUsingBootstrapStandardErrors 219 7.5.2 BootstraptConfidenceIntervals 220 7.5.3 ComparingBootstraptandFormulatConfidenceIntervals 224 7.6 ConfidenceIntervalProperties 226 7.6.1 ConfidenceIntervalAccuracy 226 7.6.2 ConfidenceIntervalLength 227 7.6.3 TransformationInvariance 227 7.6.4 EaseofUseandInterpretation 227 7.6.5 ResearchNeeded 228 Exercises 228 8 MoreHypothesisTesting 241 8.1 HypothesisTestsforMeansandProportions:OnePopulation 241 8.1.1 ASingleMean 241 8.1.2 OneProportion 244 8.2 Bootstrapt-Tests 246 8.3 HypothesisTestsforMeansandProportions:TwoPopulations 248 8.3.1 ComparingTwoMeans 248 8.3.2 ComparingTwoProportions 251 8.3.3 MatchedPairsforProportions 254 8.4 TypeIandTypeIIErrors 255 8.4.1 TypeIErrors 257 8.4.2 TypeIIErrorsandPower 261 8.4.3 P-ValuesVersusCriticalRegions 266 8.5 InterpretingTestResults 267 8.5.1 P-Values 267 8.5.2 OnSignificance 268 8.5.3 AdjustmentsforMultipleTesting 269 8.6 LikelihoodRatioTests 271 8.6.1 SimpleHypothesesandtheNeyman–PearsonLemma 271 8.6.2 LikelihoodRatioTestsforCompositeHypotheses 275 8.7 StatisticalPractice 279 8.7.1 MoreCampaignswithNoClicksandNoConversions 284 Exercises 285 9 Regression 297 9.1 Covariance 297 9.2 Correlation 301 9.3 Least-SquaresRegression 304 9.3.1 RegressionTowardtheMean 308 9.3.2 Variation 310 9.3.3 Diagnostics 311 9.3.4 MultipleRegression 317 9.4 TheSimpleLinearModel 317 9.4.1 Inferencefor𝛼and𝛽 322 9.4.2 InferencefortheResponse 326 9.4.3 CommentsAboutAssumptionsfortheLinearModel 330 9.5 ResamplingCorrelationandRegression 332 9.5.1 PermutationTests 335 9.5.2 BootstrapCaseStudy:Bushmeat 336 9.6 LogisticRegression 340 9.6.1 InferenceforLogisticRegression 346 Exercises 350 10 CategoricalData 359 10.1 IndependenceinContingencyTables 359 10.2 PermutationTestofIndependence 361 10.3 Chi-squareTestofIndependence 365 10.3.1 ModelforChi-squareTestofIndependence 366 10.3.2 2×2Tables 368 10.3.3 Fisher’sExactTest 370 10.3.4 Conditioning 371 10.4 Chi-squareTestofHomogeneity 372 10.5 Goodness-of-fitTests 374 10.5.1 AllParametersKnown 374 10.5.2 SomeParametersEstimated 377 10.6 Chi-squareandtheLikelihoodRatio∗ 379 Exercises 380 11 BayesianMethods 391 11.1 BayesTheorem 392 11.2 BinomialData:DiscretePriorDistributions 392 11.3 BinomialData:ContinuousPriorDistributions 400 11.4 ContinuousData 406 11.5 SequentialData 409 Exercises 414 12 One-wayANOVA 419 12.1 ComparingThreeorMorePopulations 419 12.1.1 TheANOVAF-test 419 12.1.2 APermutationTestApproach 428 Exercises 429 13 AdditionalTopics 433 13.1 SmoothedBootstrap 433 13.1.1 KernelDensityEstimate 435 13.2 ParametricBootstrap 437 13.3 TheDeltaMethod 441 13.4 StratifiedSampling 445 13.5 ComputationalIssuesinBayesianAnalysis 446 13.6 MonteCarloIntegration 448 13.7 ImportanceSampling 452 13.7.1 RatioEstimateforImportanceSampling 458 13.7.2 ImportanceSamplinginBayesianApplications 461 13.8 TheEMAlgorithm 467 13.8.1 GeneralBackground 469 Exercises 472 Appendix A ReviewofProbability 477 A.1 BasicProbability 477 A.2 MeanandVariance 478 A.3 TheNormalDistribution 480 A.4 TheMeanofaSampleofRandomVariables 481 A.5 SumsofNormalRandomVariables 482 A.6 TheLawofAverages 483 A.7 HigherMomentsandtheMoment-generatingFunction 484 Appendix B ProbabilityDistributions 487 B.1 TheBernoulliandBinomialDistributions 487 B.2 TheMultinomialDistribution 488 B.3 TheGeometricDistribution 490 B.4 TheNegativeBinomialDistribution 491 B.5 TheHypergeometricDistribution 492 B.6 ThePoissonDistribution 493 B.7 TheUniformDistribution 495 B.8 TheExponentialDistribution 495 B.9 TheGammaDistribution 497 B.10 TheChi-squareDistribution 499 B.11 TheStudent’stDistribution 502 B.12 TheBetaDistribution 504 B.13 TheFDistribution 505 Exercises 507 Appendix C DistributionsQuickReference 509 SolutionstoSelectedExercises 513 References 525 Index 531 Preface Mathematical Statistics with Resampling and R is a one-term undergraduate statistics textbook aimed at sophomores or juniors who have taken a course inprobability(atthelevelof,forinstance,Ross(2009),Ghahramani(2004),or Scheaffer and Young (2010)) but may not have had any previous exposure to statistics. Whatsetsthisbookapartfromothermathematicalstatisticstextsistheuse ofmodernresamplingtechniques–permutationtestsandbootstrapping.We beginwithpermutationtestsandbootstrapmethodsbeforeintroducingclas- sical inference methods. Resampling helps students understand the meaning ofsamplingdistributions,samplingvariability,P-values,hypothesistests,and confidenceintervals.WeareinspiredbythetextbooksofWardrop(1995)and ChanceandRossman(2005),twoinnovativeintroductorystatisticsbooksthat alsotakeanontraditionalapproachinthesequencingoftopics. Webelievethetimeisripeforthisbook.Manyfacultyhavelearnedresam- pling and simulation-based methods in graduate school and/or use them in theirownworkandareeagertoincorporatetheseideasintoamathematical statisticscourse.Studentsandfacultytodayhaveaccesstocomputersthatare powerfulenoughtoperformresamplingquickly. Amajortopicofdebateaboutthemathematicalstatisticscourseishowmuch theorytointroduce.Wewantmathematicallytalentedstudentstogetexcited aboutstatistics,sowetrytostrikeabalancebetweentheory,computing,and applications.Wefeelthatitisimportanttodemonstratesomerigorindevelop- ingsomeofthestatisticalideaspresentedhere,butthatmathematicaltheory shouldnotdominatethetext.Tokeepthesizeofthetextreasonable,weomit sometopicssuchassufficiencyandFisherinformation(thoughweplantomake someomittedtopicsavailableassupplementsonthetextwebpagehttps://sites .google.com/site/ChiharaHesterberg). Wehavecompiledthedefinitionsandtheoremsoftheimportantprobability distributionsintoanappendix(seeAppendixB).Instructorswhowanttoprove resultsondistributionaltheorycanrefertothatchapter.Instructorswhowish toskipthetheorycancontinuewithoutinterruptingtheflowofthestatistical discussion. Incorporatingresamplingandbootstrappingmethodsrequiresthatstudents usestatisticalsoftware.WeuseRorRStudiobecausetheyarefreelyavailable (www.r-project.org or rstudio.com), powerful, flexible, and a valuable tool in futurecareers.OneofusworksatGooglewherethereisanexplosionintheuse ofR,withmoreandmorenonstatisticianslearningR(thestatisticiansalready knowit).WerealizethatthelearningcurveforRishigh,butbelievethatthe timeinvestedinmasteringRisworththeeffort.Wehavewrittensomebasic materialsonRthatareavailableonthewebsiteforthistext.Werecommend thatinstructorsworkthroughtheintroductoryworksheetwiththestudentson thefirstorseconddayoftheterminacomputerlabifpossible. We had some discussion about whether to include examples of R code using various packages (including ggplot2), but we received feedback from colleagues who felt that students should learn to do basic exploratory data analysis and quick diagnostics using base R. And though some R packages existthatimplementsomeofthebootstrapandpermutationalgorithmsthat weteach,wefeltthatstudentsunderstandandinternalizetheconceptsbetter iftheyarerequiredtowritethecodethemselves.WedoprovideRscriptsor R Markdown files with code on our website, and we may include alternate codingusingsomeofthemanyRpackagesavailable. Statisticalcomputingisnecessaryinstatisticalpracticeandforpeoplework- ingwithdatainawidevarietyoffields.Thereisanexplosionofdatamoreand more data – and new computational methods are continuously being devel- oped to handle this explosion. Statistics is an exciting field; dare we even say sexy?1 Second Edition: Major changes from the first edition include splitting Chapter 3, the introduction to hypothesis testing, in two. The second half which dealt with categorical data and contingency tables is now its own chapterandmovedtolaterinthetextbook.Wedecidedtomovementionof the 𝛼=0.05 significance level from Chapter 2 to Chapter 8 in an attempt to discouragestudents’relianceonthisthreshold. Wehavealsoincludedanew casestudyusingdatafromGoogle,plussomediscussiononstatisticalpractice. Wemovedtheshortchapteronone-wayANOVAavailablepreviouslyonour websiteintothebook;otheradditionsincludesectionsonthebootstrapttest andanintroductiontotheEMalgorithm.Wehavealsoaddedmoreexercises 1 Trygoogling“statisticssexyprofession.”

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.