ebook img

Primer For Data Analytics And Graduate Study In Statistics PDF

236 Pages·2020·3.523 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Primer For Data Analytics And Graduate Study In Statistics

Douglas Wolfe Grant Schneider Primer for Data Analytics and Graduate Study in Statistics Primer for Data Analytics and Graduate Study in Statistics (cid:129) Douglas Wolfe Grant Schneider Primer for Data Analytics and Graduate Study in Statistics DouglasWolfe GrantSchneider DepartmentofStatistics UpstartNetwork,Inc. OhioStateUniversity Columbus,OH,USA Columbus,OH,USA ISBN978-3-030-47478-2 ISBN978-3-030-47479-9 (eBook) https://doi.org/10.1007/978-3-030-47479-9 ©SpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsorthe editorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland DouglasWolfe:ToRobertV.HoggandAllen T. Craig who fostered in me a passion for mathematical statistics during my graduate studies at the University of Iowa and to D. Ransom Whitney who nurtured my statistical career through my early years in academia at The Ohio State University. Grant Schneider: To my parents, Bill and Judy Schneider, who nurtured my curiosity about the world by answering my countless questionsasakid,evenwhentheydidn’tknow the answers, and to Aunt Marce who gave the best, most clear-eyed advice a young man could receive, dispensed over warm Natty Lights. Preface Formanyyears,wetaughtasummertermcourseforthenewlyenteringstudentsin our graduate programs in Statistics at Ohio State University. It was designed to refresh and/or elevate the level of understanding for the basic background in probability and distributional theory required to be successful in our Master of Applied Statistics, M.S. in Statistics, and Ph. D. programs in Statistics and Biosta- tistics.Overtheyears,thisprovedtobeaneffectivewayforundergraduatestudents from a variety of quantitative backgrounds (particularly domestic students from smaller liberal arts programs) to bridge the transition from general undergraduate studies in a mathematically oriented field to the more career oriented graduate studiesinStatistics. Another factor that makes this text extremely relevant today is the recent increased interest in the field of data analytics (to be read “statistics” with a small s)asanundergraduatemajor.WerecentlystartedsuchaprogramhereatOhioState andwithinafewyearsithasattractedmorethan100topstudents.Thejobmarketis extremelystrongforgraduateswithadataanalyticsundergraduatedegreeanditalso provides an excellent background for those students who wish to continue their studies in a graduate program in statistics or biostatistics (where the job market is alsooutstanding). Webelievethatthistextprovidesthenecessaryframeworkforanundergraduate course at a smaller liberal arts college for anyone who is interested in either exploringjobopportunitiesinthedataanalyticsfielditselforinattendingagraduate program in statistics or biostatistics. It could also, of course, be used for a bridge course similar to the one we taught in the summer term to our incoming graduate students—orevenasagoodrefreshertextforastudentwhowishestorefresh/better preparethemselvesforgraduateworkinstatisticsorbiostatistics. Columbus,OH,USA DouglasWolfe Columbus,OH,USA GrantSchneider vii Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 BasicProbability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 RandomEventsandProbabilitySetFunctions. . . . . . . . . . . . . . . 3 2.2 PropertiesofProbabilityFunctions. . . . . . . . . . . . . . . . . . . . . . . 7 2.3 ConditionalProbability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 RandomVariablesandProbabilityDistributions. . . . . . . . . . . . . . . 31 3.1 DiscreteRandomVariables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 DiscreteRandomVariables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 ContinuousRandomVariables. . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4 GeneralPropertiesofRandomVariables. . . . . . . . . . . . . . . . . . . . . 69 4.1 CumulativeDistributionFunction. . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.1 RelationshipBetweenc.d.f.andp.d.f. . . . . . . . . . . . . . . . 72 4.1.2 GeneralPropertiesofac.d.f.F (x). . . . . . . . . . . . . . . . . . 73 X 4.2 MedianofaProbabilityDistribution. . . . . . . . . . . . . . . . . . . . . . 74 4.3 SymmetricProbabilityDistribution. . . . . . . . . . . . . . . . . . . . . . . 75 4.4 MathematicalExpectations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 Chebyshev’sInequality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.6 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5 JointProbabilityDistributionsforTwoRandomVariables. . . . . . . 103 5.1 JointProbabilityDistributionsofTwoVariables. . . . . . . . . . . . . . 103 5.1.1 DiscreteVariables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.1.2 ContinuousVariables. . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 MarginalProbabilityDistributions. . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 CovarianceandCorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 ix x Contents 5.4 ConditionalProbabilityDistributions. . . . . . . . . . . . . . . . . . . . . . 113 5.5 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6 ProbabilityDistributionofaFunctionofaSingleRandom Variable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1 ChangeofVariableTechnique. . .. . . . .. . . .. . . . .. . . . .. . . .. 137 6.2 MomentGeneratingFunctionTechnique. . . . . . . . . . . . . . . . . . . 140 6.3 DistributionFunctionTechnique. . . . . . . . . . . . . . . . . . . . . . . . . 143 6.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7 SamplingDistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.1 SimpleRandomSamples. . . . .. . . .. . . .. . . .. . . . .. . . .. . . .. 153 7.2 SamplingDistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.3 GeneralApproachesforObtainingSamplingDistributions. . . . . . 161 7.3.1 MomentGeneratingFunctionTechnique. . . . . . . . . . . . . . 161 7.3.2 DistributionFunctionTechnique. . . . . . . . . . . . . . . . . . . . 164 7.3.3 ChangeofVariableTechnique. . . . . . . . . . . . . . . . . . . . . 170 7.4 EqualinDistributionApproachtoObtainingProperties ofSamplingDistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.5 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8 Asymptotic(Large-Sample)PropertiesofStatistics. . . . . . . . . . . . . . 201 8.1 ConvergenceinProbability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 8.2 ConvergenceinDistribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.2.1 ConvergenceofMomentGeneratingFunctions. . . . . . . . . 209 8.2.2 CentralLimitTheorem(CLT). . .. . . . . .. . . . . .. . . . . .. 212 8.2.3 Slutsky’sTheorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 8.2.4 DeltaMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 8.3 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Bibliography. . . . . . .. . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . .. . 233 Chapter 1 Introduction Haveyoueverplayedthegame“TwentyQuestions”?Typically,thefirstquestions are used to determine whether the item of interest is a person, place, or thing. Everything in our world (or cosmos, for that matter) is one of these. Once the category is established, the questioning usually proceeds along the line of ascertaining physical or personal properties of the item, such as how big it is, whether it is alive, or whether it is famous, etc. All of these follow-up questions are designed to help the player understand more about the item of interest and, eventually,helptheplayercorrectlyidentifyit. Thesimplegameof“TwentyQuestions”isanexampleofwhatweallfaceinour everydaylives.Weareconstantlytryingtolearnmoreaboutandunderstand“items” thatweencounterinourlives.“Howlongwasthathomerun?”,“Howtallareyouor how muchdoyouweighatafixedpoint intime?”,“Howmanysinglerecordsdid ElvisPresleysell?”,“WhatwastheaveragedailypriceoftheApplecommonstock over the past twelve months?”, “How much did the light from a distant star bend whenitpassedalargeexoplanetonitswaytoearth?”,“Howmanycaloriesareina chocolate milkshake?”, “How intensive was that recent earthquake or volcanic eruption?”, “How much is the carbon dioxide from burning fossil fuels affecting the warming of our atmosphere and our oceans?”, etc. Fortunately, mathematics provides the mechanism for addressing every one of these questions, not only in termsoftheactualphysicalmeasurementsrequiredbutalsointermsofunderstand- ing the scientific concepts and structure that are critical for interpreting these measurements. We use it all the time to routinely measure lengths, areas, speeds, anddistances.However,italsoplaysamajorroleinmorecomplicatedsettings,such asdeterminingthenecessaryrocketspeedtoputasatelliteintoearthorbitortosend aprobeonaninterstellarmission;understandingthebondstructureofcomplicated molecules to facilitate chemical processes for synthesizing new compounds/mate- rialsfortacklinghumandiseases;thedevelopmentofnewmaterialsforconstructing heat-resistant shields to enable safe space travel; and helping to understand the nature of “dark matter” and “dark energy” and their role in an ever-expanding universe. ©SpringerNatureSwitzerlandAG2020 1 D.Wolfe,G.Schneider,PrimerforDataAnalyticsandGraduateStudyinStatistics, https://doi.org/10.1007/978-3-030-47479-9_1 2 1 Introduction Everythingdiscussedinthepreviousparagraphrelatestoourattemptstomeasure (or interpret such measurements) deterministic physical events at a fixed point in time—andtherearemathematicalapproachestoaddressingsuchphenomena.How- ever, many interesting phenomena do not fit into such a deterministic framework. Forexample,willacoinflipresultinheadsortails(orevenleaningupagainstthe wall!)?Willagiventreatmentforlungcancerbeeffectiveattreatingmylungcancer? Will the price of Apple common stock rise in tomorrow’s market trading? How manymilespergallonwillIgetwithmynewcar?Whateffectwillachangeofdiet have on my high blood pressure? How does the amount of sleep I get affect my physicalwell-being?Thesequestionsallrelatetorandom(non-deterministic)events that require a different analytical approach. However, once again mathematics comes to our rescue, as it also provides the necessary structure to facilitate the understanding of such random (i.e., uncertain) events. That area of inquiry is governed by the rules of probability, and we can use these rules to provide the frameworkinwhichtostudyrandomevents. Thefirstpartofthistextisdevotedtothedevelopmentofthesebasicprobability rules and discussion of how they can be used to better understand random events, includingwhattoexpectfromrepetitionsofarandomeventandhowtointerpretthe observed outcome of such repetitions. The second portion of the text incorporates these basic probability properties for random events into a more formal expanded structure,calledprobabilitydistributiontheory,thatcanbeusedtoprovidemodels foranalyzingtheoutcomesfromrepetitionsofrandomevents.Thesemodelsplaya keyroleininterpretingdataobtainedfromexperimentalrepetitionsthroughtheuse ofstatisticalinferencetechniquessuchaspointestimation,confidenceintervals,and hypothesistests,thediscussionofwhichisleftasthenextintriguingsubjectforyour furtherexploration!

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.