ebook img

Statistical Analysis Of Empirical Data: Methods For Applied Sciences PDF

278 Pages·2020·6.642 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Analysis Of Empirical Data: Methods For Applied Sciences

Scott Pardo Statistical Analysis of Empirical Data Methods for Applied Sciences Statistical Analysis of Empirical Data Scott Pardo Statistical Analysis of Empirical Data Methods for Applied Sciences ScottPardo GlobalMedical&ClinicalAffairs AscensiaDiabetesCare Valhalla,NY,USA ISBN978-3-030-43327-7 ISBN978-3-030-43328-4 (eBook) https://doi.org/10.1007/978-3-030-43328-4 ©SpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsorthe editorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Researchersandstudentswhorelyonempiricalinvestigationmustchoosestatistical methods for analyses, and they are often challenged to justify their choices of statistical methods and analyses. Furthermore, these researchers or students may havehadasinglecourseinstatisticalmethodsuponwhichtheyrelyforhelpingthem makemethodologicalchoices.Theresearcher/studentprobablyhasfamiliaritywith some“classical”statisticalmethodsandtools,suchast-tests,ANOVA,andmultiple regression.Iftheyarenotwell-versedinstatisticaltheory,theymayhavedifficulties inmakingthosechoicesaboutstatisticalanalysesandcanbethrownoff-balanceby some questions and challenges. Often the challenges come from individuals who maythemselvesnothaveafullerunderstanding.Thus,questionsandchallengesare sometimesmisplaced,andtheresearchermaynotbewellequippedtorespond.The challenges may be about sample size (not an uncommon question), where the challenge concerns the representativeness of the sample. However, the challenger may recommend a sample size formula that assumes a random sample from a homogeneouspopulation,asopposedtorepresentingstrataorclustersinthepopu- lation.Theresearchermaybemisledtobelievethatthesampleisinadequate,when in fact it was quite sufficient. Similarly, a predictive model may have a number of predictors, all of which are meaningful, and the challenger claims the model is overparameterizedbecausetheAkaikeinformationcriterion(AIC)statisticisgreater than 2. Why 2? Mallow’s Cp, a statistic related to AIC, would ideally be approxi- mately 2 for a simple linear regression. This has no bearing on an “optimal” AIC value for a multiple regression model. The unsuspecting researcher may begin excluding important predictor variables simply to achieve a lower AIC, without consideringthedegreeofpredictionerrororsomemeansofassessingthequalityof predictionsusinga“test”dataset. Sometimes challenges are Non-sequitorial, such as “How much confidence do youhaveinthisp-value?”Perhapsmoregenerally,thenotionof“confidence”canbe misunderstood, and the researcher may find himself or herself confused by state- mentssuchas“Alargersamplesizewillgiveyouahigherconfidencelevel.”Other types of misleading challenges can twist the minds of the researcher, such as v vi Preface recommendingtodoublethesamplesizeofthe“treatment”groupcomparedtothat ofthe“control”group,orcriticizinganANOVAwithcategoricalfactorsbecausethe “R2isnothighenough.”Theresearchermaynothaveareadilyaccessibleanswerto thesequestionsorcriticisms. It is possible to find justifications for choices of models and methods, and responses to various questions, in multiple texts, some of which are more mathe- matical than others.Thistext,however, isdevotedtoprovidingresponses,without requiring the researcher to invest considerable time and effort in searching. It is written at a level that anyone with a modicum of mathematical training (say, elementarycalculus)shouldbeabletocomprehend.Abriefreviewoffundamental concepts of probability and statistics, together with a primer on some concepts in elementarycalculusandmatrixalgebra,isincluded.Finally,alwaysremember this dictum: Neverunderestimatetheabilityoftheignoranttoconfusetheknowledgeable. This work is devoted to helping the reader find the knowledge he or she needs to choosedataanalyticmethods,andrefute,correct,andrespondtoquestionsposedby thosewhoaremoreignorant.Eachchapterbeginswithaconundrum,orapuzzling questionrelatingtoaprobleminchoosingsomeanalysis,orexplainingthenatureof theanalysis.Thentheanalysesaredescribed,alongwithsomehelpinjustifyingthe choiceofthatparticularmethod. Asatext,thisworkcouldbeusedasthebackboneinasecondcourseonstatistics forresearchersorstudentsinbiological,medical,social,orphysicalsciences.Like textsforfirstcoursesinstatistics,thetopicsineachchaptercouldformanentiretext. Thisbookisasurveyinthesensethattopicsarenotexpoundeduponinfulldetail. However, unlike first-course texts, the chapters either cover issues not usually described in a first-course textbook (e.g., how to interpret the outcome of an acceptance sampling plan) or they introduce a topic that would not normally be coveredinafirstcourse(e.g.,Models,Models,Everywhere...ModelSelection). Theintentistomakethisusefulforabroadaudience,sothatexamplesgenerally have no particular application associated with them. However, at the beginning of most chapters, there are several questions that might be asked by researchers in various disciplines. Hopefully, these questions will help motivate the reader to learnmore. Those with greater background in mathematics may not need the Appendix materialonsome“elementary”concepts,andthosewithsometraininginprobability and mathematical statistics may not need the first and/or second chapter. It is the author’s hope that the material will assist the researcher in choosing statistical methodsandmodelsandinjustifyingthosechoices. Valhalla,NY,USA ScottPardo Acknowledgements Theauthorowesatremendousdebttohisfamily;hissons,MichaelA.Pardo,Ph.D., YehudahA.Pardo,Ph.D.,andJeremyD.Pardo(Ph.D.candidate),andhiswife.The three of you guys gave me over the years so many questions about experimental designanddataanalyses,andIlearnedsomuchfromtryingtoanswerthem.Tomy wife,IcanneverdescribethedebtIfeel,orthethanksIwanttogiveyouforallyour inspiration,questions,andideas thatyougavemeforwritingthisandall theother works.G-donlyknowswhatIwouldbewithoutyou. vii Contents 1 Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 General. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 SomeProbabilityConcepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 SomeStatisticalConcepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 SampleStatisticsAreNOTParameters. . . . . . . . . . . . . . . . . . . . . . 17 Inference,Again. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Confidence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 WhatConfidenceDoesNOTMean. . . . . . . . . . . . . . . . . . . . . . . . . . 25 SomeExampleConfidenceIntervals. . . . . . . . . . . . . . . . . . . . . . . . . 26 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 MultiplicityandMultipleComparisons. . . . . . . . . . . . . . . . . . . . . . 33 5 PowerandtheMythofSampleSizeDetermination. . . . . . . . . . . . . 41 PowerforTwo-SampleT-Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 PowerforF-Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 AFinalWord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6 RegressionandModelFittingwithCollinearity.. . . . .. . . .. . . . .. 53 SolvingLinearEquations. . .. . . . .. . . .. . . .. . . .. . . . .. . . .. . . .. 53 OrdinaryLeastSquares(OLS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 PartialLeastSquares. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 RidgeRegression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 LeastAbsoluteShrinkageandSelectionOperator(LASSO). . . . . . . . . 58 7 Over-Parameterization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 TheBasicsoftheAnalysisofVariance(ANOVA). . . . . . . . . . . . . . . 63 DetectingOver-Parameterization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 SomeMoreonDegreesofFreedom. . . . . . . . . . . . . . . . . . . . . . . . . . 68 ix x Contents 8 IgnoringErrorControlFactorsandExperimentalDesign. . . . . . . 75 ErrorControl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 BalanceandOrthogonality. . . .. . . .. . . .. . . .. . . .. . . .. . . .. . . .. 79 9 GeneralizedLinearModels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 ToGeneralizeorNottoGeneralize. . . . . . . . . . . . . . . . . . . . . . . . . . 93 OddsandOddsRatio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 TheLogitTransformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 TheMaximumLikelihoodApproachtoEstimation. . . . . . . . . . . . . . . 96 LogisticRegressionasPredictiveModel. . . . . . . . . . . . . . . . . . . . . . . 98 PoissonRegression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Overdispersion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Zero-InflatedDataandPoissonRegression. . . . . . . . . . . . . . . . . . . . . 103 10 MixedModelsandVarianceComponents. . . . . . . . . . . . . . . . . . . . 107 FixedandRandomEffectsinOneModel. . . . . . . . . . . . . . . . . . . . . . 107 11 Models,ModelsEverywhere...ModelSelection. . . . . . . . . . . . . . . . 121 StepwiseRegression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 BayesianModelAveraging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 GLMULTI:AnAutomatedModelSelectionProcedure. . . . . . . . . . . . 133 NeuralNetworks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 ClassificationandRegressionTrees(CART). . . . . . . . . . . . . . . . . . . . 142 RandomForests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 LogisticRegressionandModelSelection. . . . . . . . . . . . . . . . . . . . . . 154 InSummary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 12 BayesianAnalyses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 13 TheAcceptanceSamplingGame. . . . . . . . . . . . . . . . . . . . . . . . . . . 169 AttributeSampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 VariablesSampling:Cpk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 TheCatch:WhyAcceptanceSamplingisaGame. . . . . . . . . . . . . . . . 177 ASubtleTrap:Sample-BasedCriticalValues. . . . . . . . . . . . . . . . . . . 179 14 NonparametricStatistics:AStrangeName. . . . . . . . . . . . . . . . . . . 181 TheRankTransformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 PermutationTests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 15 AutocorrelatedDataandDynamicSystems. . . . . . . . . . . . . . . . . . . 197 Autocorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 TimeSeries:AutoregressiveProcesses. . . . . . . . . . . . . . . . . . . . . . . . 200 TimeSeries:MovingAverageProcesses. . . . .. . . . . .. . . . .. . . . . .. 201 ASimpleExample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Non-Stationarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 ABriefSummary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.