ebook img

Statistics and Analysis of Scientific Data PDF

318 Pages·2017·4.418 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistics and Analysis of Scientific Data

Massimiliano Bonamente Statistics and Analysis of Scientific Data Second Edition 123 MassimilianoBonamente UniversityofAlabama Huntsville Alabama,USA ISSN1868-4513 ISSN1868-4521 (electronic) GraduateTextsinPhysics ISBN978-1-4939-6570-0 ISBN978-1-4939-6572-4 (eBook) DOI10.1007/978-1-4939-6572-4 LibraryofCongressControlNumber:2016957885 1stedition:©SpringerScience+BusinessMediaNewYork2013 2ndedition:©SpringerScience+BusinessMediaLLC2017 ©SpringerScience+BusinesMediaNewYork2017 ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerScience+BusinessMediaLLC Theregisteredcompanyaddressis:233SpringStreet,NewYork,NY10013,U.S.A Preface to the First Edition Across all sciences, a quantitative analysis of data is necessary to assess the significanceofexperiments,observations,andcalculations.Thisbookwaswritten over a period of 10 years, as I developed an introductory graduate course on statistics and data analysis at the University of Alabama in Huntsville. My goal wastoputtogetherthematerialthatastudentneedsfortheanalysisandstatistical interpretationofdata,includinganextensivesetofapplicationsandproblemsthat illustratethepracticeofstatisticaldataanalysis. The literature offers a variety of books on statistical methods and probability theory. Some are primarily on the mathematical foundations of statistics, some are purely on the theory of probability, and others focus on advanced statistical methodsforspecificsciences.Thistextbookcontainsthefoundationsofprobability, statistics, and data analysis methods that are applicable to a variety of fields— fromastronomytobiology,businesssciences,chemistry,engineering,physics,and more—withequalemphasisonmathematicsandapplications.Thebookistherefore not specific to a given discipline, nor does it attempt to describe every possible statistical method. Instead, it focuses on the fundamental methods that are used across the sciences and that are at the basis of more specific techniques that can befoundinmorespecializedtextbooksorresearcharticles. This textbook covers probability theory and random variables, maximum- likelihoodmethodsforsinglevariablesandtwo-variabledatasets,andmorecomplex topicsofdatafitting,estimationofparameters,andconfidenceintervals.Amongthe topicsthathaverecentlybecomemainstream,MonteCarloMarkovchainsoccupy a specialrole.The lastchapterofthe bookprovidesa comprehensiveoverviewof MarkovchainsandMonteCarloMarkovchains,fromtheorytoimplementation. I believe that a description of the mathematical properties of statistical tests is necessarytounderstandtheirapplicability.Thisbookthereforecontainsmathemat- icalderivationsthatIconsideredparticularlyusefulforathoroughunderstandingof thesubject;thebookrefersthereadertoothersourcesincaseofmathematicsthat goesbeyondthatofbasiccalculus.Thereaderwhoisnotfamiliarwithcalculusmay skipthosederivationsandcontinuewiththeapplications. Nonetheless, statistics is necessarily slanted toward applications. To highlight the relevance of the statistical methods described, I have reported original data from four fundamental scientific experiments from the past two centuries: J.J. Thomson’sexperimentthat led to the discoveryof the electron, G. Mendel’s data on plant characteristics that led to the law of independent assortment of species, E. Hubble’s observation of nebulae that uncovered the expansion of the universe, and K. Pearson’s collection of biometric characteristics in the UK in the early twentiethcentury.Theseexperimentsareusedthroughoutthebooktoillustratehow statisticalmethodsareappliedtoactualdataandareusedinseveralend-of-chapter problems.The reader will thereforehave an opportunityto see statistics in action ontheseclassicexperimentsandseveraladditionalexamples. The material presented in this book is aimed at upper-level undergraduate students or beginning graduate students. The reader is expected to be familiar withbasiccalculus,andnopriorknowledgeofstatisticsorprobabilityisassumed. Professionalscientistsandresearcherswillfinditausefulreferenceforfundamental methodssuchasmaximum-likelihoodfit, errorpropagationformulas,goodnessof fitandmodelcomparison,MonteCarlomethodssuchasthejackknifeandbootstrap, Monte Carlo Markov chains, Kolmogorov-Smirnov tests, and more. All subjects are complemented by an extensive set of numerical tables that make the book completelyself-contained. Thematerialpresentedinthisbookcanbecomfortablycoveredinaone-semester course and has several problems at the end of each chapter that are suitable as homework assignments or exam questions. Problems are both of theoretical and numerical nature, so that emphasis is equally placed on conceptual and practical understandingof the subject. Severaldatasets, includingthose in the four“classic experiments,”are used across several chapters, and the students can therefore use theminapplicationsofincreasingdifficulty. Huntsville,AL,USA MassimilianoBonamente Preface to the Second Edition The second edition of Statistics and Analysis of Scientific Data was motivated by the overallgoalto providea textbookthat is mathematicallyrigorousand easy to read and use as a reference at the same time. Basically, it is a book for both the student who wants to learn in detail the mathematical underpinnings of statistics and the reader who wants to just find the practical description on how to apply a givenstatisticalmethodorusethebookasareference. Tothisend,firstIdecidedthatamorecleardemarcationbetweentheoreticaland practicaltopicswouldimprovethereadabilityofthebook.Asaresult,severalpages (i.e.,mathematicalderivations)arenowclearlymarkedthroughoutthebookwitha verticalline,toindicatematerialthatisprimarilyaimedtothosereaderswhoseek amorethoroughmathematicalunderstanding.Thosepartsarenotrequiredtolearn howtoapplythestatisticalmethodspresentedinthebook.Forthereaderwhouses this book as a reference, this makes it easy to skip such sections and go directly to the main results. At the end of each chapter, I also provide a summary of key concepts,intendedforaquicklook-upoftheresultsofeachchapter. Secondly,certainexistingmaterialneededsubstantialre-organizationandexpan- sion. The second edition is now comprised of 16 chapters, versus ten of the first edition.Afewchapters(Chap.6onmean,median,andaverages,Chap.9onmulti- variableregression,andChap.11onsystematicerrorsandintrinsicscatter)contain materialthatissubstantiallynew.Inparticular,thetopicofmulti-variableregression was introducedbecauseof its use in manyfields such as businessand economics, whereitiscommontoapplytheregressionmethodtomanyindependentvariables. Otherchaptersoriginatefromre-arrangingexistingmaterialmoreeffectively.Some ofthenumericaltablesinboththemainbodyandtheappendixhavebeenexpanded andre-arranged,sothatthereaderwillfinditeveneasiertousethemforavariety ofapplicationsandasareference. Thesecondeditionalsocontainsanewclassicexperiment,thatofthemeasure- mentofirischaracteristicsbyR.A.FisherandE.Anderson.Thesenewdataareused to illustrate primarily the method of regression with many independentvariables. Thetextbooknowfeaturesatotaloffiveclassicexperiments(includingG.Mendel’s dataontheindependentassortmentofspecies,J.J.Thomson’sdataonthediscovery of the electron, K. Pearson’s collection of data of biometric characteristics, and E. Hubble’smeasurementsofthe expansionof the universe).These data andtheir analysisprovideauniquewaytolearnthestatisticalmethodspresentedinthebook and a resource for the student and the teacher alike. Many of the end-of-chapter problemsarebasedontheseexperimentaldata. Finally, the new edition contains corrections to a number of typos that had inadvertentlyenteredthemanuscript.Iamverymuchindebttomanyofmystudents attheUniversityofAlabamainHuntsvilleforpointingoutthesetypostomeoverthe pastfewyears,inparticular,toZacharyRobinson,whohaspatientlygonethrough muchofthetexttofindtypographicalerrors. Huntsville,AL,USA MassimilianoBonamente Contents 1 TheoryofProbability ...................................................... 1 1.1 Experiments,Events,andtheSampleSpace........................ 1 1.2 ProbabilityofEvents................................................. 2 1.2.1 TheKolmogorovAxioms.................................. 2 1.2.2 FrequentistorClassicalMethod........................... 3 1.2.3 BayesianorEmpiricalMethod............................. 4 1.3 FundamentalPropertiesofProbability.............................. 4 1.4 StatisticalIndependence ............................................. 5 1.5 ConditionalProbability .............................................. 7 1.6 A Classic Experiment:Mendel’sLaw ofHeredity andtheIndependentAssortmentofSpecies ........................ 8 1.7 TheTotalProbabilityTheoremandBayes’Theorem .............. 10 2 RandomVariablesandTheirDistributions ............................. 17 2.1 RandomVariables.................................................... 17 2.2 ProbabilityDistributionFunctions .................................. 19 2.3 MomentsofaDistributionFunction ................................ 20 2.3.1 TheMeanandtheSampleMean........................... 21 2.3.2 TheVarianceandtheSampleVariance.................... 22 2.4 A Classic Experiment:J.J. Thomson’sDiscovery oftheElectron........................................................ 23 2.5 CovarianceandCorrelationBetweenRandomVariables .......... 26 2.5.1 JointDistribution and Momentsof Two RandomVariables .......................................... 26 2.5.2 StatisticalIndependenceofRandomVariables............ 28 2.6 AClassicExperiment:Pearson’sCollectionofDataon BiometricCharacteristics............................................ 30 3 ThreeFundamentalDistributions:Binomial,Gaussian, andPoisson ................................................................. 35 3.1 TheBinomialDistribution........................................... 35 3.1.1 DerivationoftheBinomialDistribution................... 35 3.1.2 MomentsoftheBinomialDistribution.................... 38 3.2 TheGaussianDistribution........................................... 40 3.2.1 Derivationof the Gaussian Distribution fromtheBinomialDistribution............................ 40 3.2.2 MomentsandPropertiesoftheGaussianDistribution.... 44 3.2.3 HowtoGeneratea GaussianDistribution fromaStandardNormal.................................... 45 3.3 ThePoissonDistribution............................................. 45 3.3.1 DerivationofthePoissonDistribution..................... 46 3.3.2 PropertiesandInterpretationofthePoisson Distribution ................................................. 47 3.3.3 ThePoissonDistributionandthePoissonProcess........ 48 3.3.4 AnExampleonLikelihoodandPosterior ProbabilityofaPoissonVariable .......................... 49 3.4 ComparisonofBinomial,Gaussian,andPoissonDistributions ... 51 4 FunctionsofRandomVariablesandErrorPropagation .............. 55 4.1 LinearCombinationofRandomVariables.......................... 55 4.1.1 GeneralMeanandVarianceFormulas..................... 55 p 4.1.2 UncorrelatedVariablesandthe1= N Factor............. 56 4.2 TheMomentGeneratingFunction .................................. 58 4.2.1 PropertiesoftheMomentGeneratingFunction........... 59 4.2.2 TheMomentGeneratingFunctionofthe GaussianandPoissonDistribution ........................ 59 4.3 TheCentralLimitTheorem.......................................... 61 4.4 TheDistributionofFunctionsofRandomVariables ............... 64 4.4.1 TheMethodofChangeofVariables....................... 65 4.4.2 AMethodforMulti-dimensionalFunctions .............. 66 4.5 TheLawofLargeNumbers.......................................... 68 4.6 TheMeanofFunctionsofRandomVariables ...................... 69 4.7 The Varianceof Functionsof RandomVariables andErrorPropagationFormulas..................................... 70 4.7.1 SumofaConstant.......................................... 72 4.7.2 WeightedSumofTwoVariables........................... 72 4.7.3 ProductandDivisionofTwoRandomVariables.......... 73 4.7.4 PowerofaRandomVariable............................... 74 4.7.5 ExponentialofaRandomVariable ........................ 75 4.7.6 LogarithmofaRandomVariable.......................... 75 4.8 TheQuantileFunctionandSimulationofRandomVariables...... 76 4.8.1 GeneralMethodtoSimulateaVariable ................... 78 4.8.2 SimulationofaGaussianVariable......................... 79 5 MaximumLikelihoodandOtherMethodstoEstimate Variables .................................................................... 85 5.1 TheMaximumLikelihoodMethodforGaussianVariables........ 85 5.1.1 EstimateoftheMean....................................... 86 5.1.2 EstimateoftheVariance.................................... 87 5.1.3 EstimateofMeanforNon-uniformUncertainties ........ 88 5.2 TheMaximumLikelihoodMethodforOtherDistributions........ 90 5.3 MethodofMoments.................................................. 91 5.4 QuantilesandConfidenceIntervals ................................. 93 5.4.1 ConfidenceIntervalsforaGaussianVariable............. 94 5.4.2 ConfidenceIntervalsfortheMeanofaPoisson Variable ..................................................... 97 5.5 BayesianMethodsforthePoissonMean............................ 102 5.5.1 BayesianExpectationofthePoissonMean ............... 102 5.5.2 Bayesian UpperandLower Limitsfora PoissonVariable............................................ 103 6 Mean,Median,andAverageValuesofVariables ....................... 107 6.1 LinearandWeightedAverage ....................................... 107 6.2 TheMedian........................................................... 109 6.3 The Logarithmic Average and Fractional orMultiplicativeErrors.............................................. 109 6.3.1 TheWeightedLogarithmicAverage....................... 110 6.3.2 TheRelative-ErrorWeightedAverage..................... 113 7 HypothesisTestingandStatistics ......................................... 117 7.1 StatisticsandHypothesisTesting.................................... 117 7.2 The(cid:2)2Distribution................................................... 122 7.2.1 TheProbabilityDistributionFunction..................... 122 7.2.2 MomentsandOtherProperties............................. 125 7.2.3 HypothesisTesting ......................................... 126 7.3 TheSamplingDistributionoftheVariance ......................... 127 7.4 TheFStatistic........................................................ 131 7.4.1 TheProbabilityDistributionFunction..................... 132 7.4.2 MomentsandOtherProperties............................. 133 7.4.3 HypothesisTesting ......................................... 134 7.5 The Sampling Distribution of the Mean andtheStudent’stDistribution...................................... 137 7.5.1 ComparisonofSampleMeanwithParentMean.......... 137 7.5.2 Comparisonof Two Sample Meansand HypothesisTesting ......................................... 141 8 MaximumLikelihoodMethodsforTwo-VariableDatasets ........... 147 8.1 MeasurementofPairsofVariables.................................. 147 8.2 MaximumLikelihoodMethodforGaussianData.................. 149 8.3 Least-SquaresFittoaStraightLine,orLinearRegression ........ 150 8.4 MultipleLinearRegression.......................................... 151 8.4.1 Best-FitParametersforMultipleRegression.............. 152 8.4.2 Parameter Errors and Covariances for MultipleRegression........................................ 153 8.4.3 ErrorsandCovarianceforLinearRegression............. 154 8.5 SpecialCases:IdenticalErrorsorNoErrorsAvailable ............ 155 8.6 A Classic Experiment:EdwinHubble’sDiscovery oftheExpansionoftheUniverse.................................... 157 8.7 MaximumLikelihoodMethodforNon-linearFunctions .......... 160 8.8 LinearRegressionwithPoissonData ............................... 160 9 Multi-VariableRegression ................................................ 165 9.1 Multi-VariableDatasets.............................................. 165 9.2 A Classic Experiment: The R.A. Fisher and E.AndersonMeasurementsofIrisCharacteristics................. 166 9.3 TheMulti-VariableLinearRegression.............................. 168 9.4 TestsforSignificanceoftheMultipleRegressionCoefficients.... 170 9.4.1 T-TestfortheSignificanceofModelComponents........ 170 9.4.2 F-TestforGoodnessofFit ................................. 172 9.4.3 TheCoefficientofDetermination.......................... 174 10 GoodnessofFitandParameterUncertainty ............................ 177 10.1 GoodnessofFitforthe(cid:2)2 FitStatistic............................ 177 min 10.2 GoodnessofFitfortheCashCStatistic ............................ 180 10.3 ConfidenceIntervalsofParametersforGaussianData............. 181 10.3.1 ConfidenceIntervalonAllParameters.................... 183 10.3.2 ConfidenceIntervalsonReducedNumber ofParameters ............................................... 184 10.4 ConfidenceIntervalsofParametersforPoissonData .............. 186 10.5 TheLinearCorrelationCoefficient.................................. 187 10.5.1 TheProbabilityDistributionFunction..................... 188 10.5.2 HypothesisTesting ......................................... 190 11 SystematicErrorsandIntrinsicScatter ................................. 195 11.1 WhattoDoWhentheGoodness-of-FitTestFails.................. 195 11.2 IntrinsicScatterandDebiasedVariance............................. 196 11.2.1 DirectCalculationoftheIntrinsicScatter................. 196 11.2.2 AlternativeMethodtoEstimatetheIntrinsicScatter ..... 197 11.3 SystematicErrors..................................................... 198 11.4 EstimateofModelParameterswithSystematicErrors orIntrinsicScatter.................................................... 200 12 FittingTwo-VariableDatasetswithBivariateErrors .................. 203 12.1 Two-VariableDatasetswithBivariateErrors ....................... 203 12.2 GeneralizedLeast-SquaresLinearFittoBivariateData ........... 204 12.3 LinearFitUsingBivariateErrorsinthe(cid:2)2Statistic................ 209 13 ModelComparison ........................................................ 211 13.1 TheFTest ............................................................ 211 13.1.1 F-TestforTwoIndependent(cid:2)2Measurements ........... 212 13.1.2 F-TestforanAdditionalModelComponent.............. 214 13.2 Kolmogorov–SmirnovTests......................................... 216 13.2.1 ComparisonofDatatoaModel............................ 216 13.2.2 Two-SampleKolmogorov–SmirnovTest.................. 219

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.