STATISTICS FOR PHYSICAL SCIENCES AN INTRODUCTION This page intentionally left blank STATISTICS FOR PHYSICAL SCIENCES AN INTRODUCTION B. R. M ARTIN Department ofPhysicsand Astronomy University College London AMSTERDAM(cid:129)BOSTON(cid:129)HEIDELBERG(cid:129)LONDON NEWYORK(cid:129)OXFORD(cid:129)PARIS(cid:129)SANDIEGO SANFRANCISCO(cid:129)SINGAPORE(cid:129)SYDNEY(cid:129)TOKYO AcademicPressisanimprintofElsevier AcademicPressisanimprintofElsevier 225WymanStreet,Waltham,MA02451,USA 525BStreet,Suite1900,SanDiego,CA92101-4495,USA TheBoulevard,LangfordLane,Kidlington,OxfordOX51GB,UK Radarweg29,POBox211,1000AEAmsterdam,TheNetherlands Firstedition2012 Copyright(cid:1)2012ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproduced,storedinaretrievalsystemortransmittedinanyformorbyany meanselectronic,mechanical,photocopying,recordingorotherwisewithoutthepriorwrittenpermissionofthe publisherPermissionsmaybesoughtdirectlyfromElsevier’sScience&TechnologyRightsDepartmentin Oxford,UK:phone(+44)(0)1865843830;fax(+44)(0)1865853333;email:[email protected] youcansubmityourrequestonlinebyvisitingtheElsevierwebsiteathttp://elsevier.com/locate/permissions, andselectingObtainingpermissiontouseElseviermaterial Notice Noresponsibilityisassumedbythepublisherforanyinjuryand/ordamagetopersonsorpropertyasamatter ofproductsliability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products,instructions orideascontainedinthematerialherein.Becauseofrapidadvancesinthemedicalsciences,inparticular, independentverificationofdiagnosesanddrugdosagesshouldbemade LibraryofCongressCataloging-in-PublicationData Applicationsubmitted BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ForinformationonallAcademicPresspublications visitourwebsiteatelsevierdirect.com PrintedandboundinUSA 1213141510987654321 ISBN:978-0-12-387760-4 Contents 3.4. FunctionsofaRandomVariable 51 Preface ix Problems3 55 1. Statistics, Experiments, and Data 1 4. Probability Distributions II: Examples 57 1.1. ExperimentsandObservations 2 1.2. DisplayingData 4 4.1. Uniform 57 1.3. SummarizingDataNumerically 7 4.2. UnivariateNormal(Gaussian) 59 1.3.1. MeasuresofLocation 8 4.3. MultivariateNormal 63 1.3.2. MeasuresofSpread 9 4.3.1. BivariateNormal 65 1.3.3. MorethanOneVariable 12 4.4. Exponential 66 1.4. LargeSamples 15 4.5. Cauchy 68 1.5. ExperimentalErrors 17 4.6. Binomial 69 Problems1 19 4.7. Multinomial 74 4.8. Poisson 75 2. Probability 21 Problems4 80 2.1. AxiomsofProbability 21 5. Sampling and Estimation 83 2.2. CalculusofProbabilities 23 5.1. RandomSamplesandEstimators 83 2.3. TheMeaningofProbability 27 5.1.1. SamplingDistributions 84 2.3.1. FrequencyInterpretation 27 5.1.2. PropertiesofPointEstimators 86 2.3.2. SubjectiveInterpretation 29 5.2. EstimatorsfortheMean,Variance,and Problems2 32 Covariance 90 5.3. LawsofLargeNumbersandtheCentralLimit 3. Probability Distributions I: Theorem 93 Basic Concepts 35 5.4. ExperimentalErrors 97 3.1. RandomVariables 35 5.4.1. PropagationofErrors 99 3.2. SingleVariates 36 Problems5 103 3.2.1. ProbabilityDistributions 36 6. Sampling Distributions Associated with 3.2.2. ExpectationValues 40 the Normal Distribution 105 3.2.3. MomentGenerating,andCharacteristic Functions 42 6.1. Chi-SquaredDistribution 105 3.3. SeveralVariates 45 6.2. Student’stDistribution 111 3.3.1. JointProbabilityDistributions 45 6.3. FDistribution 116 3.3.2. MarginalandConditional 6.4. RelationsBetweenc2,t,andF Distributions 45 Distributions 119 3.3.3. MomentsandExpectationValues 49 Problems6 121 v vi CONTENTS 7. Parameter Estimation I: Maximum 10. Hypothesis Testing I: Parameters 193 Likelihood and Minimum Variance 123 10.1. StatisticalHypotheses 194 7.1. EstimationofaSingleParameter 123 10.2. GeneralHypotheses:LikelihoodRatios 198 7.2. VarianceofanEstimator 128 10.2.1. SimpleHypothesis:OneSimple 7.2.1. Approximatemethods 130 Alternative 198 7.3. SimultaneousEstimationofSeveral 10.2.2. CompositeHypotheses 201 Parameters 133 10.3. NormalDistribution 204 7.4. MinimumVariance 136 10.3.1. BasicIdeas 204 7.4.1. ParameterEstimation 136 10.3.2. SpecificTests 206 7.4.2. MinimumVarianceBound 137 10.4. OtherDistributions 214 Problems7 140 10.5. AnalysisofVariance 215 Problems10 218 8. Parameter Estimation II: Least-Squares and Other Methods 143 11. Hypothesis Testing II: Other Tests 221 8.1. UnconstrainedLinearLeastSquares 143 8.1.1. GeneralSolutionfortheParameters 145 11.1. Goodness-of-FitTests 221 8.1.2. ErrorsontheParameterEstimates 149 11.1.1. DiscreteDistributions 222 8.1.3. QualityoftheFit 151 11.1.2. ContinuousDistributions 225 8.1.4. OrthogonalPolynomials 152 11.1.3. LinearHypotheses 228 8.1.5. FittingaStraightLine 154 11.2. TestsforIndependence 231 8.1.6. CombiningExperiments 158 11.3. NonparametricTests 233 8.2. LinearLeastSquareswithConstraints 159 11.3.1. SignTest 233 8.3. NonlinearLeastSquares 162 11.3.2. Signed-RankTest 234 8.4. OtherMethods 163 11.3.3. Rank-SumTest 236 8.4.1. MinimumChi-Square 163 11.3.4. RunsTest 237 8.4.2. MethodofMoments 165 11.3.5. RankCorrelationCoefficient 239 8.4.3. Bayes’Estimators 167 Problems11 241 Problems8 171 Appendix A. Miscellaneous Mathematics 243 9. Interval Estimation 173 A.1. MatrixAlgebra 243 9.1. ConfidenceIntervals:BasicIdeas 174 A.2. ClassicalTheoryofMinima 247 9.2. ConfidenceIntervals:GeneralMethod 177 9.3. NormalDistribution 179 Appendix B. Optimization of Nonlinear 9.3.1. ConfidenceIntervalsfortheMean 180 Functions 249 9.3.2. ConfidenceIntervalsforthe Variance 182 B.1. GeneralPrinciples 249 9.3.3. ConfidenceRegionsfortheMeanand B.2. UnconstrainedMinimizationofFunctionsof Variance 183 Onevariable 252 9.4. PoissonDistribution 184 B.3. UnconstrainedMinimizationofMultivariable 9.5. LargeSamples 186 Functions 253 9.6. ConfidenceIntervalsNearBoundaries 187 B.3.1. DirectSearchMethods 253 9.7. BayesianConfidenceIntervals 189 B.3.2. GradientMethods 254 Problems9 190 B.4. ConstrainedOptimization 255 vii CONTENTS Appendix C. Statistical Tables 257 C.9. RunsTest 285 C.10. RankCorrelationCoefficient 286 C.1. NormalDistribution 257 C.2. BinomialDistribution 259 Appendix D. Answers to Odd-Numbered C.3. PoissonDistribution 266 Problems 287 C.4. Chi-squaredDistribution 273 C.5. Student’stDistribution 275 Bibliography 293 C.6. FDistribution 277 Index 295 C.7. Signed-RankTest 283 C.8. Rank-SumTest 284 This page intentionally left blank Preface Almostallphysicalscientistsephysicists, read in its entirety by anyone with a basic astronomers, chemists, earth scientists, and exposure to mathematics, principally others e at some time come into contact calculus and matrices, at the level of a first- with statistics. This is often initially during year undergraduate student of physical their undergraduate studies, but rarely is it science. viaafulllecturecourse.Usually,somestatis- Statisticsinphysicalscienceisprincipally tics lectures are given as part of a general concerned with the analysis of numerical mathematical methods course, or as part of data, so in Chapter 1 there is a review of a laboratory course; neither route is entirely what is meant by an experiment, and how satisfactory. The student learns a few tech- the data that it produces are displayed and niques, typically unconstrained linear least- characterized by a few simple numbers. squares fitting and analysis of errors, but This leads naturally to a discussion in without necessarily the theoretical back- Chapter 2 of the vexed question of proba- groundthatjustifiesthemethodsandallows bility e what do we mean by this term and one to appreciate their limitations. On the how is it calculated. There then follow two other hand, physical scientists, particularly chapters on probability distributions: the undergraduates, rarely have the time, and first reviews some basic concepts and in possiblytheinclination,tostudymathemat- thesecondthereisadiscussionoftheprop- ical statistics in detail. What I have tried to erties of a number of specific theoretical do in this book is therefore to steer a path distributions commonly met in the physical betweentheextremesofarecipeofmethods sciences. In practice, scientists rarely have with a collection of useful formulas, and access to the whole population of events, adetailedaccountofmathematicalstatistics, but instead have to rely on a sample from while at the same time developing the which to draw inferences about the popula- subject in a reasonably logical way. I have tion;soinChapter5thebasicideasinvolved included proofsof some of themoreimpor- in sampling are discussed. This is followed tant results stated in those cases wherethey in Chapter 6 by a review of some sampling are fairly short, but this book is written by distributions associated with the important a physicist for other physical scientists and and ubiquitous normal distribution, the there is no pretense to mathematical rigor. latter more familiar to physical scientists as The proofs are useful for showing how the the Gaussian function. The next two chap- definitions of certain statistical quantities ters explain how estimates are inferred for andtheirpropertiesmaybeused.Neverthe- individual parameters of a population from less, a reader uninterested in the proofs can sample statistics, using several practical easily skip over these, hopefully to come techniques. This is called point estimation. back to them later. Above all, I have con- ItisgeneralizedinChapter9byconsidering tained the size of the book so that it can be how to obtain estimates for the interval ix