Lecture Notes in Physics 941 Luca Lista Statistical Methods for Data Analysis in Particle Physics Second Edition Lecture Notes in Physics Volume 941 FoundingEditors W.Beiglböck J.Ehlers K.Hepp H.Weidenmüller EditorialBoard M.Bartelmann,Heidelberg,Germany P.Hänggi,Augsburg,Germany M.Hjorth-Jensen,Oslo,Norway R.A.L.Jones,Sheffield,UK M.Lewenstein,Barcelona,Spain H.vonLöhneysen,Karlsruhe,Germany A.Rubio,Hamburg,Germany M.Salmhofer,Heidelberg,Germany W.Schleich,Ulm,Germany S.Theisen,Potsdam,Germany D.Vollhardt,Augsburg,Germany J.D.Wells,AnnArbor,USA G.P.Zank,Huntsville,USA The Lecture Notes in Physics The series Lecture Notes in Physics (LNP), founded in 1969, reports new developmentsin physicsresearch and teaching-quicklyand informally,but with a highqualityandtheexplicitaimtosummarizeandcommunicatecurrentknowledge in an accessible way. Books published in this series are conceived as bridging materialbetweenadvancedgraduatetextbooksandtheforefrontofresearchandto servethreepurposes: (cid:129) to be a compact and modern up-to-date source of reference on a well-defined topic (cid:129) to serve as an accessible introduction to the field to postgraduate students and nonspecialistresearchersfromrelatedareas (cid:129) to be a source of advanced teaching material for specialized seminars, courses andschools Bothmonographsandmulti-authorvolumeswillbeconsideredforpublication. Editedvolumesshould,however,consistofaverylimitednumberofcontributions only.ProceedingswillnotbeconsideredforLNP. Volumes published in LNP are disseminated both in print and in electronic formats, the electronic archive being available at springerlink.com. The series contentisindexed,abstractedandreferencedbymanyabstractingandinformation services, bibliographic networks, subscription agencies, library networks, and consortia. Proposalsshouldbe sent to a memberof the EditorialBoard, ordirectly to the managingeditoratSpringer: ChristianCaron SpringerHeidelberg PhysicsEditorialDepartmentI Tiergartenstrasse17 69121Heidelberg/Germany [email protected] Moreinformationaboutthisseriesathttp://www.springer.com/series/5304 Luca Lista Statistical Methods for Data Analysis in Particle Physics Second Edition 123 LucaLista INFNSezionediNapoli Napoli,Italy ISSN0075-8450 ISSN1616-6361 (electronic) LectureNotesinPhysics ISBN978-3-319-62839-4 ISBN978-3-319-62840-0 (eBook) DOI10.1007/978-3-319-62840-0 LibraryofCongressControlNumber:2017948232 ©SpringerInternationalPublishingAG2016,2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface ThisbookstartedasacollectionofmaterialfromacourseoflecturesonStatistical MethodsforDataAnalysisIgavetoPh.D.studentsinphysicsattheUniversityof NaplesFedericoIIfrom2009to2017andwassubsequentlyenrichedwithmaterial fromotherseminarsandlecturesIhavebeeninvitedtogiveinthelastyears. Theaimofthebookistopresentandelaboratethemainconceptsandtoolsthat physicistsusetoanalyzeexperimentaldata. An introductionto probabilitytheory and basic statistics is providedmainly as refresherlecturesto studentswhodidnottake aformalcourseon statistics before startingtheirPh.D.ThisalsogivestheopportunitytointroduceBayesianapproach toprobability,whichisanewtopictomanystudents. More advanced topics follow, up to recent developmentsin statistical methods usedforparticlephysics,inparticularfordataanalysesattheLargeHadronCollider. Manyofthecoveredtoolsandmethodshaveapplicationsinhigh-energyphysics, buttheirscopecouldwellbeextendedtootherfields. A shorter version of the course was presented at CERN in November 2009 as lectures on Statistical Methods in LHC Data Analysis for the ATLAS and CMS experiments. The chapter that discusses discoveries and upper limits was improvedafter the lectureson the subjectI gavein Autrans,France, atthe IN2P3 School of Statistics in May 2012. I was also invited to conduct a seminar about Statistical MethodsatGentUniversity,Belgium,in October2014,whichgaveme theopportunitytoreviewsomeofmymaterialandaddnewexamples. Note to theSecond Edition ThesecondeditionofthisbookreflectstheworkIdidinpreparationofthelectures thatIwasinvitedtogiveduringtheCERN-JINREuropeanSchoolofHigh-Energy Physics (15–28 June 2016, Skeikampen, Norway). On that occasion, I reviewed, expanded,andreorderedmymaterial. v vi Preface In addition,with respectto the first edition, I addeda chapter aboutunfolding, anextendeddiscussionaboutthebestlinearunbiasedestimator,andanintroduction to machine learning algorithms, in particular artificial neuralnetworks, with hints aboutdeeplearning,andboosteddecisiontrees. Acknowledgments IamgratefultoLouisLyonswhocarefullyandpatientlyreadthefirsteditionofmy bookand providedusefulcommentsand suggestions.I wouldlike to thankEliam Grossforprovidingusefulexamplesandforreviewingthesectionsaboutthelook elsewhere effect. I also received useful comments from Vitaliano Ciulli and from LuisIsaacRamosGarcia. IconsideredallfeedbackIreceivedinthepreparationofthissecondedition. Napoli,Italy LucaLista Contents 1 ProbabilityTheory ......................................................... 1 1.1 WhyProbabilityMatterstoaPhysicist............................ 1 1.2 TheConceptofProbability......................................... 2 1.3 RepeatableandNon-RepeatableCases ............................ 2 1.4 DifferentApproachestoProbability............................... 3 1.5 ClassicalProbability................................................ 4 1.6 GeneralizationtotheContinuum................................... 6 1.6.1 TheBertrand’sParadox.................................. 7 1.7 AxiomaticProbabilityDefinition .................................. 8 1.8 ProbabilityDistributions............................................ 9 1.9 ConditionalProbability............................................. 9 1.10 IndependentEvents................................................. 10 1.11 LawofTotalProbability............................................ 11 1.12 Average,VarianceandCovariance................................. 12 1.13 TransformationsofVariables....................................... 15 1.14 TheBernoulliProcess .............................................. 16 1.15 TheBinomialProcess............................................... 17 1.16 MultinomialDistribution........................................... 20 1.17 TheLawofLargeNumbers........................................ 21 1.18 FrequentistDefinitionofProbability............................... 22 References.................................................................... 23 2 ProbabilityDistributionFunctions....................................... 25 2.1 Introduction......................................................... 25 2.2 DefinitionofProbabilityDistributionFunction................... 25 2.3 AverageandVarianceintheContinuousCase .................... 27 2.4 Mode,Median,Quantiles........................................... 28 2.5 CumulativeDistribution............................................ 28 2.6 ContinuousTransformationsofVariables ......................... 29 2.7 UniformDistribution ............................................... 30 vii viii Contents 2.8 GaussianDistribution............................................... 31 2.9 (cid:2)2Distribution...................................................... 32 2.10 LogNormalDistribution ........................................... 33 2.11 ExponentialDistribution............................................ 34 2.12 PoissonDistribution ................................................ 35 2.13 OtherDistributionsUsefulinPhysics ............................. 41 2.13.1 Breit–WignerDistribution............................... 41 2.13.2 RelativisticBreit–WignerDistribution.................. 42 2.13.3 ArgusFunction........................................... 43 2.13.4 CrystalBallFunction.................................... 44 2.13.5 LandauDistribution...................................... 46 2.14 CentralLimitTheorem ............................................. 46 2.15 ProbabilityDistributionFunctionsinMorethanOne Dimension........................................................... 49 2.15.1 MarginalDistributions................................... 49 2.15.2 IndependentVariables................................... 50 2.15.3 ConditionalDistributions................................ 53 2.16 GaussianDistributionsinTwoorMoreDimensions.............. 54 References.................................................................... 58 3 BayesianApproachtoProbability........................................ 59 3.1 Introduction......................................................... 59 3.2 Bayes’Theorem..................................................... 59 3.3 BayesianProbabilityDefinition.................................... 64 3.4 BayesianProbabilityandLikelihoodFunctions................... 67 3.4.1 RepeatedUseofBayes’TheoremandLearning Process ................................................... 67 3.5 BayesianInference.................................................. 68 3.5.1 ParametersofInterestandNuisanceParameters....... 69 3.5.2 CredibleIntervals........................................ 70 3.6 BayesFactors ....................................................... 73 3.7 SubjectivenessandPriorChoice................................... 74 3.8 Jeffreys’Prior....................................................... 75 3.9 ReferencePriors .................................................... 76 3.10 ImproperPriors ..................................................... 76 3.11 TransformationsofVariablesandErrorPropagation ............. 79 References.................................................................... 79 4 RandomNumbersandMonteCarloMethods .......................... 81 4.1 PseudorandomNumbers............................................ 81 4.2 PseudorandomGeneratorsProperties.............................. 82 4.3 UniformRandomNumberGenerators............................. 84 4.3.1 RemappingUniformRandomNumbers................ 85 4.4 DiscreteRandomNumberGenerators............................. 85 Contents ix 4.5 NonuniformRandomNumberGenerators......................... 86 4.5.1 NonuniformDistribution from Inversion oftheCumulativeDistribution.......................... 86 4.5.2 GaussianGeneratorUsingtheCentralLimit Theorem.................................................. 88 4.5.3 GaussianGeneratorwiththeBox–MullerMethod..... 89 4.6 MonteCarloSampling.............................................. 89 4.6.1 Hit-or-MissMonteCarlo................................ 90 4.6.2 ImportanceSampling.................................... 91 4.7 NumericalIntegrationwithMonteCarloMethods................ 92 4.8 MarkovChainMonteCarlo ........................................ 93 References.................................................................... 95 5 ParameterEstimate ........................................................ 97 5.1 Introduction......................................................... 97 5.2 Inference............................................................. 97 5.3 ParametersofInterest............................................... 98 5.4 NuisanceParameters................................................ 98 5.5 MeasurementsandTheirUncertainties............................ 99 5.5.1 StatisticalandSystematicUncertainties................ 99 5.6 FrequentistvsBayesianInference ................................. 100 5.7 Estimators........................................................... 100 5.8 PropertiesofEstimators............................................ 101 5.8.1 Consistency .............................................. 102 5.8.2 Bias....................................................... 102 5.8.3 MinimumVarianceBoundandEfficiency.............. 102 5.8.4 RobustEstimators........................................ 103 5.9 BinomialDistributionforEfficiencyEstimate .................... 104 5.10 MaximumLikelihoodMethod ..................................... 105 5.10.1 LikelihoodFunction ..................................... 105 5.10.2 ExtendedLikelihoodFunction.......................... 106 5.10.3 GaussianLikelihoodFunctions ......................... 108 5.11 ErrorswiththeMaximumLikelihoodMethod.................... 109 5.11.1 SecondDerivativesMatrix .............................. 109 5.11.2 LikelihoodScan.......................................... 110 5.11.3 PropertiesofMaximumLikelihoodEstimators........ 112 5.12 Minimum(cid:2)2andLeast-SquaresMethods......................... 114 5.12.1 LinearRegression........................................ 115 5.12.2 GoodnessofFitandp-Value............................ 118 5.13 BinnedDataSamples............................................... 118 5.13.1 Minimum(cid:2)2MethodforBinnedHistograms.......... 119 5.13.2 BinnedPoissonianFits .................................. 120