AMS / MAA TEXTBOOKS VOL 34 Teaching Statistics Using Baseball Second Edition Jim Albert Teaching Statistics Using Baseball Second Edition Originallypublishedby TheMathematicalAssociationofAmerica,2017. SoftcoverISBN:978-1-4704-6938-2 LCCN:2017931120 Copyright©2017,heldbytheAmericanMathematicalSociety PrintedintheUnitedStatesofAmerica. ReprintedbytheAmericanMathematicalSociety,2022 TheAmericanMathematicalSocietyretainsallrights exceptthosegrantedtotheUnitedStatesGovernment. ⃝1Thepaperusedinthisbookisacid-freeandfallswithintheguidelines establishedtoensurepermanenceanddurability. VisittheAMShomepageathttps://www.ams.org/ 1098765432 272625242322 AMS/MAA TEXTBOOKS VOL 34 Teaching Statistics Using Baseball Second Edition Jim Albert CouncilonPublicationsandCommunications JenniferJ.Quinn,Chair CommitteeonBooks JenniferJ.Quinn,Chair MAATextbooksEditorialBoard StanleyE.Seltzer,Editor BelaBajnok MatthiasBeck RichardE.Bedient OttoBretscher HeatherAnnDye WilliamGreen CharlesR.Hampton JacquelineA.Jensen-Vallin SuzanneLynneLarson JohnLorch VirginiaA.Noonburg SusanF.Pustejovsky JeffreyL.Stuart MAATEXTBOOKS BridgetoAbstractMathematics,RalphW.Oberste-Vorth,AristidesMouzakitis,andBonitaA. Lawrence CalculusDeconstructed:ASecondCourseinFirst-YearCalculus,ZbigniewH.Nitecki CalculusfortheLifeSciences:AModelingApproach,JamesL.CornetteandRalphA.Ackerman Combinatorics:AGuidedTour,DavidR.Mazur Combinatorics:AProblemOrientedApproach,DanielA.Marcus CommonSenseMathematics,EthanD.BolkerandMauraB.Mast ComplexNumbersandGeometry,Liang-shinHahn ACourseinMathematicalModeling,DouglasMooneyandRandallSwift CryptologicalMathematics,RobertEdwardLewand DifferentialGeometryanditsApplications,JohnOprea DistillingIdeas:AnIntroductiontoMathematicalThinking,BrianP.KatzandMichaelStarbird ElementaryCryptanalysis,AbrahamSinkov ElementaryMathematicalModels,DanKalman AnEpisodicHistoryofMathematics:MathematicalCultureThroughProblemSolving,Steven G.Krantz EssentialsofMathematics,MargieHale FieldTheoryanditsClassicalProblems,CharlesHadlock FourierSeries,RajendraBhatia GameTheoryandStrategy,PhilipD.Straffin GeometryIlluminated:AnIllustratedIntroductiontoEuclideanandHyperbolicPlaneGeome- try,MatthewHarvey GeometryRevisited,H.S.M.CoxeterandS.L.Greitzer GraphTheory:AProblemOrientedApproach,DanielMarcus AnInvitationtoRealAnalysis,LuisF.Moreno KnotTheory,CharlesLivingston Learning Modern Algebra: From Early Attempts to Prove Fermat’s Last Theorem, Al Cuoco andJosephJ.Rotman TheLebesgueIntegralforUndergraduates,WilliamJohnston LieGroups:AProblem-OrientedIntroductionviaMatrixGroups,HarrietPollatsek MathematicalConnections:ACompanionforTeachersandOthers,AlCuoco MathematicalInterestTheory,SecondEdition,LeslieJaneFedererVaalerandJamesW.Daniel MathematicalModelingintheEnvironment,CharlesHadlock MathematicsforBusinessDecisionsPart1:ProbabilityandSimulation(electronictextbook), RichardB.ThompsonandChristopherG.Lamoureux MathematicsforBusinessDecisionsPart2:CalculusandOptimization(electronictextbook), RichardB.ThompsonandChristopherG.Lamoureux MathematicsforSecondarySchoolTeachers,ElizabethG.Bremigan,RalphJ.Bremigan,and JohnD.Lorch TheMathematicsofChoice,IvanNiven TheMathematicsofGamesandGambling,EdwardPackel MathThroughtheAges,WilliamBerlinghoffandFernandoGouvea NoncommutativeRings,I.N.Herstein Non-EuclideanGeometry,H.S.M.Coxeter NumberTheoryThroughInquiry,DavidC.Marshall,EdwardOdell,andMichaelStarbird OrdinaryDifferentialEquations:fromCalculustoDynamicalSystems,V.W.Noonburg APrimerofRealFunctions,RalphP.Boas ARadicalApproachtoLebesgue’sTheoryofIntegration,DavidM.Bressoud ARadicalApproachtoRealAnalysis,2ndedition,DavidM.Bressoud RealInfiniteSeries,DanielD.BonarandMichaelKhoury,Jr. TeachingStatisticsUsingBaseball,2ndedition,JimAlbert ThinkingGeometrically:ASurveyofGeometries,ThomasQ.Sibley TopologyNow!,RobertMesserandPhilipStraffin UnderstandingourQuantitativeWorld,JanetAndersenandToddSwanson Contents PrefacetotheFirstEdition ix PrefacetotheSecondEdition xi 1 AnIntroductiontoBaseballStatistics 1 2 ExploringaSingleBatchofBaseballData 13 2.1 LookingatTeams’OffensiveStatistics..................................................... 13 2.2 ATributetoDerekJeter........................................................................ 17 2.3 ATributetoRandyJohnson................................................................... 20 2.4 AnalyzingBaseballAttendance............................................................... 23 2.5 ManagerStatistics:theUseofSacrificeBunts............................................ 26 2.6 Exercises........................................................................................... 28 3 ComparingBatchesandStandardization 45 3.1 AlbertPujolsandMannyRamirez........................................................... 45 3.2 RobinRobertsandWhiteyFord.............................................................. 50 3.3 HomeRuns:AComparisonofFourSeasons.............................................. 54 3.4 SluggingPercentagesareNormal............................................................ 57 3.5 GreatBattingAverages......................................................................... 59 3.6 Exercises........................................................................................... 61 4 RelationshipsBetweenMeasurementVariables 73 4.1 RelationshipsinTeamOffensiveStatistics................................................. 73 4.2 RunsandOffensiveStatistics.................................................................. 78 4.3 MostValuableHittingStatistics.............................................................. 80 4.4 ANewMeasureofOffensivePerformance................................................ 86 4.5 HowImportantisaRun?....................................................................... 88 4.6 BaseballPlayersRegresstotheMean....................................................... 91 4.7 Exercises........................................................................................... 94 vii viii Contents 5 IntroductiontoProbabilityUsingTabletopGames 111 5.1 WhatisChrisDavis’HomeRunProbability?............................................. 111 5.2 BigLeagueBaseball............................................................................. 114 5.3 All-StarBaseball................................................................................. 116 5.4 Strat-O-MaticBaseball......................................................................... 119 5.5 Exercises........................................................................................... 124 6 ProbabilityDistributionsandBaseball 139 6.1 TheBinomialDistributionandHitsperGame............................................ 139 6.2 ModelingRunsScored:GettingonBase................................................... 142 6.3 ModelingRunsScored:AdvancingtheRunnerstoHome.............................. 144 6.4 Exercises........................................................................................... 148 7 IntroductiontoStatisticalInference 155 7.1 AbilityandPerformance........................................................................ 155 7.2 SimulatingaBatter’sPerformanceifHisAbilityisKnown............................ 157 7.3 LearningAboutaBatter’sAbility............................................................ 159 7.4 IntervalEstimatesforAbility.................................................................. 161 7.5 ComparingWadeBoggsandTonyGwynn................................................. 165 7.6 Exercises........................................................................................... 168 8 TopicsinStatisticalInference 175 8.1 SituationalHittingStatisticsforMikeTrout............................................... 176 8.2 ObservedSituationalEffectsforManyPlayers........................................... 178 8.3 ModelingOn-BasePercentagesforManyPlayers........................................ 181 8.4 ModelsforSituationalEffects................................................................. 185 8.5 IsMichaelBrantleyStreaky?.................................................................. 188 8.6 AStreakyDie..................................................................................... 191 8.7 Exercises........................................................................................... 193 9 ModelingBaseballUsingaMarkovChain 211 9.1 IntroductiontoaMarkovChain............................................................... 211 9.2 AHalf-inningofBaseballasaMarkovChain............................................. 215 9.3 UsefulMarkovChainCalculations........................................................... 217 9.4 TheValueofDifferentOn-baseEvents..................................................... 222 9.5 AnsweringQuestionsAboutBaseballStrategy........................................... 224 9.6 Exercises........................................................................................... 225 A AnIntroductiontoBaseball 233 A.1 TheGameofBaseball........................................................................... 233 A.2 OneHalf-InningofBaseball................................................................... 234 A.3 TheBoxscore:AStatisticalRecordofaBaseballGame................................ 235 Bibliography 239 Index 241 Preface to the First Edition Overtwentyyears,thisauthorhasbeenintheenterpriseofteachingintroductorystatisticstoan audiencethatistakingtheclasstosatisfytheirmathematicsrequirement.Thisisachallenging endeavor because the students have little prior knowledge about the discipline of statistics and many of them are anxious about mathematics and computation. Statistical concepts and examplesareusuallypresentedinaparticularcontext.However,oneobstacleinteachingthis introductoryclassisthatweoftendescribethestatisticalconceptsinacontext,suchasmedicine, law, or agriculture that is completely foreign to the undergraduate student. The student has a muchbetterchanceofunderstandingconceptsinprobabilityandstatisticsiftheyaredescribed inafamiliarcontext. Manystudentsarefamiliarwithsportseitherasaparticipantoraspectator.Theyknowofthe popularathletes,suchasTigerWoodsandBarryBonds,andtheyaregenerallyknowledgeable withtherulesofthemajorsports,suchasbaseball,football,andbasketball.Formanystudents sportsisafamiliarcontextinwhichaninstructorcandescribestatisticalthinking. Thegoalofthisbookistoprovideacollectionofexamplesandexercisesapplyingproba- bilityandstatisticstothesportofbaseball.Whybaseballinsteadofothersports? (cid:4) BaseballisthegreatAmericangame.Baseballisgreatinthatithasarichhistoryofteams andplayers,andmanypeoplearefamiliarwiththebasicrulesofthegame.Thepopularityof baseball isreflected by thelarge number of movies thathave been produced about baseball teamsandplayers. (cid:4) Baseball is the most statistical of all sports. Hitters and pitchers are identified by their correspondinghittingandpitchingstatistics.Forexample,BabeRuthisforeveridentifiedby the statistic 60, which was the number of home runs hit in his 1927 season. Bob Gibson is famous for his record low earned run average of 1.12 during the 1968 season. A flood of differentstatisticalmeasuresareusedtorateplayersandsalariesofplayersaredeterminedin partbythesestatistics.Thereisanactiveeffortamongbaseballwriterstolearnmoreabout baseballissuesbyusingstatistics. (cid:4) AwealthofbaseballdataiscurrentlyavailableovertheInternet.Playerandteamhittingand pitchingstatisticscanbeeasilyfound.Comparisonsbetweenplayersofdifferenterascanbe made using a downloadable dataset that gives hitting and pitching data for all players who haveeverplayedprofessionalbaseball. ix