ANALYZING WIMBLEDON The game of tennis raises many challenging questions to a statisti- cian. Is it true that serving first in a set gives an advantage? Or servingwithnewballs? Istheseventhgameinasetparticularlyim- portant? Are top players more ‘stable’ than other players? Do real champions win the big points? These, and many other questions, are formulated as ‘hypotheses’ and tested statistically. This book discusses how the outcome of a match can be predicted (even while the match is in progress), which points are important and which are not, how to choose an optimal service strategy, and whether a ‘winning mood’ actually exists in tennis. Aimed at readers with someknowledgeofmathematics andstatistics, thebookusestennis (Wimbledon in particular) as a vehicle to illustrate the power and beauty of statistical reasoning. Franc Klaassen is Professor of International Economics at the Uni- versity of Amsterdam. After obtaining masters degrees in econo- metrics and economics and a PhD at Tilburg University, he moved to the University of Amsterdam in 1999. Klaassen is a fellow of the Tinbergen Institute and was a visiting fellow at the University of Wisconsin-Madison. His main research interests are the empirical analysis of international economics and finance, fiscal policy, and sports, mainly tennis, on which he has published widely. He is an enthusiastic tennis player and, as a junior, was selected to train with the Royal Dutch Lawn Tennis Association for nine years. Jan R. Magnus is Emeritus Professor at Tilburg University and Visiting Professor of Econometrics at the VU University Amster- dam. He studied econometrics and philosophy at the University of Amsterdam, where he obtained his PhD in Economics. He worked attheUniversitiesofAmsterdam,Leiden,andBritishColumbiabe- foremoving to theLondonSchoolof Economics in 1981. In1996 he was appointed Research Professor of Econometrics at Tilburg Uni- versity. Magnus held visiting positions at University of California San Diego, New Economic School of Moscow, European University Institute in Florence, and University of Tokyo, among others. He is author or coauthor of eight books and more than one hundred scientific papers. This page intentionally left blank Analyzing Wimbledon The Power of Statistics Franc Klaassen Amsterdam School of Economics, University of Amsterdam, The Netherlands Jan R. Magnus Department of Econometrics & Operations Research, VU University Amsterdam, The Netherlands 1 1 OxfordUniversityPressisadepartmentoftheUniversityofOxford. Itfurthersthe University’sobjectiveofexcellenceinresearch,scholarship,andeducationby publishingworldwide. Oxford NewYork Auckland CapeTown DaresSalaam HongKong Karachi KualaLumpur Madrid Melbourne MexicoCity Nairobi NewDelhi Shanghai Taipei Toronto Withofficesin Argentina Austria Brazil Chile CzechRepublic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore SouthKorea Switzerland Thailand Turkey Ukraine Vietnam OxfordisaregisteredtrademarkofOxfordUniversityPressintheUK andcertainothercountries. PublishedintheUnitedStatesofAmericaby OxfordUniversityPress 198MadisonAvenue,NewYork,NY10016 (cid:2)c FrancKlaassenandJanR.Magnus2014 Allrightsreserved. Nopartofthispublicationmaybereproduced,storedina retrievalsystem,ortransmitted,inanyformorbyanymeans,withouttheprior permissioninwritingofOxfordUniversityPress,orasexpresslypermittedbylaw, bylicense,orundertermsagreedwiththeappropriatereproductionrights organization. Inquiriesconcerningreproductionoutsidethescopeoftheaboveshould besenttotheRightsDepartment,OxfordUniversityPress,attheaddressabove. Youmustnotcirculatethisworkinanyotherformandyoumustimposethissame conditiononanyacquirer. LibraryofCongressCataloging-in-PublicationData Klaassen,Franc. AnalyzingWimbledon: thepowerofstatistics/FrancKlaassen,JanR.Magnus. pagescm Includesbibliographicalreferencesandindex. ISBN978-0-19-935595-2(cloth: alk.paper)–ISBN978-0-19-935596-9(paperback: alk.paper)1. WimbledonChampionships(Wimbledon,London,England)2. Tennis–Statistics. I.Magnus,JanR.II.Title. GV999.K582014 796.342094212–dc23 2013026355 135798642 Typesetbytheauthors PrintedintheUnitedStatesofAmericaonacid-freepaper To my parents (FK) To Eveline de Jong (JM) This page intentionally left blank Contents Preface xiii Acknowledgements xv 1 Warming up 1 Wimbledon . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Commentators . . . . . . . . . . . . . . . . . . . . . . . . 2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Correlation and causality . . . . . . . . . . . . . . . . . . 4 Why statistics? . . . . . . . . . . . . . . . . . . . . . . . . 5 Sports data and human behavior . . . . . . . . . . . . . . 6 Why tennis? . . . . . . . . . . . . . . . . . . . . . . . . . 8 Structure of the book . . . . . . . . . . . . . . . . . . . . 9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Richard 13 Meeting Richard . . . . . . . . . . . . . . . . . . . . . . . 13 From point to game . . . . . . . . . . . . . . . . . . . . . 15 The tiebreak . . . . . . . . . . . . . . . . . . . . . . . . . 17 Serving first in a set . . . . . . . . . . . . . . . . . . . . . 18 During the set . . . . . . . . . . . . . . . . . . . . . . . . 20 Best-of-three versus best-of-five . . . . . . . . . . . . . . . 21 Upsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Long matches: Isner-Mahut 2010 . . . . . . . . . . . . . . 24 Rule changes: the no-ad rule . . . . . . . . . . . . . . . . 27 Abolishing the second service . . . . . . . . . . . . . . . . 28 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 30 viii Contents 3 Forecasting 33 Forecasting with Richard . . . . . . . . . . . . . . . . . . 34 Federer-Nadal, Wimbledon final 2008 . . . . . . . . . . . . 36 Effect of smaller p¯ . . . . . . . . . . . . . . . . . . . . . . 38 Kim Clijsters defeats Venus Williams, US Open 2010 . . . 40 Effect of larger p¯ . . . . . . . . . . . . . . . . . . . . . . . 41 Djokovic-Nadal, Australian Open 2012 . . . . . . . . . . . 42 In-play betting . . . . . . . . . . . . . . . . . . . . . . . . 44 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Importance 49 What is importance? . . . . . . . . . . . . . . . . . . . . . 49 Big points in a game . . . . . . . . . . . . . . . . . . . . . 50 Big games in a set . . . . . . . . . . . . . . . . . . . . . . 52 The vital seventh game . . . . . . . . . . . . . . . . . . . 54 Big sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Are all points equally important? . . . . . . . . . . . . . . 57 The most important point . . . . . . . . . . . . . . . . . . 58 Three importance profiles . . . . . . . . . . . . . . . . . . 59 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 62 5 Point data 65 The Wimbledon data set . . . . . . . . . . . . . . . . . . . 65 Two selection problems . . . . . . . . . . . . . . . . . . . 67 Estimators, estimates, and accuracy . . . . . . . . . . . . 70 Development of tennis over time . . . . . . . . . . . . . . 72 Winning a point on service unraveled . . . . . . . . . . . . 74 Testing a hypothesis: men versus women . . . . . . . . . . 76 Aces and double faults . . . . . . . . . . . . . . . . . . . . 78 Breaks and rebreaks . . . . . . . . . . . . . . . . . . . . . 80 Are our summary statistics too simple? . . . . . . . . . . 82 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 82 6 The method of moments 85 Our summary statistics are too simple . . . . . . . . . . . 85 The method of moments . . . . . . . . . . . . . . . . . . . 88 Enter Miss Marple . . . . . . . . . . . . . . . . . . . . . . 90 Re-estimating p by the method of moments . . . . . . . . 90 Men versus women revisited . . . . . . . . . . . . . . . . . 91 Contents ix Beyond the mean: variation over players . . . . . . . . . . 92 Reliability of summary statistics: a rule of thumb . . . . . 94 Filtering out the noise . . . . . . . . . . . . . . . . . . . . 97 Noise-free variation over players . . . . . . . . . . . . . . . 99 Correlation between opponents . . . . . . . . . . . . . . . 100 Why bother? . . . . . . . . . . . . . . . . . . . . . . . . . 102 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 102 7 Quality 105 Observable variation over players . . . . . . . . . . . . . . 105 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Round, bonus, and malus . . . . . . . . . . . . . . . . . . 112 Significance, relevance, and sensitivity . . . . . . . . . . . 114 The complete model . . . . . . . . . . . . . . . . . . . . . 115 Winning a point on service . . . . . . . . . . . . . . . . . 116 Other service characteristics . . . . . . . . . . . . . . . . . 119 Aces and double faults . . . . . . . . . . . . . . . . . . . . 121 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 123 8 First and second service 127 Is the second service more important than the first? . . . 127 Differences in service probabilities explained . . . . . . . . 130 Joint analysis: bivariate GMM . . . . . . . . . . . . . . . 132 Four service dimensions . . . . . . . . . . . . . . . . . . . 134 Four-variate GMM . . . . . . . . . . . . . . . . . . . . . . 134 Further reading . . . . . . . . . . . . . . . . . . . . . . . . 136 9 Service strategy 137 The server’s trade-off . . . . . . . . . . . . . . . . . . . . . 137 The y-curve . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Optimal strategy: one service . . . . . . . . . . . . . . . . 140 Optimal strategy: two services . . . . . . . . . . . . . . . 141 Existence and uniqueness . . . . . . . . . . . . . . . . . . 142 Four regularity conditions for the optimal strategy . . . . 143 Functional form of y-curve . . . . . . . . . . . . . . . . . . 145 Efficiency defined . . . . . . . . . . . . . . . . . . . . . . . 146 Efficiency of the average player . . . . . . . . . . . . . . . 147 Observations for the key probabilities: Monte Carlo . . . 148 Efficiency estimates. . . . . . . . . . . . . . . . . . . . . . 149
Description: