Table Of Content

StatisticalScience 2009,Vol.24,No.2,191–194 DOI:10.1214/09-STS284REJ MainarticleDOI:10.1214/09-STS284 (cid:13)c InstituteofMathematicalStatistics,2009 Rejoinder: Harold Jeffreys’s Theory of Probability Revisited Christian P. Robert, Nicolas Chopin and Judith Rousseau 0 1 Abstract. We are grateful to all discussants of our re-visitation for 0 2 their strong support in our enterprise and for their overall agreement with our perspective. Further discussions with them and other leading n a statisticians showed that the legacy of Theory of Probability is alive J and lasting. 8 1 1. ON BERNARDO’S COMMENTS sure as an invariant loss function, even though the ] E necessity ofscaling this measureandof selecting the We cannot but agree with most issues raised by M subsequentboundbetween acceptance andrejection Professor Bernardo, first and foremost the impor- . remainstrongimpedimentsagainstadoptingthisal- t tantdistinctionbetweentestingandestimation.The a ternative. (The fact that it depends on the sample t multidimensional Jeffreys prior (for estimation) is s size n is certainly a major drawback, although one [ certainlynotformallydefinedwithinTheoryofProb- could reasonably object that the boundbetween ac- ability and the multiplication of cases in the book 2 ceptance and rejection associated with the Bayes v does not help. We alas have no clear explanation as factor should also depend on the sample size.) 8 to why most Jeffreys priors produce proper posteri- 0 ors for all datasets. While Lindley–Jeffreys’s para- 0 2. ON GELMAN’S COMMENTS dox may beupsetting(although it mostly highlights 1 . the discrepancy between the frequentist and the 9 We first apologize to the authors of Gelman et al. Bayesiananswers)we,however,considertheattempt 0 (2001) for not ranking them into the “classics” of 9 to create a testing Jeffreys prior in Section 5.2 as an our first footnote. This choice was, however, delib- 0 interesting if incomplete attempt, concretized much : erate: we wanted to stop short of comparing the v later by Bayarri and Garcia-Donato (2007). We un- most recent textbooks of the late 1990s (excluding i X derstandProfessorBernardo’spointofviewonBayes as well Robert, 1994). At a more foundational level, factors,butstillresistthetemptation tothrowaway r the debate about the choice of a noninformative or a this useful tool, as discussed below in conjunction of a weakly informative prior is endless, hopeless withProfessorLindley’scomments.Wearenonethe- and possibly fruitless, in that (a) there is no way a less sympathetic to the intrinsic discrepancy mea- single perfect noninformative prior can be adopted by one and all except through a formal decision Christian P. Robert is Professor, CEREMADE, from the community to always use Jeffreys prior as Université Paris Dauphine, 75775 Paris cedex 16, a default (in the same way the Black-and-Scholes France e-mail: [email protected]. Nicolas formula is used by financial analysts as a common Chopin is Professor, CREST-ENSAE, INSEE, 92245 Malakoff cedex, France e-mail: [email protected]. ground, not as a representation of real series); and Judith Rousseau is Professor, CEREMADE, Université (b) noninformative and informative priors are not Paris Dauphine, 75775 Paris cedex 16, France e-mail: two well-separated categories, they form a contin- [email protected]. uum.Itseemsthusmorefruitfultotrytobuildmea- suresthatassess of theimpactof agiven prior(or of This is an electronic reprint of the original article the variation of a parameter in a family of priors). published by the Institute of Mathematical Statistics in Statistical Science, 2009, Vol. 24, No. 2, 191–194. This The debate about complexity is more in line with reprint differs from the original in pagination and our views: (a) similar to the notion of a universal typographic detail. noninformative prior, a practical implementation of 1 2 C. P. ROBERT,N.CHOPIN AND J. ROUSSEAU the Ockham’s razor does not exist; and (b) com- of this journal). That Bayesian testing, or any kind plexity is quite a subjective factor. This is not to of testing, remains a source for discussion and fur- say that we reject the Jeffreysian tenet that Bayes ther research is clearly illustrated by the number of factors naturally downweight complex models with commentsandthevariety ofproposalsonthispoint. limited supportfromthedata,sincewesupportthis Finally,thelackofdecisiontheoryisanissuethatwe point,butrather that thesupportfor morecomplex also deplore,in agreement with ProfessorsBernardo models should come from the prior or from a loss and Lindley as well, if not Professor Zellner. function, rather than from the complexity-hungry 4. ON LINDLEY’S COMMENTS likelihood.(Thesocialscientistattitudethatworries about missing some factor could also be questioned Besides so kindly contributing to the discussion as being too optimistic in its belief in models.) Fi- therein, Professor Lindley patiently and helpfully nally, we concede that Bayesian data analysis may enlightened us on the construction and contents of forceustomoveawayfromthe“ideal”standardsset Theory of Probability during the preparation of the byHarold Jeffreys’sTheory of Probability, including paper. We are therefore deeply indebted to him for the reliance on the Bayes factor. Bayesian model sharing so much with us. His comments bring a criticism, indeed a major direction in Gelman et al. unique perspective to the discussion, both from historical and foundational viewpoints. As a witness (2001), is still in its infancy and would correctly re- of the early developments of Theory of Probabil- quire more emphasis in our papers and in our prac- ity,ProfessorLindleyexposes thephilosophical cum tice! As put by Professor Gelman, we need to learn practical reasons for the composition of this book. more from “the failures of a statistical model’s at- The point about Section 3.10 and the integration tempt to capture reality.” overthesamplespacewasmissedinouranalysisbut is indeed crucial in its link with the likelihood prin- 3. ON KASS’ COMMENTS ciplethatdoesnotappearperseinTheory of Proba- We are grateful to Professor Kass for his com- bility. Nowadays, this is certainly themost standard ments that follow a talk given during the Harold example that illustrates how the principle for con- Jeffreys’s Theory of Probability anniversary session structing Jeffreys’s priors may violate the likelihood at the O-Bayes 2009 meeting. Thereis actually very principle (Berger and Wolpert, 1988). (The opposi- little we can disagree with in these comments which tion withdeFinetti’s perspectiveisalso worthnotic- show a deep and scholarly knowledge of Theory of ing, since they approached Bayesian statistics from fundamentally different perspectives, even though Probability andexposeourneedtopursueourstudy their respective books share the same title.) of this profound book. The fact that uncertainty must be analyzed in The connection with geometry was bound to be probabilistic terms is certainly a driving force in part of Professor Kass’ comments and we do agree Theory of Probability and a convincing reason to with the essential feature of looking for orthogonal follow Bayesian ways. Wecompletely agree thatthis parameterisation, a point which, in our awkward formalizationisoneofHaroldJeffreys’sgreatinputs. phrasing, we would relate to the search for a con- Once again, the other fundamental input stressed stant information parameterization. We also appre- both by Professor Lindley and ourselves is the com- ciatetheemphasisonLaplace’sapproximationsthat pleteformalization ofacoherentapproachtotesting permeate the book and provide an early link with via Bayes factors. Professor Lindley is 100% correct Bayesian asymptotics. The epistemological implica- in his assessment of the opposition of this view with tions of Theory of Probability are certainly worth Popper’s and of its persistence (Templeton, 2008): stressing (a point also made by Professor Zellner) rejecting a model based on its “falsity” is only feasi- if only because Harold Jeffreys was first and fore- blewhenconsideringtheavailablealternatives.That most a physicist who developed his own statistical Theory of Probability does not directly dwell on de- tools to deal with his own physics problems. The cisions is clearly a feature of the time, even though specificpointsmadebyProfessorKassaboutthena- Keynes had opened the way a few years earlier, but ture of statistical models would be worth emphasis- thisdidnotpreventaformalizationofBayesiantest- ing during any course in applied and even method- ing procedures that proved itself compatible with ological statistics (as are the central discussions by “0–1” loss functions, thus showing the insight in Erich Lehmann and David Cox in the 1990 volume Harold Jeffreys’s intuitions. 3 REJOINDER 5. ON SENN’S COMMENTS Theory of Probability with the philosophy of science at the time it was written. This is not to say we Given the tone of some earlier comments of Pro- missedtheglobalimpactofTheory of Probability on fessor Senn on the Bayesian paradigm, we must ac- scientific modeling and its definition of induction, a knowledge our pleasant surprise at his conciliatory point already stressed by Professor Kass, because it toneinthesecomments.Thankfully,thebarbedpar- obviously represents the major impact of the book, odyofHarold Jeffreys’smostquotedsentencesome- but the style of the discussions about the axiomatic how re-establishes the balance! We are quite grate- fultoProfessorSennforhislaudatoryremarks,even nature of probability and our lack of background in though we must acknowledge that our copies of this area led us to bypass them to focus on the link Harold Jeffreys’s Theory of Probability are also full with modern Bayesian statistics. (Neither does the of pencil annotations and question marks, and also “proof” of Bayes’ theorem strike us as ultimately that it took two series of lectures to achieve this necessary, once the axiomatic definition of probabil- incomplete state of awareness. We furthermore en- ity is agreed upon.)The coherence of the system for joyed the mention that Harold Jeffreys considered scientific induction presented in Theory of Probabil- Bayesian significance tests as the most important ity is what struck us the most in Theory of Proba- part of Theory of Probability, since this agrees with bility, even though we presumably skimmed too fast both Professor Lindley’s and our perceptions. over this point. We dearly appreciate the further historical de- Asalreadynoted(withadifferenttwist)inthedis- tailsprovidedbyProfessorSenn’scomments,partic- cussion about Professor Gelman’s comments, there ularly in that the exchange between Ronald Fisher is no end to the debate about non-informative pri- and Harold Jeffreys is represented in much less a ors and, while Professor Zellner’s maximal data in- controversial tone that we could have believed! [The formation prior is an interesting alternative to Jef- first author also commented on Berger, Bernardo freys’s, Laplace’s and Haldane’s solutions, there is and Sun (2009) about the particular matter of the no reason to believe the community as a whole will Law of Succession and so we do not need to re- eventually agree upon this point. We obviously ap- peat the comments here.] Similarly, the confusion preciate the derivation of this prior based on a spe- about Bernoulli shows how amateurish is our at- cific information criterion developed by Professor tempt at Science History. We are equally grateful to Zellner. In a historical perspective, it may well be Professor Senn for pointing out Bartlett’s connec- that the notions of “objective” or “noninformative” tion, as we must confess we were not even aware of are not appropriate for the (Statistics of the) mid- it! When reading Bartlett’s comments, we came to 1930s. realize his contribution to the exclusion of improper The conclusion presented by Professor Zellner re- priors for Bayes factors, as analyzed in deeper de- produces Seymour Geisser’s assessment of Theory tails by Bickel and Ghosh (1990). of Probability, for which we are both grateful and in 6. ON ZELLNER’S COMMENTS complete agreement. Unsurprisingly,ProfessorZellner’scomments—that 7. CONCLUSION hedelivered quite enthusiastically duringhis lecture at O-Bayes 2009—are opening new vistas on The- We are most grateful to the contributors for their ory of Probability, while differing from our analysis lively discussions, which illustrate how influential on several points. The first issue is that Theory of Jeffreys’sideasstillaretoday.Maybethemoststrik- Probability was aimed at scientists at large, while ing aspect in Theory of Probability is Harold Jef- wereaditasstatisticians. Thisisunavoidable,given freys’s intuition that a completely coherent system our background and, further, we doubt many non- could be designed for Bayesian analysis, a system statisticians would have the time and the will to go upon which we are still building today. through Theory of Probability. Unfortunately, most of them seem to eschew modern Bayesian introduc- ACKNOWLEDGMENTS tions to the benefit of shorter reviews published in their own discipline. We completely agree with Pro- This reply was written after the first author at- fessor Zellner that we failed to understand the his- tended both the O-Bayes 2009 and MaxEnt 2009 torical undercurrents explaining the connection of conferences. He is grateful to the speakers in the 4 C. P. ROBERT,N.CHOPIN AND J. ROUSSEAU special Harold Jeffreys’s Theory of Probability ses- R. Acad. Cienc. Exactas F´ıs. Nat. Ser. A Mat. 103 125– sion at O-Bayes 2009 and to the participants from 135. MR2535278 Bickel, P. and Ghosh, J. (1990). A decomposition for those conferences who offered comments on the pa- the likelihood ratio statistic and the Bartlett correction— per or simply support for the project. a Bayesian argument. Ann. Statist. 18 1070–1090. MR1062699 Gelman, A., Carlin, J. Stern, H. and Rubin, D. (2001). REFERENCES Bayesian Data Analysis, 2nd ed.Chapman and Hall, New York. Bayarri, M. and Garcia-Donato, G. (2007). Extending Robert, C. (1994). The Bayesian Choice. Springer, New conventionalpriorsfortestinggeneralhypothesesinlinear York.MR1313727 models. Biometrika 94 135–152. MR2367828 Templeton, A. (2008). Statistical hypothesis testing in in- Berger, J. and Wolpert, R. (1988). The Likelihood Prin- traspecific phylogeography: Nested clade phylogeographi- ciple, 2nd ed. IMS,Hayward, CA. calanalysisvs.approximateBayesiancomputation.Molec- Berger, J., Bernardo, J. andSun, D.(2009). Naturalin- ular Ecology 18 319–331. duction: An objective Bayesian approach. RACSAM Rev.