Table Of ContentSpringer Texts in Statistics
Advisors:
George Casella Stephen Fienberg Ingram Olkin
Springer Texts in Statistics
Athreya/Lahiri: Measure Theory and Probability Theory
Bilodeau/Brenner: Theory of Multivariate Statistics
Brockwell/Davis: An Introduction to Time Series and Forecasting
Carmona: Statistical Analysis of Financial Data in S-PLUS
Chow/Teicher: Probability Theory: Independence, Interchangeability,
Martingales, Third Edition
Christensen: Advanced Linear Modeling: Multivariate, Time Series, and
Spatial Data; Nonparametric Regression and Response Surface
Maximization, Second Edition
Christensen: Log-Linear Models and Logistic Regression, Second Edition
Christensen: Plane Answers to Complex Questions: The Theory of
Linear Models, Second Edition
Davis: Statistical Methods for the Analysis of Repeated Measurements
Dean/Voss: Design and Analysis of Experiments
Dekking/Kraaikamp/Lopuhaä/Meester: A Modern Introduction to
Probability and Statistics
Durrett: Essentials of Stochastic Processes
Edwards: Introduction to Graphical Modeling, Second Edition
Everitt: An R and S-PLUS Companion to Multivariate Analysis
Ghosh/Delampady/Samanta: An Introduction to Bayesian Analysis
Gut: Probability: A Graduate Course
Heiberger/Holland: Statistical Analysis and Data Display; An Intermediate
Course with Examples in S-PLUS, R, and SAS
Jobson: Applied Multivariate Data Analysis, Volume I: Regression and
Experimental Design
Jobson: Applied Multivariate Data Analysis, Volume II: Categorical and
Multivariate Methods
Karr: Probability
Kulkarni: Modeling, Analysis, Design, and Control of Stochastic Systems
Lange: Applied Probability
Lange: Optimization
Lehmann: Elements of Large Sample Theory
Lehmann/Romano: Testing Statistical Hypotheses, Third Edition
Lehmann/Casella: Theory of Point Estimation, Second Edition
Marin/Robert: Bayesian Core: A Practical Approach to Computational
Bayesian Statistics
Nolan/Speed: Stat Labs: Mathematical Statistics Through Applications
Pitman: Probability
Rawlings/Pantula/Dickey: Applied Regression Analysis
Robert: The Bayesian Choice: From Decision-Theoretic Foundations to
Computational Implementation, Second Edition
(Continued after index)
Jean-Michel Marin
Christian P. Robert
Bayesian Core:
A Practical Approach
to Computational
Bayesian Statistics
Jean-Michel Marin Christian P. Robert
Project Select CREST-INSEE
INRIA Futurs and
Laboratoire de Mathématiques CEREMADE
Université Paris–Sud Université Paris–Dauphine
91405 Orsay Cedex 75775 Paris Cedex 16
France France
jean-michel.marin@math.u-psud.fr xian@ceremade.dauphine.fr
Editorial Board
George Casella Stephen Fienberg Ingram Olkin
Department of Statistics Department of Statistics Department of Statistics
University of Florida Carnegie Mellon University Stanford University
Gainesville, FL 32611-8545 Pittsburgh, PA 15213-3890 Stanford, CA 94305
USA USA USA
ISBN 978-0-387-38979-0 e-ISBN 978-0-387-38983-7
Library of Congress Control Number: 2006932972
(cid:164) 2007 Springer Science+Business Media, LLC
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
Cover illustration: Artwork of Michel Marin, entitled Pierre de Rosette.
Printed on acid-free paper.
9 8 7 6 5 4 3 2
springer.com
To our most challenging case studies,
Lucas, Joachim and Rachel
Preface
After that, it was down to attitude.
—Ian Rankin, Black & Blue.—
The purpose of this book is to provide a self-contained (we insist!) entry into
practical and computational Bayesian statistics using generic examples from
the most common models for a class duration of about seven blocks that
roughlycorrespondto13to15weeksofteaching(withthreehoursoflectures
per week), depending on the intended level and the prerequisites imposed on
the students. (That estimate does not include practice—i.e., programming
labs—since those may have a variable duration, also depending on the stu-
dents’involvementandtheirprogrammingabilities.)Theemphasisonpractice
is a strong feature of this book in that its primary audience consists of grad-
uate students who need to use (Bayesian) statistics as a tool to analyze their
experiments and/or datasets. The book should also appeal to scientists in all
fields,giventheversatilityoftheBayesiantools.Itcanalsobeusedforamore
classicalstatisticsaudiencewhenaimedatteachingaquickentrytoBayesian
statistics at the end of an undergraduate program for instance. (Obviously, it
can supplement another textbook on data analysis at the graduate level.)
The format of the book is of a rather sketchy coverage of the topics, al-
ways backed by a motivated problem and a corresponding dataset (available
on the Website of the course), and a detailed resolution of the inference pro-
cedures pertaining to this problem, sometimes including commented R pro-
grams. Special attention is paid to the derivation of prior distributions, and
operational reference solutions are proposed for each model under study. Ad-
ditional cases are proposed as exercises. The spirit is not unrelated to that of
viii Preface
Nolan and Speed (2000), with more emphasis on the theoretical and method-
ological backgrounds. We originally planned a complete set of lab reports,
but this format would have forced us both to cut on the methodological side
and to increase the description of the datasets and the motivations for their
analysis. The current format is therefore more self-contained (than it would
have been in the lab scenario) and can thus serve as a unique textbook for a
service course for scientists aimed at analyzing data the Bayesian way or as
an introductory course on Bayesian statistics.
Acoursecorrespondingtothebookhasnowbeentaughtbybothofusfor
threeyearsinasecondyearmaster’sprogramforstudentsaimingataprofes-
sional degree in data processing and statistics (at Universit´e Paris Dauphine,
France). The first half of the book was used in a seven-week (intensive) pro-
gram, and students were tested on both the exercises (meaning all exercises)
and their (practical) mastery of the datasets, the stated expectation being
that they should go beyond a mere reproduction of the R outputs presented
in the book. While the students found that the amount of work required by
this course was rather beyond their usual standards (!), we observed that
their understanding and mastery of Bayesian techniques were much deeper
and more ingrained than in the more formal courses their counterparts had
in the years before. In short, they started to think about the purpose of a
Bayesian statistical analysis rather than on the contents of the final test and
they ended up building a true intuition about what the results should look
like, intuition that, for instance, helped them to detect modeling and pro-
gramming errors! In most subjects, working on Bayesian statistics from this
perspective created a genuine interest in the approach and several students
continued to use this approach in later courses or, even better, on the job.
Contrarytousualpractice,theexercisesareinterspersedwithin thechap-
tersratherthanpostponeduntiltheendofeachchapter.Therearetworeasons
for this stylistic choice: First, the results or developments contained in those
exercises are often relevant for upcoming points in the chapter. Second, they
signaltothestudent(ortoanyreader)thatsomeponderingovertheprevious
pages may be useful before moving to the following topic and so may act as
self-checking gateways.
Thanks
We are immensely grateful to colleagues and friends for their help with this
book,inparticular,tothefollowingpeople:Fran¸coisPerronsomehowstarted
us thinking about this book and did a thorough editing of it during a second
visit to Dauphine, helping us to adapt it more closely to North American
audiences. He also adopted Bayesian Core as a textbook in Montr´eal as soon
as it appeared. Charles Bouveyron provided and explained the vision dataset
of Chapter 8. Jean-Fran¸cois Cardoso provided the cosmological background
data in Chapter 2. George Casella made helpful suggestions on the format
Preface ix
of the book. Gilles Celeux carefully read the manuscript and made numer-
ous suggestions on both content and style. Noel Cressie insisted on a spatial
chapter in the “next” book (even though Chapter 8 may not be what he had
in mind!). J´erˆome Dupuis provided capture-recapture slides that have been
recycled in Chapter 5. Arnaud Doucet and Chris Holmes made helpful sug-
gestions during a memorable dinner in Singapore (and, later, Arnaud used
a draft of the book in his class at the University of British Columbia, Van-
couver). Jean-Dominique Lebreton provided the European dipper dataset of
Chapter5.GaelleLefolpointedouttheEurostoxxseriesasaversatiledataset
forChapter7.KerrieMengersencollaboratedwithbothofusonareviewpa-
per about mixtures that is related to Chapter 6 (and also gave us plenty of
information about a QTL dataset that we ended up not using). Jim Kay in-
troduced us to the Lake of Menteith dataset. Mike Titterington is thanked
for collaborative friendship over the years and for a detailed set of comments
on the book (quite in tune with his dedicated editorship of Biometrika). We
are also grateful to John Kimmel of Springer for his advice and efficiency, as
well as to two anonymous referees.
Students and faculty members who attended the Finish MCMC spring
2004 course in Oulanka also deserve thanks both for their dedication and
hard work, and for paving the ground for this book. In particular, the short
introduction to R in Chapter 1 is derived from a set of notes written for this
spring course. Teaching the highly motivated graduate students of Universi-
dad Carlos III, Madrid, a year later, also convinced the second author that
this venture was realistic. Later invitations to teach from this book both at
the University of Canterbury, Christchurch (New Zealand) and at the Uni-
versidadCentraldeVenezuela,Caracas(Venezuela),werewelcomeindicators
of its appeal, for which we are grateful to both Dominic Lee and Jos´e L´eon.
In addition, Dominic Lee and the students of STAT361 at the University of
Canterbury very timely pointed out typos and imprecisions that were taken
into account before the manuscript left for the printer last December. Once
the book was published earlier this year, we quickly got emails from readers
asking about possible typos. We are thus grateful to Jarrett Barber, Hos-
sein Gholami, Dominik Hangartner, Soleiman Khazaei, Petri Koistinen, and
to Fazlollah Lak for pointing out mistakes corrected in the second printing
(andlistedonthebookWebsite).(Weobviouslywelcomeemailsfromreaders
about potential remaining typos in the current printing.)
This book was written while the first author was on leave as a Charg´e de
Recherche in the Unit´e Futurs of the Institut National de la Recherche en
InformatiqueetAutomatique(INRIA).HeisgratefultobothINRIAFuturs
and the Universit´e Paris Dauphine for granting him the time necessary to
work on this project in the best possible conditions. The first author is, in
addition, grateful to all his colleagues at the Universit´e Paris Dauphine for
theirconstantsupportandtoGillesCeleuxfromINRIAFutursforhiswarm
x Preface
welcome and his enthusiastic collaboration in the Select project. Enfin, le
premier auteur salue toutes les personnes sans lesquelles cet ouvrage n’aurait
jamais vu le jour ; d’un point de vue scientifique, il pense notamment `a Henri
Caussinus et Thierry Dhorne; d’un point de vue personnel, il remercie Carole
B´egu´e pour son inestimable soutien, il pense aussi `a Anne-Marie Dalas et
Anne Marin. Both authors are also grateful to Michel Marin, who designed
the cover of the book.
Parts of this book were also written on trips taken during the sabbatical
leave of the second author: He is grateful to the Comit´e National des Uni-
versit´es (CNU) for granting him this leave, and, correlatively, to both the
DepartmentofStatistics,UniversityofGlasgow,Scotland(hence,theRankin
quotations!),andtheInstituteforMathematicalSciences,NationalUniversity
ofSingapore,fortheirinvaluablehospitality.HeisalsoindebtedtotheUniver-
sity of Canterbury, Christchurch (New Zealand), for granting him a Visiting
Erskine Fellowship in 2006 to teach out of this book. Special thanks, too, go
to Hotel Altiplanico, San Pedro de Atacama (Chile), for providing albeit too
briefly the ultimate serene working environment! The second author has an
even more special thought for Bernhard K.N.T. Flury, whose data we use in
Chapter 4 and who left us in the summer of 1998 for a never-ending climb
of ætheral via ferratas. Et, pour finir, des mercis tr`es sp´eciaux `a Brigitte,
Joachim et Rachel pour avoir ´et´e l`a, `a Denis pour les fractionn´es du mercredi
midi, et `a Baptiste pour ses relais parfois vitaux !
Paris Jean-Michel Marin
July 15, 2007 Christian P. Robert
Contents
Preface .................................................... vi
1 User’s Manual ............................................. 1
1.1 Expectations............................................ 2
1.2 Prerequisites and Further Reading......................... 3
1.3 Styles and Fonts......................................... 4
1.4 A Short Introduction to R ................................ 6
1.4.1 R Objects ........................................ 7
1.4.2 Probability Distributions in R....................... 10
1.4.3 Writing New R Functions........................... 11
1.4.4 Input and Output in R............................. 13
1.4.5 Administration of R Objects ........................ 13
2 Normal Models ............................................ 15
2.1 Normal Modeling........................................ 16
2.2 The Bayesian Toolkit .................................... 19
2.2.1 Bases ............................................ 19
2.2.2 Prior Distributions ................................ 20
2.2.3 Confidence Intervals ............................... 25
2.3 Testing Hypotheses ...................................... 27
2.3.1 Zero–One Decisions................................ 28
2.3.2 The Bayes Factor ................................. 29
2.3.3 The Ban on Improper Priors........................ 32
2.4 Monte Carlo Methods.................................... 35
2.5 Normal Extensions ...................................... 43
2.5.1 Prediction........................................ 43
2.5.2 Outliers .......................................... 44
3 Regression and Variable Selection ......................... 47
3.1 Linear Dependence ...................................... 48
3.1.1 Linear Models .................................... 50
Description:This Bayesian modeling book is intended for practitioners and applied statisticians looking for a self-contained entry to computational Bayesian statistics. Focusing on standard statistical models and backed up by discussed real datasets available from the book website, it provides an operational me