Bayesian Thinking in Biostatistics CHAPMAN Ir: HALUCRC Toti In Stat11dcal Science Serio JoM!ph JC Klituh!in, Harvard Univtnlt)', USA Julian J. faraway, University ofB ath, UK Martin Tanner, Northwatun Univtnity, USA Jim Zidek, Univenit)' ofB ritish Columbia, Canada RecentJy Publl1hed l1tle1 Practkal Muldvarlate Anaiyu, Sixth Edition Abddmonem Ajlji, Sllldnne May. RobinA . Donatdlo, and Virginid A. Clark lime Serln: A Ant Coune with Bootstnp Starter Tuder S. McElroy and Dimitri& N. Polit& Probab.lllty and Baynla.n Modelln1 JimAJbutand/illgdtenHu Sarroptel Gall55.lan Process Modeling, Des.ign, and Optimization for the Applied Sciences Robert 8. Gramacy Stad1Ucal Analyal1 o(Anandal Data With Examples in R /llme.sGmtle Stati1dcal Rethlnldq A Bayesian Course with Examples in Rand STAN, Second Edition Richard McElreath Statl1Clcal Machine LearnlnB A Model-Hased Approach Richard Gulden Randomization, Boolltrap and Monte Carlo Methods In BlololJY Fourth Edition Bryan F. J. Manly, /orje A. Navarro Alberto Prlnclples o(Unc:ertalnty, Second Edition fosep},B.Kadane Beyond Multiple Unear Regress.Ion Applit-d Gc11emlized linear Model!i and Multilevel Models in R Paul R11hack, Julie Legler Baye1ian Thinking in Blostattsdcs Gary L. Ros11er, Purushottam W. Laud, and Wesley a /ohnson Modern Data Science with R, Second Edition Benjami11 S. Baumer. Dairiel T. Kapla11, a11d Nicholas f Horton Probability and Statistical Inference 1-ium 8,,s,.· l'rmfip/,•� to Ad11m1n>d iWodefs Miftiadis Ma\lrakakis a,1d Je"my Peirzer �-=-Cm• ;nret lsn-lfon-rSmtaaddsotnlc aabl-oSuctienc:e this s/ebroioeks,- spelrelaesse/ CviHsiTt:E hXttSpTs;A//SwCwwI .crcpress.com/Chapman--Hall Bayesian Thinking in Biostatistics by Gary L. Rosner Purushottam W. Laud Wesley 0. Johnson ;.:';.!'Z1 A CHA�P;:M�;A;i�N: '&�. �H�A��L,-L SOb.,.O;nnK 1 First edition published 2021 byCRCPress 6000 Brokrn Sound Parkway Nw. Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Sq1111rMe,i lton Park, Abingdon, Oxon, OX14 4RN Cl2 021 Taylor & Francis Group, LLC CRC Press is an Imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as sume responsibility for the validity of aU materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders If permission to publish In this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. bcepl as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized In any form by any electronic, mechanical, or other means, now known or hereafter invented. including pho tocopying, microfilming, and recording. or in any Information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978•750-8400. For works that are not availableo n CCC please contact [email protected] Tmdemark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for iden tlflcatlon and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: Rosner. GaryL .. author. I Laud. Purushottam. 1948-author. I Johnson.WcslcyO .. aulhor. Title· Bayesiant hinking in biostatistics I Gal)· L. Rosner, Purushonam W. Laud. Wesley0 . Johnson. Description:F irst edition. I Boca Raton: CRC Press. 2021. I Includes b1bhograph1carle ferences and mdcx ldcnt11icrsL: C("N2 020049870 (print) I LCCN 2020049871 (cbook) I ISBN ~7KIH~8000_89(hardcovcrI) ISBN 9781439800102 (ebook) \nh_1cc'.'l l'SI I IJ1ometry--McthodologyI B. ayesian slatistical decision theory ~~:111cat1on.I .CC Ql-023.5 .R68 2021 (print) I LCC QH323..5( ebook) I ~,o1 15195--dc23 LC rcoord a\·ailahlca t hnps://lccn.loc.e.ovf2020049870 I.C ebook record al'!ulablca t hllps:1/Jc~n.locg ov/2020049871 ISRN:9 78-1-4398·0008-9 (hbk) To my wife, Naomi, and my children, Joshua and lVIolly,f or their support, under standing, patience, and, most of all, love. GLR To Kaye and the boys - Raj, Kavi, Sanjiv and Filip - who have believed since its conception that this project would come to fruition. PWL To my friend and mentor, Seymour Geisser. I miss him very much. WOJ Contents Preface xv Scientific Data Analysis 1.1 Philosophy 1.2 Examples 1.3 Essential Ingredients for Bayesian Analysis 1.3.1 Observables and Design for Their Collection . 1.3.2 Unknowns of Scientific Interest 1.3.3 Probability Models and Model Parameters . 1.3.4 External Knowledge (o r Prior) Distribution 1.4 Recap and Readings 2 Fundamentals I: Bayes' Theorem, Knowledge Distributions, Pre- diction 11 2. I Elementary Bayes' Theorem: Simple but Fundamental Probability Computations 12 2.2 Science and Knowledge about Uncertain Quantities 16 2.3 More on Bayes' Theorem: Inference for Model Unknowns 19 2.4 Prediction 27 2.5 Monte Carlo Approximation 30 2.6 Recap and Readings 31 2. 7 Exercises . 32 3 Fundamentals II: Models for Exchangeable Observations 37 3.1 Overview of Binomial, Normal, Poisson and Exponential Models 38 3.2 Posterior and Predictive Inferences 41 3.2.1 Bernoulli and Binomial Models 41 3.2.2 Normally Distributed Exchangeable Observations . 47 3.2.3 Poisson Distributed Exchangeable Count Observations . 59 3.2.4 Exchangeable Exponentially Distributed Time-to-Event Ob- servations 61 3.3 *More Flexible Models 65 3.3.1 Mixture Distributions 66 3.3.2 Dirichlet Process Mixtures . 67 vii viii Contents 3.3.3 Computation via DPpackage . 69 3.4 Recap and Readings . . 69 3.5 Exercises . . . 70 4 Computational Methods for Bayesian Analysis 75 4.1 Additional Sampling Distributions 76 4.1.1 Gamma Distributed. Data 76 4.1.2 Weibull Distributed Data 76 4.2 Asymptotics: Normal and Laplace Approximations 77 4.3 Approximating Posterior Inferences using Monte Carlo Sampling 79 4.3.1 Random Number Generation 80 4.3.2 Inverse cdf Method . 80 4.3.3 Importance Sampling 81 4.3.4 Rejection Sampling . . . 83 4.3.5 Adaptive Rejection Sampling for Log-Concave Densities 85 4.4 Markov Chain Monte Carlo Sampling . . . 87 4.4.1 Gibbs Sampler 89 4.4.2 Metropolis-Ha.stings Algorithm . 92 4.4.3 Slice Sampling 96 4.4.4 Hamiltonian Markov Chain Monte Carlo Sampling 97 4.4.4.1 Overview 97 4.4.4.2 Expanding the Parameter Space 98 4.4.4.3 Ba.sics of Hamiltonian Dynamics 98 4.4.4.4 Approximation to Generate Candidate . . 100 4.4.5 Convergence Diagnostics . 102 4.5 Recap and Readings 111 4.6 Exercises 112 5 Comparing Populations 119 5.1 Comparing Proportions in Bernoulli Populations 119 5.1.1 Prior Distributions for Binomial Proportions 120 5.1.1.1 Reference Priors 121 5.1.1.2 Informative Beta Priors . 121 5.1.1.3 Other Priors 124 5.1.2 Eff€'Ct Measures . 125 5.1.3 Cross-Sectional or Cohort Sampling 126 5.1.4 Case-Control Sampling. 128 5.2 C\)mpari11gN ormal Populati011s 132 5.2.l Priors for (111,a;). i = 1,2. 132 5.2. l. l RC'fcrence Priors 132 5.2.1.2 Informatiw Priors 134 5.2.2 Postt•rior lnfC'rl'll<'<fo' r Comparing Populations 135 5.2.3 PrC'diC"tion 136 Contents ix 5.2.4 DiaSorin Data Analysis 137 5.3 Infereni:es for Rates 141 5.3.l Reference Priors 142 5.3.2 Informative Priors 142 5.3.3 Nurses' Health Study Data Analysis 143 5.4 Recap and Readings 144 5.5 Exercises 144 6 Specifying Prior Distributions 149 6.1 Flat Priors 150 6.2 *J effreys Priors 152 6.3 Scientifically Informed Priors 154 6.4 '"Data Augmentation Priors 159 6.5 Reference Priors 161 6.6 Recap and Readings 163 6. 7 Exercises . 164 7 Linear Regression 167 7.1 The Linear Regression Model . . . . . . . . . 168 7.1.1 Simple Linear Regression: Single Numeric Predictor 169 7.1.2 From Model to Data Analysis . . . . . . . . . . 171 7.1.3 Multiple Covariates: Continuous and Categorical . . 174 7.1.4 Centering and Standardization of Covariates . . . . 178 7.2 Matrix Formulation and an Analytic Posterior Distribution 180 7.2. 1 Matrix Notation . . . . . . . . . . . 180 7.2.2 Posterior Analysis using the Flat Prior . 181 7.2.2.1 Deriving the Posterior with the Flat Prior. 181 7.2.2.2 Inference with Flat Prior 182 7.3 Priors 184 7.3.1 Flat Prior 185 7.3.2 A Proper Approximation to the Flat Prior . 185 7.3.3 Conditionally Conjugate Independence Prior 185 7.3.3.1 Conditional Means Prior for fJ ... 187 7 .3.3.2 An Informative Prior for r and er . 189 7.3.4 Partial Information Prior ... . 190 7.4 *Conjugate Priors ........... . 192 7.4.1 Generic Normal-Gamma Prior . 192 7.4.2 Zellner's g-Prior ...... . 194 7.5 Beyond Additivity: Interaction (Effect Modification) 195 7.6 ANOVA 200 7. 7 Recap and Readings 205 7.8 Exercises ........ . 207 Contents 8 Binary Response Regression 215 8.1 Logistic Regression Model 215 8.2 Logistic Regression Model with Interaction . . . 222 8.3 Inference for Regression Coefficients and Their Functions 223 8.3.1 Inference for Lethal Dose o 225 8.4 Prior Distributions . . 226 8.4.1 Conditional Means Priors 226 8.4.2 Partial Prior Information 229 8.4.3 Low-Information Omnibus Priors . 231 8.5 Prediction 232 8.6 Alternatives to Logistic Regression: Other Link Functions 234 8.7 Recap and Readings . . . 236 8.8 Exercises 237 9 Poisson and Nonlinear Regression 241 9.1 The Basic Poisson Regression Model 241 9.2 Poisson-Based More General Models for Count Data 252 9.2.1 Overdispersion 252 9.2.2 Zero-Inflated Poisson Data 254 9.3 Overview of Generalized Linear Model Regression 258 9.4 *Nonlinear Regression 260 9.5 Recap and Readings 264 9.6 Exercises 265 10 Model Assessment 269 10.1 Model Selection Based on Posterior Probabilities and Bayes Factors 270 10.1 . 1 Choice between Two Models 270 10.1.2 Cautions Regarding Bayes Factors 273 10.1.3 Choosing among Multiple Models. 275 10.1.4 Computing Bayes Factors via Sampling 275 10.2 Model Selection Based on Predictive Information Criteria 278 10.2.1 Log Pseudo Marginal Likelihood 280 10.2.2 Akaike Information Criterion 281 10.2.3 Bayesian Information Criterion 282 10.2.4 Widely Applicable Information Criterion . 283 10.2.5 DC"vianceI nformation Criterion . 284 10.2.G !-.lode,]S <'icetion in Li1war Regression . 285 10.2.7 C'nnun<'nts on Information Criteria . 287 10.2.t,; Statistical VC'rsusP ractical Import in !\'lodel Selection 287 10.3 ~lod<'l Cll('eking {~lode! Diagnostics) 289 10.3.1 Classical Cll('cking 289 10.3.2 Ilox Ch<'ck 290