Table Of ContentLecture Notes in Statistics
Vol. 1: R.A. Fisher: An Appreciation. Edited by S.E. Fien Vol. 22: S. Johansen. Functional Relations, Random Coef
berg and D.V. Hinkley. XI, 208 pages. 1980. ficients and Nonlinear Regression with Application to
KinetiC Data. VIII, 126 pages, 1984.
Vol. 2: Mathematical Statistics and Probability Theory. Pro
ceedings 1978. Edited by W. Klonecki, A. Kozek, and Vol. 23: D.G. Saphire, Estimation of Victimization Pre
J. Rosinski. XXIV, 373 pages, 1980. valence Using Data from the National Crime Survey. V, 165
pages, 1984.
Vol. 3: BD. Spencer, Benefit·Cost Analysis of Data Used
to Allocate Funds. VIII, 296 pages, 1980. Vol. 24: T.S. Rao, M.M. Gabr, An Introduction to Bispectral
Vol. 4: E.A. van Doorn, Stochastic Monotonicity and Analysis and Bilinear Time Series Models. VIII, 280 pages,
Queueing Applications of Birth-Death Processes. VI, 118 1984.
pages, 1981. Vol. 25: Time Series Analysis of Irregularly Observed
Data. Proceedings, 1983. Edited by E. Parzen. VII, 363
Vol. 5: T. .Rolski, Stationary Random Processes Asso
ciated with Point Processes. VI, 139 pages, 1981. pages, 1984.
Vol. 26: Robust and Nonlinear Time Series Analysis. Pro
Vol. 6: S.S. Gupta and D.-Y. Huang, Multiple Statistical
ceedings, 1983. Edited by J. Franke, W. Hardie and D.
Decision Theory: Recent Developments. VIII, 104 pages,
1981. Martin. IX, 286 pages, 1984.
Vol. 7: M. Akahlra and K. Takeuchi, Asymptotic Efficiency Vol. 27: A. Janssen, H. Milbrodt, H. Strasser, Infinitely
of Statistical Estimators. VIII, 242 pages, 1981. Divisible Statistical Experiments. VI, 163 pages, 1985.
Vol. 28: S. Amari, Differential-Geometrical Methods in Sta
Vol. 8: The First Pannonian Symposium on Mathematical
Statistics. Edited by P Revesz, L. Schmetterer, and V.M. tistics. V, 290 pages, 1985.
Zolotarev. VI, 308 pages, 1981, Vol. 29: Statistics in Ornithqlogy. Edited by B.J.T. Morgan
Vol. 9: B. J0rgensen, Statistical Properties of the Gen and PM. North. XXV, 418 pages, 1985.
eralized Inverse Gaussian Distribution. VI, 188 pages, Vol. 30: J. Grandell, Stochastic Models of Air Pollutant
1981. Concentration. V, 110 pages, 1985.
Vol. 10: A.A. Mcintosh, Fitting Linear Models: An Ap Vol. 31: J. Pfanzagl, Asymptotic Expansions for General
plication on Conjugate Gradient Algorithms. VI, 200 Statistical Models. VII, 505 pages, 1985.
pages, 1982.
Vol. 32: Generalized Linear Models. Proceedings, 1985.
Vol. 11: D.F Nicholls and B.G. Quinn, Random Coefficient Edited by R Gilchrist, B. Francis and J. Whittaker. VI, 178
Autoregressive Models: An Introduction. V, 154 pages, pages, 1985.
1982.
Vol. 33: M. Csorgo, S. Csorgo, L. Horvath, An Asymptotic
Vol. 12: M. Jacobsen, Statistical Analysis of Counting Pro Theory for Empirical Reliability and Concentration Pro
cesses. VII, 226 pages, 1982. cesses. V, 171 pages, 1986.
Vol. 13: J. Pfanzagl (with the assistance of W. Wefel· Vol. 34: D.E. Critchlow, Metric Methods for Analyzing Par
meyer), Contributions to a General Asymptotic Statistical tially Ranked Data. X, 216 pages, 1985.
Theory. VII, 315 pages, 1982. Vol. 35: Linear Statistical Inference. Proceedings, 1984.
Vol. 14: GUM 82: Proceedings of the International Con Edited by T. Calinski and W Klonecki. VI, 318 pages,
ference on Generalised Linear Models. Edited by R Gil 1985.
christ. V, 188 pages, 1982.
Vol. 36: B. Matern, Spatial Variation. Second Edition, 151
Vol. 15: K.RW Brewer and M. Hanif, Sampling with Un pages, 1986.
equal Probabilities. IX, 164 pages, 1983. Vol. 37: Advances in Order Restricted Statistical Infer
Vol. 16: Specifying Statistical Models: From Parametric to ence. Proceedings, 1985. Edited by R Dykstra,
Non-Parametric, Using Bayesian or Non-Bayesian T. Robertson and FT. Wright. VIII, 295 pages, 1986.
Approaches. Edited by J.P Florens, M. Mouchart, J.P. Vol. 38: Survey Research Designs: Towards a Better
Raoult, L. Simar, and A_FM. Smith, XI, 204 pages, 1983. Understanding of Their Costs and Benefits. Edited by
Vol. 17: LV. Basawa and OJ. Scott, Asymptotic Optimal RW Pearson and RF Boruch. V, 129 pages, 1986.
Inference for Non-Ergodic Models. IX, 170 pages, 1983. Vol. 39: J.D. Malley, Optimal Unbiased Estimation of
Vol. 18: W Britton, Conjugate Duality and the Exponential Variance Components. IX, 146 pages, 1986.
Fourier Spectrum. V, 226 pages, 1983. Vol. 40: H.R Lerche, Boundary Crossing of Brownian
Vol. 19: L. Fernholz, von Mises Calculus For Statistical Motion. V, 142 pages, 1986.
Functionals. VIII. 124 pages, 1983. Vol. 41: F Baccel Ii , P. Bremaud, Palm Probabilities and
Vol. 20: Mathematical Learning Models - Theory and Stationary Queues. VII, 106 pages, 1987.
Algorithms: Proceedings of a Conference. Edited by U. Vol. 42: S. Kullback, J.C. Keegel, J.H. Kullback, Topics in
Herkenrath, D. Kalin, W Vogel. XIV, 226 pages, 1983. Statistical Information Theory. IX, 158 pages, 1987.
Vol. 21: H. Tong, Threshold Models in Non-linear Time Vol. 43: B.C. Arnold, Majorization and the Lorenz Order:
Series Analysis. X, 323 pages, 1983. A Brief Introduction. VI, 122 pages, 1987.
c:td. on inside back cover
Lecture Notes in
Statistics
Edited by J. Berger, S. Fienberg, J. Gani,
K. Krickeberg, I. Olkin, and B. Singer
67
Martin A. Tanner
Tools for Statistical Inference
Observed Data and Data Augmentation Methods
t
Springer-Verlag
New York Berlin Heidelberg London Paris
t
Tokyo Hong Kong Barcelona Budapest
Author
Martin A. Tanner
Department of Biostatistics
University of Rochester Medical Center
Rochester, NY 14642, USA
1st Edition 1991
2nd Corrected Printing 1992
3rd Printing 1993
Mathematical Subject Classification: 62F 15, 62Fxx, 62Jxx
ISBN-13: 978-0-387-97525-2 e-ISBN-13: 978-1-4684-0510-1
DOl: 10.1007/978-1-4684-0510-1
This work is subject to copyright. All nghts are reserved, whether the whole or part of the material
is concernded. specifically the rights of translation, reprinting, re·use of illustrations, recitation,
broadcasting, reproduction on microfilms or in other ways. and storage in data banks. Duplication
of this publication or parts thereof is only permitted under the provisions of the German Copyright
law of September 9, 1965, in its current version, and a copyright fee must always be paid.
Violations fall under the prosecution act of the German Copyright Law.
© Springer-Verlag Berlin Heidelberg 1991
Typesetting: Camera ready by author
47/3140-543210 - Printed on acid-free paper
This material was presented in a course given during the 1990 Spring Semester at the
University of Wisconsin-Madison. I wish to thank the students of that course for their
comments. I especially wish to thank Chris Ritter for working out the logistic regression
example. Thanks to Gloria Scalissi for typing the manuscript. lowe a special debt of
gratitude to Wing H. Wong and W. J. Hall for reading through the manuscript and providing
numerous comments.
This work was supported by the National Institutes of Health grant ROI-CA35464.
M.A.T.
Contents
Preface
I. Introduction 1
A. Problems 1
B. Techniques 3
References 4
II. Observed Data Techniques-Normal Approximation 6
A. Likelihood/Posterior Density 6
B. Maximum Likelihood 8
C. Normal Based Inference 10
D. The Delta Method 12
E. Significance Levels 12
References 14
III. Observed Data Techniques 16
A. Numerical Integration 16
B. Latplaee Expansion 18
1. Moments 18
2. Marginalization 19
C. Monte Carlo Methods 23
1. Monte Carlo 23
2. Composition 23
3. Importance Sampling 25
References 28
IV. The EM Algorithm 30
A. Introduction 30
B. Theory 34
C. EM in the Exponential Family 35
D. Sla.ndard Errors 36
1. Direct Computation 36
2. Missing Information Principle 36
3. Louis' Method 37
4. Simulation 39
5. Using EM Iterates 39
E. Monte Carlo Implementation of the E-Step 42
F. Acceleration of EM 44
References 45
VI
V. Data Augmentation 47
A. Introduction 47
B. Predictive Distribution 55
C. HPD Region Computations 56
1. Calculating thc Content 57
2. Calculating the Boundary 57
D. Implementation 62
E. Theory 64
F. Poor Man's Data Augmentation 65
1. PMDA #1 65
2. PMDA Exact 67
3. PMDA #2 67
G. SIR 69
H. General Imputation Methods 71
l. Introduction 71
2. Hot Deck 72
3. Simple Residual 72
4. Normal and Adjusted Normal 73
5. Nonignorable Nonresponse 74
a. Mixture Model-I 74
b. Mixture Model-II 76
c. Selection Model-I 76
d. Selection Model-II 77
1. Data Augmentation via Importance Sampling 78
1. General Comments 78
2. Censored Regression 78
J. Sampling in the Context of Multinomial Data 83
1. Dirichlet Sampling 83
2. Latent Class Analysis 85
References 87
VI. The Gibbs Sampler 89
A. Introduction 89
1. Chained Data Augmentation 89
2. The Gibbs Sampler 90
3. Historical Comments 92
B. Examples 94
1. Rat Growth Data 94
2. Poisson Process 95
3. Generalized Lincar Models 98
C. The Griddy Gibbs Sampler 101
1. Example 102
2. Adaptive Grid 104
References 107
Index 108
I. Introduction to Problems & Techniques
A. Problems
We consider four examples as motivation.
Example: Censored Regression Data (Hard Core Missing Data)
The Stanford Heart Transplant Program began in October 1967. The data presented in
Miller (1980) summarize survival time in days after transplant for 184 patients. The cut-off
date for these data was in February 1980. Available inform~tion for each patient include:
survival time, an indication of whether the patient is dead or alive, the age of the patient in
years at the time of transplant and a mismatch score.
Suppose we wish to regress lOglO (survival) on age. This analysis is complicated by the
presence of censoring. For some patients, we do not have a survival time - we only know
that the person survived beyond the recorded event'time.
Example: Randomized Response (Missing Data by Design)
Suppose one wishes to survey cocaine usage among a population. Because cocaine usage
is illegal, a respondent may deny such activity when directly questioned about cocaine llse.
Altcmatively, one may employ a randomized response technique with each participant:
1. Ea.eh participant is to flip a coin. The result of the toss is not to be recorded or revealed
to the interviewer.
2. If the toss resulted in a tail and the participant did not use cocaine during the last six
months, then the participant is to answer no. Otherwise, answer yes.
Note that only the respondent knows if "yes" indicates cocaine usage in the last six
months. By not requiring the participant to reveal the true state, it is hoped tha.t l'esponse
bias will be diminished. The analysis of the data is complicated by the fact that of the 2 x
2 table,
IT ails IH eads I
does use cocaine ? ?
docs not use . X . ? .
only the count in the lower left column and the total sample size are available.
Example: Latent Class Analysis (Soft Core Missing Data)
The data. in the following table represent responses of 3181 participants in the 1972, 1973
and 1974 General Social Survey. The responses are ,cross classified by year of study (3 levels)
and a dichotomous response (yes/no) to each of three questions.
2
Subjects in the 1972-1974 General Social Surveys, Cross-Classified by Year of
Survey and Responses to Three Questions on Abortion Attitudes
Response Response Response Observed
Year (D) toA to B to C count
1972 Yes Yes Yes 334
Yes Yes No 34
Yes No Yes 12
Yes No No 15
No Yes Yes 53
No Yes No 63
No No Yes 43
No No No 501.
1973 Yes Yes Yes 428
Yes Yes No 29
Yes No Yes 13
Yes No No 17
No Yes Yes 42
No Yes No 53
No No Yes 31
No No No 453
1974 Yes Yes Yes 413
Yes Yes No 29
Yes No Yes 16
Yes No No 18
No Yes Yes 60
No Yes No 57
No No Yes 37
No No No 430
Source: Haberman (1979, p. 559).
All three questions begin:
"Please tell me whether or not you think it should be possible
for a pregnant woman to obtain a legal abortion if'·
Question A: continues:
she is married and does not want any more children.
Question B: continues:
the family has a very low income and cannot afford any more children.
Question C: continues:
she is not married and does not want to marry the man.
The traditional latent class model supposes that the four manifest variables are conditionally
indepcndent given a dichotomous unobserved (latent) variable (e.g., the respondcnts true
attitude toward abortion -pro/anti). That is, if the value of the dichotomous latent variable
is known for a given participant, knowledge of the participant's response to I.t given question
provides no· further information regarding the responses to eithe.r of the other two questions.
In this context, the model characterizes the unobserved (latent) data.
3
Examele: Hierarchical Models (No Missing Data)
The data. in the following ta.ble represent the weights of 30 young rats measured weekly
for five weeks.
Rat popula.tion growth data
Rat :l:i1 :l:i2 :l:i3 :l:i4 :l:i5 Rat :l:n :l:i2 :l:i2 :l:i4 :l:i5
1 151 199 246 283 320 16 160 207 248 288 324
2 145 199 249 293 354 17 142 187 234 280 316
3 147 214 263 313 328 18 156 203 243 283 317
4 155 200 237 272 297 19 157 212 259 307 336
5 135 188 230 280 323 20 152 203 246 286 321
6 159 210 252 298 331 21 154 205 253 298 334
7 141 189 231 275 305 22 139 190 225 267 302
8 159 201 248 297 338 23 146 191 229 272 302
9 177 236 285 340 376 24 157 211 250 285 323
10 134 182 220 260 296 25 132 185 237 286 331
11 160 208 261 313 352 26· 160. 207 257 303 345
12 143 188 220 273 314 27 169 216 261 295 333
13 154 200 244 289 325 28 157 205 248 289 316
14 171 221 270 326 358 29 137 180 219 258 291
15 163 216 242 281 312 30 153 200 244 286 324
= = =
:l:i1 8, :l:i2 15, :1:,3 22, :l:i4 = 29, :l:i5 = 36 days, i = 1, ... ,30.
Source: Gelfand et al., (1989)
While there are no ''missing'' dat/l., techniques which were originally developed in the.
context of "missing" data. will be of use in exploring the hierarchical model:
First Sta~e: 'Y;J - N(o, + PiXi;, 0'2)
=
Second Stage: ( ;;; ) - N { (;;:), E } where i 1, ... ,30,
=
j 1, ... ,5 a.nd x;; is the age in days of the ith rat for measurement j.
D. Techniques
A variety of methods are avalla.ble for the Bayesia.n or likelihood-based a.nalysis of the
data sets listed in the previous section. In this ma.nuscript, we will distinguish between two
types of methods: observed data a.nd data augmentation methods. In .S ections II a.nd III,
the observed data methods will be considered. These methods are applied directly to the
likelihood or posterior of the observed data.. As long as one ca.n write down a. likelihood
or postel'ior for the observed data, one ca.n potentially use these techniques for sta.tistica.l
inference.
4
ML Estimation Laplace Expansion Monte Carlo
Itnportance Sampling
The most commonly used observed data method is maximum likelihood estimation. This
approach inherently specifies a normal approximation to the likelihood/posterior density.
The Laplace expansion approach allows for non-quadratic approximations to the loglikcli
hood/logposterior. Techniques based on Monte Carlo/Importance Sampling yield iid obser
vations from the exact likelihood/posterior density.
Sections IV, V and VI consider the data augmentation methods. From a classical point
of view, these data augmentation methods make use of the special "missing" data structure
of the problem. More generally, these methods rely on an augmentation of the data which
simplifies the likelihood/posterior.
1----11-.- ---I -I. 1
EM LOUIS EM Poor Man's SIR Data Augmentation
Data Augmentation Gibbs Samplel'
The EM algorithm provides the mean of the normal approximation to the likelihoodfposteriol'
density, while the Louis modification specifies the scale. The POOl' Man's Data Augmenta
tion algorithm allows for a non-normal approximation to the likelihood/posterior density.
The Data Augmenta.tion and Gibbs Sampler approaches are iterative algorithms which, un
der certainly reguladty conditions, yield the likelihood/posterior. The SIR algorithm is a
noniterative algorithm based on importance sampling ideas.
References
Cochran, W.G.(1977). Sampling Techniques, New York: Wiley.
Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistic!!, London: Chapman and Hall.
Dempster, A., Laird, N. and Rubin, D.B. (1977). "Maximum Likelihood From Incomplete Data
Via the EM Algorithm," Journal of ~he RoyalStatistical Society, B, 39, 1-38.
Gelfand, A.E., Hills, S.E., Racine-Poon, A. and Smith, A.F.M. (1989). "lliustration of Bayesian
Inference in Normal Data Models Using Gibbs Sampling", Technical Report.
Gelfand, A.E. and Smith, A.F.M. (1990). "Sampling-Based Approaches to Calculating Mat'ginal
Densities" , Journal of the American Stat.istjcal Associatjon, 85, 398-409.
Goodman, L.A. (1974). "Exploratory Latent Structure Analysis Using Both Identifiable and
Unidentifiable Models", Biometrika, 61, 215-231.
Haberman, S.J. (1979). Analysis of Qualitative Data, New York: Academic Press.
Kasa, R.E. and Steffey, D. (1989). Journal of the American Statistical Associatjon, 84, 717-726.
Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis With Miasing Data, New York: Wiley.
Louis, T. (1982). "Finding the Observed Information Matrix Using the EM Algol'ithm",
Journal of the Royal Statistical Society, B, 44, 226-233.
Miller, R. (1980). Survival Analysis, New York: Wiley.
Morris, C. (1983). "Parametric Empirical Bayes Inference: Theory and Applications" (with dis
cuasion), Journal of the AmeriCan Statistical Association, 78,.47-65.
rupley, B.D. (1987), Stochytic Simulation, New York: Wiley.