Table Of ContentStatistics for Biology and Health
Series Editors
M.Gail,K.Krickeberg,J.Sarmet,A.Tsiatis,W.Wong
Statistics for Biology and Health
Bacchieri/Cioppa:Fundamentals of Clinical Research
Borchers/Buckland/Zucchini:Estimating Animal Abundance:Closed Populations
Burzykowski/Molenberghs/Buyse:The Evaluation of Surrogate Endpoints
Cook/Lawless : The Statistical Analysis of Recurrent Events
Duchateav/Janssen: The Fraility Model
Everitt/Rabe-Hesketh:Analyzing Medical Data Using S-PLUS
Ewens/Grant:Statistical Methods in Bioinformatics:An Introduction,2nded.
Gentleman/Carey/Huber/Irizarry/Dudoit:Bioinformatics and Computational Biology
Solutions Using R and Bioconductor
Hougaard:Analysis of Multivariate Survival Data
Keyfitz/Caswell:Applied Mathematical Demography,3rded.
Klein/Moeschberger:Survival Analysis:Techniques for Censored and Truncated
Data,2nded.
Kleinbaum/Klein:Survival Analysis:A Self-Learning Text,2nded.
Kleinbaum/Klein:Logistic Regression:A Self-Learning Text,2nded.
Lange:Mathematical and Statistical Methods for Genetic Analysis,2nded.
Manton/Singer/Suzman:Forecasting the Health of Elderly Populations
Martinussen/Scheike:Dynamic Regression Models for Survival Data
Moyé:Multiple Analyses in Clinical Trials:Fundamentals for Investigators
.
Nielsen:Statistical Methods in Molecular Evolution
O'Quigley:Proportional Hazards Regression
Parmigiani/Garrett/Irizarry/Zeger:The Analysis of Gene Expression Data:Methods
and Software
Proschan/LanWittes:Statistical Monitoring of Clinical Trials:A Unified Approach
Siegmund/Yakir:The Statistics of Gene Mapping
Simon/Korn/McShane/Radmacher/Wright/Zhao:Design and Analysis of DNA
Microarray Investigations
Sorensen/Gianola:Likelihood,Bayesian,and MCMC Methods in Quantitative
Genetics
Stallard/Manton/Cohen:Forecasting Product Liability Claims:Epidemiology and
Modeling in the Manville Asbestos Case
Sun:The Statistical Analysis of Interval-censored Failure Time Data
Therneau/Grambsch:Modeling Survival Data:Extending the Cox Model
Ting:Dose Finding in Drug Development
Vittinghoff/Glidden/Shiboski/McCulloch:Regression Methods in Biostatistics:
Linear,Logistic,Survival,and Repeated Measures Models
Wu/Ma/Casella:Statistical Genetics of Quantitative Traits:Linkage,Map and QTL
Zhang/Singer:Recursive Partitioning in the Health Sciences
Zuur/Ieno/Smith:Analyzing Ecological Data
Richard J. Cook
Jerald F. Lawless
The Statistical Analysis
of Recurrent Events
Richard J. Cook Jerald F. Lawless
Dept. Statistics & Actuarial Science Dept. Statistics & Actuarial Science
University of Waterloo,Waterloo, Ontario University of Waterloo,Waterloo, Ontario
200 University Avenue W. 200 University Avenue W.
Waterloo N2L 3G1 Waterloo N2L 3G1
Canada Canada
rjcook@uwaterloo.ca jlawless@uwaterloo.ca
Series Editors
M.Gail K.Krickeberg J.Sarmet
National Cancer Institute Le Chatelet Department of Epidemiology
Rockville,MD 20892 F-63270 Manglieu School of Public Health
USA France Johns Hopkins University
615 Wolfe Street
Baltimore,MD 21205-2103
USA
A.Tsiatis W.Wong
Department of Statistics Department of Statistics
North Carolina State Stanford University
University Stanford,CA 94305-4065
Raleigh,NC 27695 USA
USA
Library ofCongress Control Number: 2007929451
ISBN978-0-387-69809-0 e-ISBN978-0-387-69810-6
Printed on acid-free paper.
© 2007 Springer Science+Business Media,LLC
All rights reserved.This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring
Street,New York,NY 10013,USA),except for brief excerpts in connection with reviews or
scholarly analysis. Use in connection with any form of information storage and retrieval,
electronic adaptation,computer software,or by similar or dissimilar methodology now known
or hereafter developed is forbidden.
The use in this publication oftrade names,trademarks,service marks,and similar terms,even if
they are not identified as such,is not to be taken as an expression of opinion as to whether or
not they are subject to proprietary rights.
9 8 7 6 5 4 3 2 1
springer.com
To Joan and John Cook
To Jill, Kim, and Sarah
Preface
Recurrent event data arise in fields such as medicine and public health, busi-
nessandindustry,reliability,thesocialsciences,andinsurance.Theliterature
on the statistical analysis of recurrent events has grown rapidly over the past
twenty years and a variety of models and methods has been developed. This
book provides a comprehensive treatment of the area. We describe impor-
tant models, explain their underlying assumptions and properties, consider
settings where they are appropriate, and discuss in detail how to fit and base
inferences on these models. Parametric, nonparametric, and semiparametric
methodsarecovered.Manyillustrativeexamplesaregiven,mostofwhichare
taken from health or industrial settings.
This book is intended as a resource for persons interested in the modeling
andanalysisofrecurrenteventsandasatextforagraduatecourseinstatistics
or biostatistics. We discuss results and models from stochastic processes in
some detail, and have attempted to present the material in an accessible way
withdiscussionofmodelformulation,estimationandinference,andnumerous
applications.Theimportanceofmodelassessmentisemphasized.Chaptersare
concluded with Problems and Supplements sections which give exercises as
well as extensions to material in the text. An important feature of this book
is the coverage of practical issues such as observation and subject-selection
schemes, the planning of randomized experiments, incomplete data, and the
predictionoffutureevents.Areasneedingfurthermethodologicaldevelopment
are also discussed.
Likelihood methods are emphasized as a basis for inference whenever
possible.Estimatingfunctiontheoryisalsoused,especiallyforinferenceabout
marginalfeatureswhenmodelsarenotfullyspecified.AppendixAprovidesa
summary of relevant material on likelihood and estimating function method-
ology, but familiarity with statistical inference is assumed. Martingale repre-
sentations are used for certain estimating functions, but we do not discuss
asymptotic theory used to rigorously justify large sample results. Our ap-
proach is to indicate clearly the statistical basis of methodology without
dwelling on regularity conditions and detailed proofs of asymptotic results.
viii Preface
Some background in survival analysis is beneficial, inasmuch as many
methods for recurrent events are related to survival analysis and can be im-
plemented with software for that area. Kalbfleisch and Prentice (2002) and
Lawless (2003a) are references with a similar style of presentation to this
book. Books which discuss models for recurrent event data include Cox and
Lewis(1966),CoxandIsham(1980), DaleyandVere-Jones(1988),andother
books on point processes. Andersen et al. (1993) provide a rigorous discus-
sion of models and methods for the analysis of data arising from counting
processes, and emphasize Markov processes. Therneau and Grambsch (2000)
present methods for the analysis of recurrent event data along with applica-
(cid:1)
tions using S-PLUSR, R and SAS. Nelson (2003) gives graphical procedures
andsimplemethodsfortheanalysisofrecurrenteventsbasedonrateormean
functions. Other recent books which include some discussion of the analysis
of recurrent event data include Hougaard (2000), Kalbfleisch and Prentice
(2002), Martinussen and Scheike (2006), and Sun (2006). The present book
goes beyond these treatments in the breadth of models addressed and in the
attention paid to practical issues of design and analysis.
ThedatainexamplesareanalyzedusingS-PLUS,althoughidenticalcode
can be used in R (see www.r-project.org). In most cases there exist analo-
gous procedures in SAS software. Datasets that are available to the public
are listed in Appendix D and are posted at www.stats.uwaterloo.ca/cook-
lawless/book.shtml along with sample code for S-PLUS or R and SAS.
Our interests in statistical methods for recurrent events have developed
from working with several colleagues in various areas of research. We would
liketoacknowledgeNancyHeddle(McMasterUniversity),PierreMajor(Mc-
Master University), and Jeff Robinson (General Motors) for stimulating col-
laborations which have led to methodological development in this area. We
also wish to thank colleagues at GlaxoSmithKline Inc., Novartis Pharmaceu-
ticalsInc.,andBayerCanadaInc.forpermissiontousethedatafromclinical
trials in several examples.
We are grateful to the faculty, visiting fellows, graduate students, and
staff at University of Waterloo who help create a stimulating environment
for research. In particular we would like to acknowledge collaborations in-
volving recurrent events with Jean-Marie Boher, Bingshu Chen, Charmaine
Dean,DanielFong,MarcFredette,JoanHu,JackKalbfleisch,ClaudeNadeau,
Edmund Ng, Wei Wei, Grace Yi, and Min Zhan. Mary Lou Dufton and Joan
Hatton provided secretarial assistance in the preparation of this book, for
which we are grateful. We would especially like to thank Ker-Ai Lee, whose
expertstatisticalprogramminghelpedinthepreparationoftheexamples,and
who provides important support to our research.
MuchoftheworkherewasdevelopedwhilethefirstauthorheldanInvesti-
gator Award from the Canadian Institutes of Health Research and a Canada
Research Chair in Statistical Methods for Health Research, and while the
second author held an Industrial Research Chair co-sponsored by General
Preface ix
Motors Canada and the Natural Sciences and Engineering Research Council
of Canada. This support is gratefully acknowledged.
Finally we would like to thank our wives Alison (R.J.C.) and Liz (J.F.L.)
for their patience and support during the preparation of this book.
University of Waterloo Richard Cook
December 2006 Jerry Lawless
Glossary
The following is a summary of the notation used throughout this book.
• I(A) is the indicator function, equaling 1 if A is true and 0 otherwise
• Pr(A) is the probability of event A
• E(·) denotes expectation, var(·) denotes variance, cov(·) denotes covari-
ance, corr(·) denotes correlation, asvar(·) and ascov(·) denote asymptotic
variance and covariance, respectively
• m (t)=E(exp(Xt)) is the moment generating function of X
X
(cid:1)
• Γ(a)= ∞ua−1exp(−u)du is the gamma function, where a>0
0
• B(a,b)=Γ(a)Γ(b)/Γ(a+b) is the beta function, where a>0 and b>0
• g(x)∼o(x) means g(x)/x→0 as x→0
• The transpose of a matrix A is A(cid:2)
• Vectors are written in column form so, for example, θ =(θ ,...,θ )(cid:2)
1 r
• If g(θ)=(g (θ),...,g (θ))(cid:2) is a vector of functions, then ∂g(θ)/∂θ(cid:2) is the
1 k
k×r matrix with (i,j) element ∂g (θ)/∂θ
i j
(cid:2)
• {1+g(u)du} is a product integral; see Section 2.1
[a,b]
(cid:1)
• The integral bdG(u) is a Riemann–Stieltjes integral; see Section 2.1.
a
• L(θ), (cid:4)(θ), U(θ), I(θ), and I(θ) represent the likelihood, log-likelihood,
score, observed information, and expected information functions, respec-
tively; see Appendix A
• θ(cid:3)denotes an estimate of the parameter θ
• If θ =(θ(cid:2),θ(cid:2))(cid:2), then θ˜ (θ ) is the profile likelihood estimate of θ for fixed
1 2 1 2 1
θ
2