Statistics for Biology and Health Series Editors M.Gail,K.Krickeberg,J.Sarmet,A.Tsiatis,W.Wong Statistics for Biology and Health Bacchieri/Cioppa:Fundamentals of Clinical Research Borchers/Buckland/Zucchini:Estimating Animal Abundance:Closed Populations Burzykowski/Molenberghs/Buyse:The Evaluation of Surrogate Endpoints Cook/Lawless : The Statistical Analysis of Recurrent Events Duchateav/Janssen: The Fraility Model Everitt/Rabe-Hesketh:Analyzing Medical Data Using S-PLUS Ewens/Grant:Statistical Methods in Bioinformatics:An Introduction,2nded. Gentleman/Carey/Huber/Irizarry/Dudoit:Bioinformatics and Computational Biology Solutions Using R and Bioconductor Hougaard:Analysis of Multivariate Survival Data Keyfitz/Caswell:Applied Mathematical Demography,3rded. Klein/Moeschberger:Survival Analysis:Techniques for Censored and Truncated Data,2nded. Kleinbaum/Klein:Survival Analysis:A Self-Learning Text,2nded. Kleinbaum/Klein:Logistic Regression:A Self-Learning Text,2nded. Lange:Mathematical and Statistical Methods for Genetic Analysis,2nded. Manton/Singer/Suzman:Forecasting the Health of Elderly Populations Martinussen/Scheike:Dynamic Regression Models for Survival Data Moyé:Multiple Analyses in Clinical Trials:Fundamentals for Investigators . Nielsen:Statistical Methods in Molecular Evolution O'Quigley:Proportional Hazards Regression Parmigiani/Garrett/Irizarry/Zeger:The Analysis of Gene Expression Data:Methods and Software Proschan/LanWittes:Statistical Monitoring of Clinical Trials:A Unified Approach Siegmund/Yakir:The Statistics of Gene Mapping Simon/Korn/McShane/Radmacher/Wright/Zhao:Design and Analysis of DNA Microarray Investigations Sorensen/Gianola:Likelihood,Bayesian,and MCMC Methods in Quantitative Genetics Stallard/Manton/Cohen:Forecasting Product Liability Claims:Epidemiology and Modeling in the Manville Asbestos Case Sun:The Statistical Analysis of Interval-censored Failure Time Data Therneau/Grambsch:Modeling Survival Data:Extending the Cox Model Ting:Dose Finding in Drug Development Vittinghoff/Glidden/Shiboski/McCulloch:Regression Methods in Biostatistics: Linear,Logistic,Survival,and Repeated Measures Models Wu/Ma/Casella:Statistical Genetics of Quantitative Traits:Linkage,Map and QTL Zhang/Singer:Recursive Partitioning in the Health Sciences Zuur/Ieno/Smith:Analyzing Ecological Data Richard J. Cook Jerald F. Lawless The Statistical Analysis of Recurrent Events Richard J. Cook Jerald F. Lawless Dept. Statistics & Actuarial Science Dept. Statistics & Actuarial Science University of Waterloo,Waterloo, Ontario University of Waterloo,Waterloo, Ontario 200 University Avenue W. 200 University Avenue W. Waterloo N2L 3G1 Waterloo N2L 3G1 Canada Canada [email protected] [email protected] Series Editors M.Gail K.Krickeberg J.Sarmet National Cancer Institute Le Chatelet Department of Epidemiology Rockville,MD 20892 F-63270 Manglieu School of Public Health USA France Johns Hopkins University 615 Wolfe Street Baltimore,MD 21205-2103 USA A.Tsiatis W.Wong Department of Statistics Department of Statistics North Carolina State Stanford University University Stanford,CA 94305-4065 Raleigh,NC 27695 USA USA Library ofCongress Control Number: 2007929451 ISBN978-0-387-69809-0 e-ISBN978-0-387-69810-6 Printed on acid-free paper. © 2007 Springer Science+Business Media,LLC All rights reserved.This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street,New York,NY 10013,USA),except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation,computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication oftrade names,trademarks,service marks,and similar terms,even if they are not identified as such,is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 springer.com To Joan and John Cook To Jill, Kim, and Sarah Preface Recurrent event data arise in fields such as medicine and public health, busi- nessandindustry,reliability,thesocialsciences,andinsurance.Theliterature on the statistical analysis of recurrent events has grown rapidly over the past twenty years and a variety of models and methods has been developed. This book provides a comprehensive treatment of the area. We describe impor- tant models, explain their underlying assumptions and properties, consider settings where they are appropriate, and discuss in detail how to fit and base inferences on these models. Parametric, nonparametric, and semiparametric methodsarecovered.Manyillustrativeexamplesaregiven,mostofwhichare taken from health or industrial settings. This book is intended as a resource for persons interested in the modeling andanalysisofrecurrenteventsandasatextforagraduatecourseinstatistics or biostatistics. We discuss results and models from stochastic processes in some detail, and have attempted to present the material in an accessible way withdiscussionofmodelformulation,estimationandinference,andnumerous applications.Theimportanceofmodelassessmentisemphasized.Chaptersare concluded with Problems and Supplements sections which give exercises as well as extensions to material in the text. An important feature of this book is the coverage of practical issues such as observation and subject-selection schemes, the planning of randomized experiments, incomplete data, and the predictionoffutureevents.Areasneedingfurthermethodologicaldevelopment are also discussed. Likelihood methods are emphasized as a basis for inference whenever possible.Estimatingfunctiontheoryisalsoused,especiallyforinferenceabout marginalfeatureswhenmodelsarenotfullyspecified.AppendixAprovidesa summary of relevant material on likelihood and estimating function method- ology, but familiarity with statistical inference is assumed. Martingale repre- sentations are used for certain estimating functions, but we do not discuss asymptotic theory used to rigorously justify large sample results. Our ap- proach is to indicate clearly the statistical basis of methodology without dwelling on regularity conditions and detailed proofs of asymptotic results. viii Preface Some background in survival analysis is beneficial, inasmuch as many methods for recurrent events are related to survival analysis and can be im- plemented with software for that area. Kalbfleisch and Prentice (2002) and Lawless (2003a) are references with a similar style of presentation to this book. Books which discuss models for recurrent event data include Cox and Lewis(1966),CoxandIsham(1980), DaleyandVere-Jones(1988),andother books on point processes. Andersen et al. (1993) provide a rigorous discus- sion of models and methods for the analysis of data arising from counting processes, and emphasize Markov processes. Therneau and Grambsch (2000) present methods for the analysis of recurrent event data along with applica- (cid:1) tions using S-PLUSR, R and SAS. Nelson (2003) gives graphical procedures andsimplemethodsfortheanalysisofrecurrenteventsbasedonrateormean functions. Other recent books which include some discussion of the analysis of recurrent event data include Hougaard (2000), Kalbfleisch and Prentice (2002), Martinussen and Scheike (2006), and Sun (2006). The present book goes beyond these treatments in the breadth of models addressed and in the attention paid to practical issues of design and analysis. ThedatainexamplesareanalyzedusingS-PLUS,althoughidenticalcode can be used in R (see www.r-project.org). In most cases there exist analo- gous procedures in SAS software. Datasets that are available to the public are listed in Appendix D and are posted at www.stats.uwaterloo.ca/cook- lawless/book.shtml along with sample code for S-PLUS or R and SAS. Our interests in statistical methods for recurrent events have developed from working with several colleagues in various areas of research. We would liketoacknowledgeNancyHeddle(McMasterUniversity),PierreMajor(Mc- Master University), and Jeff Robinson (General Motors) for stimulating col- laborations which have led to methodological development in this area. We also wish to thank colleagues at GlaxoSmithKline Inc., Novartis Pharmaceu- ticalsInc.,andBayerCanadaInc.forpermissiontousethedatafromclinical trials in several examples. We are grateful to the faculty, visiting fellows, graduate students, and staff at University of Waterloo who help create a stimulating environment for research. In particular we would like to acknowledge collaborations in- volving recurrent events with Jean-Marie Boher, Bingshu Chen, Charmaine Dean,DanielFong,MarcFredette,JoanHu,JackKalbfleisch,ClaudeNadeau, Edmund Ng, Wei Wei, Grace Yi, and Min Zhan. Mary Lou Dufton and Joan Hatton provided secretarial assistance in the preparation of this book, for which we are grateful. We would especially like to thank Ker-Ai Lee, whose expertstatisticalprogramminghelpedinthepreparationoftheexamples,and who provides important support to our research. MuchoftheworkherewasdevelopedwhilethefirstauthorheldanInvesti- gator Award from the Canadian Institutes of Health Research and a Canada Research Chair in Statistical Methods for Health Research, and while the second author held an Industrial Research Chair co-sponsored by General Preface ix Motors Canada and the Natural Sciences and Engineering Research Council of Canada. This support is gratefully acknowledged. Finally we would like to thank our wives Alison (R.J.C.) and Liz (J.F.L.) for their patience and support during the preparation of this book. University of Waterloo Richard Cook December 2006 Jerry Lawless Glossary The following is a summary of the notation used throughout this book. • I(A) is the indicator function, equaling 1 if A is true and 0 otherwise • Pr(A) is the probability of event A • E(·) denotes expectation, var(·) denotes variance, cov(·) denotes covari- ance, corr(·) denotes correlation, asvar(·) and ascov(·) denote asymptotic variance and covariance, respectively • m (t)=E(exp(Xt)) is the moment generating function of X X (cid:1) • Γ(a)= ∞ua−1exp(−u)du is the gamma function, where a>0 0 • B(a,b)=Γ(a)Γ(b)/Γ(a+b) is the beta function, where a>0 and b>0 • g(x)∼o(x) means g(x)/x→0 as x→0 • The transpose of a matrix A is A(cid:2) • Vectors are written in column form so, for example, θ =(θ ,...,θ )(cid:2) 1 r • If g(θ)=(g (θ),...,g (θ))(cid:2) is a vector of functions, then ∂g(θ)/∂θ(cid:2) is the 1 k k×r matrix with (i,j) element ∂g (θ)/∂θ i j (cid:2) • {1+g(u)du} is a product integral; see Section 2.1 [a,b] (cid:1) • The integral bdG(u) is a Riemann–Stieltjes integral; see Section 2.1. a • L(θ), (cid:4)(θ), U(θ), I(θ), and I(θ) represent the likelihood, log-likelihood, score, observed information, and expected information functions, respec- tively; see Appendix A • θ(cid:3)denotes an estimate of the parameter θ • If θ =(θ(cid:2),θ(cid:2))(cid:2), then θ˜ (θ ) is the profile likelihood estimate of θ for fixed 1 2 1 2 1 θ 2