Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2006 Estimation of accelerated failure time models with random effects Yaqin Wang Iowa State University Follow this and additional works at:https://lib.dr.iastate.edu/rtd Part of theBiostatistics Commons Recommended Citation Wang, Yaqin, "Estimation of accelerated failure time models with random effects " (2006).Retrospective Theses and Dissertations. 3062. https://lib.dr.iastate.edu/rtd/3062 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please [email protected]. Estimation of accelerated failure time models with random effects by Yaqin Wang A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Kenneth J. Koehler, Major Professor Song X Chen Richard Evans Heike Hofmann Terry Therneau Iowa State University Ames, Iowa 2006 Copyright © Yaqin Wang, 2006. All rights reserved. UMI Number: 3243544 UMI Microform3243544 Copyright2007 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346 ii TABLE OF CONTENTS ABSTRACT............................................................................................................................iv GENERAL INTRODUCTION..............................................................................................1 1 Introduction....................................................................................................................1 2 Cox Proportional Hazards Model with Random Effects...............................................3 3 Accelerated Failure Time Models..................................................................................6 3.1 AFT Models........................................................................................................7 3.2 Inference for AFT Models..................................................................................8 3.3 AFT Models with Shared Frailty........................................................................9 3.4 AFT Models with Random Effects...................................................................10 4 Dissertation Organization............................................................................................13 5 References for General Introduction............................................................................13 ESTIMATION OF ACCELERATED FAILURE TIME MODELS WITH RANDOM EFFECTS...............................................................................................................................16 Abstract...........................................................................................................................16 1 Introduction..................................................................................................................16 2 Accelerated Failure Time Models with Random Effects.............................................19 2.1 AFT Models with Shared Frailty..........................................................................20 2.2 AFT Models with Random Effects.......................................................................21 3 Estimation....................................................................................................................23 3.1 Approximate Likelihood.......................................................................................23 3.2 Asymptotic Properties of Laplace-Based Estimation...........................................25 3.2.1 Consistency of the Laplace-Based Estimator................................................25 3.2.2 Asymptotic Normality...................................................................................28 3.3 Estimation.............................................................................................................30 4 Simulation Studies.......................................................................................................31 4.1 Description of Simulation I...................................................................................32 4.2 Results of Simulation I..........................................................................................33 4.3 Description of Simulation II.................................................................................48 iii 4.4 Results of Simulation II........................................................................................49 4.5 Approximate Grouped Jackknife Estimator..........................................................52 5 Discussion....................................................................................................................54 6 References....................................................................................................................55 Appendix 1 The Accuracy of the Laplace Approximation.............................................58 Appendix 2 Programs for AFT Models with Random Effects.......................................60 2.1 Algorithm Description......................................................................................61 2.2 Algorithm Testing.............................................................................................62 AFT MODELS WITH RANDOM EFFECTS FOR CORRELATED SURVIVAL DATA AND AN APPLICATION TO BREAST CANCER FAMILY DATA................66 Abstract...........................................................................................................................66 1 Introduction..................................................................................................................66 2 Minnesota Breast Cancer Family Studies....................................................................69 2.1 Minnesota Breast Cancer Family Resource..........................................................69 2.2 Kinship..................................................................................................................72 3 Mixed Effects Cox Models..........................................................................................74 4 Modeling the Breast Cancer Data Using Mixed Effects Cox Models.........................76 5 AFT Models with Random Effects..............................................................................82 6 Modeling the Breast Cancer Data Using AFT Models with Random Effects.............85 7 Discussion....................................................................................................................95 8 References....................................................................................................................97 GENERAL CONCLUSIONS............................................................................................100 iv ABSTRACT Correlated survival data with possible censoring are frequently encountered in survival analysis. This includes multi center studies where subjects are clustered by clinical or other environmental factors that influence expected survival time, studies where times to several different events are monitored on each subject, and studies using groups of genetically related subjects. To analyze such data, we propose accelerated failure time (AFT) models based on lognormal frailties. AFT models provide a linear relationship between the log of the failure time and covariates that affect the expected time to failure by contracting or expanding the time scale. These models account for within cluster association by incorporating random effects with dependence structures that may be functions of unknown covariance parameters. They can be applied to right, left or interval-censored survival data. To estimate model parameters, we consider an approximate maximum likelihood estimation procedure derived from the Laplace approximation. This avoids the use of computationally intensive methods needed to evaluate the exact log-likelihood, such as MCMC methods or numerical integration that are not feasible for large data sets. Asymptotic properties of the proposed estimators are established and small sample performance is evaluated through several simulation studies. The fixed effects parameters are estimated well with little absolute bias. Asymptotic formulas tend to underestimate the standard errors for small cluster sizes. Reliable estimates depend on both the number of clusters and cluster size. The methodology is used to analyze data taken from the Minnesota Breast Cancer Family Resource to examine age-at-onset of breast cancer for women in 426 families. 1 GENERAL INTRODUCTION 1 Introduction There are two important classes of regression models for survival data, Cox proportional hazards (PH) models (Cox, 1972) and accelerated failure time (AFT) models (Collett, 2003). Cox proportional hazards models relate the hazard function to covariates, while the AFT models specify a direct relationship between the failure time and covariates. Cox models have been extensively applied in medical research. AFT models are especially useful in industrial applications in which failure is accelerated by thermal, high-voltage or other factors. The theme of this dissertation is the application of accelerated failure time models to correlated survival data. Traditional applications and development of the proportional hazards and AFT models have relied on the assumption of independent responses from the monitored units that are subject to failure. Correlated survival data with possible censoring, however, are frequently encountered in survival analysis and models for correlated survival data are receiving increasing attention. Correlated data may arise from multiple observations on the same individuals, for instance, recurrent infections in clinical trials. The lack of independence also appears when observations are clustered, for example, in a multi-center study of kidney transplant survival (Lambert et al., 2004), survival times of patients from the same transplant center were associated since the transplants might be carried out by the same surgical team. Correlated survival time may also arise when genetically or socially related subjects, such as family members or classmates, are followed until some specific event occurs. Traditional methods of estimation that treat observations as independent are inappropriate for such data. Various methods have been developed for analysis of correlated observations. One basic approach introduces random effects into models to induce correlations. In survival analysis such random effects models are commonly referred to as frailty models. Another approach is to use estimation methods developed for independent observations, such as partial likelihood estimation, and then adjust the covariance matrix of the resulting estimators to reflect the 2 correlations. Robust or “Sandwich” covariance estimators, or appropriate resampling methods, can be used to obtain consistent estimates of covariance matrices and standard errors. While this approach provides appropriate large sample inferences, the estimators tend to be inefficient because information provided by the correlations among the survival time is not fully incorporated into the estimating equations. This is a special case of generalized estimating equations. It has the advantage of not requiring a specific model for the joint distribution of the correlated responses, which may be difficult to assess for small or moderate samples. Estimating equations that incorporate information about the correlation structure of the observations can be developed without completely specifying a model for the joint distribution of the observations, and such equations can improve the efficiency of estimators. By completely specifying joint distributions for correlated observations, maximum likelihood, maximum partial likelihood, or Bayesian estimation methods can be used. Although efficiency may be gained, one practical problem with this approach is that the derivation of the marginal likelihood, or marginal partial likelihood, for the observed may be intractable. Numerical integration is usually not feasible, and marginal likelihoods, or marginal partial likelihoods, are either evaluated with simulation techniques or approximated. The former may be quite expensive computationally, and the latter is an approximation that may reduce efficiency of estimation. The concept of frailty initially was used to explain variability due to heterogeneity of members of a population in the context of mortality studies (Vaupel et al, 1979). Frailties are basically random effects in survival models. Hougaard (1986) examined a shared frailty model with Weibull hazards. Whitmore and Lee (1991) discussed an inverse Gaussian shared frailty model with constant individual hazards. A shared frailty describes some common effects on the members of a cluster. The shared frailty model has gained broad acceptance over the last few years for clustered survival data. When there are dependencies among observed survival times, traditional partial likelihood estimation for the Cox proportional hazards model that assumes independent responses may not provide reliable inferences. Although parameter estimates are generally consistent, ignoring the dependence of correlated survival data adversely affects the precision of the parameter estimates (Wei, Lin, and Weissfeld, 1989). More importantly, the estimated 3 variances of parameter estimates are biased. Therefore, the Cox proportional hazards model with random effects was proposed to account for such dependences. Many approaches have been developed to estimate parameters in the Cox proportional hazards model with random effects. Next, we will briefly review several estimation procedures for this model. 2 Cox Proportional Hazards Model with Random Effects Let T* denote the event time or survival time for the jth (j = 1, …, n ) subject from the ith ij i cluster (i = 1, …, N), and letC*represent the censoring time. Then, the observed time is T = ij ij min (T *,C * ), the indicator function δ = I({T * ≤ C *}) is 1 if the response time is ij ij i ij ij uncensored and 0 if the response time is censored. Given random effects, survival times are assumed to be conditionally independent. The hazard function for the jth subject from the ith cluster of a shared frailty model is given by λ (t) = λ (t)ω exp( x′ β) (1) ij 0 i ij where λ is the baseline hazard function, β is a vector of fixed effects corresponding to 0 covariate vector x , and ω are independent, identically distributed random variables with ij i some common density function. Shared frailty models have some limitations. For example, they can’t accommodate the situation where the frailty is not the same for all the individuals in a cluster. In order to account for more complicated frailty structure, the shared frailty model needs to be extended. The hazard function for a more general mixed-effects proportional hazards model can be defined as λ (t) = λ (t)exp( x′ β + z′ b ) (2) ij 0 ij ij i where b is a vector of random cluster effects associated with individual vectors of covariates i z . The random effects b are assumed to be distributed according to some distribution with ij i mean 0 and covariance matrix D = D(θ), where θ is a vector of unknown parameters. Several approaches have been proposed to estimate the parameters of model (2). McGilchrist and Aisbett (1991) and McGilchrist (1993) used a penalized partial likelihood 4 approach to estimate the fixed effects and an approximate residual maximum likelihood (REML) approach to estimate the variance covariance parameters based on a normal approximation to the distribution of the residuals. They only considered the special case where the random effects are normally distributed with mean zero and diagonal variance- covariance matrix D. In an animal-breeding context, Ducrocq and Casella (1996) introduced a Bayesian approach to estimate the parameters of a special form of model (2) with Weibull baseline hazards and one set of random sire effects with either log-gamma or Gaussian distributions. For those models, the sire effects can be integrated out of the posterior distribution algebraically. The marginal posterior distribution for the dispersion parameter cannot be obtained algebraically and a Laplace approximation was considered. Simulation results showed that the estimation procedure performed well when there are few sires and many daughters per sire, but did not always perform well when there were many sires with only a few daughters per sire. Ripatti and Palmgren (2000) proposed an approximate marginal likelihood approach for a multivariate lognormal frailty model based on a penalized partial likelihood. Their approach allows for more complex dependence frailty structures. The random effects are assumed to be log-normally distributed with positive definite variance-covariance matrix D(θ). The Laplace approximation was applied to get an approximate marginal likelihood as the integral cannot be evaluated analytically. This leads to estimating equations based on a penalized partial likelihood. The estimating procedure is simple but it tends to result in an underestimation of the variance of the estimated fixed effects parameters. EM-algorithm based estimation approaches have been applied by several authors. Ripatti, Larsen and Palmgren (2002) developed an estimation procedure based on a Monte Carlo EM algorithm with the aim of obtaining the maximum marginal likelihood estimation rather than an approximation of the marginal likelihood estimation (Ripatti and Palmgren 2000). The frailties are treated as missing data and imputed in the E-step. The expectation in the E-step cannot be solved analytically and it is approximated by sampling from the conditional distribution of the frailties given the observed data. The M-step maximizes the complete data log-likelihood using the imputed frailties as if they were observed. This procedure alternates
Description: