ebook img

Bayesian Semiparametric Estimation of Cancer-specific Age-at-onset Penetrance with Application to Li-Fraumeni Syndrome PDF

1.2 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian Semiparametric Estimation of Cancer-specific Age-at-onset Penetrance with Application to Li-Fraumeni Syndrome

Bayesian Semiparametric Estimation of Cancer-specific Age-at-onset Penetrance with Application to Li-Fraumeni Syndrome 7 1 0 Seung Jun Shin1, Louise C. Strong2, Jasmina Bojadzieva2, 2 Wenyi Wang2, and Ying Yuan2 n a J Korea University1 and The University of Texas MD Anderson Cancer Center2 0 2 ] Abstract P A Penetrance, which plays a key role in genetic research, is defined as the proportion of individuals t. withthegeneticvariants(i.e.,genotype)thatcauseaparticulartraitandwhohaveclinicalsymptoms a t ofthetrait(i.e.,phenotype). WeproposeaBayesiansemiparametricapproachtoestimatethecancer- s [ specificage-at-onsetpenetranceinthepresenceofthecompetingriskofmultiplecancers. Weemploy 2 a Bayesian semiparametric competing risk model to model the duration until individuals in a high- v 8 riskgroupdevelopdifferentcancers,andaccommodatefamilydatausingfamily-wiselikelihoods. We 5 tackletheascertainmentbiasarisingwhenfamilydataarecollectedthroughprobandsinahigh-risk 5 1 populationinwhichdiseasecasesaremorelikelytobeobserved. Weapplytheproposedmethodtoa 0 . cohortof186familieswithLi-Fraumenisyndromeidentifiedthroughprobandswithsarcomatreated 1 0 at MD Anderson Cancer Center from 1944 to 1982. 7 1 : Keyword: cancer specific age-at-onset penetrance, competing risk, gamma frailty model, family-wise v i likelihood, Li-Fraumeni syndrome X r a 1 Introduction The Li-Fraumeni syndrome (LFS) is a rare disorder that substantially increases the risk of developing severalcancertypes,particularlyinchildrenandyoungadults. Itischaracterizedbyautosomaldominant mutation inheritance with frequent occurrence of several cancer types: soft tissue/bone sarcoma, breast cancer, lung cancer, and other types of cancer that are grouped together as “other cancers” (Nichols et al.; 2001; Birch et al.; 2001). A majority of LFS is caused by germline mutations in the TP53 tumor suppressor gene (Malkin et al.; 1990; Srivastava et al.; 1990). 1 The LFS data that motivate our work are family data collected through patients diagnosed with pediatric sarcoma (i.e., probands) who were treated at MD Anderson Cancer Center from 1944 to 1982 and their extended kindred. The data consist of 186 families, with a total of 3686 subjects. The size of thefamiliesrangesfrom3to717, withthemedianat7. Thisdatasetisthelongestfollowed-upcohortin theworld(followedupfor20-50years), andamongthelargestcollectionofTP53mutationcarriersinall cohortsthatareavailableforLFS.ConsideringtheprevalenceofTP53mutationsinageneralpopulation isaslowas0.0001to0.003,thisdatasetprovidesaspeciallyenrichedcollectionofTP53mutations,which then allow us to characterize its effect on a diverse cancer outcomes. For each subject, the duration until he/she develops cancer is recorded as the primary endpoint. Although it is possible for LFS patients to experience multiple cancers during their lifetime, here, we focus on only the time to the first primary cancersinceonlyafewpatientsrepresentedinthedatabaseexperiencedmultipleprimarycancers. Table 1 shows the cancer-specific summaries for the LFS data. Further descriptions of the data are provided by Lustbader et al. (1992), Strong et al. (1992), and Hwang et al. (2003). The primary objective here is to estimate the cancer-specific age-at-onset penetrance as a measure of theriskofexperiencingaspecificcancerforapersonwithaspecificgenotype(i.e.,TP53mutationstatus). Penetrance, whichplaysacrucialroleingeneticresearch, isdefinedastheproportionofindividualswith the genetic variants (i.e., genotype) that cause a particular trait who also have clinical symptoms of the trait (i.e., phenotype). When the clinical traits of interest are age-dependent (e.g., cancers), it is often more desirable to estimate the age-at-onset penetrance, defined as the probability of disease onset by a certain age, while adjusting for additional covariates if necessary. For the LFS study, the age-at-onset penetrance is defined as the conditional probability of having LFS-related cancers by a certain age given a certain TP53 mutation status. Cox proportional hazard regression models (Gauderman and Faucett; 1997; Wu et al.; 2010, among many others) have been most widely used for this task. Other approaches have included nonparametric estimation Wang et al. (2007) and parametric estimation based on logistic regression (Abel et al.; 1990) or a Weibull model (Hashemian et al.; 2009). Estimatingtheage-at-onsetpenetrancefortheLFSdataischallengingforseveralreasons. First,LFS Table 1: Frequency table for LFS data. Gender Genotype Breast Sarcoma Others Censored Total Unknown 0 11 130 1275 1416 Male Wildtype 0 76 30 295 401 Mutation 0 12 27 9 48 Subtotal 0 99 187 1579 1865 Unknown 39 4 62 1204 1309 Female Wildtype 8 96 17 343 464 Mutation 19 12 7 10 48 Subtotal 66 112 86 1557 1821 Total 66 211 273 3136 3686 2 involves multiple types of cancer, and subjects have simultaneous competing risks of developing multiple typesofcancer. Chatterjeeetal.(2003)proposedapenetranceestimationmethodunderacompetingrisk frameworkforakin-cohortdesign. However,theirmethodisnotdirectlyapplicableifthepedigreesizeis large and/or there is additional genetic information from relatives. Gorfine and Hsu (2011) and Gorfine et al. (2014) proposed frailty-based competing risk models for family data, assuming that genotypes are completely observed for all family members, which is not the case for the LFS data. Second, the genotype (i.e., TP53 mutation status) is not measured for the majority (about 74%) of subjects and the LFS data are clustered in the form of families. Accommodating the missing data and accounting for family or pedigree data structure are statistically and computationally challenging. As shown later, to efficiently utilize the observed genotype data nested in the family structure, we need to marginalize the likelihood over (or integrate out) all possible genotypes for subjects with missing genotype information, and meanwhile take into account the available genotypes in the family under the given pedigree structure. Third, the LFS data are not a random sample, but have been collected through probands diagnosed with sarcoma at young ages. That is, the data oversampled sarcoma patients. Such a sampling scheme inevitablycreatesbias,knownasascertainment bias,andshouldbeproperlyadjustedtoobtainunbiased results. Several likelihood-calibrated models have been developed to correct the ascertainment bias, including the retrospective model (Kraft and Thomas; 2000), the conditional-on-ascertainment variable model (Ewens and Shute; 1986; Pfeiffer et al.; 2001), and the ascertainment-corrected joint model (Kraft and Thomas; 2000; Iversen and Chen; 2005), among others. Toaddressthesechallenges,inthisarticle,wedevelopaBayesiansemiparametricapproachtoestimate the cancer-specific age-at-onset penetrance in the presence of the competing risks of developing multiple cancers. WeemployaBayesiansemiparametriccompetingriskmodeltomodelthetimetodifferenttypes of cancer and introduce the family-wise likelihood to minimize information loss from missing genotypes andharnesstheinformationcontainedinthepedigreestructure. Weemploythepeelingalgorithm(Elston and Stewart; 1971) to evaluate the family-wise likelihood, and utilize the ascertainment-corrected joint model (Kraft and Thomas, 2000) to correct the ascertainment bias. The rest of the article is organized as follows. In Section 2, we define the cancer-specific age-at-onset penetrance and describe our Bayesian semiparametric competing risk model including details about the family-wiselikelihoodandtheascertainmentbiascorrection. InSection3, weprovideanalgorithmtofit the models and carry out a simulation study in Section 4. We apply the proposed methodology to the LFS data in Section 5. Discussions follow in Section 6. 3 2 Model 2.1 Cancer-specific Age-at-onset Penetrance Let G denote a subject’s genotype, and X denote the baseline covariates (e.g., gender). As LFS is autosomal dominant, let G = 1 denote genotype Aa or AA, and G = 0 denote genotype aa, where A and a denote the (minor) mutated and wildtype alleles, respectively. Suppose that K types of cancer are under consideration and compete against each other such that the occurrence of one type of cancer censors the other types of cancer. Let T denote the time to the kth type of cancer, k = 1,...,K, and k define T =min T and Y =min{T,C}, where C is a conditional random censoring time given k∈{1,···,K} k G and X, i.e., T⊥C|G,X. Let D denote the cancer type indicator, with D = k if T = T < C (i.e., k the kth type of cancer that occurs); otherwise, D = 0 (i.e., censored observation). The actual observed time-to-event data are H =(Y,D). Traditionally, when analyzing subjects at risk of developing a single disease, the age-at-onset pene- trance is defined as the probability of having the disease at a certain age given a certain genotype. In ordertostudyLFS,wheresubjectssimultaneouslyhavethe(competing)riskofdevelopingmultipletypes ofcancer,thisstandarddefinitionmustbeextended. Borrowingideasfromthecompetingriskliterature, we define the kth cancer-specific age-at-onset penetrance, denoted by q (t|G,X), as the probability of k havingthekthtypeofcanceratagetpriortodevelopingothercancers(competingrisks),givenaspecific genotype G and additional baseline covariates X if necessary, that is, q (t|G,X)=Pr(T ≤t,D =k|G,X), k =1,··· ,K. (1) k The cancer-specific penetrance q (t|G,X) can be estimated as k (cid:90) t q (t|G,X)= λ (u|G,X)S(u|G,X)du, k =1,··· ,K, (2) k k 0 where Pr(t≤T <t+h,D =k|T >t,G,X) λ (t|G,X)=lim , (3) k h↓0 h and (cid:40) K (cid:41) (cid:88) S(t|G,X)=exp − Λ (t|G,X) , k k=1 (cid:82)t with Λ (t|G,X) = λ (u|G,X)du. In the competing risk literature, λ (t|G,X) and Λ (t|G,X) are k 0 k k k referredtoasthecancer-specifichazardandcancer-specificcumulativehazard,respectively. Wenotethat it may be tempting to define the cancer-specific age-at-onset penetrance function as Pr(T ≤ t|G,X), k whichisanalogoustotheconventionaldefinitionofpenetranceforasingledisease. However,thatquantity 4 is not identifiable (Tsiatis; 1975). Besides cancer-specific penetrance, it is often of practical interest to estimate the overall age-at-onset penetrance, defined as q(t|G,X)=Pr(T ≤t|G,X), (4) which is the probability that a subject has any type of cancer by age t given his/her genotype G and baseline characteristics X, and can be calculated through the cancer-specific penetrance q (t|G,x) using k q(t|G,X)=(cid:80)K q (t|G,X). k=1 k 2.2 Competing Risk Model Let Z=(G,X,G×X)T, with G×X denoting the interaction between G and X. We model the hazard for the kth type of cancer, say λ (t|Z,ξ ), using a frailty model as follows: k i,k λ (t|Z,ξ )=λ (t)ξ exp{βTZ}, k =1,··· ,K, (5) k i,k 0,k i,k k whereβ denotestheregressioncoefficientparametervector;λ (t)isabaselinehazardfunction;andξ k 0,k i,k istheithfamily-specificfrailty(orrandomeffect)usedtoaccountforthewithin-familycorrelationinduced by non-genetic factors that are not included in X. The pedigree information (or genetic relationship) within a family will be incorporated through the family-wise likelihood described in Section 2.3. We assume that ξ follows a gamma distribution, ξ ,··· ,ξ ∼ Gamma(ν ,ν ). Such a gamma frailty i,k 1,k I,k k k has been widely used in frailty models (Duchateau and Janssen; 2007). Under this model, the cancer-specific age-at-onset penetrance can be expressed as (cid:90) t(cid:90) q (t|Z)= λ (u|Z,ξ )S(u|Z,ξ)f(ξ|ν)dξdu k k k 0 ξ∈[0,∞)K (cid:90) t ν = k λ (u)exp{βTZ}S(u|Z)du, (6) (ν −log{S∗(u|Z)}) 0,k k 0 k k (cid:110) (cid:111) where S∗(t|Z)=exp −(cid:82)tλ (u)exp{βTZ}du and S(t|Z)=(cid:81)K S (t|Z) with k 0 k,0 k k=1 k (cid:90) ∞ (cid:26) (cid:90) t (cid:27) S (t|Z)= exp − λ (u|Z,ξ )du f(ξ |ν )dξ k k k k k k 0 0 (cid:20) ν (cid:21)νk = k . ν −log{S∗(t|Z)} k k Becausethepenetrancedependsonthesurvivalfunction,itisimperativetospecifythebaselinehazard λ (t), which appears in (5). To this end, we propose to approximate the cumulative baseline hazard 0,k (cid:82)t Λ (t) = λ (s)ds via Bernstein polynomials (Lorentz; 1953) since Λ (t) is monotone increasing. 0,k 0 0,k 0,k 5 BernsteinpolynomialsarepopularinBayesiannonparametricfunctionestimation,withshaperestrictions due to desired properties such as the optimal shape restriction property (Carnicer and Pen˜a; 1993) and theconvergencepropertyoftheirderivatives(Lorentz;1953). Withoutlossofgenerality,weassumethas been rescaled, e.g., by the largest observed time, such that t∈[0,1]. Now, we have Λ (t) approximated 0,k by Bernstein polynomials of degree M as follows (Chang et al.; 2005). M (cid:18) (cid:19) (cid:88) M Λ (t)≈ ω tl(1−t)M−l, (7) 0,k l,k l l=1 where ω = Λ (l/M) and ω ≤ ··· ≤ ω to ensure that Λ (t) is monotone increasing. Notice l,k 0,k 1,k M,k 0,k that l is running from 1 because of Λ (0)=0. Applying the re-parameterization of γ =ω −ω 0,k l,k l,k l−1,k with ω =0 and γ ≥0,l=1,··· ,M, the right-hand side of (7) can be equivalently rewritten as 0,k l,k (cid:88)M (cid:90) t um(1−u)M −m γ du=γTB (t), (8) m,k Beta(m,M −m+1) k M m=1 0 where B (t) = (B (t,1),··· ,B (t,M))T, with B (t,m) being the distribution function of the beta M M M M distribution evaluated at the value of t with parameters m and M −m+1, and γ =(γ ,··· ,γ )T k 1,k M,k (Curtis and Ghosh; 2011). Therefore, it follows that λ (t)≈γTb (t), (9) 0,k k M where b (t) = (b (t,1),··· ,b (t,M))T and b (t,m) = ∂B (t,m)/∂t (i.e., associated beta density). M M M M M Finally, the frailty model (5) can be written as λ (t|Z;β ,γ ,ξ )={γTb (t)}ξ exp{βTZ}. (10) k k k i,k k M i,k k Theproposednonparametricbaselinehazardmodel(9)ismoreflexiblethanparametricmodels, such as exponential and Weibull models, without imposing a restrictive parametric structure on the shape of the baseline hazard. Compared to the piecewise constant hazard model, our approach produces a smooth estimate of hazard and also avoids selection of knots, which is often subjective. The numerical comparison of different baseline models is provided in Section 5.5 and Appendix Section C. 2.3 Family-wise Likelihood Letiindexthefamilyandj indextheindividualwithinthefamily,wherei=1,··· ,I,andj =1,··· ,n . i For the ith family, let H = (H ,··· ,H )T denote the observed time-to-cancer data, and G and i i1 ini i,obs G respectivelydenotetheobservedandmissingpartsofgenotypedataG ,i.e.,G =(G ,G ). i,mis i i i,obs i,mis 6 Conditional on frailty ξ =(ξ ,··· ,ξ , the likelihood of H for the ith family can be expressed as i i,1 i,K) i (cid:88) Pr(H |G ,X ,θ,ξ )= Pr(G |G )Pr(H |G ,X ,θ,ξ ), (11) i i,obs i i i,mis i,obs i i i i Gi,mis∈Si where S denotes a set of all the possible combinations of genotypes G conditional on G . In this i i,mis i,obs family-wiselikelihood,thefirsttermontherighthandside,i.e.,Pr(G |G ),describesthegenotype i,mis i,obs relationship among family members, which follows the law of Mendelian inheritance. This term enables us to incorporate the pedigree information into the likelihood and capture the genetic correlation among family members beyond the shared frailty ξ . The second term on the right hand side of family-wise i likelihood (11), i.e., Pr(H |G ,X ,θ,ξ ), is the complete-data likelihood (without missing genotypes), i i i i given by Pr(H |G ,X ,θ,ξ )∝ (cid:89)ni (cid:89)K {λ (Y |Z ,θ,ξ )}∆ijkexp{−Λ (Y |Z ,θ,ξ )}, (12) i i i i k ij ij i k ij ij i j=1k=1 with ∆ = 1 if D = k and 0 otherwise (Prentice et al.; 1978; Maller and Zhou; 2002) and θ = ijk ij {(βT,γT):k =1,··· ,K}. k k Evaluation of the family-wise likelihood (11) is challenging because the cardinality of the set S i exponentially increases with the number of genotype-unknown subjects. To tackle this challenge, we propose to use Elston-Stewart’s peeling algorithm (Elston and Stewart; 1971; Lange and Elston; 1975; Fernando et al.; 1993), described as follows. For notational brevity, we suppress the subscript i and the conditional arguments except genotypes. LetH=(H ,H ,H )T, whereH isaphenotypeofanarbitrarilychosenjthindividualinthefamily, j− j j+ j which we call a pivot member; and H and H respectively represent phenotypes of ancestors and j− j+ offsprings of the pivot member j. Notice that such decomposition of H is always possible when there is no loop in the pedigree, which we implicitly assume throughout this article. In addition, H ,H , j− j+ andH areconditionallyindependentgivenG , sinceallphenotypesareconditionallyindependentgiven j j genotypes (and frailty). Following the idea of Fernando et al. (1993), the family-wise likelihood (11) can be equivalently rewritten as (cid:88) Pr(H|G )= Pr(G |G )Pr(H|G ,G ) obs j obs j obs Gj (cid:34) (cid:35) (cid:88) = Pr(G |G ) Pr(H |G ,G )·Pr(H |G ,G )·Pr(H |G ,G ) j obs j j obs j− j obs j+ j obs (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) Gj pivot ancestor offspring (cid:34) (cid:35) (cid:88) =Pr(H |G+ )·Pr(H |G− )· Pr(G |G )Pr(H |G ) (13) j+ obs j− obs j obs j j Gj 7 where G+ and G− respectively denote the observed genotype vectors within the ancestors and the obs obs offspringsofthepivotmember. NoticethatPr(H |G+ )andPr(H |G− )canbeevaluatedinexactly j+ obs j− obs the same manner as Pr(H|G ). The conditional Pr(G |G ) can be easily computed from the law of obs j obs Mendelian inheritance. The peeling algorithm computes (11) by recursively applying (13) until further decompositionofHisnotpossible. AnillustrativeexampleofthepeelingalgorithmisgiveninAppendix Section A. The peeling algorithm shares a similar spirit with the popular divide-and-conquer algorithm, and its computational complexity is approximately linear, O(nlog(n)). Another approach to handle the missing genotypes in the family-wise likelihood is the data augmentation method, which “imputes” the missing genotypes at each iteration of the Markov Chain sampler. This approach avoids marginalizing the family-wise likelihood over all possible missing genotypes, thus may potentially improve the speed of model fitting. It warrants further investigation. 2.4 Ascertainment Bias Correction For studies of rare diseases, such as LFS, ascertainment bias is inevitable when family data are collected through probands in high-risk populations in which disease cases are more likely to be observed. Let A denote the ascertainment indicator variable, such that A = 1 if the ith family is ascertained and i i 0 otherwise. In the LFS data, a family is ascertained and included in the sample only if the proband is diagnosed with sarcoma. Thus, the likelihood of the ith family collected in the FLS data is not Pr(H |X ,G ,θ,ξ ) as given in (11), but i i i,obs i Pr(H |X ,G ,θ,ξ ,A =1). i i i,obs i i These two quantities are related as Pr(H |X ,G ,θ,ξ )Pr(A =1|H ,X ,G ,θ,ξ ) Pr(H |X ,G ,θ,ξ ,A =1)= i i i,obs i i i i i,obs i . i i i,obs i i Pr(A =1|X ,G ,θ,ξ ) i i i,obs i In practice, the ascertainment decision is often made on the basis of the proband’s phenotype H i1 in a deterministic way, e.g., only sarcoma patients (and their families) are selected and included in the sample. Hence, the above equation can be simplified to Pr(H |X ,G ,θ,ξ ) Pr(H |X ,G ,θ,ξ ,A =1)∝ i i i,obs i . (14) i i i,obs i i Pr(A =1|θ,ξ ) i i This means that the ascertainment bias can be corrected by inverse-probability weighting the likelihood 8 by the corresponding ascertainment probability, which is given by (cid:88) Pr(A =1|θ,ξ )= Pr(A =1|H )Pr(H |θ,ξ ). (15) i i i i1 i1 i Hi1 In the LFS data, as a family is ascertained only if the proband is diagnosed with sarcoma (coded as D =2), it follows Pr(A =1|Y ,D =2)=1 and Pr(A =1|Y ,D (cid:54)=2)=0. i i1 i1 i i1 i1 Therefore, recalling H =(Y ,D ), the ascertainment probability (15) is given by i1 i1 i1 Pr(A =1|θ,ξ )=Pr(Y ,D =2|θ,ξ ) i i i1 i1 i (cid:88)(cid:88) = Pr(Y ,D =2|G,X,θ,ξ )Pr(G)Pr(X) i1 i1 i X G (cid:40) K (cid:41) (cid:88)(cid:88) (cid:88) = λ (Y |Z,β ,γ ,ξ )·exp − Λ (Y |Z,β ,γ ,ξ ) Pr(G)Pr(X), (16) 2 i1 2 2 i,2 k i1 k k i,k X G k=1 where Pr(G) is the prevalence of genotype G, which can be calculated on the basis of the mutated allele frequency φ , i.e., Pr(G = 0) = (1 − φ )2 and Pr(G = 1) = 1 − (1 − φ )2. Typically, φ can be A A A A estimatedfrompopulation-baseddatasources; theprevalenceofagermlinep53mutationintheWestern population is known to be φ = 0.0006 (Lalloo et al.; 2003). In the LFS data, X is gender and we set A Pr(X =0)=Pr(X =1)=0.5, where X =0 denotes male and X =1 denotes female. The ascertainment-bias-corrected likelihood of the entire data of I mutually independent families is given by the product of (14) I Pr(H|G ,X,θ,ξ,A)∝(cid:89)Pr(Hi|Gi,obs,Xi,θ,ξi), (17) obs Pr(A =1|θ,ξ ) i=1 i i where H=(H ,··· ,H ), G =(G ,··· ,G ) and A=(A ,··· ,A ). i I obs 1,obs I,obs 1 I 3 Prior and Posterior Sampling We use an independent normal prior for β , i.e., β ∼ N(0,σ2I), where 0 and I denote a zero vector k k and an identity matrix, respectively, and we set a large value of σ for vague priors. For the nonnegative parameter γ ,m = 1,··· ,M,k = 1,··· ,K for the baseline hazard, we use the noninformative flat m,k prior. We assign ν , k =1,··· ,K, the independent vague gamma prior Gamma(0.01,0.01). See Section k 5.6 for the results of the sensitivity analysis of γ and ν . For the choice of M, a large value provides m,k k moreflexibilitytomodeltheshapeofthebaselinehazard,butatthecostofincreasingthecomputational 9 burden. Gelfand and Mallick (1995) suggest that a small value of M works well for most applications. We set M =5 in the analysis. Let Pr(θ) and Pr(ν) denote the prior distributions of θ and ν, respectively. The joint posterior distribution of ν, ξ and θ is given by Pr(θ,ξ,ν|H,G ,X,A)∝Pr(H|G ,X,θ,ξ,A)·Pr(θ)·Pr(ξ|ν)·Pr(ν). obs obs We employ the random walk Metropolis-Hastings algorithm within Gibbs sampler to sample the posterior distribution. We generate 100,000 posterior samples in total and take every fifth sample for thinning after discarding the first 10,000 samples for burn-in. We implement the Markov chain Monte Carlo (MCMC)algorithmin R,whichtakes aboutthree secondsper singleMCMC iteration. We observe that the physical computing time is approximately linear, corresponding to the number of families, I, regardless of the family size n . i 4 Simulation Weconductasimulationstudytoevaluatetheperformanceoftheproposedmethod. Supposethatthere aretwocompetingcancers,indicatedbyD =1and2,respectively. Wesimulate200familytrios(mother, father,andoneoffspring)thatarecollectedthroughoffsprings(probands)withthesecondtypeofcancers (i.e., D =2), as follows: 1. Wesimulateoffspring’sdatabyfirstsimulatingoffspring’sgenotypeG∼Bernoulli(0.0001). Given G, we then simulate his/her true time to cancer, T ,k = 1,2, from the following cancer-specific k hazard model: λ (t|G)=λ (t)exp(β G), k =1,2, (18) k k k with β = 4,β = 10 and λ (t) = 0.1,λ (t) = 0.0005. We choose these simulation parameters 1 2 1 2 such that the second type of cancer (i.e., D =2) is rare with the prevalence of about 0.0003, while that of the first type of cancer is about 0.05. To mimic the ascertainment process of the LFS data, only offsprings with D = 2 are selected and included in the sample as probands. We repeat the above procedure until 200 probands are collected. We simulate random censoring time C from Exponential(2). 2. Given probands’ data, we generate genotypes of their parents as follows. If a proband is a non- carrier(G=0), his/herparentsaresettobenon-carriers; otherwise, werandomlyselectoneofthe parents as a carrier and set the another as a non-carrier. Given the genotypes, the time to cancer 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.