ebook img

Lung cancer rate predictions using generalized additive models PDF

14 Pages·2005·0.53 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Lung cancer rate predictions using generalized additive models

Biostatistics(2005),6,4, pp.576–589 doi:10.1093/biostatistics/kxi028 AdvanceAccesspublicationonApril28,2005 Lung cancer rate predictions using generalized additive models ∗ MARKS.CLEMENTS NationalCentreforEpidemiologyandPopulationHealth, TheAustralianNationalUniversity,Canberra, ACT0200,Australia [email protected] BRUCEK.ARMSTRONG SchoolofPublicHealth,TheUniversityofSydney,Sydney,Australia SURESHH.MOOLGAVKAR FredHutchinsonCancerResearchCenter,Seattle,WA,USA SUMMARY Predictionsoflungcancerincidenceandmortalityarenecessaryforplanningpublichealthprogramsand clinical services. It is proposed that generalized additive models (GAMs) are practical for cancer rate prediction. Smooth equivalents for classical age-period, age-cohort, and age-period-cohort models are available using one-dimensional smoothing splines. We also propose using two-dimensional smoothing splines for age and period. Variance estimation can be based on the bootstrap. To assess predictive per- formance, we compared the models with a Bayesian age-period-cohort model. Model comparison used cross-validation and measures of predictive performance for recent predictions. The models were ap- pliedtodatafromtheWorldHealthOrganizationMortalityDatabaseforfemalesinfivecountries.Model choice between the age-period-cohort models and the two-dimensional models was equivocal with re- spect to cross-validation, while the two-dimensional GAMs had very good predictive performance. The Bayesianmodelperformedpoorlyduetoimprecisepredictionsandtheassumptionoflinearityoutsideof observeddata.Insummary,thetwo-dimensionalGAMperformedwell.TheGAMsmaketheimportant predictionthatfemalelungcancerratesinthesecountrieswillbestableorbegintodeclineinthefuture. Keywords:Age-period-cohortmodel;Bayesian;Lungcancer;Trends. 1. INTRODUCTION Predictionsoffuturecancerincidenceandmortalityrates,definedhereasrateprojections,arenecessary forplanningpublichealthprogramsandclinicalservices,includingclinicaltraining(Hakulinen,1996). Asamotivatingexample,lungcanceristhesecondmostcommonsiteforcancerincidenceandthemost commonsiteforcancerdeathsintheUnitedStates(Riesetal.,2003). ∗ Towhomcorrespondenceshouldbeaddressed. (cid:2)c TheAuthor2005.PublishedbyOxfordUniversityPress.Allrightsreserved.Forpermissions,pleasee-mail:[email protected]. Ratepredictionsusinggeneralizedadditivemodels 577 Onechallengeforthepredictionoflungcancerratesisthecomplexandrapidpatternofchangeover the past century. For the United States, lung cancer incidence rates for males reached a peak during the mid 1980s, while rates for females may have reached a peak by the late 1990s (Weir et al., 2003). The ratesdisplayastrongassociationwithageandtheagepatternhaschangedbetweencohortsandbetween periods.Suchratepatternscanlargelybeexplainedbymarkedchangesinsmokingbehavior(Burnsetal., 1997). Importantly, lung cancer rates are associated with cumulative smoking exposure, so that future lungcancerratesarepredictableandareexpectedtofollowasmoothcurvilinearpattern.However,there is a paucity of historical smoking data, so that lung cancer predictions are commonly based solely on mortalityorincidencerates. Simple rate models, such as the age-drift (Clayton and Schifflers, 1987a) and linear power models (Dybaetal.,1997;Mølleretal.,2003)provideusefulmechanismsforrateprojectionsoflinearorlog- lineartrends,particularlyforrarecancers.Modelingissuesarisewhendataexhibitcurvilineartrendsor age-period-cohortinteractions.Formodelingcurvilineartrends,alternativeapproachesincludetheuseof polynomialterms,whichmaysufferfromrapidchangesattheends,orregressionsplines,whichrequire the specification of knots (Heuer, 1997). For age-period-cohort interactions, classical age-period-cohort modelsarewellsuitedtoinvestigatingthedescriptiveepidemiologyforadisease(Holford,1983;Clayton andSchifflers,1987b).However,theyarelesssuitableforcancerprojections,requiringspecificationofthe parametricformsforprojectedperiodandcohorteffectsoutsideofobserveddata(Osmond,1985;Møller et al., 2003). Additionally, classical models often use data in 5-year periods and 5-year age groups to approximatethefollowingcohortsthroughtime,henceaggregatingacrossdatathataretypicallyavailable bysingle-yearperiodsand5-yearagegroups. Smoothchangesinratessuggesttheuseofacontinuoustimeratemodelingapproachusingmodern regressionmethods(Keiding,1990).Suchmodelscanbeinterpretedgeometricallyasfittingasmoothsur- facethroughratesovertheLexisdiagram,whereprojectionsareestimatedbyextendingtheratesurface out in time. Models to implement this approach include kernel smoothing (Keiding, 1990), local likeli- hoodregression(Loader,1999),andgeneralizedadditivemodels(GAMs)(HastieandTibshirani,1990; Wood,2003). GAMshavebeenusedtodescribetrendsinage-standardizedcancerrates(Boyleetal.,2003)andfor back-calculationofHIVincidenceusingtwo-dimensionalsmoothers(MarschnerandBosch,1998).These modelshavereceivedlittleattentionformodelingage-period-cohortcancerdata,althoughtheapplication has been suggested by several authors (Bashir and Este`ve, 2001; Berzuini and Clayton, 1994; Heuer, 1997).Thelimitedattentionmayrelatetotwotechnicalhurdles.First,themodelsrequirethespecifica- tionofsmoothingparameters,wherepreviousimplementationshaveeitherrequiredtheusertoinvestigate manuallythepossiblevaluesusingcross-validation(HastieandTibshirani,1990)orhavebeencomputa- tionallyexpensive(Gu,2002).Second,ananalysisofage-period-cohortdatatypicallyrequiresvariance estimatesforweightedsumsofcorrelatedestimates,suchasage-standardizedratesandtotalpopulation counts(HakulinenandDyba,1994),demandingthatvarianceestimationuseeitherlinearapproximations suchasthedeltamethodorresamplingmethodssuchasthebootstrap. Fortunately,efficientmethodsforautomaticsmoothingparameterestimationforthin-platesmoothing splineshaverecentlybeendevelopedandimplemented(Wood,2000,2003).Moreover,intensivecalcu- lationssuchasbootstrappingarenowfeasible.Giventheserecentdevelopments,weproposethatGAMs provideausefulmechanismforestimatingrateprojections. In the remainder of the paper, data sources are first described, then methods are introduced for using GAMs for rate projections, including variance estimation using the bootstrap. The models will be compared with a Bayesian age-period-cohort model, with model comparisons using both cross- validation and measures of predictive performance. Following an application of the methods to lung cancer mortality data from the World Health Organization (WHO) Mortality Database, there is a brief discussion. 578 M.S.CLEMENTSETAL. 2. DATASOURCES TheprimarydatasourcewastheWHOMortalityDatabasewithURLhttp://www3.who.int/whosis/menu. cfm?path=mort.Datawereextractedforpopulationestimatesandforcancerdeathsforthelung,bronchus, and trachea. Data were available by country, by single calendar year, and by 5-year age groups for females aged 25–84 years. The countries included in the analysis were Australia (1952–2001), New Zealand (1951–2000), Sweden (1952–2001), United Kingdom (1950–1999), and the United States (1951–2000). Ageandperiodwererepresentedbythemidpointofanagegroupandaperiodinterval,respectively. Birth cohorts were calculated by the difference between period and age. For age standardization, the revisedWHOworldpopulationwasusedasthestandardpopulation(Ahmadetal.,2002). 3. GENERALIZEDADDITIVEMODELS Let the observed number of event counts be represented by y, with age represented by a, period by p, and cohort by c (= p − a). Following Brillinger (1986), we assumed that the observed event counts followedaPoissondistribution.Lettheperson-yearsofexposureestimatedbytheannualpopulationbe representedbyn.Moreover,letthepredictedmeanforthenumberofeven(cid:1)tcountsbeµ.Let{wa}beaset ofpositive-valuedstandardweightsindexedoveragegroupsa,suchthat w =1.Foragivenperiod a a p,thepredictedage-standardizedratewillbe (cid:2) µ ASR = w ap. p a n a ap Inordertomodelfortherate,thelogsoftheperson-yearsofexposurewereincludedasoffsetsinthe regressionmodel.Inthefollowing,letg (x)andg (x,y)representone-dimensionalandtwo-dimensional j j smoothingfunctions,respectively. Theproposedapproachistouseaclassofage-period-cohortmodelsusingGAMs.LetModelA+P beaGAMwithsmoothedageandperiod,withtheexpectedvaluefollowingtherelationshiplog(µ )= ap log(n )+g (a)+g (p).Similarly,letModelA+CbeaGAMsmoothingforageandcohort,defined ap 1 2 bylog(µ )=log(n )+g (a)+g (c),andletModelA+P+CbeaGAMsmoothingforage,period, ac ac 1 2 andcohort,definedbylog(µ )=log(n )+g (a)+g (p)+g (c). apc apc 1 2 3 Thedefinitionforcohortisbasedonanageintervalof5yearsandaperiodintervalof1year.Holford (1983)describeda‘saw-tooth’orcyclicpatternthatcanbegeneratedfromage-period-cohortmodeling ofdatabasedonunequalintervals.Toresolvethisproblem,wecanassumesmoothnesswithinthe5-year ageintervalandfitsplinesoranothersmootherforcohortandperiodeffects(Heuer,1997). ModelsA+PandA+Caresmoothedequivalentsoftheage-periodandage-cohortmodelsdescribed by Clayton and Schifflers (1987a), which assume proportionality of the age-specific rates across peri- ods and cohorts, respectively. Model A + P + C is a smooth equivalent of the age-period-cohort model described by Clayton and Schifflers (1987b). Finally, let Model A * P be a GAM defined using a two- dimensionalsmoothingfunction,suchthatlog(µ )=log(n )+g (a,p).Forflexibledeterminationof ap ap 1 thesmoothing,themodelswillbeestimatedusingpenalizedlikelihood.Fittingofthemodelsinvolvesde- terminationofoneortwosmoothingparameterstogetherwiththesplineparameters.Thin-platesmooth- ing splines were used for the functions g (Green and Silverman, 1994; Wood, 2003). We want to find i the function g, dependent upon the unpenalized likelihood l(·), unknown parameters θ, and smoothing parametersλ ,thatmaximizesthepenalizedlog-likelihood j (cid:2) 1 l(θ)− λ J(g ), j j 2 j Ratepredictionsusinggeneralizedadditivemodels 579 (cid:3)with the summation indexed by j over the smoothers. For univariate thin-plate splines, J(g) = {g(cid:4)(cid:4)(t)}2dt.Theresultingthin-platesplinetakestheform (cid:3) (cid:2) g(t)= δ |t −t |3+b +b t, i i 1 2 i (cid:1) (cid:1) whereδ andb areconstants,subjecttotheidentifiabilityconstraintsthat δ = δ t =0. i k i i i i i Note that these splines are equivalent to natural cubic smoothing splines. Outside of observed data, g(cid:4)(cid:4)(t) = 0,hence, g(t)andlog[µ(t)]willbelinear.Asaconsequence,predictionsforagivenagefrom ModelsA+P,A+C,andA+P+Cwillbelinearonalogscaleforlaterperiodsandcohorts. Forthetwo-dimensionalthin-platesmoothingspline, ⎡(cid:7) (cid:8) (cid:7) (cid:8) (cid:7) (cid:8) ⎤ (cid:4)(cid:4) 2 2 2 J(g)= ⎣ ∂2g +2 ∂2g + ∂2g ⎦ dxdy, (cid:3)2 ∂x2 ∂x∂y ∂y2 sothat,writingt=[x,y]T,thethin-platesplinethentakestheform (cid:2) g(t)= δ η((cid:5)t−t (cid:5))+(b +b x +b y), i i 1 2 3 i whereδ andb areconstants,(cid:5)·(cid:5)denotestheEuclideannorm, i k (cid:11) 1 r2log(r2), forr >0, η(r)= 16π 0, forr =0, (cid:1) (cid:1) (cid:1) andweimposetheidentifiabilityconstraintsthat δ = δ x = δ y =0. i i i i i i i i For fixed x, such as for a given age, we note that g(t) as a function of y will have curvature that is between that for y and y2log(y). Moreover, g will tend to be linear in y when y is large because limy→∞∂2g/∂y2 = 0. Predictions from these models are globally based, in the sense that all of the observeddataareusedinestimatingthepredictions. The models were implemented using the mgcv package (Wood, 2000, 2003) in R (Ihaka and Gentleman, 1996). The smoothing parameters λ were automatically estimated using unbiased risk es- j timation,whichisequivalenttominimizingAkaike’sinformationcriterion.Similarresultswereobtained whenthesmoothingparameterswereestimatedusinggeneralizedcross-validation.Givenpotentialcon- cernsaboutnumericalstability,werescaledage,period,andcohortbyafactorof0.001andfoundconsis- tent parameter estimates. Attention to the default convergence criterion may be required when applying thegampackagebyTrevorHastie. 3.1 Bootstrapestimation Asestimatedtotalcountsandage-standardizedratesareaweightedsumofcorrelatedage-specificrates, interval estimation should take the correlation structure into account (Hakulinen and Dyba, 1994). The bootstrap provides a flexible method for estimation of confidence intervals for the mean and prediction intervalsfortheGAMs(DavisonandHinkley,1997).Theparametricbootstraphasbeenusedpreviously forestimatingconfidenceintervalsforage-period-cohortmodels(RobertsonandBoyle,1998). In outline, a nonparametric bootstrap was implemented using resampling of model-based residuals withanadjustmentforbiasduetosmoothing.Thesmoothingparameterswerereestimatedateachboot- strap iteration, taking some account of uncertainty in the parameters (Hastie et al., 2001, p. 231). The datawereinitiallyfittedwithsmoothingparametersestimatedbyminimizingtheunbiasedriskestimator. 580 M.S.CLEMENTSETAL. The simulated outcomes were calculated by resampling residuals from an undersmoothed fit added to oversmoothed means. The under- and oversmoothing take some account of the bias-precision trade-off inherentinnonparametricregression. Letthepredictedmeansfromtheinitialfitbeµˆ withsmoothingparametersλˆ .Letµ˜ bethemean i j i fromanoversmoothedfitusingsmoothingparameters2λˆ andletµˇ bethemeanfromanundersmoothed j i fitusingsmoothingparametersofλˆ /2.Thefactorof2intheover-andundersmoothingisassuggested j byDavisonandHinkley(1997,p.365). For each simulation, the Pearson standardized resi(cid:12)duals ri can be defined using the undersmoothed means and hat values h , such that r = (y − µˇ )/ µˇ (1−h ) are scaled to have mean zero. The i i i i i i simulatedrespo(cid:12)nses yi∗ combinetheoversmoothedmeanandtheresampledresidualsri∗ suchthat yi∗ = max[0,µ˜ +r∗ µ˜ ]. i i i Themodelwasfittedtothesimulatedresponsesusinganunbiasedriskestimation,includingestima- tionofthesmoothingparameterateachsimulation.Meanpredictionsforcovariatesx wereobtainedfrom exp[logµˆ(x)+logµ˜∗(x)−logµ˜ (x)]. i i Heteroscedasticityofresidualsrequiredstratifiedresamplingofresidualsforexpectedcountslessthan 10(DavisonandHinkley,1997).FortheA+PandA+Cmodels,whichoftenhadpoormodelfitsand largeestimatedresiduals,thebootstrapsimulationswerebasedontheA*Pmodel.Similarresultswere obtainedwhenweusedstandardizeddevianceresiduals.ThebootstrapwasimplementedinRusingthe boot package written by Angelo Canty and described by Davison and Hinkley (1997). Three hundred bootstrapiterationswereused. 4. ABAYESIANAGE-PERIOD-COHORTMODEL We would like to compare the predictive performance of GAMs with the performance of a model stan- dard. Although Møller et al. (2003) provide a thorough review of model classes for rate prediction, we wouldexpectthatmostoftheadditivemodelspresentedwouldperformpoorlywithlungcancerratedata because they take no account of age-period-cohort interactions. Alternatively, the Bayesian age-period- cohortratemodelproposedbyBerzuiniandClayton(1994)isofinterestforthreereasons.First,themodel providesanalternativesmoothanalogtotheclassicalage-period-cohortmodels,takingaccountofpoten- tialinteractionsbetweenthethreetimeparameters.Second,Bray(2002)foundthatthemodelperformed well compared with additive models for a range of cancers, based on plug-in estimates of deviance for validationdata.Third,theBayesianmodelformulationsuggestsapredictiveapproachtomodelselection, whereparameteruncertaintycanbeincludedinmodelcomparisons(Gelmanetal.,1995).Inpractice,this involvesaveragingamodelfitparameterovertheposteriordistribution.Suchanapproachwouldbeuseful inalikelihood-basedsetting,possiblythroughtheuseofthebootstrap(seeHastieetal.,2001,p.235) TheBayesianage-period-cohortmodelisageneralizationofparametricage-period-cohortmodeling, using an autoregressive smoothing of the separate effects of age, period, and cohort. The formulation proposed by Berzuini and Clayton (1994) used second-order autoregressive smoothing, with effects for cohortandperiodbeinglinearoutsideofobserveddata.Foradiscussionoftheautoregressivepriors,see BashirandEste`ve(2001)andBray(2002). AnadjustmentofthemodelproposedbyBerzuiniandClayton(1994)wasrequiredforimplementing the cross-validation. For each iteration of the cross-validation, there may be no data within a stratum for an age, a period, or a cohort, suggesting the use of an undirected prior for early effects. If we let {β ,b=1,...,B}representtheparametersforage,period,orcohort,thenweassumedthat b (cid:13) (cid:14) β +β β1|β2,β3 ∼ N(2β2−β3,σβ2), β2|β1,β3 ∼ N 1 3,σβ2 , 2 βb|β1,...,βb−1 ∼ N(2βb−1−βb−2,σβ2), b>2. Ratepredictionsusinggeneralizedadditivemodels 581 Predictions were calculated by pushing out the period and cohort terms using the same priors. Gamma- distributedpriorswereassignedtotheprecisionparameters,withshapeandscaleparametersof0.001.To assessthechoiceofthesepriors,wealsousedgamma-distributedpriorswithshapeandscaleparameters of0.5and0.0005,respectively,consistentwithalargemeanandvariance,andfoundverysimilarvalues forthemeansandfitstatistics.ThemodelwasimplementedusingWinBUGSinterfacedwithR.Forthe mainmodelfit,weusedaMonteCarloMarkovchain(MCMC)lengthof10000.Theinterfaceallowed the use of existing functions for cross-validation, averaging two sets of 10-fold validation with a chain lengthforeachMCMCfitof2000(seenextsection). 5. MODELCOMPARISONSANDPREDICTIVEPERFORMANCE OurinterestisincomparingGAMsfittedusingmaximumlikelihoodwithaBayesianage-period-cohort modelwheretheposteriordistributionisevaluatedusingMCMCmethods.Predictiveperformancebased on out-of-data model validation for cancer predictions was discussed recently by Møller et al. (2003). Theauthorsdefinedrecentshort-termpredictionsasfittingmodelstoobserveddataexcludingthelast10 yearsofobservation,withthepredictionsbeingforthelast5yearsofobservation. Wedefinethedevianceas D(y,µ)=−2log{p(y|θ)}+2log{p(y|µ= y)}, where p(·) denotes probability, θ is the set of model parameters, and µ = E(y), hence µ = µ(θ). For independentPoissonobservationsindexedbyi,thedevianceis (cid:2) D(y,µ)=−2 [(y −µ )−y log(y /µ )], i i i i i i withtheconventionthat0log0 = 0.Ifallofthedataareusedforprediction,withtheestimatordenoted µˆ = µ(E(θ|y)) = µ(θ¯), then the plug-in deviance for fitted data can be defined as D(y,µˆ). This definition for deviance is similar to that used for classical GAMs. If the data are divided into a training set, denoted by subscript t, and an assessment (or test) set, denoted by subscript a, then the plug-in devianceforassessmentdatacanbedefinedas D(y ,µˆ ),whereµˆ =µ(E(θ|y )).Notethattheplug-in a t t t approachesmakenoadjustmentforparameteruncertainty. AsisnaturalinaBayesiansetting,thepredictiveapproachtakesexpectationsovertheposteriordistri- bution,suchthattheexpecteddevianceforfitteddataisdefinedasEθ|yD(y,µ(θ)).Similarly,theexpected devianceforassessmentdata,definedas Eθ|ytD(ya,µ(θ)),providesanout-of-datameasureofpredictive performance,takingaccountofparameteruncertainty. InformationtheoreticcriteriarelatedtoAkaike’sinformationcriteriontaketheform D(y,µˆ)+2p∗, ∗ where p isthenumberofmodelparameters.Suchcriteriaareasymptoticallyequivalenttocross-validation andhistoricallyhadnotbeenusedforBayesianmodels.Spiegelhalteretal.(2002)recentlydescribedsuch anapproachforBayesianmodels,definingameasure p thatprovidesanestimateofthenumberofpa- D rametersbeingthedifferencebetweentheexpectedandplug-indevianceforfitteddata,suchthat pD = Eθ|y[D(y,µ(θ))]−D(y,µ(E(θ|y))) = D(y,µ)−D(y,µ(θ¯)). TheauthorsalsointroducedtheDevianceInformationCriterion(DIC),definedas DIC= D(y,µˆ)+2p . D K-foldcross-validationisarelatedapproach,wherethedataarerandomlypartitionedinto K setsof similarsize,indexedbyk,withmodelfittingonallofthedataoutsideofsetktogetherwithanassessment 582 M.S.CLEMENTSETAL. onsetk.Typically, K takesvaluessuchas5or10.Ifµˆ−k isdefinedastheestimatorbasedonfittingthe modelondataoutsideofsetk,µˆ−k,k isµˆ−k restrictedtosetk, yk aretheoutcomesinsetk,and pk isthe proportionofthedatainsetk,thentheK-foldcross-validationdeviancewithbiascorrectionisdefinedas (cid:2)K (cid:2)K CVK = D(yk,µˆ−k,k)+D(y,µˆ)− pkD(y,µˆ−k) k=1 k=1 (DavisonandHinkley,1997,p.295).NotethatK-foldcross-validationcanbecomputationallyexpensive, requiring that the model be refitted K times. The theoretical properties for cross-validation are better understood than the DIC, so that cross-validation was considered the primary measure of fit for within- datapredictions. Inthefollowing,themodelswillbeassessedusingcross-validation,DICandexpecteddeviancefor assessment data. The intention is to compare the use of cross-validation with the DIC, and to compare cross-validationwiththeexpecteddevianceforassessmentdata. 5.1 Bootstrapestimation Thebootstrapcanbeusedtoapplyapredictiveapproachtofrequentistmodels.Abootstrapestimatefor p canbecalculatedusing D p = E∗[D(y∗,µ(E∗(θ|y∗)))−D(y∗,µ(θ|y∗))], D whereE∗(·)denotesexpectationunderthebootstrap.Thisapproachismotivatedbythebootstrapestima- tionoftheextendedinformationcriterion(EIC)describedinKonishiandKitagawa(1996).Similarly,for assessmentdata,weareinterestedinabootstrapestimatorwithlowervariancethanthena¨ıvebootstrap estimator E∗[D(y ,µ(θ|y∗))].Bynotingthat a t E∗[D(y ,µ(θ|y∗))]−D(y ,µˆ )≈ E∗[D(y∗,µ(E∗(θ|y∗)))−D(y∗,µ(θ|y∗))], a t a t a t a t thenabootstrapestimatefortheexpecteddevianceforassessmentdatacanbecalculatedusing D(y ,µˆ )+E∗[D(y∗,µ(E∗(θ|y∗)))−D(y∗,µ(θ|y∗))]. a t a t a t ∗ ∗ Thebootstrapsimulationsfory andy werebasedonfittingamodeltoalloftheobserveddata.Asan t a empiricalvalidation,wefoundthatestimatesof p weresimilartotheestimateddegreesoffreedomfrom D thetraceoftheestimatedsmoothermatrix,andthatthemeansfortheexpecteddevianceforassessment dataweresimilartothatforthena¨ıvebootstrapestimator. 5.2 Modelassessmentusingdatawithdifferentaggregations Cancer mortality data are commonly reported by single-year calendar periods and 5-year age groups, however, the Bayesian models require age and period to be available by similar lengths of time, such as5-yeargroups,inorderfordiscretechangesincohorttobesensiblydefined.Forassessmentbetween models, models were fitted to the data using model-specific data aggregation, and then predictions and deviancewereestimatedfordataby5-yearperiodsand5-yearagegroups. FortheGAMs,predictionswerebasedonbootstrappingandmodelestimationforsingle-yearcalendar periods,withdevianceestimatesbasedonresultsaggregatedupto5-yearcalendarperiods.Inparticular, theestimateddegreesoffreedom p wascalculatedfromtheaggregatedresults.Forcomparisonofcross- D validation deviance between models, the GAMs were sampled using blocks of 5-year calendar periods Ratepredictionsusinggeneralizedadditivemodels 583 by 5-year age groups, with the models fitted for single-year calendar periods, and estimates for model assessmentusedaggregatedresults. 6. RESULTS 6.1 Descriptiveepidemiology LungcancermortalityratesforU.S.femalesacrosstheLexisdiagramarepresentedinFigure1.Thepeak ratesatgivenages,representedbyverticallines,variedbyperiodandbycohort.Forfuturetrends,most of the younger age groups are expected to exhibit declining rates, while the rates at oldest ages could conceivablyrisefurtherbeforereachingtheirultimatepeak. 6.2 Modelfits ThefitsforthedifferentmodelsaresummarizedinTable1.Asexpectedfromclassicalanalysesoflung cancerratesusingage-period-cohortmodels,theage-cohortmodelsperformedbetterthantheage-period models(ClaytonandSchifflers,1987a).Theage-periodandage-cohortmodelsgenerallyhadworsemodel fitsthantheotherthreemodels,withSwedenasanotableexception. Model selection based on cross-validation and the DIC did not strongly support either of the age- period-cohort models or the A * P model. For recent short-term projections, the plug-in deviances for GAMA+P+CandA*PwerelessthanthosefortheBayesA+P+Cformostcomparisons,except for Sweden. This indicates that the mean predictions were closer to the observed data for the GAMs. Takingaccountofparameteruncertaintyusingthepredictivedevianceforrecentshort-termprojections, the GAMs performed considerably better for all comparisons, including models for Sweden. For the GAMs,thepredictivedeviancewaslowerfortheA*Pmodelforfourofthefivecomparisons,wherethe GAMA+P+CmodelwasbetterforSweden. Formales(resultsnotshown),itwasfoundthattheA*Pmodelhadthelowestpredictivedeviance forrecentshort-termpredictionsacrossallfivecountries. 6.3 Rateprojections Foragraphicalassessmentofthemodels,age-standardizedratesbasedonmodelsfittedtodataexclud- ingthelast10yearsofdatawerecomparedwithobserveddata(seeFigure2).Modelpredictionsand95% Fig.1.Age-andperiod-specificlungcancermortalityrates,U.S.A.,femalesaged25–84years,1951–2000. 584 M.S.CLEMENTSETAL. d e g a ales nce p,D country,forfem projections cePreddevia 60769170841033375517010 883611737945941768 100446413456974 103449163245 2671588344226 xityestimatedusing ancermortality,by Recent Plug-indevian 6080917175105428227207 89201194871607957 102147316078149 118563117361 2751661015594 fitteddata,modelcompleforrecentprojections. gc fornce parisonsbetweenGAMsandBayesianage-period-cohortmodelsforlun,†‡years25–84 ModelWithindata pPlug-indevianceDICCVD10GAMA+P46686264673771614GAMA+C585734592513117GAMA+P+C43342518847GAMA*P11488289597BayesA+P+C40438480929 mGAMA+P13063201310318869GAMA+C632276861180GAMA+P+C25235322471GAMA*P14669284626BayesA+P+C24236314539 GAMA+P965179991350GAMA+C24521287403GAMA+P+C9826150153GAMA*P9247186204BayesA+P+C9230152165 GAMA+P35714385482GAMA+C15218188219GAMA+P+C10922154160GAMA*P8041161149BayesA+P+C10228158158 GAMA+P25713284335GAMA+C17316205232GAMA+P+C10721148147GAMA*P8932152155BayesA+P+C10323149149 deredbydecreasingpopulationsize.fitarefordataaggregatedto5-yearperiods.Measuresinclude,inorder,theplug-indeviance()cross-validationCV,theplug-indevianceforrecentprojections,andthepredictivedevia10 Modelcom1. Country U.S.A. UnitedKingdo Australia Sweden NewZealand †Countriesareor‡AllmeasuresoftheDIC,10-fold e bl a T Ratepredictionsusinggeneralizedadditivemodels 585 Fig. 2. Assessment of recent short-term predictions for lung cancer mortality rates, by country, for females aged 25–84years. confidence intervals (or credible bounds) for Bayesian age-period-cohort models and GAM A * P are shown by 5-year and single-year periods, respectively. The confidence intervals for the means are tight for both models within fitted data. However, the confidence intervals for the Bayesian model are very much wider outside of observed data, becoming moderately uninformative for 5–10 years outside of fitteddata. The GAM A * P estimates were in better agreement with the validation data than the Bayesian es- timates for all five countries, except Sweden. The Bayesian age-period-cohort predictions were above observedrates.TheGAMestimatestendedtobelowerthanestimatesfromtheBayesA+P+Cmodel, showing greater curvature. For Sweden, the rates predicted by the GAM A * P model declined more quicklythantheobservedrates.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.