ebook img

A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure PDF

1.4 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure

A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure PeterSchulam SuchiSaria Dept. ofComputerScience Dept. ofComputerScience JohnsHopkinsUniversity JohnsHopkinsUniversity 6 Baltimore,MD21218 Baltimore,MD21218 1 [email protected] [email protected] 0 2 n a Abstract J 1 For many complex diseases, there is a wide variety of ways in which an indi- 2 vidual can manifest the disease. The challenge of personalized medicine is to developtoolsthatcanaccuratelypredictthetrajectoryofanindividual’sdisease, ] L which can in turn enable clinicians to optimize treatments. We represent an in- M dividual’sdiseasetrajectoryasacontinuous-valuedcontinuous-timefunctionde- scribing the severity of the disease over time. We propose a hierarchical latent t. variablemodelthatindividualizespredictionsofdiseasetrajectories. Thismodel a sharesstatisticalstrengthacrossobservationsatdifferentresolutions–thepopula- t s tion,subpopulationandtheindividuallevel.Wedescribeanalgorithmforlearning [ populationandsubpopulationparametersoffline,andanonlineprocedurefordy- 2 namicallylearningindividual-specificparameters. Finally,wevalidateourmodel v on the task of predicting the course of interstitial lung disease, a leading cause 4 ofdeathamongpatientswiththeautoimmunediseasescleroderma. Wecompare 7 ourapproachagainststate-of-the-artanddemonstratesignificantimprovementsin 6 predictiveaccuracy. 4 0 . 1 Introduction 1 0 Incomplex,chronicdiseasessuchasautism,lupus,andParkinson’s,thewaythediseasemanifests 6 mayvarygreatlyacrossindividuals[1].Forexample,inscleroderma,thediseaseweuseasarunning 1 exampleinthiswork,individualsmaybeaffectedacrosssixorgansystems—thelungs,heart,skin, : gastrointestinaltract,kidneys,andvasculature—tovaryingextents[2]. Foranysingleorgansystem, v someindividualsmayshowrapiddeclinethroughoutthecourseoftheirdisease,whileothersmay i X show early decline but stabilize later on. Often in such diseases, the most effective drugs have r strongside-effects. Withtoolsthatcanaccuratelypredictanindividual’sdiseaseactivitytrajectory, a clinicianscanmoreaggressivelytreatthoseatgreatestriskearly,ratherthanwaitinguntilthedisease progresses to a high level of severity. To monitor the disease, physicians use clinical markers to quantify severity. In scleroderma, for example, PFVC is a clinical marker used to measure lung severity. The task of individualized prediction of disease activity trajectories is that of using an individual’sclinicalhistorytopredictthefuturecourseofaclinicalmarker;inotherwords,thegoal is to predict a function representing a trajectory that is updated dynamically using an individual’s previousmarkersandindividualcharacteristics. Predictingdiseaseactivitytrajectoriespresentsanumberofchallenges. First,therearemultiplela- tentfactorsthatcauseheterogeneityacrossindividuals. Onesuchfactoristheunderlyingbiological mechanism driving the disease. For example, two different genetic mutations may trigger distinct diseasetrajectories(e.g. asinFigures1aand1b). Ifwecoulddivideindividualsintogroupsaccord- ing to their mechanisms—or disease subtypes (see e.g. [3, 4, 5, 6])—it would be straightforward tofitseparatemodelstoeachsubpopulation. Inmostcomplexdiseases,however,themechanisms are poorly understood and clear definitions of subtypes do not exist. If subtype alone determined trajectory,thenwecouldclusterindividuals. However,otherunobservedindividual-specificfactors 1 such as behavior and prior exposures affect health and can cause different trajectories across indi- vidualsofthesamesubtype. Forinstance,achronicsmokerwilltypicallyhaveunhealthylungsand somayhaveatrajectorythatisconsistentlylowerthananon-smoker’s,whichwemustaccountfor usingindividual-specificparameters. Anindividual’strajectorymayalsobeinfluencedbytransient factors—e.g. an infection unrelated to the disease that makes it difficult to breath (similar to the “dips”inFigure1corthethirdrowinFigure1d). Thiscancausemarkervaluestotemporarilydrop, andmaybehardtodistinguishfromdiseaseactivity. Weshowthatthesefactorscanbearrangedin a hierarchy (population, subpopulation, and individual), but that not all levels of the hierarchy are observed. Finally,thefunctionaloutcomeisarichtarget,andthereforemorechallengingtomodel thanscalaroutcomes. Inaddition,themarkerdataisobservedincontinuous-timeandisirregularly sampled,makingcommonlyuseddiscrete-timeapproachestotimeseriesmodeling(orapproaches thatrelyonimputation)notwellsuitedtothisdomain. Relatedwork. Themajorityofpredictivemodelsinmedicineexplainvariabilityinthetargetout- come by conditioning on observed risk factors alone. However, these do not account for latent sources of variability such as those discussed above. Further, these models are typically cross- sectional—theyusefeaturesfromdatameasuredupuntilthecurrenttimetopredictaclinicalmarker oroutcomeatafixedpointinthefuture. Asanexample,considerthemortalitypredictionmodelby Leeetal.[7],wherelogisticregressionisusedtointegratefeaturesintoapredictionabouttheprob- abilityofdeathwithin30daysforagivenpatient. Topredicttheoutcomeatmultipletimepoints, typically separate models are trained. Moreover, these models use data from a fixed-size window, ratherthanagrowinghistory. Researchers in the statistics and machine learning communities have proposed solutions that ad- dressanumberoftheselimitations. MostrelatedtoourworkisthatbyRizopoulos[8],wherethe focus is on making dynamical predictions about a time-to-event outcome (e.g. time until death). Theirmodelupdatespredictionsovertimeusingallpreviouslyobservedvaluesofalongitudinally recorded marker. Besides conditioning on observed factors, they account for latent heterogeneity acrossindividualsbyallowingforindividual-specificadjustmentstothepopulation-levelmodel— e.g. for a longitudinal marker, deviations from the population baseline are modeled using random effectsbysamplingindividual-specificinterceptsfromacommondistribution. Othercloselyrelated work by Proust-Lima et al. [9] tackle a similar problem as Rizopoulos, but address heterogeneity usingamixturemodel. Another common approach to dynamical predictions is to use Markov models such as order-p autoregressive models (AR-p), HMMs, state space models, and dynamic Bayesian networks (seee.g. in[10]). Althoughsuchmodelsnaturallymakedynamicpredictionsusingthefullhistory by forward-filtering, they typically assume discrete, regularly-spaced observation times. Gaussian processes (GPs) are a commonly used alternative for handling continuous-time observations—see Robertsetal. [11]forarecentreviewofGPtimeseriesmodels. SinceGaussianprocessesarenon- parametric generative models of functions, they naturally produce functional predictions dynami- callybyusingtheposteriorpredictiveconditionedontheobserveddata. MixturesofGPshavebeen appliedtomodelheterogeneityinthecovariancestructureacrosstimeseries(e.g. [12]),howeveras notedinRobertsetal.,appropriatemeanfunctionsarecriticalforaccurateforecastsusingGPs. In ourwork,anindividual’strajectoryisexpressedasaGPwithahighlystructuredmeancomprising population, subpopulation and individual-level components where some components are observed andothersrequireinference. Morebroadly,multi-levelmodelshavebeenappliedinmanyfieldstomodelheterogeneouscollec- tionsofunitsthatareorganizedwithinahierarchy[13]. Forexample,inpredictingstudentgrades overtime, individualswithinaschoolmayhaveparameterssampledfromtheschool-levelmodel, andtheschool-levelmodelparametersinturnmaybesampledfromacounty-specificmodel. Inour setting,thehierarchicalstructure—whichindividualsbelongtothesamesubgroup—isnotknowna priori. Similarideasarestudiedinmulti-tasklearning,whererelationshipsbetweendistinctpredic- tiontasksareusedtoencouragesimilarparameters. Thishasbeenappliedtomodelingtrajectories by treating predictions at each time point as a separate task and enforcing similarity between sub- models close in time [14]. This approach is limited, however, in that it models a finite number of times. Others, more recently, have developed models for disease trajectories (see [15, 16] and referenceswithin)butthesefocusonretrospectiveanalysistodiscoverdiseaseetiologyratherthan dynamicalprediction. Schulametal.[16]incorporatedifferencesintrajectoriesduetosubtypesand individual-specificfactors. Webuilduponthisworkhere. Finally,recommendersystemsalsoshare 2 (e) (d) Subtype B-spline coefficients2Rdz w~g �~g G Subtype marginal model coefficients Rqz M 2 z i (a) Subtype indicator 1,...,G 2{ } Smuobdteypl fee amtuarregsinal Rqz ~xiz yij tij 2 N Shytrpuecrt-upraerda mnoeisteer sGP ↵ fi i (b) ~bi ⇢~i ~xip Structured noise function RR 2 Individual model coefficients2Rd` ⌃b ⇤ Individual model covariance matrix2Rd`⇥d` (c) Population model features-to-cPooepffiuclaietniotn m maopd2el Rcodpe⇥ffiqcpients 2Rdp Pfeoaptuurleastio2n mRoqdpel Figure 1: Plots (a-c) show example marker trajectories. Plot (d) shows adjustments to a population and subpopulation fit (row 1). Row 2 makes an individual-specific long-term adjustment. Row 3 makes short- termstructurednoiseadjustments. Plot(e)showstheproposedgraphicalmodel. Levelsinthehierarchyare color-coded.Modelparametersareenclosedindashedcircles.Observedrandomvariablesareshaded. informationacrossindividualswiththeaimoftailoringpredictions(seee.g. [17,18,19]), butthe taskisotherwisedistinctfromours. Contributions. We propose a hierarchical model of disease activity trajectories that directly ad- dressescommon—latentandobserved—sourcesofheterogeneityincomplex,chronicdiseasesus- ingthreelevels:thepopulationlevel,subpopulationlevel,andindividuallevel.Themodeldiscovers thesubpopulationstructureautomatically,andinfersindividual-levelstructureovertimewhenmak- ing predictions. In addition, we include a Gaussian process as a model of structured noise, which is designed to explain away temporary sources of variability that are unrelated to disease activity. Together, thesefourcomponentsallowindividualtrajectoriestobehighlyheterogeneouswhilesi- multaneously sharing statistical strength across observations at different “resolutions” of the data. Whenmakingpredictionsforagivenindividual,weuseBayesianinferencetodynamicallyupdate ourposteriorbeliefoverindividual-specificparametersgiventheclinicalhistoryandusetheposte- riorpredictivetoproduceatrajectoryestimate. Finally,weevaluateourapproachbydevelopinga state-of-the-arttrajectorypredictiontoolforlungdiseaseinscleroderma. Wetrainourmodelusing alarge,nationaldatasetcontainingindividualswithsclerodermatrackedover20yearsandcompare ourpredictionsagainstalternativeapproaches. Wefindthatourapproachyieldssignificantgainsin predictiveaccuracyofdiseaseactivitytrajectories. 2 DiseaseTrajectoryModel We describe a hierarchical model of an individual’s clinical marker values. The graphical model is shown in Figure 1e. For each individual i, we use N to denote the number of observed mark- i ers. We denote each individual observation using y and its measurement time using t where ij ij j ∈ {1,...,Ni}. Weuse(cid:126)yi ∈ RNi and(cid:126)ti ∈ RNi todenoteallofindividuali’smarkervaluesand measurementtimesrespectively. Inthefollowingdiscussion,Φ(t )denotesacolumn-vectorcon- ij tainingabasisexpansionofthetimet andweuseΦ(cid:0)(cid:126)t (cid:1) = [Φ(t ),...,Φ(t )](cid:62) todenotethe ij i i1 iNi matrixcontainingthebasisexpansionofpointsin(cid:126)t ineachofitsrows. Wemodelthejthmarker i valueforindividualiasanormallydistributedrandomvariablewithameanassumedtobethesum offourterms: apopulationcomponent,asubpopulationcomponent,anindividualcomponent,and astructurednoisecomponent:   y ∼N Φ (t )(cid:62)Λ (cid:126)x +Φ (t )(cid:62)β(cid:126) +Φ (t )(cid:62)(cid:126)b + f (t ) ,σ2. (1) ij  p ij ip z ij zi (cid:96) ij i i ij  (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (A)population (B)subpopulation (C)individual (D)structurednoise The four terms in the sum serve two purposes. First, they allow for a number of different sources ofvariationtoinfluencetheobservedmarkervalue,whichallowsforheterogeneitybothacrossand within individuals. Second, they share statistical strength across different subsets of observations. The population component shares strength across all observations. The subpopulation component 3 sharesstrengthacrossobservationsbelongingtosubgroupsofindividuals. Theindividualcompo- nentsharesstrengthacrossallobservationsbelongingtothesameindividual. Finally,thestructured noisesharesinformationacrossobservationsbelongingtothesameindividualthataremeasuredat similartimes. Predictinganindividual’strajectoryinvolvesestimatinghersubtypeandindividual- specificparametersasnewclinicaldatabecomesavailable1. Wedescribeeachofthecomponentsin detailbelow. Populationlevel. Thepopulationmodelpredictsaspectsofanindividual’sdiseaseactivitytrajec- toryusingobservedbaselinecharacteristics(e.g. genderandrace),whicharerepresentedusingthe featurevector(cid:126)x . Thissub-modelisshownwithintheorangeboxinFigure1e. Hereweassume ip thatthiscomponentisalinearmodelwherethecoefficientsareafunctionofthefeatures(cid:126)xip ∈Rqp. Thepredictedvalueofthejthmarkerofindividualimeasuredattimet isshowninEq. 14(A), ij whereΦp(t)∈Rdp isabasisexpansionoftheobservationtimeandΛ∈Rdp×qp isamatrixusedas alinearmapfromanindividual’scovariates(cid:126)xip tocoefficientsρi ∈ Rdp. Atthislevel,individuals withsimilarcovariateswillhavesimilarcoefficients. ThematrixΛislearnedoffline. Subpopulation level. We model an individual’s subtype using a discrete-valued latent variable z ∈{1,...,G}, where G is the number of subtypes. We associate each subtype with a unique i diseaseactivitytrajectoryrepresentedusingB-splines,wherethenumberandlocationoftheknots and the degree of the polynomial pieces are fixed prior to learning. These hyper-parameters de- termine a basis expansion Φz(t) ∈ Rdz mapping a time t to the B-spline basis function values at thattime. Trajectoriesforeachsubtypeareparameterizedbyavectorofcoefficientsβ(cid:126)g ∈ Rdz for g ∈ {1,...,G}, which are learned offline. Under subtype z , the predicted value of marker y i ij measured at time t is shown in Eq. 14(B). This component explains differences such as those ij observedbetweenthetrajectoriesinFigures1aand1b. Inmanycases,featuresatbaselinemaybe predictive of subtype. For example, in scleroderma, the types of antibody an individual produces (i.e. the presence of certain proteins in the blood) are correlated with certain trajectories. We can improvepredictiveperformancebyconditioningonbaselinecovariatestoinferthesubtype. Todo this, we use a multinomial logistic regression to define feature-dependent marginal probabilities: zi | (cid:126)xiz ∼ Mult(π1:G((cid:126)xiz)),whereπg((cid:126)xiz) ∝ ew(cid:126)g(cid:62)(cid:126)xiz. Wedenotetheweightsofthemultinomial regressionusingw(cid:126) ,wheretheweightsofthefirstclassareconstrainedtobe(cid:126)0toensuremodel 1:G identifiability. Theremainingweightsarelearnedoffline. Individual level. This level models deviations from the population and subpopulation models us- ing parameters that are learned dynamically as the individual’s clinical history grows. Here, we parameterizetheindividualcomponentusingalinearmodelwithbasisexpansionΦ(cid:96)(t) ∈ Rd(cid:96) and individual-specificcoefficients(cid:126)bi ∈Rd(cid:96).Anindividual’scoefficientsaremodeledaslatentvariables withmarginaldistribution(cid:126)b ∼ N((cid:126)0,Σ ). Forindividuali,thepredictedvalueofmarkery mea- i b ij suredat time t isshown inEq. 14(C). This componentcanexplain, forexample, differences in ij overallhealthduetoanunobservedcharacteristicsuchaschronicsmoking,whichmaycauseatyp- icallylowerlungfunctionthanwhatispredictedbythepopulationandsubpopulationcomponents. SuchanadjustmentisillustratedacrossthefirstandsecondrowsofFigure1d. Structured noise. Finally, the structured noise component f captures transient trends. For ex- i ample, an infection may cause an individual’s lung function to temporarily appear more restricted than it actually is, which may cause short-term trends like those shown in Figure1c and the third row of Figure1d. We treat f as a function-valued latent variable and model it using a Gaus- i sian process with zero-valued mean function and Ornstein-Uhlenbeck (OU) covariance function: K (t ,t )=a2exp(cid:8)−(cid:96)−1|t −t |(cid:9). The amplitude a controls the magnitude of the structured OU 1 2 1 2 noisethatweexpecttoseeandthelength-scale(cid:96)controlsthelengthoftimeoverwhichweexpect thesetemporarytrendstooccur. TheOUkernelisidealformodelingsuchdeviationsasitisboth mean-revertinganddrawsfromthecorrespondingstochasticprocessareonlyfirst-ordercontinuous, whicheliminateslong-rangedependenciesbetweendeviations[20]. Applicationsinotherdomains mayrequiredifferentkernelstructuresmotivatedbypropertiesofthenoiseinthetrajectories. 1Themodelfocusesonpredictingthelong-termtrajectoryofanindividualwhenleftuntreated. Inmany chronicconditions,asisthecaseforscleroderma,drugsonlyprovideshort-termrelief(accountedforinour modelbytheindividual-specificadjustments). Iftreatmentsthatalterlong-termcourseareavailableandcom- monlyprescribed,thentheseshouldbeincludedwithinthemodelasanadditionalcomponentthatinfluences thetrajectory. 4 2.1 Learning Objective function. To learn the parameters of our model Θ = {Λ,w(cid:126) ,β(cid:126) ,Σ ,a,(cid:96),σ2}, we 1:G 1:G b maximizetheobserved-datalog-likelihood(i.e. theprobabilityofallindividual’smarkervalues(cid:126)y i given measurement times(cid:126)t and features {(cid:126)x ,(cid:126)x }). This requires marginalizing over the latent i ip iz variables{z ,(cid:126)b ,f }foreachindividual. Thisyieldsamixtureofmultivariatenormals: i i i G P ((cid:126)y |X ,Θ)= (cid:88)π ((cid:126)x )N (cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17), (2) i i zi iz i p i ip z i zi i i zi=1 where K(t ,t ) = Φ (t )(cid:62)Σ Φ (t ) + K (t ,t ) + σ2I(t = t ). The observed-data 1 2 (cid:96) 1 b (cid:96) 2 OU 1 2 1 2 log-likelihood for all individuals is therefore: L(Θ)=(cid:80)M logP ((cid:126)y |X ,Θ). A more detailed i=1 i i derivationisprovidedinthesupplement. Optimizing the objective. To maximize the observed-data log-likelihood with respect to Θ, we partitiontheparametersintotwosubsets. Thefirstsubset,Θ ={Σ ,α,(cid:96),σ2},containsvaluesthat 1 b parameterize the covariance function K(t ,t ) above. As is often done when designing the ker- 1 2 nelofaGaussianprocess,weuseacombinationofdomainknowledgetochoosecandidatevalues and model selection using observed-data log-likelihood as a criterion for choosing among candi- dates [20]. The second subset, Θ ={Λ,w(cid:126) ,β(cid:126) }, contains values that parameterize the mean 2 1:G 1:G ofthemultivariatenormaldistributioninEquation16. Welearntheseparametersusingexpectation maximization(EM)tofindalocalmaximumoftheobserved-datalog-likelihood. Expectationstep. Allparametersrelatedto(cid:126)b andf arelimitedtothecovariancekernelandare i i not optimized using EM. We therefore only need to consider the subtype indicators z as unob- i servedintheexpectationstep. Becausez isdiscrete,itsposterioriscomputedbynormalizingthe i jointprobabilityofz and(cid:126)y . Letπ∗ denotetheposteriorprobabilitythatindividualihassubtype i i ig g ∈{1,...,G},thenwehave π∗ ∝π ((cid:126)x )N (cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (3) ig g iz i p i ip z i g i i Maximization step. In the maximization step, we optimize the marginal probability of the soft assignments under the multinomial logistic regression model with respect to w(cid:126) using gradient- 1:G basedmethods. Tooptimizetheexpectedcomplete-datalog-likelihoodwithrespecttoΛandβ(cid:126) , 1:G we note that the mean of the multivariate normal for each individual is a linear function of these parameters. HoldingΛfixed,wecanthereforesolveforβ(cid:126) inclosedformandviceversa. Weuse 1:G ablockcoordinateascentapproach,alternatingbetweensolvingforΛandβ(cid:126) untilconvergence. 1:G Becausetheexpectedcomplete-datalog-likelihoodisconcavewithrespecttoallparametersinΘ , 2 eachmaximizationstepisguaranteedtoconverge. Weprovideadditionaldetailsinthesupplement. 2.2 Prediction Our prediction yˆ(t(cid:48)) for the value of the trajectory at time t(cid:48) is the expectation of the marker y(cid:48) i i i under the posterior predictive conditioned on observed markers (cid:126)y measured at times(cid:126)t thus far. i i Thisrequiresevaluatingthefollowingexpression: yˆ(t(cid:48))= (cid:88)G (cid:90) (cid:90) E(cid:104)y(cid:48) |z ,(cid:126)b ,f ,t(cid:48)(cid:105)P (cid:16)z ,(cid:126)b ,f |(cid:126)y ,X ,Θ(cid:17)df d(cid:126)b (4) i i i i i i i i i i i i i zi=1 Rd(cid:96) RNi (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) predictiongivenlatentvars. posterioroverlatentvars. (cid:104) (cid:105) =E∗ Φ (t(cid:48))(cid:62)Λ(cid:126)x +Φ (t(cid:48))(cid:62)β(cid:126) +Φ (t(cid:48))(cid:62)(cid:126)b +f (t(cid:48)) (5) zi,(cid:126)bi,fi p i ip z i zi (cid:96) i i i i β(cid:126)i∗(Eq.32) (cid:126)b∗i (Eq.36) f∗(t(cid:48))(Eq.43) (cid:122) (cid:125)(cid:124) (cid:123) (cid:122) (cid:125)(cid:124) (cid:123) i i (cid:104) (cid:105) (cid:104) (cid:105) (cid:122) (cid:125)(cid:124) (cid:123) = Φ (t(cid:48))(cid:62)Λ(cid:126)x +Φ (t(cid:48))(cid:62)E∗ β(cid:126) +Φ (t(cid:48))(cid:62)E∗ (cid:126)b + E∗ [f (t(cid:48))] , (6) (cid:124)p i(cid:123)(cid:122) ip(cid:125) (cid:124)z i (cid:123)(cid:122)zi zi(cid:125) (cid:124)(cid:96) i (cid:123)(cid:122)(cid:126)bi i(cid:125) (cid:124)fi (cid:123)(cid:122)i i(cid:125) populationprediction subpopulationprediction individualprediction structurednoiseprediction whereE∗ denotesanexpectationconditionedon(cid:126)y ,X ,Θ. InmovingfromEq. 29to30,wehave i i writtentheintegralasanexpectationandsubstitutedtheinnerexpectationwiththemeanofthenor- maldistributioninEq.14. FromEq. 30to31,weuselinearityofexpectation. Eqs. 32,36,and43 5 1 2 4 1 2 4 (a) Pr()=0.57 Pr()=0.71 Pr()=0.60 (b) 80 Pr()=0.18 Pr()=0.21 Pr()=0.39 80 60 ●●●●●●●●● ●●●●●●●●● ●●●●●●●●● 60 ●●●●●●●●● ●●●●●●●●● ●●●●●●●●● ●● ●● ●● ●● ●● ●● 40 ●●●●● ●●●●● ●●●●● 40 ●●●●● ●●●●● ●●●●● 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 1 2 4 1 2 4 100 (c) 100 (d) Pr()=0.56 Pr()=0.40 80 80 60 ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● 60 ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● 40 40 Pr()=0.53 Pr()=0.54 Pr()=0.99 Pr()=0.54 20 Pr()=0.26 Pr()=0.46 Pr()=0.01 20 Pr()=0.46 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 Years Since First Seen Figure 2: Plots (a) and (c) show dynamic predictions using the proposed model for two individuals. Red markersareunobserved. Blueshowsthetrajectorypredictedusingthemostlikelysubtype,andgreenshows thesecondmostlikely. Plot(b)showsdynamicpredictionsusingtheB-splineGPbaseline. Plot(d)shows predictionsmadeusingtheproposedmodelwithoutindividual-specificadjustments. belowshowhowtheexpectationsinEq.31arecomputed. Anexpandedversionofthesestepsare providedinthesupplement. Computingthepopulationpredictionisstraightforwardasallquantitiesareobserved. Tocompute thesubpopulationprediction,weneedtocomputethemarginalposterioroverz ,whichweusedin i theexpectationstepabove(Eq.19). Theexpectedsubtypecoefficientsaretherefore (cid:16) (cid:17) β(cid:126)∗ (cid:44) (cid:80)G π∗ β(cid:126) . (7) i zi=1 izi zi Tocomputetheindividualprediction,notethatbyconditioningonz ,theintegraloverthelikelihood i withrespecttof andthepriorover(cid:126)b formthelikelihoodandpriorofaBayesianlinearregression. i i LetK =K ((cid:126)t ,(cid:126)t )+σ2I,thentheposteriorover(cid:126)b conditionedonz is: f OU i i i i P (cid:16)(cid:126)b |z ,(cid:126)y ,X ,Θ(cid:17)∝N (cid:16)(cid:126)b |0,Σ (cid:17)N (cid:16)(cid:126)y |Φ Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) +Φ (cid:0)(cid:126)t (cid:1)(cid:126)b , K (cid:17). (8) i i i i i b i p ip z i zi (cid:96) i i f JustasinEq.16, wehaveintegratedoverf movingitseffectfromthemeanofthenormaldistri- i bution to the covariance. Because the prior over(cid:126)b is conjugate to the likelihood on the right side i ofEq.33,theposteriorcanbewritteninclosedformasanormaldistribution(seee.g. [10]). The meanoftheleftsideofEq.33istherefore (cid:104) (cid:105)−1(cid:104) (cid:16) (cid:17)(cid:105) Σ−1+Φ ((cid:126)t )(cid:62)K−1Φ ((cid:126)t ) Φ ((cid:126)t )(cid:62)K−1 (cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126) , (9) b (cid:96) i f (cid:96) i (cid:96) i f i p i ip z i zi Tocomputetheunconditionalposteriormeanof(cid:126)b wetaketheexpectationofEq.9withrespectto i theposterioroverz . Eq.9islinearinβ(cid:126) ,sowecandirectlyreplaceβ(cid:126) withitsmean(Eq.32): i zi zi (cid:104) (cid:105)−1(cid:104) (cid:16) (cid:17)(cid:105) (cid:126)b∗ (cid:44) Σ−1+Φ ((cid:126)t )(cid:62)K−1Φ ((cid:126)t ) Φ ((cid:126)t )(cid:62)K−1 (cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126)∗ . (10) i b (cid:96) i f (cid:96) i (cid:96) i f i p i ip z i i Finally,tocomputethestructurednoiseprediction,notethatconditionedonz and(cid:126)b ,theGPprior i i and marker likelihood (Eq. 14) form a standard GP regression (see e.g. [20]). The conditional posterioroff (t(cid:48))isthereforeaGPwithmean i i K (t(cid:48),(cid:126)t )(cid:2)K ((cid:126)t ,(cid:126)t )+σ2I(cid:3)−1(cid:16)(cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126) −Φ ((cid:126)t )(cid:126)b (cid:17). (11) OU i i OU i i i p i ip z i zi (cid:96) i i Tocomputetheunconditionalposteriorexpectationoff (t(cid:48)), wenotethattheexpressionaboveis i i linearinz and(cid:126)b andsotheirexpectationscanbepluggedintoobtain i i f∗(t(cid:48))(cid:44)K (t(cid:48),(cid:126)t )(cid:2)K ((cid:126)t ,(cid:126)t )+σ2I(cid:3)−1(cid:16)(cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126)∗−Φ ((cid:126)t )(cid:126)b∗(cid:17). (12) i OU i i OU i i i p i ip z i i (cid:96) i i 6 3 Experiments Wedemonstrateourapproachbybuildingatooltopredictthelungdiseasetrajectoriesofindivid- uals with scleroderma. Lung disease is currently the leading cause of death among scleroderma patients,andisnotoriouslydifficulttotreatbecausetherearefewpredictorsofdeclineandthereis tremendousvariabilityacrossindividualtrajectories[21]. Clinicianstracklungseverityusingper- centofpredictedforcedvitalcapacity(PFVC),whichisexpectedtodropasthediseaseprogresses. In addition, demographic variables and molecular test results are often available at baseline to aid prognoses. WetrainandvalidateourmodelusingdatafromtheJohnsHopkinsSclerodermaCenter patientregistry,whichisoneofthelargestintheworld. Toselectindividualsfromtheregistry,we used the following criteria. First, we include individuals who were seen at the clinic within two yearsoftheirearliestscleroderma-relatedsymptom. Second,weexcludeallindividualswithfewer thantwoPFVCmeasurementsaftertheirfirstvisit. Finally,weexcludeindividualswhoreceiveda lungtransplant. Thedatasetcontains672individualsandatotalof4,992PFVCmeasurements. Forthepopulationmodel,weuseconstantfunctions(i.e. observedcovariatesadjustanindividual’s intercept). The population covariates ((cid:126)x ) are gender, African American race, and indicators of ip ACA and Scl-70 antibodies—two proteins believed to be connected to scleroderma-related lung disease. Notethatallfeaturesarebinary. ForthesubpopulationB-splines, wesetboundaryknots at0and25years(themaximumobservationtimeinourdatasetis23years),usetwointeriorknots that divide the time period from 0-25 years into three equally spaced chunks, and use quadratics as the piecewise components. These B-spline hyperparameters (knots and polynomial degree) are also used for all baseline models. We select G = 9 subtypes using BIC. The covariates in the subtypemarginalmodel((cid:126)x )arethesameusedinthepopulationmodel. Fortheindividualmodel, iz weuselinearfunctions. Forthehyper-parametersΘ = {Σ ,α,(cid:96),σ2}wesetΣ tobeadiagonal 1 b b covariance matrix with entries [16,10−2] along the diagonal, which correspond to intercept and slopevariancesrespectively. Finally, wesetα = 6, (cid:96) = 2, andσ2 = 1usingdomainknowledge; weexpecttransientdeviationstolastaround2yearsandtochangePFVCbyaround±6units. Baselines. First,tocompareagainsttypicalapproachesusedinclinicalmedicinethatconditionon baselinecovariatesonly(e.g. [22]),wefitaregressionmodelconditionedonallcovariatesincluded in(cid:126)x above. ThemeanisparameterizedusingB-splinebases(Φ(t))as: iz (cid:16) (cid:17) yˆ|(cid:126)x =Φ(t)(cid:62) β(cid:126) +(cid:80) x β(cid:126) +(cid:80) x x β(cid:126) . (13) iz 0 xiin(cid:126)xiz i i xi,xjinpairsof(cid:126)xiz i j ij The second baseline is similar to [8] and [23] and extends the first baseline by accounting for individual-specific heterogeneity. The model has a mean function identical to the first baseline and individualizes predictions using a GP with the same kernel as in Equation 16 (using hyper- parameters as above). Another natural approach is to explain heterogeneity by using a mixture modelsimilarto[9]. However,amixturemodelcannotadequatelyexplainawayindividual-specific sourcesofvariabilitythatareunrelatedtosubtypeandthereforefailstorecoversubtypesthatcapture canonicaltrajectories(wediscussthisindetailinthesupplementalsection).Therecoveredsubtypes fromthefullmodeldonotsufferfromthisissue. Tomakethecomparisonfairandtounderstandthe extenttowhichtheindividual-specificcomponentcontributestowardspersonalizingpredictions,we createamixturemodel(Proposedw/nopersonalization)wherethesubtypesarefixedtobethesame asthoseinthefullmodelandtheremainingparametersarelearned. Notethatthisversiondoesnot containtheindividual-specificcomponent. Evaluation. Wemakepredictionsafterone, two, andfouryearsoffollow-up. Errorsaresumma- rized within four disjoint time periods: (1,2], (2,4], (4,8], and (8,25] years2. To measure error, we use the absolute difference between the prediction and a smoothed version of the individual’s observedtrajectory. Weestimatemeanabsoluteerror(MAE)using10-foldCVatthelevelofindi- viduals(i.e. allofanindividual’sdataisheld-out),andtestforstatisticallysignificantreductionsin errorusingaone-sided, pairedt-test. Forallmodels, weusetheMAPestimateoftheindividual’s trajectory.Inthemodelsthatincludesubtypes,thismeansthatwechoosethetrajectorypredictedby themostlikelysubtypeundertheposterior. Althoughthisdiscardsinformationfromtheposterior, inourexperiencecliniciansfindthischoicetobemoreinterpretable. Qualitativeresults. InFigure2wepresentdynamicallyupdatedpredictionsfortwopatients(one perrow,dynamicupdatesmovelefttoright).Bluelinesindicatethepredictionunderthemostlikely subtype and green lines indicate the prediction under the second most likely. The first individual 2Aftertheeighthyear,databecomestoosparsetofurtherdividethistimespan. 7 Predictionsusing1yearofdata Model (1,2] %Im. (2,4] %Im. (4,8] %Im. (8,25] %Im. B-splinewithBaselineFeats. 12.78 12.73 12.40 12.14 B-spline+GP 5.49 7.70 9.67 10.71 Proposed 5.26 ∗7.04 8.6 10.17 12.12 Proposedw/nopersonalization 6.17 7.12 9.38 12.85 Predictionsusing2yearsofdata B-splinewithBaselineFeats. 12.73 12.40 12.14 B-spline+GP 5.88 8.65 10.02 Proposed ∗5.48 6.8 ∗7.95 8.1 9.53 Proposedw/nopersonalization 6.00 8.12 11.39 Predictionsusing4yearsofdata B-splinewithBaselineFeats. 12.40 12.14 B-spline+GP 6.00 8.88 Proposed ∗5.14 14.3 ∗7.58 14.3 Proposedw/nopersonalization 5.75 9.16 Table1:MAEofPFVCpredictionsforthetwobaselinesandtheproposedmodel.Boldnumbersindicatebest performanceacrossmodels(∗isstat.significant).“%Im.”reportspercentimprovementovernextbest. (Figure2a)isa50-year-old,whitewomanwithScl-70antibodies,whicharethoughttobeassociated withactivelungdisease. Withinthefirstyear,herdiseaseseemsstable,andthemodelpredictsthis course with 57% confidence. After another year of data, the model shifts 21% of its belief to a rapidlydecliningtrajectory;likelyinpartduetothesuddendipinyear2. Wecontrastthiswiththe behavioroftheB-splineGPshowninFigure2b,whichhaslimitedcapacitytoexpressindividualized long-termbehavior.Weseethatthemodeldoesnotadequatelyadjustinlightofthedownwardtrend betweenyearsoneandtwo. Toillustratethevalueofincludingindividual-specificadjustments,we nowturntoFigures2cand2d(whichplotpredictionsmadebytheproposedmodelwithandwithout personalization respectively). This individual is a 60-year-old, white man that is Scl-70 negative, which makes declining lung function less likely. Both models use the same set of subtypes, but whereasthemodelwithoutindividual-specificadjustmentdoesnotconsidertherecoveringsubtype to be likely until after year two, the full model shifts the recovering subtype trajectory downward towardstheman’sinitialPFVCvalueandidentifythecorrecttrajectoryusingasingleyearofdata. Quantitativeresults. Table1reportsMAEforthebaselinesandtheproposedmodel. Wenotethat afterobservingtwoormoreyearsofdata,ourmodel’serrorsaresmallerthanthetwobaselines(and statisticallysignificantlysoinallbutonecomparison). AlthoughtheB-splineGPimprovesoverthe firstbaseline,theseresultssuggestthatbothsubpopulationandindividual-specificcomponentsen- ablemoreaccuratepredictionsofanindividual’sfuturecourseasmoredataareobserved.Moreover, bycomparingtheproposedmodelwithandwithoutpersonalization,weseethatsubtypesaloneare not sufficient and that individual-specific adjustments are critical. These improvements also have clinicalsignificance. Forexample,individualswhodropbymorethan10PFVCarecandidatesfor aggressiveimmunosuppressivetherapy. Outofthe7.5%ofindividualsinourdatawhodeclineby morethan10PFVC,ourmodelpredictssuchadeclineattwicethetrue-positiverateoftheB-spline GP(31%vs. 17%)andwithalowerfalse-positiverate(81%vs. 90%). 4 Conclusion We have described a hierarchical model for making individualized predictions of disease activity trajectories that accounts for both latent and observed sources of heterogeneity. We empirically demonstrated that using all elements of the proposed hierarchy allows our model to dynamically personalize predictions and reduce error as more data about an individual is collected. Although our analysis focused on scleroderma, our approach is more broadly applicable to other complex, heterogeneousdiseases[1]. Examplesofsuchdiseasesincludeasthma[3], autism[4], andCOPD [5]. Thereareseveralpromisingdirectionsforfurtherdevelopingtheideaspresentedhere. First,we observedthatpredictionsarelessaccurateearlyinthediseasecoursewhenlittledataisavailableto learntheindividual-specificadjustments.Toaddressthisshortcoming,itmaybepossibletoleverage time-dependentcovariatesinadditiontothebaselinecovariatesusedhere.Second,thequalityofour predictionsdependsupontheallowedtypesofindividual-specificadjustmentsencodedinthemodel. More sophisticated models of individual variation may further improve performance. Moreover, approaches for automatically learning the class of possible adjustments would make it possible to applyourapproachtonewdiseasesmorequickly. 8 References [1] J.Craig. Complexdiseases:Researchandapplications. NatureEducation,1(1):184,2008. [2] J.Varga, C.P.Denton, andF.M.Wigley. Scleroderma: FromPathogenesistoComprehensiveManage- ment. SpringerScience&BusinessMedia,2012. [3] J.Lo¨tvalletal. Asthmaendotypes:anewapproachtoclassificationofdiseaseentitieswithintheasthma syndrome. JournalofAllergyandClinicalImmunology,127(2):355–360,2011. [4] L.D.Wiggins,D.L.Robins,L.B.Adamson,R.Bakeman,andC.C.Henrich. Supportforadimensional viewofautismspectrumdisordersintoddlers.Journalofautismanddevelopmentaldisorders,42(2):191– 200,2012. [5] P.J. Castaldi et al. Cluster analysis in the copdgene study identifies subtypes of smokers with distinct patternsofairwaydiseaseandemphysema. Thorax,2014. [6] S.SariaandA.Goldenberg. Subtyping: WhatIsItandItsRoleinPrecisionMedicine. IEEEIntelligent Systems,30,2015. [7] D.S.Lee,P.C.Austin,J.L.Rouleau,P.PLiu,D.Naimark,andJ.V.Tu.Predictingmortalityamongpatients hospitalizedforheartfailure: derivationandvalidationofaclinicalmodel. Jama,290(19):2581–2587, 2003. [8] D.Rizopoulos. Dynamicpredictionsandprospectiveaccuracyinjointmodelsforlongitudinalandtime- to-eventdata. Biometrics,67(3):819–829,2011. [9] C.Proust-Limaetal. Jointlatentclassmodelsforlongitudinalandtime-to-eventdata:Areview. Statisti- calMethodsinMedicalResearch,23(1):74–90,2014. [10] K.P.Murphy. Machinelearning:aprobabilisticperspective. MITpress,2012. [11] S.Roberts,M.Osborne,M.Ebden,S.Reece,N.Gibson,andS.Aigrain. Gaussianprocessesfortime- seriesmodelling. PhilosophicalTransactionsoftheRoyalSocietyA:Mathematical,PhysicalandEngi- neeringSciences,371(1984):20110550,2013. [12] J.Q.Shi,R.Murray-Smith,andD.M.Titterington.Hierarchicalgaussianprocessmixturesforregression. Statisticsandcomputing,15(1):31–41,2005. [13] A.GelmanandJ.Hill. Dataanalysisusingregressionandmultilevel/hierarchicalmodels. Cambridge UniversityPress,2006. [14] H. Wang et al. High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer’sdiseaseprogressionprediction.InAdvancesinNeuralInformationProcessingSystems,pages 1277–1285,2012. [15] J.RossandJ.Dy. Nonparametricmixtureofgaussianprocesseswithconstraints. InProceedingsofthe 30thInternationalConferenceonMachineLearning(ICML-13),pages1346–1354,2013. [16] P.F.Schulam,F.M.Wigley,andS.Saria. Clusteringlongitudinalclinicalmarkertrajectoriesfromelec- tronichealthdata: Applicationstophenotypingandendotypediscovery. InProceedingsoftheTwinty- NinthAAAIConferenceonArtificialIntelligence,2015. [17] B.M.Marlin. Modelinguserratingprofilesforcollaborativefiltering. InAdvancesinneuralinformation processingsystems,2003. [18] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of thestate-of-the-artandpossibleextensions. KnowledgeandDataEngineering, IEEETransactionson, 17(6):734–749,2005. [19] D.Sontag,K.Collins-Thompson,P.N.Bennett,R.W.White,S.Dumais,andB.Billerbeck. Probabilistic modelsforpersonalizingwebsearch. InProceedingsofthefifthACMinternationalconferenceonWeb searchanddatamining,pages433–442.ACM,2012. [20] C.E.RasmussenandC.K.Williams. GaussianProcessesforMachineLearning. TheMITPress,2006. [21] Y.Allanoreetal. Systemicsclerosis. NatureReviewsDiseasePrimers,2015. [22] D. Khanna et al. Clinical course of lung physiology in patients with scleroderma and interstitial lung disease: analysisofthesclerodermalungstudyplacebogroup. Arthritis&Rheumatism,63(10):3078– 3085,2011. [23] J.Q.Shi,B.Wang,E.J.Will,andR.M.West.Mixed-effectsgaussianprocessfunctionalregressionmodels withapplicationtodose–responsecurveprediction. Stat.Med.,31(26):3165–3177,2012. 9 Supplement A ExpectationMaximizationforDiseaseTrajectoryModel A.1 Objective We include the derivation of the EM objective for convenience. Recall that the model for marker y given ij parametersΘandlatentvariables{z ,(cid:126)b ,f }is i i i   y ∼NΦ (t )(cid:62)Λ(cid:126)x +Φ (t )(cid:62)β(cid:126) +Φ (t )(cid:62)(cid:126)b + f (t ) ,σ2. (14) ij  p ij ip z ij zi (cid:96) ij i i ij  (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (A)population (B)subpopulation (C)individual (D)structurednoise Let X = {(cid:126)t ,(cid:126)x ,(cid:126)x } denote individual i’s observation times and features, then marginalizing the joint i i ip iz likelihoodgivesus P((cid:126)y |X ,Θ) i i = (cid:88)G P(z |X ,Θ)(cid:90) P(cid:16)(cid:126)b |Θ(cid:17)(cid:90) P(f |Θ)P(cid:16)(cid:126)y |z ,(cid:126)b ,f ,X ,Θ(cid:17)df d(cid:126)b (15) i i i i i i i i i i i zi=1(cid:124)Mult.reg(cid:123)re(cid:122)ssionprio(cid:125)r Rd(cid:96) (cid:124) (cid:123)(cid:122) (cid:125) RNi (cid:124) GP(cid:123)p(cid:122)rior (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) Normalprior Eq.14 = (cid:88)G π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (16) zi iz i p i ip z i zi i i zi=1 MovingfromEq.15toEq.16,weevaluatetheinnermostintegralusingthefactthattheGPprioroverf is i conjugatetoEq.14yieldinganewmultivariatenormal[20]. ToevaluatethenextintegralinEq.15,weagain havethatthenormalpriorover(cid:126)b isconjugatetothemultivariatenormalobtainedbymarginalizingoverf , i i whichgivesusthemultivariatenormalshowninEq.16wherethecovariancefunctionisdefinedas K(t ,t )=Φ (t )(cid:62)Σ Φ (t )+K (t ,t )+σ2I(t =t ). (17) 1 2 (cid:96) 1 b (cid:96) 2 OU 1 2 1 2 Weseethattheobserved-datalog-likelihoodforindividualiisdefinedbyamixtureofmultivariatenormals where each subtype is associated with a class in the mixture. The mixing probabilities are defined by the multinomiallogisticregression.Themeanofthemultivariatenormalisdefinedbythepopulationandsubpop- ulationmodels,andthecovarianceisdefinedbytheindividualandstructurednoisemodels.Theobserved-data log-likelihoodforallindividualsistherefore L(Θ)=(cid:88)M log(cid:34)(cid:88)G π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17)(cid:35). (18) zi iz i p i ip z i zi i i i=1 zi=1 A.2 ExpectationStep Allparametersrelatedto(cid:126)b andf arelimitedtothecovariancekernelandarethereforenotoptimizedusing i i EM.Wethereforeonlyneedtoconsiderthemechanismindicatorsz asunobservedintheexpectationstep. i Becausez isdiscrete,itsposteriorissimplycomputedusingthejointprobabilityofz and(cid:126)y .Letπ∗ denote i i i ig theposteriorprobabilitythatindividualihasmechanismg∈{1,...,G},thenwehave π∗ ∝π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (19) ig g iz i p i ip z i g i i A.3 MaximizationStep Inthemaximizationstep,wemaximizetheexpectedcomplete-dataloglikelihoodwithrespecttoΘ . Wecan 2 writethecomplete-dataloglikelihoodas L (Θ )=(cid:88)M logπ ((cid:126)x )+logN(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17) c 2 zi iz i p i ip z i zi i i i=1 =(cid:88)M logπ ((cid:126)x )+(cid:88)M logN(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (20) zi iz i p i ip z i zi i i i=1 i=1 (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) Subtypemarginal Markerconditional Theexpectationistakenwithrespecttotheposterioroverz . i 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.