ebook img

A Bayesian view of doubly robust causal inference PDF

0.39 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Bayesian view of doubly robust causal inference

A Bayesian view of doubly robust causal inference O.Saarela∗, L.R.Belzile†AND D.A.Stephens‡ Abstract Incausalinferenceconfoundingmaybecontrolledeitherthroughregressionadjustmentinanout- comemodel,orthroughpropensityscoreadjustmentorinverseprobabilityoftreatmentweighting,or 7 1 both. Thelatterapproaches,whicharebasedonmodellingofthetreatmentassignmentmechanism 0 andtheirdoublyrobustextensionshavebeendifficulttomotivateusingformalBayesianarguments; 2 inprinciple,forlikelihood-basedinferences,thetreatmentassignmentmodelcanplaynopartininfer- encesconcerningtheexpectedoutcomesifthemodelsareassumedtobecorrectlyspecified. Onthe n otherhand,forcingdependencybetweentheoutcomeandtreatmentassignmentmodelsbyallowing a J theformertobemisspecifiedresultsinlossofthebalancingpropertyofthepropensityscoresandthe lossofanydoublerobustness.Inthispaper,weexplainintheframeworkofmisspecifiedmodelswhy 5 1 doublyrobustinferencescannotarisefrompurelylikelihood-basedarguments,anddemonstratethis throughsimulations. AsanalternativetoBayesianpropensityscoreanalysis,weproposeaBayesian ] posterior predictive approach for constructing doubly robust estimation procedures. Our approach E appropriatelydecouplestheoutcomeandtreatmentassignmentmodelsbyincorporatingtheinverse M treatmentassignmentprobabilitiesinBayesiancausalinferencesasimportancesamplingweightsin MonteCarlointegration. . t A revised version of this article has been accepted for publication in Biometrika, published by a t OxfordUniversityPress. s Saarela,O.,Belzile,L.R.andD.A.Stephens. ABayesianviewofdoublyrobustcausalinference, [ Biometrika(2016),103(3):667-681.doi:10.1093/biomet/asw025. 1 v 1. INTRODUCTION 3 9 In causal inference contexts, confounding is most often controlled by one of two approaches: first, by 0 conditioningontheconfoundersinaregressionmodelfortheoutcomeinanoutcomeregressionmodel; 4 secondly,bymodellingthetreatmentassignmentmechanismtoobtainso-calledpropensityscorevalues, 0 and then using these scores to construct strata within the observed sample, or a pseudo-sample from . 1 a hypothetical population, within which treatment assignment is not confounded. This pseudo-sample 0 can be obtained via inverse probability of treatment weighting of the original sample, analogously to 7 surveysamplingprocedures. Theoutcomeregressionadjustmentmethodrequirescorrectspecification 1 : of the regression function in order to obtain consistent inference; this may be achieved in practice us- v ingflexibleregressionstrategiesandcomplexfunctionsoftypicallyalargenumberofcovariates. The i X propensityscoreadjustmentmethodsfocusprincipallyonthespecificationofthetreatmentassignment r model, which may be similarly flexible or complex. Under either approach, sufficient control of con- a foundingisthereforedependentonpossiblyunverifiablemodellingassumptions. Thishasmotivatedthe developmentofdoublyrobustmethodsinwhichbothmodelcomponentsarespecified,butonlyoneof themneedstobecorrectlyspecifiedtosufficientlycontrolforconfounding. Adjustments that depend on the propensity score using regression or reweighting are not easy to interpretfromtheBayesianperspective,sinceBayesianinferencesarenaturallybasedonmodellingof the outcome, with modelling of the treatment assignment playing no role in inference relating to the outcome/treatment relationship (Robins and Ritov, 1997). Gustafson (2012) attempted a Bayesian in- terpretation as a compromise between a saturated outcome model and a parametric one; however, the treatmentassignmentmodeldidnotfeatureinthisinterpretation. Scharfsteinetal.(1999)andBangand ∗Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, Ontario M5T 3M7, Canada. [email protected] †École Polytechnique Fédérale de Lausanne, EPFL-SB-MATHAA-STAT, Station 8, CH-1015 Lausanne, Switzerland. [email protected] ‡Department of Mathematics and Statistics, McGill University, Montreal, Quebec H3A 2K6, Canada. [email protected] 1 2 O.SAARELA,L.R.BELZILED.A.STEPHENS Robins(2005)pointedoutaconnectionbetweenadoublyrobustestimatorandamodel-basedestima- tor; the two are equivalent if the outcome model in the latter features the so called clever covariate, a particularfunctionofthepropensityscore. Separatelyfromtheabovedevelopments,therehasbeenabodyofworkstudyingBayesianversions ofpropensityscoreadjustmenttocontrolforconfounding(Hoshino,2008;McCandlessetal.,2009a,b, 2010,2012;An,2010;KaplanandChen,2012;Zigleretal.,2013;ChenandKaplan,2015;Grahametal., 2015). Theseapproachesrequireoneoftwokindsofcompromises;theyeitherforceaparametrization whichmakestheoutcomeandtreatmentassignmentmodelsdependent,thuslosingthebalancingprop- ertyofthepropensityscore,orcutsuchafeedbackinwhichcasetheinferenceproceduresarenolonger Bayesian. Because of these difficulties, some authors (e.g. Achy-Brou et al., 2010, p. 828) have been contenttofixthepropensityscorestotheirbestestimatesinmodel-basedinferences,withoutattempting tojointlyestimatethetwomodelcomponents. Inanalternativeapproach,Wangetal.(2012)andZigler andDominici(2014)havesuggestedconnectingtheoutcomeandtreatmentassignmentmodelsthrough thepriordistributioninordertoincorporatetheuncertaintyinconfounderselection. Hereinwedonot consider model uncertainty, but rather, concentrate on inferences with a priori specified outcome and treatmentassignmentmodels. ThepurposeofthispaperistoclarifythetheoreticalandpracticalmotivationsforBayesianpropen- sityscoreadjustment,andtherelationshipsbetweenthedifferentmethodsproposedforthis,whichhave notbeenfullyexploredpreviously.Weaddresstheseinthecontextofdoubleadjustmentforboththepo- tentialconfoundersandthepropensityscore,andarguethattheproblemcannotbeproperlyunderstood withoutconsideringitintheframeworkofmisspecifiedmodels. ToprovideanalternativetoBayesian propensity score adjustment, we propose deriving Bayesian versions of various inverse probability of treatmentweightedestimators,includinginverseprobabilityoftreatmentweightedoutcomeregression and the semi-parametric double robust estimator, through posterior predictive expectations, with the weightsintroducedasimportancesamplingweightsinMonteCarlointegration. 2. PRELIMINARIES 2. Notationandassumptions Forsimplicity,weconsiderthecaseofasinglebinarytreatment,anddeferdiscussionofthelongitudinal, multipleexposureandcontinuouscasestotheDiscussion. LettherandomvectorsX representasetof i pre-treatment covariate measurements, Z a binary treatment allocation indicator, and Y an outcome i i forindividuali,measuredaftersufficienttimehaspassedsinceadministeringthetreatment. Weadopt, for convenience, the standard potential outcome (or counterfactual) framework: for individual i, the observedoutcomeisrelatedtothetwopossiblepotentialoutcomes(Y ,Y )byY =(1−Z )Y + 0i 1i i i 0i Z Y . We assume that X includes a sufficient set of covariates to control for confounding in the i 1i i sense that Z ⊥⊥ (Y ,Y )|X (cf. ignorable treatment assignment, Rosenbaum and Rubin, 1983, i 0i 1i i p. 43). The propensity score e(X ) ≡ pr(Z = 1|X ) has the balancing property Z ⊥⊥ X |e(X ), i i i i i i whichalsoimpliesthat(Y ,Y )⊥⊥Z |e(X )–thisscalarscoreisthereforeusefulincontrollingfor 0i 1i i i confounding. IfthecovariatespaceX ishigh-dimensional,inpracticethetaskofcontrollingforconfoundingoften i involves some covariate selection; here one can either select for the features that predict the outcome, orthetreatmentassignment. Torepresentthis,letS andB representsomeaprioriselectedsubsetsof i i the all observed features X , so that the latter can be partitioned as X = (S ,R ) or X = (B ,C ), i i i i i i i where possibly R = ∅ and C = ∅. If the selected set of features S captures all relevant prognostic i i i information, then Y ⊥⊥ X |S (Hansen, 2008). For the remainder of the paper, we consider the 0i i i strongercondition(i)(Y ,Y )⊥⊥X |S ,whichrequiresthatS alsocapturesallrelevantinformation 0i 1i i i i aboutpossibleeffectmodification. InthiscaseS issufficienttocontrolforconfounding,sincefromthepropertiesofconditionalinde- i pendence(Dawid,1979)itfollowsthat(Y ,Y ) ⊥⊥ Z |S . If,ontheotherhand,theselectedsetof 0i 1i i i featuresB hasthebalancingproperty(ii)Z ⊥⊥ X |B ,itissufficienttocontrolforconfounding. We i i i i areinterestedinestimationprocedureswhicharevalidwheneither(Y ,Y ) ⊥⊥ X |S (butpossibly 0i 1i i i Z (cid:54)⊥⊥X |B )orZ ⊥⊥X |B (butpossibly(Y ,Y )(cid:54)⊥⊥X |S ). i i i i i i 0i 1i i i Ourparameterofinterestisanaveragecausalcontrastsuchas E(Y )−E(Y )=E {E(Y |X )}−E {E(Y |X )}. 1i 0i Xi 1i i Xi 0i i Interest then lies in the identifiability of the average potential outcomes based on the observed data. preprint,versionofOctober29th,2015 Author’sOriginalVersion ABayesianviewofdoublyrobustcausalinference, 3 Whenthe‘nounmeasuredconfounder’assumptionholds,itfollowsthat(e.g.HernánandRobins,2006, p. 43) (cid:90) E(Y )−E(Y )= {E(Y |Z =1,x )−E(Y |Z =0,x )}p(x )dx . 1i 0i i i i i i i i i xi 2. Bayesianmodelformulationforoutcomeregression AnyBayesianmodelspecificationisconstructedviadeFinetti’srepresentationforexchangeablerandom sequencesv ≡(x ,y ,z )(seeforexampleBernardoandSmith,1994,Chap. 4). The(subjective)joint i i i i distributionforobservationsv ≡(v ,...,v )maythenberepresentedby 1 n (cid:90) (cid:40) n (cid:41) (cid:89) p(v)= p(y |z ,x ;φ)p(z |x ;γ)p(x ;ψ) π (φ,γ,ψ)dφdγdψ, (1) i i i i i i 0 φ,γ,ψ i=1 implying the existence of the parametric models and the prior density π (φ,γ,ψ). Since the repre- 0 sentation theorem is not constructive, that is, does not specify the models implicit in (1), in practice inferencesaboutagivenfinite-dimensionalparametrizationinvolvestheoftenimplicitassumptionthat p(y |z ,x ;φ )=f(y |z ,x ),p(z |x ;γ )=f(z |x )andp(x ;ψ )=f(x ),where(φ ,γ ,ψ )is i i i 0 i i i i i 0 i i i 0 i 0 0 0 thelimitingvalueoftheposteriorπ (φ,γ,ψ)≡p(φ,γ,ψ|v)inthesenseofe.g.vanderVaart(1998,p. n 139),andwherethefsrepresentthe‘true’limiting(sampling)distributions. Wemightfurtherassume thattheparametersareaprioriindependent, sothatπ (φ,γ,ψ) = π (φ)π (γ)π (ψ), inwhichcaseit 0 0 0 0 italsofollowsalsothattheposteriorfactorizesasπ (φ,γ,ψ)=π (φ)π (γ)π (ψ)(e.g.Gelmanetal., n n n n 2004,p. 354–355). Themarginaldistributionp(x ;ψ)caninpracticebespecifiednonparametricallyandestimatedusing i theempiricalcovariatedistribution,leadingtoaBayesianestimatoroftheaveragecausalcontrast, (cid:90) 1 (cid:88)n {m(1,x ;φ)−m(0,x ;φ)}π (φ)dφ, (2) n i i n φ i=1 whereπ (φ)∝(cid:81)n p(y |z ,x ;φ)π (φ)dφ,andm(z,x;φ)≡(cid:82) yp(y|z,x;φ)dy. InSupplementary n i=1 i i i 0 Appendix1,weshowthat(2)canbemotivatedwithouttheuseofpotentialoutcomesnotationthrough posterior predictive expectations for a new observation under a hypothetical completely randomized setting. Theestimator(2)istheBayesianversionofthewell-knowndirectstandardizationorg-formula. We return to it in Section 6, but note here that it does not feature the treatment assignment model; rather, theposteriorpredictiveapproachforestimatingthemarginalcausalcontrastdependsentirelyoncorrect specificationofthedistributionp(y |z ,x ;φ),orinamoment-basedrepresentation,m(z,x;φ). Before i i i discussingBayesianalternativesto(2),webrieflyreviewsomecommonlyusedfrequentistapproaches forcombiningoutcomeregressionandpropensityscoreadjustment. 3. FREQUENTIST APPROACHES FOR COMBINING OUTCOME REGRESSION AND PROPENSITY SCORE ADJUSTMENT 3. Includingpropensityscoresintooutcomeregression Because of the balancing property of the propensity score, it is tempting to specify a propensity score e(b ;γ)≡pr(Z =1|b ;γ)anduseastatisticalmodelsuchasp{y |z ,e(b ),s ;φ},inthehopethat,if i i i i i i i theprognosticmodelisismisspecified,adjustingforthepropensityscorewouldstillsufficientlycontrol for any residual confounding. For simplicity, we take the parameters φ to specify also the functional dependencebetweenthepropensityscoreandtheoutcome;tomodelthisdependency,itisadvisableto useflexibleformulationssuchassplines(e.g.ZhangandLittle,2009). Usingsuchanoutcomemodel, themarginalcausalcontrastwouldthenbeestimatedby n 1 (cid:88)(cid:104) (cid:110) (cid:111) (cid:110) (cid:111)(cid:105) n m 1,e(bi;γ(cid:98)),si;φ(cid:98) −m 0,e(bi;γ(cid:98)),si;φ(cid:98) , (3) i=1 (cid:82) where m{z,e(b;γ),s;φ} ≡ yp{y|z,e(b;γ),s;φ}dy, and where φ(cid:98)and γ(cid:98) are maximum likelihood estimators for the parameters in the outcome regression and treatment assignment model, respectively. The motivation for such a double adjustment is that it is sufficient to control for confounding if either Author’sOriginalVersion preprint,versionofOctober29th,2015 4 O.SAARELA,L.R.BELZILED.A.STEPHENS condition (i) or (ii) of Section 2.1 applies. We summarize this property in the following theorem (a proofinSupplementaryAppendix2;inthenotationallconvergenciesareinprobabilityunlessotherwise stated). THEOREM1. Estimator(3)isconsistentiftheoutcomemodeliscorrectlyspecifiedinthesensethat (cid:82) m{z,e(b;γ ),s;φ } = yf(y|z,e(b),s)dy, the parameters in this can be consistently estimated so 0 0 that φ(cid:98) → φ0, the treatment assignment model is correctly specified in the sense that p(zi|bi;γ0) = f(z |b )andγ →γ ,andeither(i)holdstrue,or(ii)holdstrue. i i (cid:98) 0 Theestimator(3)maybeconsidered‘doublyrobust’intermsofthecovariateselectioninthesense that only one of the sets S and B needs to be correctly specified, although it still always relies on a i i correctparametricspecificationofthemodelfortheexpectedoutcome,conditionalon{Z ,e(B ),S }. i i i 3. Theclevercovariateandaugmentedoutcomeregression The estimator discussed in the previous section did not specify which function of the propensity score e(B ) should be added to the regression model. Scharfstein et al. (1999, p. 1141–1142) and Bang i andRobins(2005,p. 964–965)drewaconnectionbetweenpropensityscoreregressionadjustmentand doublyrobustestimatorsoftheform n 1 (cid:88)yizi−{zi−e(bi;γ(cid:98))}m(1,si;φ(cid:98)) n e(b ;γ) i (cid:98) i=1 n − 1 (cid:88)yi(1−zi)−[(1−zi)−{1−e(bi;γ(cid:98))}]m(0,si;φ(cid:98)), n 1−e(b ;γ) i (cid:98) i=1 whichcanbeequivalentlyrepresentedas n1 (cid:88)n (cid:26)e(bzi;γ) − 1−1e−(bzi;γ)(cid:27)(cid:110)yi−m(zi,si;φ(cid:98))(cid:111)+ n1 (cid:88)n (cid:26)m(1,si;φ(cid:98))−m(0,si;φ(cid:98))(cid:27). (4) i (cid:98) i (cid:98) i=1 i=1 On considering the score equation derived from a regression of Y on Z and S with mean function i i i m(z,s;φ),thisformsuggestsincorporatingthederivedcovariatec(z ,b ) = z /e(b )−(1−z )/{1− i i i i i e(b )}(termedthe‘clevercovariate’byRoseandvanderLaan,2008,p. 8)additivelyintotheoutcome i regression,thatis,forexample m(z,s;φ)=φ +φ z+φ(cid:62)s+φ c(z,b). (5) 0 1 2 3 Thefirsttermin(4)isthenzerothroughthemaximumlikelihoodscoreequation, leavingonlythelast termwhichisthemodelbasedestimatorofthemarginaltreatmenteffect. Thuswiththeclevercovariate in the outcome model, the doubly robust estimator is equivalent to the model-based estimator. In the specialcaseofmodel(5),thisbecomes 1 (cid:88)n (cid:110) (cid:111) 1 (cid:88)n (cid:26) 1 1 (cid:27) n m(1,xi;φ(cid:98))−m(0,xi;φ(cid:98)) =φ(cid:98)1+φ(cid:98)3n e(b ;γ) − 1−e(b ;γ) . i (cid:98) i (cid:98) i=1 i=1 A potential drawback of using this covariate is that it may lead to extreme variability for the resulting meandifferenceestimator,evencomparedtoinverseprobabilityoftreatmentweightedestimatorsofthe form n (cid:26) (cid:27) 1 (cid:88) yizi − yi(1−zi) . (6) n e(b ;γ) 1−e(b ;γ) i (cid:98) i (cid:98) i=1 Toseewhythisisthecase,thedistributionofc(z ,b )initselfcanbeveryskewedtotheright,butthis i i becomesevenmorepronouncedinthemodel-basedestimatorwheretheclevercovariatehastoevaluated at both c(1,b ) and c(0,b ) for each i = 1,...,n. In contrast, the inverse probability of treatment i i weightedestimator(6)involvesonlytheprobabilitiesoftreatmentsthatwereactuallyassigned. Eq. (4) is doubly robust in a stronger, semi-parametric, sense than (3); it does not require correct specificationoftheoutcomemodel,ifthetreatmentassignmentmodeliscorrectlyspecified.Theapprox- imateBayesiandoublerobustapproachproposedbyGrahametal.(2015)involvedreplacingm(z,x;φ) preprint,versionofOctober29th,2015 Author’sOriginalVersion ABayesianviewofdoublyrobustcausalinference, 5 in(2) withalinear predictoraugmentedwith theclevercovariate. Wetake thistobe aspecialcase of thetwo-stepBayesianmethodstobediscussedinSection4,andthusdonotseparatelyconsideritinthe presentpaper. However,inSection6.2wewillshowhowtheform(4)maybederivedthroughposterior predictiveexpectationsandimportancesampling. 3. Inverseprobabilityoftreatmentweightedoutcomeregression Yetanotherestimatorforthemarginalcausalcontrastis n 1 (cid:88)(cid:110) (cid:111) n E(Y1i|si;φ(cid:98))−E(Y0i|si;φ(cid:98)) , (7) i=1 wheretheparametersφinthemodelforthepotentialoutcomes(Y ,Y )areestimatedusinganinverse 1i 0i probabilityoftreatmentweightedestimatingfunctionl(φ)=(cid:80)n l (φ),where i=1 i 1 l (φ)=(cid:88)1 logp(yai|si;φ). i {zi=a} pr(Z =a|b ) i i a=0 Thecorrespondingestimatingequationisu(φ)=(cid:80)n u (φ)=0,wherethepseudo-scorefunctionis i=1 i 1 ∂ logp(y |s ;φ) u (φ)=(cid:88)1 ∂φ ai i . (8) i {zi=a} pr(Z =a|b ) i i a=0 Herethetreatmentassignmentprobabilitiespr(Z =a|b )wouldinpracticebereplacedwithestimates i i pr(Z = a|b ;γ). To motivate the use of (8) for the parameter estimation, we present the following i i (cid:98) theorem(aproofgiveninSupplementaryAppendix2). THEOREM2. Theestimatingequationu(φ)=0isunbiasedifthemodelforthepotentialoutcomes iscorrectlyspecifiedinthesensethatp(y |s ;φ )=f(y |s ),andeither(i)or(ii)holdstrue. ai i 0 ai i If we further assume the consistency of the estimator for φ, as well as consistency of γ when the (cid:98) weightsarecorrectlyspecified,italsofollowsthat(7)consistentlyestimatesthemarginalcausalcontrast. In Section 6.1 we demonstrate that an estimator of the form (7) can also be motivated from Bayesian arguments. 4. TWO-STEP ESTIMATION WHEN THE PROPENSITY SCORE IS UNKNOWN In observational settings the function e(B ) is unknown and has to be estimated. When using an es- i timator of the form (3), a central question from a Bayesian perspective then is how the uncertainty in the estimation of the parameters γ is incorporated in the inference of the marginal causal contrast. A ‘Bayesian’approachcouldbemotivatedbywritingtheposteriorpredictiveexpectationas (cid:90) (cid:90) m(a,e(b ;γ),s ;φ)p(x ;ψ)π (φ|γ)π (γ)π (ψ)dx dφdγdψ, (9) i i i n n n i ψ,γ,φ xi where n n (cid:89) (cid:89) π (φ|γ)∝ p{y |z ,e(b ;γ),s ;φ}π (φ) and π (γ)∝ p(z |b ;γ)π (γ). n i i i i 0 n i i 0 i=1 i=1 The integrals of the form (9) could be evaluated by Monte Carlo integration, by forward sampling first from π (γ) and given the current value γ, from the conditional posterior π (φ|γ). However, a n n concernrelatedtosuchanapproachisthattheproductoftheposteriordistributionsπ (φ|γ)andπ (γ) n n in (9) does not necessarily correspond to any well defined joint posterior, except in the case when the outcomeiscorrectlyspecifiedinthesense(i),inwhichcaseitdoesnotdependonthepropensityscore. Incontrast,forthecorrectlyspecifiedmodelsin(1)weindeedhavethatp(φ|v)p(φ|x,z)=p(φ,γ|v). Asaresult,theaboveoutlinedtwo-stepapproachisnotproperBayesian,andwouldhavetobeevaluated onitsfrequency-basedproperties. To give an example of a situation where the two-step Bayesian approach does not result in correct frequency-basedinferences,wecanconsiderthespecialcaseoftheoutcomemodelm{z,e(b;γ),s;φ}= Author’sOriginalVersion preprint,versionofOctober29th,2015 6 O.SAARELA,L.R.BELZILED.A.STEPHENS φ +φ z +φ(cid:62)s+φ(cid:62)g{e(b;γ)}, where g is, for example, some appropriate spline basis transforma- 0 1 2 3 tion of the propensity score. With this model the estimator based on (9) for the average causal con- (cid:82) trastE(Y )−E(Y )reducesto φ π (φ|γ)π (γ)dφdγ,thatis,toanestimatoroftheposterior 1i 0i φ,γ 1 n n meanEΓ|X,Z{E(Φ1|v;γ)}.Thisestimatorinturncanbeapproximatedwith(cid:80)mj=1φ(cid:98)1(γ(j))/m,where (γ(1),...,γ(m))isaMonteCarlosamplefromπ (γ)(cf.KaplanandChen,2012,p. 589). Wenotefirst n thatthisestimatorhasthesameasymptoticdistributionasφ(cid:98)1(γ(cid:98)),wherethetreatmentassignmentmodel parametershavebeenfixedtotheirmaximumlikelihoodestimates(seeSupplementaryAppendix3). In Supplementary Appendix 3 we further show that avar{φ(cid:98)1(γ(cid:98))} ≤ avar{φ(cid:98)1(γ0)}, where φ(cid:98)1(γ0) is the estimator given the ‘true’ propensity scores. Thus, with the propensity score adjusted outcome model specification, a variance adjustment due to estimating the propensity scores should reduce the asymp- toticvarianceoftheresultingtreatmenteffectestimatorcomparedtoahypotheticalsituationwherethe truepropensityscoresareknown(cf.HenmiandEguchi,2004). Incontrast,KaplanandChen(2012,p. 592)andGrahametal.(2015,p. 11)proposevarianceestimatorsbasedonthevariancedecomposition formula var(Φ |v)=E {var(Φ |v;γ)}+var {E(Φ |v;γ)}, (10) 1 Γ|X,Z 1 Γ|X,Z 1 whichappearstoaddafurthervariancecomponent. Anexplanationforthediscrepancyisthatwiththe correctly specified models in the representation (1), we have that p(φ |v;γ) = p(φ |v), and the the 1 1 secondvariancecomponentbecomeszero.Ontheotherhand,ifthemodelsaremisspecified,theproduct formlikelihoodintherepresentation(1)doesnotapplyinthefirstplace. Thisillustratesthedifficulty in applying Bayesian procedures in the context of incompatible models. Even though this is routinely doneinthecontextofmultipleimputation(e.g.Rubin,1996),andoftenproducesreasonableresults,in thepresentcontextthereislittlemotivationtouseanapproachwhichintroducesanadditionalvariance component to the posterior variance, given that estimation of the propensity scores should reduce the varianceofthetreatmenteffectestimator. Wefurtherdiscussthisdiscrepancyinthefollowingsection. 5. JOINT ESTIMATION OF OUTCOME AND TREATMENT ASSIGNMENT MODELS Even though the outcome Y can obviously be predictive of the individual treatment assignment Z , i i the outcomes are not informative of the treatment assignment mechanism (a proof in Supplementary Appendix2): THEOREM3. If the outcome model is correctly specified, the outcomes are non-informative of the parameterscharacterizingthetreatmentassignmentprocess. Insuchacasethetreatmentassignmentmodelplaysnopartintheinferences,sincethecorresponding posterior predictive estimator is (2). However, the Bayesian propensity score approach proposed by McCandless et al. (2009a) specifies a parametrization making the outcome and treatment assignment models dependent and estimates the parameters jointly. More recently, Zigler et al. (2013) suggested thatasimilarapproachcouldbeusedtoobtainaBayesiananaloguetodoublyrobustinferences. Such an approach can be understood by assuming that there exists a de Finetti parametrization (φ∗,γ∗) for which (cid:90) (cid:40) n (cid:41) (cid:89) p(v)= p(y ,z |x ;φ∗,γ∗)p(x ;ψ) π (φ∗)π (γ∗)π (ψ)dφ∗dγ∗dψ, i i i i 0 0 0 φ∗,γ∗,ψ i=1 wherep(y ,z |x ;φ∗,γ∗) = p{y |z ,e(b ;γ∗),s ;φ∗}p(z |b ;γ∗). Comparedto(φ,γ)in(1),neither i i i i i i i i i φ∗orγ∗retainstheoriginalinterpretation,butnowthereisawelldefinedjointposteriordistribution n (cid:89) π (φ∗,γ∗)∝ [p{y |z ,e(b ;γ∗),s ;φ∗}p(z |b ;γ∗)]π (φ∗)π (γ∗). n i i i i i i 0 0 i=1 Inferencescouldnowbebasedontheposteriorpredictiveexpectations (cid:90) (cid:90) m{a,e(b ;γ∗),s ;φ∗}p(x ;ψ)π (φ∗,γ∗)π (ψ)dx dφ∗dγ∗dψ. (11) i i i n n i φ∗,γ∗,ψ xi Atfirstsight,(11)wouldseemmorenaturalthan(9),sincethespecification(11)doesnotmakeuseof incompatiblemodels. However,nowthequantitiese(b ;γ∗)donotpossessthebalancingpropertiesof i preprint,versionofOctober29th,2015 Author’sOriginalVersion ABayesianviewofdoublyrobustcausalinference, 7 propensityscores,andthusitwouldbedifficulttoshowwhether(11)wouldhavethe‘doublerobustness’ propertyoftheestimator(3). Toaddressthelackofbalance,McCandlessetal.(2010)suggestedaGibbssamplertypeapproach similartothatofLunnetal.(2009)tocutthefeedbackfromtheoutcomemodel,bysuccessivelydraw- ing from the conditional posteriors π (γ) and π (φ|γ) to approximate the joint posterior of (φ∗,γ∗). n n However, as discussed in the previous Section, these posteriors are incompatible and generally such a sampling procedure is not guaranteed to converge to any well defined joint distribution. In fact, if the conditionalposteriorscanbesampleddirectly,orifthesecondsamplingstepisallowedtoconvergeto thecorrespondingconditionaldistribution,theinferencesbasedontheformulations(9)and(11)willbe equivalent. Tosumup,tryingtorecoverfullyprobabilisticinferencesthroughsamplingfromajointposteriorof theoutcomeandtreatmentassignmentmodelparameterslosesthebalancingpropertyofthepropensity scores,andconsequently,thepropertiesoftheresultingestimatorwouldbedifficulttoestablish. Onthe other hand, cutting the feedback in an attempt to recover the balancing property would mean that the inferences are no longer based on well-defined posterior distributions. Thus, in the following section, followingtheapproachoutlinedinSaarelaetal.(2015b),weformulatealternativeBayesianestimators thatarenotbasedonBayesianpropensityscoreadjustment. 6. POSTERIOR PREDICTIVE INFERENCES WITH IMPORTANCE SAMPLING 6. Inverseprobabilityoftreatmentweightedoutcomeregression It has been recognized by various authors (e.g. Røysland, 2011; Chakraborty and Moodie, 2013) that inverseprobabilityoftreatmentweightingcanbemotivatedthroughachangeofprobabilitymeasures, or equivalently, importance sampling. However, as far as we know, before Saarela et al. (2015b) this approachhasnotbeenusedtoformulateBayesiancausalinferences. Herewearguethatthisapproach canbeusedtoresolvetheparadoxesdiscussedinSections(4)and(5).Wefollowtheposteriorpredictive reasoning of Supplementary Appendix 1, but rather than trying to directly predict a new observation undertheexperimentalsettingE,weconsiderfirstthetaskofparameterestimationunderthisregime.For thispurpose,aBayesestimatorfortheoutcomemodelparameterscanbeconstructedbymaximizingthe expectedutilityE {l(φ;V )|v}withrespecttoφ,wherethelog-likelihoodl(φ;v )≡logp(y |z ,s ;φ) E i i i i i takes the role of a parametric utility function, and the expectation is over a predicted new observation v =(y ,z ,x ),i∈/ {1,...,n}. i i i i Let further ξ be a set of parameters characterizing the entire data-generating mechanism under the observationalregimeO. Wecanfurtherwritetheexpectationas E {l(φ;V )|v}=E [E {l(φ;V )|v;ξ}], E i Ξ|V E i where, following Walker (2010, p. 26–27), we can consider the lower dimensional decision of maxi- mizingtheexpectedutilityE {l(φ;V )|v;ξ}withrespecttoφconditionalonξ. Withaknownregime E i E andthestabilityassumptiondiscussedinSupplementaryAppendix1,argmax E {l(φ;V )|v;ξ}is φ E i a deterministic function of ξ. Thus, the uncertainty represented by the posterior distribution p (ξ|v) O thenalsoreflectstheuncertaintyonφ,providingmeanstoconstructaposteriordistributionforφ. This proceedsasfollows;theinnerexpectationcanbewrittenas (cid:90) E [l(φ;V )|v;ξ]= l(φ;v )p (v |v;ξ)dv E i i E i i vi (cid:90) p (v |v;ξ) = l(φ;v ) E i p (v |v;ξ)dv i p (v |v;ξ) O i i vi O i (cid:90) p (y |z ,x ,v;ξ)p (z )p (x |v;ξ) = l(φ;v ) E i i i E i E i p (v |v;ξ)dv i p (y |z ,x ,v;ξ)p (z |x ,v;ξ)p (x |v;ξ) O i i vi O i i i O i i O i (cid:90) p (z ) = l(φ;v ) E i p (v |v;ξ)dv , i p (z |x ,v;ξ) O i i vi O i i whereinthelastequalitywemadeuseofthestabilityassumptionofSupplementaryAppendix1. Inthe last form we can replace the predictive distribution under O with the Bayesian bootstrap specification p (v |v;ξ) = (cid:80)n ξ δ (v ), where ξ ≡ (ξ ,...,ξ ) and Ξ|v ∼ Dirichlet(1,...,1), as in the O i k=1 k vk i 1 n weightedlikelihoodbootstrapofNewtonandRaftery,1994. Denotingw (ξ)≡p (z )/p (z |x ,v;ξ), i E i O i i Author’sOriginalVersion preprint,versionofOctober29th,2015 8 O.SAARELA,L.R.BELZILED.A.STEPHENS theexpectedutilitynowbecomes (cid:90) n n (cid:88) (cid:88) E [l(φ;V )|v;ξ]= l(φ;v )w (ξ) ξ δ (v )dv = w (ξ)ξ l(φ;v ), (12) E i i i k vk i i k k k vi k=1 k=1 thatis,aweightedlog-likelihood,motivatingtheestimator n (cid:88) φ(cid:98)(ξ)≡argmax wk(ξ)ξkl(φ;vk). φ k=1 An approximate posterior distribution for φ under E can now be constructed by repeatedly sampling the weight vectors from Ξ|v ∼ Dirichlet(1,...,1), and recalculating φ(cid:98)(ξ) at each realization. This approachofcreatingamappingbetweenthenon-parametricspecificationandaparametrizationrelevant toinferencesisanalogoustoChamberlainandImbens(2003),butaddstheimportancesamplingweights totheDirichletweightsinordertomakeinferencesacrosstheobservationalandexperimentalregimes. The weights w (ξ) function as importance sampling weights in Monte Carlo integration; they add i variabilitytotheestimation,butinthepresentcontextprovidesomeprotectiontowardsmisspecification oftheoutcomemodel,inthesenseofSection3.3. Theabovedidnotyetaddresshowtocalculatethese weights;inprinciplethesearefullydeterminedbythecurrentrealizationofξ underthenon-parametric specification, but in practice parametric model specifications are needed for smoothing purposes, and weneedawaytolinktheξ andthetreatmentassignmentmodelparametersγ. Forthispurposeγ itself can be estimated through the weighted likelihood bootstrap since this readily gives the deterministic function linking the two parametrizations; thus in (12) we choose w (ξ) = p (z )/p {z |b ;γ(ξ)}, i E i O i i (cid:98) where γ(ξ) ≡ argmax (cid:80)n ξ logp(z |b ;γ). The probabilities p (z ) are given by the chosen (cid:98) γ k=1 k k k E i regimeE thatistheobjectofinference;inpracticetheestimationismostefficientwhenwechoosethe targetregimetobeascloseaspossibletotheobservedregimeO;thiscanbeachievedbyfixingp (z ) E i to the marginal treatment assignment probabilities under O, which would result in the usual kind of stabilizedinverseprobabilityoftreatmentweightsusedinmarginalstructuralmodelling(Robinsetal., 2000;Hernánetal.,2001;ColeandHernán,2008). Themarginalcausalcontrastmaynowbeestimatedthroughtheexpectations (cid:90) (cid:90) n (cid:88) EΞ|V{EE(Yi|Zi =a,v;ξ)}= m{a,si;φ(cid:98)(ξ)} ξkδsk(si)pO(ξ|v)dsidξ ξ si k=1 (cid:90) n (cid:88) = ξkm{a,sk;φ(cid:98)(ξ)}pO(ξ|v)dξ, (13) ξk=1 whereweusedthenon-parametricspecificationp(s |v;ξ)=(cid:80)n ξ δ (s ),andwhereagainp (ξ|v) i k=1 k sk i O is replaced with the uniform Dirichlet distribution. Since all uncertainty is contained in the parameter vectorξ,aposteriordistributionforthepredictivemeanormeandifferencecanbeconstructedasbefore throughresampling. Thepointestimatorgivenby(13)isthedirectBayesiananalogueof(7),wherethe outcomemodelwasestimatedusinginverseprobabilityoftreatmentweightedregression. Infact,ifwe fix ξ = 1/n, k = 1,...,n, instead of considering these as unknown parameters, the two estimators k are equivalent. Thus, we conjecture that the estimator given by (13) has a similar ‘double robustness’ propertyas(7). WedemonstratethisthroughsimulationsinSection7,butbeforethat,weshowhowthe semi-parametricdoublyrobustestimator(4)canbemotivatedasaposteriorpredictiveexpectation. 6. Doublyrobustestimation In Supplementary Appendix 4 we show that under the non-parametric specification in terms of ξ, the posteriorpredictivecausalcontrastmaybewrittenas E (Y |Z =1,v;ξ)−E (Y |Z =0,v;ξ) E i i E i i n (cid:26) (cid:27) =(cid:88)ξ {y −m(z ,x ;ξ)} zk − 1−zk k i k k pr (Z =1|x ,v;ξ) pr (Z =0|x ,v;ξ) k=1 O k k O k k n (cid:88) + ξ {m(1,x ;ξ)−m(0,x ;ξ)}, (14) k k k k=1 preprint,versionofOctober29th,2015 Author’sOriginalVersion ABayesianviewofdoublyrobustcausalinference, 9 whichcorrespondstoformulation(4). Sincethenon-parametricspecificationplacesnorestrictionson theconditionaldistributions, inpractice, toobtainanestimator, thenon-parametricallyspecifiedquan- tities m(z ,x ;ξ) and pr (Z = a|x ,v;ξ) would have to be replaced with the parametric versions k k O k k m{zk,sk;φ(cid:98)(ξ)} and prO{Zk = a|bk;γ(cid:98)(ξ)}, estimated using the weighted likelihood bootstrap, as in the previous section. It is straightforward to see that if the outcome model is correctly specified, the expression(14)reducestothemodel-basedestimator(thesecondadditiveterm). Thisagainreflectsthe factthatifwebelieveinouroutcomemodel,thetreatmentassignmentmodeldoesnotplayapartinthe inferences. However,aBayesianmightwanttouseanestimatoroftheform(14)ifbeingrestrictedby twoparametricconstraints,intermsofφandγ,butnotknowingwhichoneoftheseiscorrect. Ifeither oneoftheparametricspecificationsiscorrect,(14)willstillreducetotheposteriorpredictivemeandif- ference,thenaturalBayesianestimator. Wesummarizethispropertyinthefollowingtheorem(aproof inSupplementaryAppendix2). THEOREM4. Theestimatorobtainedbysubstitutingtheparametricspecificationsm{zk,sk;φ(cid:98)(ξ)} andpr {Z =a|b ;γ(ξ)}intoexpression(14)isequivalenttotheposteriorpredictivemeandifference O k k (cid:98) ifeitheroneofthesemodelsiscorrectlyspecified. Aposteriordistributionforthemeandifferencecanbegeneratedasintheprevioussectionthrough resamplingoftheparametervectorsξandrecalculating(14)ateachrealization;wewillillustratethisin thefollowingsection. 7. SIMULATION STUDY Above we have made a distinction between model misspecification due to omission of relevant co- variates, and misspecification of the parametric functional relationship between the outcome and the covariates, and noted that all the estimators discussed in Section 3 should be ‘doubly robust’ against theformertypeofmisspecification. However, inpracticetheconsequencesofthesetwo types ofmis- specification will often be similar; they result in residual confounding. Therefore, herein we investi- gate how the different estimators perform when the covariate sets S and B are not only created by i i a partitioning, but also a transformation of the x ’s. For this purpose, we simulated X ∼ N(0,1), i ij j = 1,...,4, and transformed these as c = |x |/(1−2/π)1/2. The true treatment assignment ij ij andoutcomemechanismswerespecifiedasZ |x ∼ Bernoulli(expit{0.4c +0.4x +0.8x })and i i i1 i2 i3 Y |z ,x ∼ N(z − c − x − x ,1), respectively. For estimation, we considered two scenarios: i i i i i1 i2 i4 (I) s ≡ (x ,x ,x ) and b ≡ (c ,x ,x ) (misspecified outcome model and correctly specified i i1 i2 i3 i i1 i2 i4 treatment assignment model), and (II) s ≡ (c ,x ,x ) and b ≡ (x ,x ,x ) (correctly specified i i1 i2 i3 i i1 i2 i4 outcomemodelandmisspecifiedtreatmentassignmentmodel). WeareinterestedinthemarginalcausalcontrastE(Y )−E(Y )=1.Toestimatethis,weapplied i1 i0 the Bayesian estimators discussed in Sections 4, 5, and 6. In the two-step estimation we attempted bothforwardsamplingfromtheposteriordistributions,andthevariancedecompositionformula(10). In theformer, insteadofMarkovchainMonteCarlo, weappliednormalapproximationsfortheposterior distributions,oftheformΦ|v;γ ∼N{φ(cid:98)(γ),S},whereφ(cid:98)(γ)isthemaximumlikelihoodestimateandS itsestimatedvariance-covariancematrix. TheposteriordistributionΓ|x,z wasapproximatedusingthe weightedlikelihoodbootstrap.Inthejointestimation,weagainusedanormalapproximation,centeredat thejointmaximumlikelihoodestimates(φ(cid:98),γ(cid:98)),andthevariance-covariancematrixgivenbytheinverse oftheobservedinformationatthemaximumlikelihoodpoint. Inbothtwo-stepandjointestimation,the fittedmodelswerespecifiedasm{z ,e(b ;γ),s ;φ}=φ +φ z +φ(cid:62)s +φ(cid:62)g{e(b ;γ)},wheregis i i i 0 1 i 2 i 3 i acubicpolynomialbasis, ande(b ;γ) = expit(γ +γ(cid:62)b ). Intheimportancesampling(IS)estimator i 0 1 i proposed in Section 6.1, and in the importance sampling/doubly robust estimator (IS/DR) of Section 6.2, the fitted treatment assignment model was the same, with the outcome model specified through m(z ,s ;φ)=φ +φ z +φ(cid:62)s . i i 0 1 i 2 i ForcomparisontotheBayesianestimators,wealsocalculatenaiveunadjustedcomparison(naive), outcome regression adjusted for covariates s (adjusted), the standard inverse probability of treatment i weighted estimator (6) (IPTW), the semi-parametric doubly robust estimator (4) (DR), the clever co- variate version of this (CC), the inverse probability of treatment weighted outcome regression based estimator(7)(OR/IPTW),aswellaspropensityscoreadjustedoutcomeregressionbasedestimator(3) (OR/PS). For IPTW, DR, CC, and OR/IPTW, the standard errors were estimated through the standard frequentistnonparametricbootstrap(Efron, 1979). ForOR/PS,todemonstrate thevarianceestimation Author’sOriginalVersion preprint,versionofOctober29th,2015 10 O.SAARELA,L.R.BELZILED.A.STEPHENS issues discussed in Section 4, we calculated both observed information based standard errors, and the adjustedsandwichtypestandarderrorsdiscussedinSupplementaryAppendix3. The results over 1000 replications are shown in Table 1. Under scenario (I), all estimators except naiveandadjustedcancorrectforconfounding,althoughthejointestimationapproachproducesaslight bias.Intermsofefficiency,theestimatorsbasedonpropensityscoreadjustedoutcomeregressionarethe best,withtheinverseprobabilityoftreatmentweightingbasedestimatorslosingslightly. Asdiscussed inSection3.2, theclevercovariateapproachresultsinhighervariabilitycomparedtotheotherdoubly adjustedestimators.Intermsofvarianceestimation,thecomparisonbetweentheunadjustedandadjusted standarderrorsforOR/PSsuggeststhatunderthissimulationsettingestimationofthepropensityscores substantially reduces the variance, and not adjusting for this results in overcoverage. The resampling based variance estimators adjust for this automatically. However, the two-step approach to variance estimationperformspoorly;asdemonstratedinSupplementaryAppendix3,thetwo-steppointestimator has the same asymptotic variance as the other OR/PS estimators, but the two-step variance estimators unnecessarilyaddafurthervariancecomponent. Thus,thesimulationresultssupportthediscussionin Sections 4 and 5; the two-step and joint estimation approaches are difficult to justify from Bayesian arguments,anddonotseemtoprovidepracticaladvantagesintermsoftheirfrequency-basedproperties. Ontheotherhand, theimportancesamplingbasedBayesianestimatorsproduce verysimilarresultsto theOR/IPTWandDRestimators,respectively. Underscenario(II),alltheestimatorsexceptIPTWareunbiased, whichisexpectedbasedontheir previouslydiscussedtheoreticalproperties. Whentheoutcomemodeliscorrectlyspecified,thereisalso verylittledifferenceintheefficiencyofthevariousestimators. Table1: Estimatesforthemarginalcausalcontrast(truevalue=1)over1000simulationrounds Point Relative MC Coverage Scenario Estimator SD SE estimate bias(%) error (%) (I) Naive 0.347 −65.3 0.128 0.128 0.004 0.3 Adjusted 0.667 −33.3 0.091 0.092 0.003 3.6 IPTW 1.001 0.1 0.135 0.138 0.005 94.4 OR/PS(obs.information) 0.997 −0.3 0.071 0.095 0.002 99.3 OR/PS(adj.sandwich) 0.997 −0.3 0.071 0.073 0.002 95.3 DR 0.998 −0.2 0.087 0.088 0.003 94.2 Clevercovariate 1.026 2.6 0.110 0.110 0.003 93.0 OR/IPTW 0.990 −1.0 0.082 0.083 0.003 93.9 Two-step(forwardsampling) 0.999 −0.1 0.071 0.113 0.002 99.8 Two-step(variancedecomposition) 0.995 −0.5 0.071 0.112 0.002 99.8 Jointestimation 1.046 4.6 0.071 0.071 0.002 89.5 Importancesampling 0.991 −0.9 0.083 0.080 0.003 93.6 Importancesampling/DR 0.997 −0.3 0.087 0.086 0.003 93.9 (II) Naive 0.347 −65.3 0.128 0.128 0.004 0.3 Adjusted 0.997 −0.3 0.065 0.067 0.002 95.3 IPTW 0.629 −37.1 0.133 0.131 0.005 20.3 OR/PS(obs.information) 0.997 −0.3 0.071 0.071 0.002 95.4 OR/PS(adj.sandwich) 0.997 −0.3 0.071 0.071 0.002 95.4 DR 0.999 −0.1 0.074 0.074 0.003 95.9 Clevercovariate 0.999 −0.1 0.075 0.075 0.003 95.9 OR/IPTW 0.999 −0.1 0.074 0.073 0.003 95.7 Two-step(forwardsampling) 1.000 0.0 0.071 0.072 0.002 96.5 Two-step(variancedecomposition) 0.997 −0.3 0.071 0.071 0.002 95.5 Jointestimation 0.997 −0.3 0.071 0.071 0.002 95.4 Importancesampling 0.999 −0.1 0.074 0.072 0.003 95.2 Importancesampling/DR 0.999 −0.1 0.074 0.073 0.003 95.5 Thecolumnscorrespondtoestimator,meanpointestimate,relativebias,MonteCarlostandarddeviation(SD),mean standarderrorestimate(SE),MonteCarlo(MC)error(batchmeans)ofthemeanpointestimate,and95%confidence intervalcoverageprobability.Thetwoscenarioscorrespondto(I)misspecifiedoutcomemodelandcorrectlyspeci- fiedtreatmentassignmentmodel,and(II)correctlyspecifiedoutcomemodelandmisspecifiedtreatmentassignment model. preprint,versionofOctober29th,2015 Author’sOriginalVersion

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.