ebook img

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation PDF

0.3 MB·English
by  Hui Guo
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Sufficient Covariate, Propensity Variable and Doubly Robust Estimation

Sufficient Covariate, Propensity Variable and Doubly Robust Estimation HuiGuo,PhilipDawidandGiovanniBerzuini 5 1 0 2 n a J 0 3 AbstractStatistical causalinferencefromobservationalstudiesoftenrequiresad- justment for a possibly multi-dimensionalvariable, where dimension reduction is ] crucial.Thepropensityscore,firstintroducedbyRosenbaumandRubin,isapopular T approachtosuchreduction.We addresscausalinferencewithinDawid’sdecision- S . theoretic framework, where it is essential to pay attention to sufficient covariates h andtheirproperties.Weexaminetheroleofapropensityvariableinanormallinear t a model.We investigatebothpopulation-basedandsample-basedlinearregressions, m with adjustmentsfor a multivariate covariate and for a propensity variable. In ad- [ dition, we study the augmented inverse probability weighted estimator, involving a combinationof a responsemodelanda propensitymodel.In a linear regression 1 v withhomoscedasticity,apropensityvariableisprovedtoprovidethesameestimated 1 causaleffectasmultivariateadjustment.Anestimatedpropensityvariablemay,but 6 need not, yield better precision than the true propensity variable. The augmented 7 inverseprobabilityweightedestimatorisdoublyrobustandcanimproveprecision 7 ifthepropensitymodeliscorrectlyspecified. 0 . 1 0 5 1 : v i X HuiGuo r a Centre for Biostatistics, Institute of Population Health, The University of Manch- ester, Jean McFarlane Building, Oxford Road, Manchester M13 9PL, UK, e-mail: [email protected] PhilipDawid StatisticalLaboratory, UniversityofCambridge, Wilberforce Road, CambridgeCB30WB,UK, e-mail:[email protected] GiovanniBerzuini Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy, e-mail: [email protected] 1 2 HuiGuo,PhilipDawidandGiovanniBerzuini 1 Introduction Causal effects can be identified from well-designed experiments, such as ran- domisedcontrolledtrials(RCT),becausetreatmentassignmentisentirelyunrelated to subjects’ characteristics, both observedand unobserved.Suppose there are two treatment arms in an RCT: treatment group and control group. Then the average causaleffect(ACE)can simplybe estimatedasthe outcomedifferenceofthe two groupsfrom the observeddata. However,randomisedexperiments,althoughideal andtobeconductedwheneverpossible,arenotalwaysfeasible.Forinstance,toin- vestigatewhethersmokingcauseslungcancer,wecannotrandomlyforceagroupof subjectstotakecigarettes.Moreover,itmaytakeyearsorlongerfordevelopmentof thisdisease.Instead,aretrospectivecase-controlstudymayhavetobeconsidered. The task of drawing causal conclusion, however,becomesproblematic since sim- ilarity of subjectsfrom the two groupswill rarelyhold, e.g., lifestyles of smokers mightbedifferentfromthoseofnon-smokers.Thus,weareunableto“comparelike withlike”–theclassicproblemofconfoundinginobservationalstudies,whichmay requireadjustingforasuitablesetofvariables(suchasage,sex,healthstatus,diet). Otherwise, the relationship between treatmentand response will be distorted, and leadtobiasedinferences.Ingeneral,linearregressions,matchingorsubclassifica- tionareusedforadjustmentpurpose.Iftherearemultipleconfounders,especially formatchingandsubclassification,identifyingtwoindividualswithverysimilarval- uesofallconfounderssimultaneouslywouldbecumbersomeorimpossible.Thus,it wouldbesensibletoreplacealltheconfoundersbyascalarvariable.Thepropensity score[22]isapopulardimensionreductionapproachinavarietyofresearchfields. 2 Framework The aim of statistical causal inference is to understand and estimate a “causal ef- fect”,andtoidentifyscientificandinprincipletestableconditionsunderwhichthe causaleffectcanbeidentifiedfromobservationalstudies.Thephilosophicalnature of“causality”isreflectedin thediversityofitsstatistical formalisations,asexem- plifiedbythreeframeworks: 1. Rubin’spotentialresponseframework[24,25,26](alsoknownasRubin’scausal model)basedoncounterfactualtheory; 2. Pearl’scausalframework[16,17]richlydevelopedfromgraphicalmodels; 3. Dawid’sdecision-theoreticframework[6,7]basedondecisiontheoryandprob- abilisticconditionalindependence. InDawid’sframework,causalrelationsaremodelledentirelybyconditionalproba- bilitydistributions.Weadoptitthroughoutthischaptertoaddresscausalinference; theassumptionsrequiredare,atleastinprinciple,testable. LetX,T andY denote,respectively,a(typicallymultivariate)confounder,treat- ment, and response (or outcome). For simplicity, Y is a scalar and X a multi- SufficientCovariate,PropensityVariableandDoublyRobustEstimation 3 dimensionalvariable.WeassumethatT isbinary:1(treatmentarm)and0(control arm). Within Dawid’s framework, a non-stochastic regime indicator variable F , T takingvalues0/,0 and 1,is introducedto denotethetreatmentassignmentmecha- nismoperating.Thisdividestheworldintothreedistinctregimes,asfollows: 1. F =0/:theobservational(idle)regime.Inthisregime,thevalueofthetreatment T ispassivelyobservedandtreatmentassignmentisdeterminedbyNature. 2. F =1:the interventionaltreatmentregime,i.e.,treatmentT issetto 1byma- T nipulation. 3. F =0:theinterventionalcontrolregime,i.e.,treatmentT issetto0bymanip- T ulation. Forexample,inanobservationalstudyofcustodialsanctions,ourinterestisinthe effect of custodial sanction, as comparedto probation (noncustodialsanction), on theprobabilityofre-offence.ThenF =0/ denotestheactualobservationalregime T underwhichdatawerecollected;F =1isthe(hypothetical)interventionalregime T thatalwaysimposesimprisonment;andF =0isthe(hypothetical)interventional T regimethatalwaysimposesprobation.Throughout,weassumefullcomplianceand no dropouts, i.e., each individual actually takes whichever treatment they are as- signed to. Then we have a joint distribution P of all relevant variables in each f regimeF = f (f =0,1,0/). T Inthedecision-theoreticframework,causalassumptionsareconstruedasasser- tionsthatcertainmarginalor conditionaldistributionsare commonto allregimes. Suchassumptionscanbeformallyexpressedaspropertiesofconditionalindepen- dence,wherethisisextendedtoallownon-stochasticvariablessuchasF [4,5,7]. T For example, the “ignorable treatment assignment” assumption in Rubin’s causal model(RCM)[22]canbeexpressedas Y⊥⊥F |T, (1) T read as “Y is independent of F given T”. However, this condition will be most T likelyinappropriateinobservationalstudieswhererandomisationisabsent. Causal effect is defined as the response difference by manipulating treatment, which purely involves interventional regimes. In particular, the population-based averagecausaleffect(ACE)ofthetreatmentisdefinedas: ACE:=E(Y|F =1)−E(Y|F =0), (2) T T oralternatively, ACE:=E (Y)−E (Y)1. (3) 1 0 Without further assumptions, by its definition ACE is not identifiable from the observationalregime. 1Forconvenience,thevaluesoftheregimeindicatorF arepresentedassubscripts. T 4 HuiGuo,PhilipDawidandGiovanniBerzuini 3 Identification ofACE Supposethejointdistributionof(F ,T,Y )isknownandsatisfies(1).IsACEiden- T tifiablefromdatacollectedintheobservationalregime?Notethat(1)demonstrates thatthedistributionofY givenT =t isthesame,whethert isobservedintheob- servationalregimeF =0/,orintheinterventionalregimeF =t.Asdiscussed,this T T assumptionwouldnotbesatisfiedinobservationalstudies,andthus,directcompar- isonofresponsefromthetwotreatmentgroupscannotbeinterpretedasthecausal effectfromobservationaldata. Definition1. The“face-valueaveragecausaleffect”(FACE)isdefinedas: FACE:=E (Y|T =1)−E (Y|T =0). (4) 0/ 0/ ItwouldbehardlytruethatFACE=ACE,aswewouldnotexpecttheconditional distribution ofY given T =t is the same in any regime. In fact, identification of ACE fromobservationalstudiesrequires,on one hand,adjustingfor confounders, ontheotherhand,interplayofdistributionalinformationbetweendifferentregimes. Onecanmakenofurtherprogressunlesssomepropertiesaresatisfied. 3.1 Stronglysufficient covariate RigorousconditionsmustbeinvestigatedsoastoidentifyACE. Definition2. X isacovariateif: Property1. X⊥⊥F . T That is, the distribution of X is the same in any regime, be it observational or interventional.Inmostcases,X areattributesdeterminedpriortothetreatment,for example,bloodtypesandgenes. Definition3. X isasufficientcovariatefortheeffectoftreatmentT onresponseY if,inadditiontoProperty1,wehave Property2. Y⊥⊥F |(X,T). T Property 2 requires that the distribution of Y, given X and T, is the same in all regimes. It can also be described as “strongly ignorable treatment assignment, givenX”[22].Weassumethatreadersarefamiliarwiththeconceptandproperties ofdirectedacyclicgraphs(DAGs).ThenProperties1 and2canberepresentedby meansofaDAGasFig.1.ThedashedarrowfromX toT indicatesthatT ispartially dependentonX,i.e.,thedistributionofT dependsonX intheobservationalregime, butnotintheinterventionalregimewhereF =t. T SufficientCovariate,PropensityVariableandDoublyRobustEstimation 5 Fig.1 Sufficientcovariate X F T Y T Definition4. X is a strongly sufficientcovariateif, in additionto Properties1 and 2,wehave Property3.P (T =t|X)>0withprobabilility1,fort=0,1. 0/ Property 3 requires that, for any X =x, both treatment and control groups are observedintheobservationalregime. Lemma1.SupposeX isastronglysufficientcovariate.Then,consideredasajoint distributions for (Y,X,T), P is absolutely continuouswith respect to P (denoted t 0/ byP ≪P ),fort=0andt=1.Thatis,foreveryeventAdeterminedby(X,T,Y), t 0/ P (A)=0 =⇒ P(A)=0. (5) 0/ t Equivalently,ifaneventAoccurswithprobability1underthemeasureP ,thenit 0/ occurswithprobability1underthemeasureP (t=0,1). t Proof. Property 2, expressed equivalently as (Y,X,T)⊥⊥F |(X,T), asserts that T thereexistsafunctionw(X,T)suchthat P (A|X,T)=w(X,T) f almostsurely(a.s.)ineachregime f =0,1,0/.LetP (A)=0.Thena.s.[P ], 0/ 0/ 0=P (A|X)=w(X,1)P (T =1|X)+w(X,0)P (T =0|X). 0/ 0/ 0/ ByProperty3,fort=0,1, w(X,t)=0 (6) a.s.[P ].Asw(X,t)isafunctionofX,itfollowsthat(6)holdsa.s.[P]byProperty 0/ t 1.Consequently, w(X,T)=0 a.s. [P], (7) t since a.s. [P], T =t and w(X,T)=w(X,t) for any boundedfunctionw. Then by t (7), P(A)=E{P(A|X,T)}=E{w(X,T)}=0. t t t t Lemma2.ForanyintegrableZ(cid:22)2 (Y,X,T),andanyversionsoftheconditional expectations, E(Z|X)=E(Z|X,T) a.s.[P]. (8) t t t 2The(cid:22)symbolisinterpretedas“afunctionof”. 6 HuiGuo,PhilipDawidandGiovanniBerzuini Proof. Let j(X,T)beanarbitrarybutfixedversionofE(Z|X,T).Then j(X,T)= t j(X,t)a.s.[P],and j(X,t)servesasaversionofE(Z|X,T)under[P].So t t t E(Z|X)=E{j(X,T)|X}=E{j(X,t)|X}= j(X,t) a.s. [P]. t t t t Thus j(X,t)isaversionofE(Z|X)under[P]and(8)follows. t t Since E(Z |X) is a function of X, then by Property 1, j(X,t) is a version of t E(Z|X)inanyregime.Letg(X,T)besomearbitrarybutfixedversionofE (Z| t 0/ X,T). Theorem1.Suppose that X is a strongly sufficient covariate. Then for any inte- grableZ(cid:22)(Y,X,T),andwithnotationasabove, j(X,t)=g(X,t) (9) almostsurelyinanyregime. Proof. ByProperty2,thereexistsafunctionh(X,T)whichisacommonversionof E (Z |X,T) under[P ] for f =0,1,0/. Thenh(X,T)servesasa versionof E (Z | f f 0/ X,T)under[P ],andaversionofE(Z|X,T)under[P].As j(X,T)isaversionof 0/ t t E(Z|X,T), t j(X,T)=h(X,T) a.s. [P], t andconsequently j(X,t)=h(X,t) a.s. [P]. t Since j(X,t)andh(X,t)arefunctionsofX,byProperty1 j(X,t)=h(X,t) a.s. [P ] (10) f for f =0,1,0/.Wealsohavethatg(X,T)=h(X,T) a.s.[P ],andso,byLemma1, 0/ a.s.[P]. Then g(X,t)=h(X,t) a.s.[P], where g(X,t) and h(X,t) are both func- t t tionsofX.ByProperty1, g(X,t)=h(X,t) a.s. [P ] (11) f for f =0,1,0/.Thus(9)holdsby(10)and(11). 3.2 Specific causal effect LetX beacovariate. Definition5. Thespecificcausaleffect ofT onY,relativetoX,is SCE:=E (Y |X)−E (Y |X). 1 0 SufficientCovariate,PropensityVariableandDoublyRobustEstimation 7 WeannotateSCE toexpressSCEasafunctionofX andwriteSCE(x)toindicate X that X takes specific value x. Because it is defined in the interventional regimes, SCE has a direct causal interpretation,i.e., SCE(x) is the average causal effect in thesubpopulationwithX =x. Although we do not assume the existence of potential responses, when this assumption is made we might proceed as follows. Take X to be the pair Y = (Y(1),Y(0))ofpotentialresponses—whichisassumedtosatisfyProperty1.Then E(Y |X)=Y(t),andconsequently t SCE =Y(1)−Y(0), Y whichisthedefinitionof“individualcausaleffect”,ICE,inRubin’scausalmodel. Thus,althoughthe formalisationsof causality are different,SCE in Dawid’sdeci- sion theoretic framework can be reagarded as a generalisation of ICE in Rubin’s causalmodel. We caneasily provethat, foranycovariateX, ACE=E(SCE ), wherethe ex- X pectationmaybetakeninanyregime.SincebyProperty1, E {E(Y |X)}=E{E(Y |X)}=E(Y), 0/ t t t t fort =0,1.Thusbysubtraction,ACE=E (SCE )foranyregime f =0,1,0/ and f X thereforethesubscript f canbedropped.Hence,ACEisidentifiablefromobserva- tionaldatasolongasSCE isidentifiablefromobservationaldata.IfX isastrongly X sufficientcovariate,byTheorem1,E(Y |X)isidentifiablefromtheobservational t regime.ItfollowsthatSCE canbeestimatedfromdatapurelycollectedintheob- servationalregime.ThenACEexpressedas ACE=E (SCE ) (12) 0/ X isidentifiable,fromtheobservationaljointdistributionof(X,T,Y).Formula(12)is Pearl’s“back-doorformula”[17]becausebythepropertyofmodularity,P(X)isthe samewithorwithoutinterventiononT andthuscanbetakenasthedistributionof X intheobservationalregime. 3.3 Dimensionreductionofstronglysufficient covariate SupposeX isamulti-dimensionalstronglysufficientcovariate.Theadjustmentpro- cess might be simplified if we could replace X by some reduced variableV (cid:22)X, with fewerdimensions—solongasV is itself a stronglysufficientcovariate.Now sinceV isafunctionofX,Properties1and3willautomaticallyholdforV.Wethus onlyneedtoensurethatV satisfiesProperty2:thatis, Y⊥⊥F |(V,T). (13) T 8 HuiGuo,PhilipDawidandGiovanniBerzuini SincetwoarrowsinitiatefromX inFig.1,possiblereductionsmaybenaturally considered,onthepathwaysfromX toT, andfromX toY. Indeed,the following theoremgivestwoalternativesufficientconditionsfor(13) tohold.However,(13) canstillholdwithouttheseconditions. Theorem2.Suppose X is a strongly sufficient covariate andV (cid:22)X. ThenV is a stronglysufficientcovariateifeitherofthefollowingconditionsissatisfied: (a).Response-sufficientreduction: Y⊥⊥X|(V,F =t), (14) T or Y⊥⊥X|(V,T,F =0/), (15) T for t =0,1. It is indicated in (14) that, in each interventional regime, X con- tributesnothingtowardspredictingY onceweknowV.Inotherwords,aslong asV is observed, X need not be observed to make inference onY. While (15) impliesthatintheobservationalregime,knowingX isofnovalueofpredicting Y ifV andT areknown. (b).Treatment-sufficientreduction: T⊥⊥X|(V,F =0/). (16) T Thatis,intheobservationalregime,treatmentdoesnotdependonXconditioning ontheinformationofV. Proofsoftheabovereductionswereprovidedin[9].Analternativeproofof(b) canbe implementedgraphically[9], whichresultsina DAG asFig.2 3 offwhich (16)and(13)canbedirectlyread. Fig.2 Treatmentsufficient reduction X V FT T Y Agraphicalapproachto(a)doesnotworksinceProperty3isrequired.However, whilenotservingasaproof,Fig.3convenientlyembodiestheconditionalindepen- denciesProperties1,2andthetrivialpropertyV⊥⊥T|(X,F ),aswellas(13). T 3Thehollowarrowhead,pointingfromXtoV,isusedtoemphasisethatV isafunctionofX. SufficientCovariate,PropensityVariableandDoublyRobustEstimation 9 Fig.3 Response sufficient reduction X V FT T Y 4 Propensity analysis Here we furtherdiscussthetreatment-sufficientreduction,whichdoesnotinvolve theresponse.Thisbringsintheconceptofpropensityvariable:aminimaltreatment- sufficientcovariate,forwhichweinvestigatetheunbiasednessandprecisionofthe estimatorofACE. Alsothe asymptoticprecisionoftheestimatedACE,aswellas thevariationoftheestimatefromtheactualdata,willbeanalysed.Inasimplenor- mal linear model that applied for covariate adjustment, two cases are considered: homoscedasticity and heteroscedasticity. A non-parametric approach – subclassi- fication will also be conducted, for different covariance matrices of X of the two treatmentarms.TheestimatedACEobtainedbyadjustingformultivariateX andby adjustingfor a scalar propensityvariable,willthen be comparedtheoreticallyand throughsimulations[9]. 4.1 Propensityscoreand propensityvariable Thepropensityscore(PS),firstintroducedbyRosenbaumandRubin,isabalancing score [22]. Regardedas a useful tool to reduce bias and increase precision, it is a verypopularapproachtocausaleffectestimation.PSmatching(orsubclassification) method,widelyusedinvariousresearchfields,exploitsthepropertyofconditional (within-stratum) exchangeability, whereby individuals with the same value of PS (or belonging to a group with similar values of PS) are taken as comparable or exchangeable. We will, however, mainly focus on the application of PS within a linear regression. The definitions of the balancing score and PS given below are borrowedfrom[22]. Definition6. A balancingscore b(X)isa functionof X suchthat, inthe observa- tional regime 4, the conditional distribution of X given b(X) is the same for both treatmentgroups.Thatis, X⊥⊥T|(b(X),F =0/). T 4RosenbaumandRubindonotdefinethebalancingscoreandthePSexplicitlyforobservational studies,althoughtheydoaimtoapplythePSapproachinsuchstudies. 10 HuiGuo,PhilipDawidandGiovanniBerzuini IthasbeenshownthatadjustingforabalancingscoreratherthanX resultsinunbi- asedestimateofACE,withtheassumptionofstronglyignorabletreatmentassign- ment[22].Onecantriviallychooseb(X)=X,butitismoreconstructivetofinda balancingscoretobeamanytoonefunction. Definition7. The propensity score, denoted by P , is the probability of being as- signedtothetreatmentgroupgivenX intheobservationalregime: P :=P (T =1|X). 0/ We shalluse the symbolp to denotea particularrealisationof P . By (16) and Definitions6and7,weassertthatPSisthecoarsestbalancingscore.Forasubject i,PSisassumedtobepositive,i.e.,0<p <1.ThosewiththesamevalueofPSare i equallylikelytobeallocatedtothetreatmentgroup(orequivalently,tothecontrol group),whichprovidesobservationalstudieswiththerandomised-experiment-like propertybasedonmeasuredX.Thisisbecausethecharacteristicsofthetwogroups with the same or similar PS are “balanced”. Therefore, the scalar PS serves as a proxyofmulti-dimensionalvariableX,andthus,itissufficienttoadjustforthefor- merinsteadofthelatter.Inobservationalstudies,PSisgenerallyunknownbecause wedonotknowexactlywhichcomponentsofXhaveimpactonT andhowthetreat- mentisassociatedwiththem.However,wecanestimatePSfromtheobservational data. PSanalysisforcausalinferenceisbasedonasequenceoftwostages: Stage1:PSEstimation.ItisestimatedbytheobservedT andX,andnormally byalogisticregressionofT onX forbinarytreatment.NotethattheresponseY is irrelevantatthisstage.BecausewecanestimatePSwithoutobservingY,thereisno harminfindingan”optimal”regressionmodelofT onX byrepeatedtrials. Stage2:AdjustingforPS.Variousadjustmentapproacheshavebeendeveloped, e.g.,linearregression.IfweareunclearabouttheconditionaldistributionofY given T and PS, non-parametricadjustmentsuch as matching or subclassification could beappliedinstead. Althoughtwoalternativesfordimensionreductionshavebeenprovided,inprac- tice, this type of reduction may be more convenientin many cases. For example, certain values of the response may occur rarely and only after long observation periodsaftertreatment.Inaddition,itmaysometimesbetrickytodeterminea”cor- rect”formforaregressionmodelofY onX,T andF .SwappingthepositionsofX T andT,Equation(16)canbere-expressedas X⊥⊥T|(V,F =0/), (17) T which states that the observationaldistribution of X givenV is the same for both treatmentarms.Thatistosay,V isabalancingscoreforX. Thetreatment-sufficientcondition(b)canbeequivalentlyinterpretedasfollows. Consider the family Q ={Q ,Q } consisting of observational distributions of X 0 1 forthetwogroupsT =0andT =1.ThenEquation(16),re-expressedas(17),says thatV isasufficientstatistic(intheusualFisheriansense[8])forthisfamily.Inpar-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.