Estimation of causal effects with multiple treatments: a review and new ideas Michael J. Lopez∗, and Roee Gutman DepartmentofMathematics,815N.Broadway,SaratogaSprings,NY12866e-mail:[email protected]. DepartmentofBiostatistics,BoxG-S121-7,121S.MainStreet,Providence,RI02912e-mail:roee [email protected]. Abstract: 7 Thepropensityscoreisacommontoolforestimatingthecausaleffectofabinarytreatmentinobser- 1 vational data. In this setting, matching, subclassification, imputation, or inverse probability weighting on thepropensityscorecanreducetheinitialcovariatebiasbetweenthetreatmentandcontrolgroups.With 0 morethantwotreatmentoptions,however,estimationofcausaleffectsrequiresadditionalassumptionsand 2 techniques,theimplementationsofwhichhavevariedacrossdisciplines.Thispaperreviewscurrentmeth- n ods,anditidentifiesandcontraststhetreatmenteffectsthateachoneestimates.Additionally,wepropose a possible matching techniques for use with multiple, nominal categorical treatments, and use simulations J to show how such algorithms can yield improved covariate similarity between those in the matched sets, relative the pre-matched cohort. To sum, this manuscript provides a synopsis of how to notate and use 9 causalmethodsforcategoricaltreatments. 1 Keywords and phrases: causal inference, propensity score, multiple treatments, matching, observational ] data. E M . t 1. Introduction a t s [ The primary goal of many scientific applications is to identify the causal effect of exposure T ∈ {t ,...,t } on 1 Z 2 outcome Y. Randomized experiments are the gold standard for estimating a causal relationship, however, they v 2 are sometimes infeasible due to logistical, ethical, or financial considerations. Further, randomized experiments 3 1 may not be as generalizable as observational studies due to the restricted population used in the experiments. 5 0 Whenassignmenttotreatmentisnotrandomized,thosethatreceiveonelevelofthetreatmentmaydifferfrom . 1 those that receive another with respect to covariates, X, that may also influence the outcome. For example, in a 0 7 studyestimatingthecausaleffectsofneighborhoodchoiceonemployment,personswholiveindeprivedneighbor- 1 : v hoodsdifferfromthosewholiveinprivilegedonesonavarietyofcharacteristics,suchassocioeconomicstatusand i X education levels (Hedman and Van Ham, 2012). As such, it may be difficult to distinguish between neighborhood r a effectsandthedifferencesbetweensubjectswhichexistedbeforetheychosetheirneighborhoods.Insuchsettings, establishing causes and effects requires more sophisticated statistical tools and additional assumptions. Methods such as matching (Dehejia and Wahba, 2002), subclassification (Rosenbaum and Rubin, 1984), weighting(Robins,HernanandBrumback,2000),andimputations(GutmanandRubin,2015)havebeenproposed to adjust for the differences in X across the exposure groups. These approaches attempt to obtain covariate balance across treatment groups, where balance refers to equality in the distributions of X. By ensuring that the ∗MJLopezwassupportedbytheNationalInstituteforHealth(IMSDGrant#R25GM083270)andtheNationalInstituteonAging (NIAgrant#F31AG046056) †R Gutman was partially supported through a Patient-Centered Outcomes Research Institute (PCORI) Award ME-1403-12104. Disclaimer:Allstatementsinthisreport,includingitsfindingsandconclusions,aresolelythoseoftheauthorsanddonotnecessarily representtheviewsofthePatient-CenteredOutcomesResearchInstitute(PCORI),itsBoardofGovernorsorMethodologyCommittee. 1 Lopez&Gutman/Matchingwithmultipletreatments 2 distribution of units receiving different treatments are similar on average, these methods attempt to reproduce a randomized trial, thus reducing the effects of treatment assignment bias on causal estimates. When X is a scalar, it is relatively straight-forward to perform matching (Rubin, 1976). However, it is more complex to match, subclassify, or weight when X is composed of many covariates. With a binary treatment, matching,subclassification,weighting,andimputationusingthepropensityscorehavebeenproposedforestimat- ing causal effects from observational studies with binary treatment (Rosenbaum and Rubin, 1983; Stuart, 2010; GutmanandRubin,2015).Propensityscoreisdefinedastheprobabilityofreceivingthetreatmentconditionalon a set of observed covariates. It has been shown in theory (Rubin and Thomas, 1996) and practice (D’Agostino, 1998; Caliendo and Kopeinig, 2008) that under certain assumptions, matching on propensity scores results in unbiased unit-level estimates of the treatment’s causal effect (Rosenbaum and Rubin, 1983). Generalizationsandapplicationsofpropensityscoremethodsformultipletreatments,however,remainscattered intheliterature,inlargepartbecausetheadvancedtechniquesareunfamiliarandinaccessible.Ourfirstgoalisto provideaunifyingterminologythatwillenableresearcherstocoalesceandcompareexistingmethods.Oursecond goal is to describe current methods for estimating causal effects with multiple treatments, with a specific focus on approaches for nominal categorical exposures (e.g., a comparison of pain-killers Motrin, Advil, and Tylenol). Wecontrastthesemethods’assumptionsanddefinethecausaleffectstheyeachattempttoestimate.Indoingso, potential pitfalls in the commonly used practice of applying binary propensity score tools to multiple treatments are identified. Third, we explain the elevated importance of defining a common support region when studying multiple treat- ments, where differences in the implementation of certain approaches can vary the causal estimands as well as change the study population to which inference is generalizable. Our final goal is to provide a technique for gen- erating matched sets when there are more than two treatments that addresses some of the pitfalls of the current methods, as well as to compare the performance of the new and previously proposed algorithms in balancing covariates’ distributions using extensive simulation analysis. The remainder of Section 1 introduces the notation and identifies existing causal methods for multiple treat- ments. Section 2 proposes a new algorithm for matching with multiple treatments. Section 3 uses simulations to contrast the new and previously proposed approaches for generating well-matched subgroups of subjects. Section 4 discusses and concludes with a set of practical recommendations. 1.1. Notation for binary treatment Ournotationisbasedonthepotentialoutcomesframework,originallyproposedbyNeymanforrandomizedbased inference, and extended by Rubin to observational studies and Bayesian analysis, also known as the Rubin Causal Model (RCM) (Splawa-Neyman, Dabrowska and Speed, 1990 [1923]; Rubin, 1975; Holland, 1986). Let Y , X , i i andT betheobservedoutcome,setofcovariates,andbinarytreatmentassignment,respectively,foreachsubject i Lopez&Gutman/Matchingwithmultipletreatments 3 i = 1,..., N, with N ≤ N, where N is the population size which is possibly infinite. With T ∈ T, let T be the i treatment space. For a binary treatment, T = {t ,t }, and let n and n be the number of subjects receiving 1 2 t1 t2 treatments t and t , respectively. 1 2 The RCM relies on the Stable Unit Treatment Value Assumption (SUTVA) to define the potential outcomes Y (t ) and Y (t ), which would have been observed had unit i simultaneously received t and t , respectively i 1 i 2 1 2 (Rubin,1980).SUTVAspecifiesnointerferencebetweensubjectsandnohiddentreatmentversions,entailingthat the set of potential outcomes for each subject does not vary with the treatment assignment of others. Because each individual receives only one treatment at a specific point in time, only Y (t ) or Y (t ) is observed for each i 1 i 2 subject, which is known as the fundamental problem of causal inference (Holland, 1986). Two commonly used estimands for describing super-population effects are the population average treatment effect, PATE , and the population average treatment effect among those receiving t , PATT . t1,t2 1 t1,t2 PATE = E[Y (t )−Y (t )] (1) t1,t2 i 1 i 2 PATT = E[Y (t )−Y (t )|T =t ] (2) t1,t2 i 1 i 2 i 1 LettingI(T =t )betheindicatorfunctionforanindividualreceivingtreatmentt ,PATE andPATT i 1 1 t1,t2 t1,t2 are generally approximated by the sample average treatment effects. N 1 X SATE = (Y (t )−Y (t )) (3) t1,t2 N i 1 i 2 i=1 N 1 X SATT = (Y (t )−Y (t ))×I(T =t ) (4) t1,t2 n i 1 i 2 i 1 t1 i=1 Because only one of the potential outcomes is observed for every unit, an important piece of information to estimate (3) and (4) is the assignment mechanism, P(T|Y(t ),Y(t ),X), where T = {T }, Y(t ) = 1 2 i 1 {Y (t )},Y(t )={Y (t )}andX ={X }(ImbensandRubin,2015).Threecommonlymaderestrictionsofthe i 1 2 i 2 i assignment mechanism are individualistic, probabilistic, and unconfoundedness (Imbens and Rubin, 2015). In the super population, a random sample of N units automatically results in an individualistic assignment mechanism. A super-population probabilistic assignment mechanism entails that 0 < f (t |Y (0),Y (1),X ,φ) < 1 T|Y(0),Y(1),X 1 i i i for each possible X ,Y (0) and Y (1), where φ is a vector of parameters controlling this distribution. i i i Finally, a super-population assignment mechanism is unconfounded if f (t|y ,y ,x,φ)=f (t|x,φ) ∀ y ,y ,x,φ and t∈{0,1}. T|Y(0),Y(1),X 0 1 T|X o 1 For notational convenience, we will drop φ throughout. Underanindividualisticassignmentmechanism,thecombinationofaprobabilisticandunconfoundedtreatment assignmenthasbeenreferredtobothasstrongunconfoundednessandstrongignorability(Stuart,2010).Theclass Lopez&Gutman/Matchingwithmultipletreatments 4 t2 t1 t2 t1 t2 t1 Scenarioa Scenariob Scenarioc Fig 1: Three scenarios of covariate overlap for binary treatment: shaded areas represent subjects included in a matched analysis ofassignmentmechanismsthatareindividualistic,probabilistic,andunconfounded,butwhosecontroldoesnotlie inthehandsofaninvestigator,arereferredtoasregularassignmentmechanisms,andaremostcommonlyidentified with observational data. Weaker versions of unconfoundedness are sufficient for some estimation techniques and estimands (Imbens, 2000), and are discussed in Section 1.5.4. Let e (X) = P(T = t |X) be the propensity score (PS), and let eˆ (X) be the estimated PS, tra- t1,t2 1 t1,t2 ditionally calculated using logistic or probit regression. If treatment assignment is regular, then it is possible to estimate unbiased unit-level causal effects between those at different treatment assignments with equal PS’s (RosenbaumandRubin,1983).Propensityscoresareoftenusedforeithermatching,inverseprobabilityweighting or subclassification to estimate (3) and (4). 1.1.1. Description of estimands It is useful to describe how estimands are affected by the distribution of X in treatment groups t and t . Figure 1 2 1 shows different sets of overlap in the covariates’ distributions between those receiving t and t . Each circle 1 2 in Figure 1 represents a hypothetical distribution of X among those exposed to each treatment, allowing for an infinitesimally small number of units outside of it. For example, each circle could represent the 99th percentiles of a two-dimensional multivariate normal distribution. In Figure 1, shaded regions correspond to the distribution of covariates in a population of interest, S . When 0<P(T =t |X =x∗)<1 ∀ x∗∈S , PATT reflects the t1 2 t1 t1,t2 ATT of those treated on t (Figure 1, Scenario a). 1 In Scenario b of Figure 1, PATT also intends to reflect the ATT of those receiving t . However, there t1,t2 1 exists an x∗ ∈ S such that P(T = t |X = x∗) ≈ 0. Thus, the assignment mechanism is not regular and it is t1 2 impossibletoapproximatePATT withoutmakingunassailableassumptionsduetoindividualswithcovariates t1,t2 lying outside the intersection of the two treatment groups. One advice to handle this issue is to use a common support region, where those with either X or eˆ (X) (t1,t2) beyondtherangeofX oreˆ (X)ofthosereceivingtheothertreatmentareexcludedfromtheanalysisphase (t1,t2) (DehejiaandWahba,1998;Crumpetal.,2009).Adifferentadvicetoreducedifferencesbetweenmatchedsubjects is by using a caliper matching procedure, and dropping units without eligible matches with similar eˆ (X) in t1,t2 Lopez&Gutman/Matchingwithmultipletreatments 5 the other group (Caliendo and Kopeinig, 2008; Stuart, 2010). With either of these advices, the treatment effect only generalizes to those receiving t who were eligible to be treated with treatment t (i.e., the intersection of 1 2 the treatment groups in Figure 1, Scenario c). Let E be an indicator for subject i having a propensity score 1i within the common support of eˆ (X). Defensible estimands of interest are now (t1,t2) PATE = E[Y (t )−Y (t )|E =1] (5) E1(t1,t2) i 1 i 2 1i PATT = E[Y (t )−Y (t )|E =1,T =t ]. (6) E1(t1,t2) i 1 i 2 1i i 1 Although estimands (5) and (6) share the same common support, they may differ if the covariates’ distributions of the treated in E differ from that of the control. 1i 1.2. Notation for multiple treatments The choice of estimands grows with increasing treatment options. Let T = {t ,t ,...,t } be the treatment 1 2 Z support for Z total treatments, with Y ={Y (t ),Y (t ),...,Y (t )} the set of potential outcomes for subject i. i i 1 i 2 i Z To define potential outcomes and estimate treatment effects with multiple treatments, our assumptions are expanded as follows. First, the SUTVA expands across a subject’s vector of potential outcomes. Second, a reg- ular treatment assignment mechanism requires that individualistic, probabilistic, and unconfoundedness hold for multiple exposures. Like in the binary case, a random sample of N units from an infinite super-population results in an individualistic assignment mechanism. Assignment mechanisms are super-population probabilistic if 0 < f (t|Y (t ),...,Y (t ),X ,φ) < 1 ∀ t∈{t ,...t }, T|Y(t1)...Y(tZ),X i 1 i Z i 1 Z for each possible X ,Y (t ),...,Y (t ). With multiple treatments, a super-population unconfounded assignment i i 1 i Z mechanism requires that f (t|y ,...y ,x,φ)=f (t|x,φ) ∀ y ,...,y ,x,φ and t∈{t ,...,t }. T|Y(t1),...,Y(tZ),X t1 tZ T|X t1 tZ 1 Z We first present a broad definition of the possible contrasts that may be of interest with multiple treatments. Definew andw astwosubgroupsoftreatmentssuchthatw ,w ⊆T andw ∩w =∅.Next,let|w |and|w | 1 2 1 2 1 2 1 2 be the cardinality of w and w , respectively. Possible estimands of interest are PATE and PATT , 1 2 w1,w2 w1|w1,w2 where (cid:20)P Y (t) P Y (t)(cid:21) PATE = E t∈w1 i − t∈w2 i , (7) w1,w2 |w | |w | 1 2 (cid:20)P Y (t) P Y (t) (cid:21) PATT = E t∈w1 i − t∈w2 i T ∈w . (8) w1|w1,w2 |w | |w | i 1 1 2 In (7) and (8), the expectation is over all units, i=1,...,N, and the summation is over the potential outcomes of a specific unit. Lopez&Gutman/Matchingwithmultipletreatments 6 An example of when (7) and (8) are scientifically meaningful is in a setting with two conventional and three atypical antipsychotic drugs, where physicians first choose drug type (conventional or atypical) before choosing an exact prescription (Tchernis, Horvitz-Lennon and Normand, 2005). In this case, an investigator could be interested in the general treatment effect between conventional treatments, w = {t ,t }, and atypical ones, 1 1 2 w ={t ,t ,t }, and an estimand of interest could be PATE =E[Yi(t1)+Yi(t2) − Yi(t3)+Yi(t4)+Yi(t5)]. 2 3 4 5 w1,w2 2 3 The most traditional estimands with multiple treatments contrast all treatments using simultaneous pairwise comparisons, where w and w are each composed of one treatment. Using equation (7), there are (cid:0)Z(cid:1) possible 1 2 2 PATE’s of interest. It is important to note that pairwise PATE’s are transitive. Formally, for w = {t }, 1 1 w ={t }, and w ={t }, PATE −PATE =PATE . 2 2 3 3 w1,w3 w1,w2 w2,w3 For reference group w = {t }, researchers are commonly interested in Z − 1 pairwise PATT’s, one for 1 1 each of the treatments which the reference group did not receive (McCaffrey et al., 2013). In order to compare amongtheZ−1treatments,thePATT’sshouldalsobetransitive,suchthatPATT −PATT = w1|w1,w3 w1|w1,w2 PATT . This property generally does not extend when conditioning on a population eligible for different w1|w2,w3 treatmentgroups.Forexample,unlessthesuperpopulationsofthosereceivingtreatmentsw andw areidentical, 1 2 PATT −PATT is generally not equal to PATT . w1|w1,w2 w2|w2,w3 w1|w1,w3 Fortheremainderofthemanuscript,weassumethatpairwisecontrastsbetweentreatmentsaretheestimands of interest, so that |w |=|w |=...=|w |=1. 1 2 z 1.3. The generalized propensity score The generalized propensity score (GPS), r(t,X)=Pr(T =t|X = x), extends the PS from a binary treatment setting to the multiple treatment setting (Imbens, 2000; Imai and Van Dyk, 2004). With a binary treatment, knowing e (X) is equivalent to knowing 1−e (X). Thus, two individuals t1,t2 t1,t2 with the same PS are also identical with respect to their probability of receiving t . Conditioning with multiple 2 treatments, however, often must be done on a vector of GPS’s, defined as R(X) = (r(t ,X),...,r(t ,X)), or 1 Z a function of R(X) (Imai and Van Dyk, 2004). Two individuals with the same r(t,X) for treatment t may have differing R(X)’s. For example, for T = {t ,t ,t }, let R(X ),R(X ), and R(X ) be the GPS vectors for subjects i, j, and k, respectively, where 1 2 3 i j k T =t , T =t , and T =t , with i 1 j 2 k 3 R(X )=(0.30,0.60,0.10), i R(X )=(0.30,0.35,0.35), j R(X )=(0.30,0.10,0.60). k Even though r(t ,X ) = r(t ,X ) = r(t ,X ) = 0.30, because r(t ,X ) 6= r(t ,X ) 6= r(t ,X ) and 1 i 1 j 1 k 2 i 2 j 2 k r(t ,X ) 6= r(t ,X ) 6= r(t ,X ), differences in outcomes between these subjects would generally not provide 3 i 3 j 3 k Lopez&Gutman/Matchingwithmultipletreatments 7 unbiased causal effect estimates (Imbens, 2000). In part due to this limitation, Imbens (2000) called individual matchingless’well-suited’tomultipletreatmentsettings.OnlyunderthescenarioofR(X )=R(X )=R(X ) i j k would contrasts in the outcomes of subjects i, j, and k provide unbiased unit-level estimates of the causal effects between all three treatments (Imbens, 2000; Imai and Van Dyk, 2004). For nominal treatment, the multinomial logistic and the multinomial probit models have been proposed to estimate R(X), and for ordinal treatment, the proportional odds model has been suggested (Imbens, 2000; Imai and Van Dyk, 2004). Alternatively, researchers have also used models designed for binary outcomes to estimate R(X), including logistic and probit regression models on different subsets of subjects receiving each pair of treatments. Although a multinomial model is more intuitive, in practice, Lechner (2002) identified correlation coefficientsofroughly0.99whencomparingtheconditionaltreatmentassignmentprobabilitiesfromasetofbinary probit models to those from a multinomial probit. As another option, McCaffrey et al. (2013) used generalized boostedmodelstoindependentlyestimateP(I (t)|X),whereI (t)={1 if T =t, 0 otherwise}.Theprobabilities i i i estimatedusinggeneralizedboostedmodelsmaynotadduptounity.Toaddressthisissue,McCaffreyetal.(2013) proposed an additional procedure that selects one treatment as a holdout and estimates P(I (t)|X) using the i estimatedoddsratiosoftheprobabilityofbeingassignedtoeachtreatmentversustheprobabilityofbeingassigned totheholdouttreatment.Thechoiceoftheholdouttreatmentmayresultindifferentestimatedprobabilities,and because it relies on binary estimation of subsamples of the population, it may not be able to adjust for the entire R(X).Inourreviewbelow,wespecifythemodelthatisusedtoestimatethetreatmentassignmentprobabilities suggested by each method. 1.4. Ordinal treatments With ordinal treatments, such as scales (e.g. never - sometimes - always) or doses (e.g. low - medium - high), it is sometimes possible to condition on a scalar balancing score in place of conditioning on a vector. This can be donebyestimatingtheassignmentmechanismasafunctionofX usingtheproportionaloddsmodel(McCullagh, 1980), such that (cid:18) (cid:19) P(T <t) log i =θ −βTX ,t=1,...,Z−1. (9) P(T ≥t) t i i Letting βT = (β ,..β )T, Joffe and Rosenbaum (1999) and Imai and Van Dyk (2004) showed that after using 1 p this model for the assignment mechanism, differences in outcomes between units with different exposure levels but equal βTX scores can provide unbiased unit-level estimates of causal effects at that βTX. The balancing property of βTX can be used to match or subclassify subjects receiving different levels of an ordinalexposure.Luetal.(2001)usednon-bipartitematchingtoformmatchedsetsbasedonafunctionofβTX and the relative distance between exposure levels. While this method does not specify an exact causal estimand, Lopez&Gutman/Matchingwithmultipletreatments 8 it is used for testing the hypothesis of whether or not a dose-response relationship exists between T and Y (see Armstrong,JagolinzerandLarcker(2010);Frank,AkreshandLu(2010);Snodgrassetal.(2011)tonameafew). Imai and Van Dyk (2004); Zanutto, Lu and Hornik (2005); Yanovitzky, Zanutto and Hornik (2005) and Lopez and Gutman (2014) used Equation (9) to estimate treatment assignment by subclassifying subjects with similar βTX values. After subclassification on βTX, the distribution of X across treatments is roughly equivalent for units in the same subclass. Unbiased causal effects can be estimated within each subclass, and aggregated across subclasses using a weighted average to estimate either PATE’s or PATT’s (Zanutto, Lu and Hornik, 2005). LopezandGutman(2014)foundthatcombiningregressionadjustmentwithsubclassificationyieldedmoreprecise estimates. Adifferentstrategyforestimatingthecausaleffectsofordinalexposuresistodichotomizethetreatmentusing apre-specifiedcutoffandbinarypropensityscoremethods(Chertow,NormandandMcNeil,2004;Davidsonetal., 2006;Schneeweissetal.,2007).Thisproceduremayresultinalossofinformation,asallsubjectsononesideofthe cutoff are treated as having the same exposure level, and could violate the component of SUTVA which requires no hidden treatment. Royston, Altman and Sauerbrei (2006) identified a loss of power, residual confounding of the treatment assignment mechanism, and possible bias in estimates as the results of dichotomization. Moreover, dichotomizationmakesidentificationofanoptimalexposurelevelimpossible.Thus,matchingorsubclassifications methodswhichmaintainallexposurelevelswhilebalancingonβTX arepreferredforcausalinferencewithordinal exposures (Imai and Van Dyk, 2004). Inverse probability weighting can also be used to estimate causal effects from ordinal treatments (Imbens, 2000). 1.5. Nominal treatments Nominal treatments do not follow a specific order. Thus, it is harder to identify a ‘sensible’ function that reduces R(X) to a scalar. Several methods have been proposed to estimate causal effects with multiple treatments from observational data. We provide an overview of these methods and explicate on their assumptions and estimands. 1.5.1. Series of binomial comparisons Lechner (2001, 2002) estimated PATT’s between multiple treatments using a series of binary comparisons (SBC). SBC implements binary propensity score methods within each of the (cid:0)Z(cid:1) pairwise population subsets. 2 For example, a treatment effect comparing t to t uses only subjects receiving either t or t , ignoring subjects 1 2 1 2 that received t . Lechner advocates matching on either eˆ (X), estimated using logistic or probit regression, 3 (t1,t2) or rˆ(t ,X)/(rˆ(t ,X)+rˆ(t ,X)), where rˆ(t ,X) and rˆ(t ,X) are estimated using a multinomial regression 1 1 2 1 2 model. Figure2(Scenariod)depictstheuniquecommonsupportregionsforZ =3whenusingSBC,wheretreatment Lopez&Gutman/Matchingwithmultipletreatments 9 effects reflect different subsets of the population. Let e (X,T = t ) and e (X,T = t ) be the (t1,t2) 1 (t1,t2) 2 vectorofallbinarypropensityscoresamongsubjectsreceivingt andt ,respectively.WedefineE (t ,t )asthe 1 2 2i 1 2 indicator for subject i having a binary propensity score for treatments t and t within the common support: 1 2 E (t ,t )= 1 if e(t1,t2)(Xi)∈e(t1,t2)(X,T = t1)∩e(t1,t2)(X,T = t2) 2i 1 2 0 if otherwise SBC estimates the causal effect of treatment t versus treatment t , among those on t , as 1 2 1 PATT =E[Y (t )−Y (t )|T =t ,E (t ,t )=1]. (10) E2(t1|t1,t2) i 1 i 2 i 1 2i 1 2 EachpairwisetreatmenteffectfromSBCgeneralizesonlytosubjectseligibleforthatspecificpairoftreatments, as opposed to those eligible for all treatments. Such pairwise treatment effects are not transitive, and cannot generallyinformwhichtreatmentisoptimalwhenappliedtotheentirepopulation.Forexample,PATT E2(t1|t1,t2) andPATT maygeneralizetoseparatesubsetsofunitswhoreceivedt (i.e.,thesuperpopulationwhere E2(t1|t1,t3) 1 E (t |t ,t )=1 could differ from the super population where E (t |t ,t )=1). 2i 1 1 2 2i 1 1 3 Despite this major limitation, versions of SBC have been applied in economics, politics, and public health (Bryson, Dorsett and Purdon, 2002; Dorsett, 2006; Levin and Alvarez, 2009; Drichoutis, Lazaridis and Nayga Jr, 2005; Kosteas, 2010). 1.5.2. Common referent matching Withthreetreatments,Rassenetal.(2011)proposedcommonreferentmatching(CRM)tocreatesetswithone individual from each treatment type. For T = {t ,t ,t }, the treatment t such that n = min{n ,n ,n }, 1 2 3 1 t1 t1 t2 t3 is used as the reference group. CRM is composed of 3 steps. (1) Among those receiving each pair of treatments, {t ,t } or {t ,t }, logistic 1 2 1 3 or probit regression is used to estimate e (X) and e (X), respectively; (2) Using 1:1 matching, pairs of t1,t2 t1,t3 units receiving t or t are matched using eˆ (X) and pairs of units receiving t or t are matched using 1 2 t1,t2 1 3 eˆ (X); (3) These two cohorts are used to construct 1:1:1 matched triplets using the patients receiving t who t1,t3 1 were matched to both a unit receiving t and a unit receiving t , along with their associated matches. Matched 2 3 pairs from treatments t and t are discarded if the unit receiving t was not matched with a unit on treatment 1 3 1 t , and pairs of units receiving t and t are discarded when there is no match for the reference unit to a unit 2 1 2 receiving t . 3 Let E be the indicator for having two pairwise binary PS’s within their respective common supports, such 3i that 1 if E2i(t1,t2)=1 and E2i(t1,t3)=1 E = 3i 0 if otherwise. Lopez&Gutman/Matchingwithmultipletreatments 10 t1 t2 t1 t2 t3 t3 Scenariod Scenarioe Fig 2: Two scenarios of eligible subjects with three treatments: shaded areas represent subjects included in a matched analysis CRM attempts to estimate the following treatment effects: PATT = E[Y (t )−Y (t )|T =t ,E =1] E3(t1|t1,t2) i 1 i 2 i 1 3i PATT = E[Y (t )−Y (t )|T =t ,E =1] E3(t1|t1,t3) i 1 i 3 i 1 3i PATT = E[Y (t )−Y (t )|T =t ,E =1] E3(t1|t2,t3) i 2 i 3 i 1 3i PATT is the average difference in the potential outcomes of receiving treatments t and t among the E3(t1|t2,t3) 2 3 population of subjects who received t . 1 Rassen et al. (2011) relied on common sampling variance estimates produced by the SAS statistical software (SAS Institute Inc., 2003) to make inference. These estimates may underestimate the sampling variance, because they ignore the variability induced by the matching procedure. The next section will explain the possible issues that arise from CRM and similar procedures. 1.5.3. Interlude: Binary PS applications to multiple treatments The following hypothetical example with Z =3 illustrates issues with the implementation of binary PS tools, as in SBC and CRM, when there are multiple treatments. LetX =(cid:0)x1i(cid:1)beavectorofcovariatesforsubjecti,andwewillassumethatX |T =t∼N(µ ,1),whereµ i x2i i i t t isa2x1meanvectorand1isthe2x2identitymatrix.Further,weletµ =(0,0),µ =(0,a),andµ =(a,0). 1 2 3 An arbitrary linear combination of X can be expressed as the sum of components along the standardized linear discriminant, Z, and orthogonal to it, W (Rubin and Thomas, 1992a). Matching on the true or estimated propensity score does not introduce any bias in W when X |T follows a multivariate normal distribution. In i i addition,aftermatching,W willhavethesameexpectedsecondmoment(RubinandThomas,1992b).Specifically, √ √ when matching treatment 1 to treatment 2 with a = 2, Z = (cid:0)0(cid:1)0X / 2 = 2X and W = X . After 12 2 1 2 12 1 matching, Rubin and Thomas (1992b) showed that