ebook img

ArCo: An Artificial Counterfactual Approach for Aggregate Data PDF

48 Pages·2015·0.74 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ArCo: An Artificial Counterfactual Approach for Aggregate Data

ArCo: An Artificial Counterfactual Approach for Aggregate Data∗ Carlos Carvalho Ricardo Masini Marcelo C. Medeiros PUC-Rio PUC-Rio PUC-Rio February 2015 (Preliminary version; please do not circulate without permission) Abstract We consider a new method to conduct counterfactual analysis with aggregate data when a “treated” unit suffers a shock or an intervention, such as a policy change. The proposed approach is based on the construction of an artificial counterfactual from a pool of “untreated” peers, and is inspired by different branches of the lit- erature such as: the Synthetic Control method, the Global Vector Autoregressive models, the econometrics of structural breaks, and the counterfactual analysis based on macro-econometric and panel data models. We derive an asymptotic Gaussian estimator for the average effect of the intervention and present a collection of com- panion hypothesis tests. We also discuss finite sample properties and conduct a detailed Monte Carlo experiment. JEL Codes: C22, C23, C32, C33. Keywords: counterfactual analysis, comparative studies, ArCo, artificial counter- factual, synthetic control, policy evaluation, intervention, structural break, panel data, factor models. ∗Wethankseminarparticipantsat2014NBER-NSFTimeSeriesConference,SBE2014,andPUC-Rio StudentWorkshopforvaluablesuggestions. Anymistakesareourown. Emails: [email protected], [email protected], [email protected]. 1 1 Introduction In this paper we propose a new method to carry out counterfactual analysis with aggregate data to evaluate the impact of interventions such as, for example, policy changes, the start of new government, outbreaks of wars, just to name a few possible cases. Our approach is specially useful in situations where there is a single “treated” unit and no available “con- trols”. The goal of the proposed method is the construction of an artificial counterfactual based on observed data from a pool of peers. Our approach is a generalization of the work of Hsiao et al. (2012) and also share some roots with the Synthetic Control (SC) method pioneered by Abadie and Gardeazabal (2003) as well as with the work of Pesaran and Smith (2012). Nevertheless, the overall proposed procedure differs from prior methods in several dimensions as will be clear in the next paragraphs. Causality is a major topic of empirical research in Economics. Usually, causal state- ments with respect of the adoption of a given treatment (intervention) rely on the con- struction counterfactuals based on the outcomes from a group of individuals not affected bythetreatment. Notwithstanding, definitivecause-and-effectstatementsareusuallyhard to formulate given the constraints that economists face in collecting the data. However, in micro-econometrics there has been major advances in the literature and the estimation of treatment effects is part of the toolbox of applied economists; see, for example, Angrist et al. (1996), Angrist and Imbens (1994), and Heckman and Vytlacil (2005). On the other hand, for aggregated data, the econometric tools have evolved in a much slowerpaceandmuchoftheworkhasfocusedonsimulatingcounterfactualsfromstructural models. However, in recent years, some authors have proposed new techniques inspired partially by the developments in micro-econometrics that are able, under some assump- tions, to conduct counterfactual analysis with aggregate (macro) data. Hsiao et al. (2012) put forward a simple panel data method to estimate counterfactuals and the authors stud- ied the impact of economic and political integration of Hong Kong with mainland China on Hong Kong’s economy. Zhang et al. (2014) applied the same techniques of Hsiao et al. (2012) to evaluate the impact of Canada-US Free Trade Agreement (FTA) on Canada’s GDP, labour productivity and unemployment. Abadie and Gardeazabal (2003) used the SC method to investigate the effects of terrorism on the GDP of the Basque Country while Abadie et al. (2010) and Abadie et al. (2014) applied the the same techniques to measure, respectively, theeffectsonconsumptionofalarge-scaletobaccocontrolprograminCalifor- nia and the economic impact of the 1990 German reunification in West Germany. Pesaran et al. (2007) and Dubois et al. (2009) used the Global Vector Autoregressive (GVAR) framework put forward by Pesaran et al. (2004) and Dees et al. (2007) to study the effects of the launching of the Euro. Pesaran and Smith (2012) studied the effects of the quanti- 2 tative easing (QE) in the United Kingdom with a new methodology partly inspired by the GVAR methods. Finally, Angrist et al. (2013) considered a new semi-parametric method to measure monetary policy effects. This paper fits into the literature of dynamic treatment effects and counterfactual analysis for aggregate data. We propose the Artificial Counterfactual (ArCo) method and we contribute to the literature in several directions. First, we construct an artificial counterfactual (“control” unit) by generalizing the panel data approach of Hsiao et al. (2012). Our proxy “control” unit is built as a possibly nonlinear function of a pool of peers and the proposed method is able to simultaneously test for effects in different variables as well as in multiple moments of set of variables such as, the mean and the variance, for example. This can be of special interest in Finance, where the goal could be the effects of different corporate governance policies in the returns and risk of the firms (Johnson et al., 2000; Mitton, 2002). More importantly, we show that the limiting distribution of the test when the time dimension increases is standard. Second, inspired by the literature on structural breaks, we show that our test statistic does not change when the exact time of the intervention is unknown and has to be estimated from the data Bai (1997a,b); Bai and Perron (1998); Hansen (2000). This is of crucial importance in the presence of anticipation effects. Third, we also derive a test for the case of multiple interventions. Finally, we construct a test to check for possible contamination effects among units. Our theoretical results have been derived under the hypothesis that the time dimension (T) grows but the number of peers (n) and the number of observed variables for each peer (q) are fixed quantities. However, we discuss the finite sample bias our our estimators and we propose a bootstrap procedure to improve performance in very small samples. A thorough Monte Carlo experiment is conducted in order to evaluate the small sample performance of the ArCo methodology. Apart from this Introduction, the paper is organized as follows. In Section 2 we present the ArCo method and discuss the conditional model used in the first step of the methodology. We also discuss the similarities and differences with respect to other alternatives recently proposed in the literature. In Section 3 we derive the asymptotic properties of the ArCo estimator for the average causal effect under two distinct cases: the time of the intervention is known for certainty and the case where the date of the intervention is unknown. Finite sample properties of the estimator are discussed in Section 4. The test for the null hypothesis of no causal effect is developed in Section 5. Extensions for multiple interventions and possible contamination effects are described in Section 6. In Section 7 we discuss some potential sources of bias in the ArCo method and Section 8 presents some discussion on the implementation of the method. A detailed Monte Carlo study is conducted in Section 9. Finally, Section 10 concludes. All proofs are relegated to 3 a technical appendix. 2 The Artificial Counterfactual Estimator 2.1 Definitions Suppose we have n units (countries, states, municipalities, firms, etc) indexed by i ∈ I ≡ {1,...,n}. For each unit and for every time period t = 1,...,T, we observe a vector of variables x ∈ Rq. Furthermore, assume that an intervention took place in unit i ∈ I, it 0 and only in unit i , at time T = (cid:98)λ T(cid:99), where λ ∈ (0,1). We are interested in the 0 0 0 0 potential effects of this intervention in y for t ≥ T , where y is some transformation i0t 0 i0t of the variables x . Without loss of generality assume that the dimensions of y and i0t i0t x are the same. i0t More formally, write y = (y(cid:48) ,...,y(cid:48) ,...,y )(cid:48) = g(x ) as a vector function of t 1t i0t nt t x = (x(cid:48) ,...,x(cid:48) ,...,x )(cid:48) ∈ Rnq and define the following null hypothesis of interest: t 1t i0t nt H : ∆ ≡ E(y −y∗ ) = 0, for t ≥ T , (1) 0 i0t i0t 0 where y∗ is the counterfactual, i.e., what would y have been like had there been no i0t i0t intervention. The choice of the transformation g(·) will depend on which moment of the data the econometrician is interested in testing for possible effects of the intervention. In other words, the goal will be to test for a break in a set of unconditional moments of the data and check if this break is solely due to the intervention or has other (global) causes (confounding effects). Typical choices for g(x ) are: t  x for testing changes in the mean  t  g(x ) = vech [(x −x¯)(x −x¯)(cid:48)] for testing changes in the covariances t t t   vec [(x −x¯)(x −x¯)(cid:48)] for testing changes in the auto-covariances t t+k where x¯ = 1 (cid:80)T0−1x .1 t T0−1 t=1 t It is clear that ∆ in (1) cannot be directly computed as y∗ is not observed. We con- i0t struct a proxy variable for y based on the Artificial Counterfactual (ArCo) method. Our i0t approach is inspired by the Synthetic Control (SC) methodology of Abadie and Gardeaz- abal (2003), the counterfactual analysis of Pesaran and Smith (2012), and the literature on Global VARs; see, for example, Pesaran et al. (2004), Dees et al. (2007), Pesaran et al. 1When testing for breaks in the variances it is not necessary to take the whole matrix of covariances. Instead, y =diag [(x −x¯)(x −x¯)(cid:48)]. t t t 4 (2007). From Abadie and Gardeazabal (2003) we borrow the idea of writing the proxy (artificial) counterfactual as a function of data from a pool of donors, i.e., the other n−1 observed units in our dataset that we assume that are not “treated”. However, contrary to the SC method, and following Pesaran and Smith (2012) we make use of a very general model for the time-series dynamics of the data to construct the counterfactual. Although we do not write explicitly a Global VAR model, we assume that the variables for all units follow an infinite order vector moving average with a common factor structure to capture possible correlation among different units. Let Y = (cid:0)y(cid:48) ,...,y(cid:48) (cid:1)(cid:48), where y −i0t −i0t −i0t−p −i0t be the elements of y which do not belong to the unit i . Accordingly, the key idea is to t 0 write the artificial counterfactual as  y for t < T y∗ = i0t 0 (2) (cid:98)i0t G(Y ;θ) for t ≥ T , −i0t 0 whereG : R(n−1)pq → Rq isapossiblynonlinearfunctionofY indexedbytheparameter −i0t vectorθ ∈ ΘandΘisametricspace. InprincipleweleavethefunctionG(·;·)unspecified. We also do not restrict Θ to be a subsect of the Euclidean space as we may allow for semi- parametric estimation; see, for example, the discussion in P¨otscher and Prucha (1997). Consider the following assumption about the data generating process (DGP) of x : t Assumption 1. The vector x = (x(cid:48) ,x(cid:48) )(cid:48) follows the stochastic process described t i0t −i0t bellow. “Treated” Unit For t < T : 0 x −µ = Ψ (L)ε (3) i0t i0,1 ∞,i0,1 i0t ε = Λ f +η , (4) i0t i0,1 t i0t where E(η η(cid:48) ) = R . i0t i0t i0,1 For t ≥ T : 0 x −µ = Ψ (L)ε (5) i0t i0,2 ∞,i0,2 i0t ε = Λ f +η , (6) i0t i0,2 t i0t where E(η η(cid:48) ) = R . i0t i0t i0,2 “Control” Units: 5 For all t: x −µ = Ψ (L)ε (7) −i0t −i0 ∞,−i0 −i0t ε = Λ f +η , (8) −i0t −i0 t −i0t where E(η η(cid:48) ) = R . −i0t −i0t −i0 Furthermore, f ∈ Rf is a vector of common unobserved factors which satisfies E(f ) = t t 0, E(f f(cid:48)) = Q, and E(f f(cid:48)) = 0, ∀t (cid:54)= j. The error η = (η(cid:48) ,η(cid:48) )(cid:48) ∈ Rnq is such that t t t j t i0t −i0t E(η ) = 0, E(η η(cid:48)) = R , for t < T and E(η η(cid:48)) = R , for t ≥ T . Both R and R are t t t 1 0 t t 2 0 1 2 assumed to be block diagonal. Additionally, E(η η(cid:48)) = 0, ∀t (cid:54)= j, and E(η f(cid:48)) = 0, ∀t,j. t j t j Q, R and R are (nq×nq) positive semi-definite covariance matrices. Λ , Λ , Λ 1 2 i0,1 i0,2 −i0 are matrices of factor loadings. Finally, L is the lag operator and Ψ (L) = (I +ψ L+ ∞,i q 1i ψ L2 +···) is such that (cid:80)∞ ψ2 < ∞ for all i = 1,...,n. I is the identity matrix. 2i j=0 ji The DGP in Assumption 1 is fairly general and nests several models as by the multi- variate Wold decomposition and under mild conditions, any second-order stationary vec- tor process can be written as an infinite order vector moving average process; see Niemi (1979). Furthermore, under a modern macroeconomics perspective, reduced-forms for Dy- namicStochasticGeneralEquilibrium(DSGE)modelsarewrittenasvectorautoregressive moving average (VARMA) processes, which, in turn, are nested in the general specifica- tion in Assumption 1 (Ferna´ndez-Villaverde et al., 2007; An and Schorfheide, 2007). The condition that R and R are block-diagonal is not restrictive as the correlation among 1 2 units is captured by the common factor structure. It also important to stress that for the “treated” unit, the intervention may cause changes in any subset of the parameters that define the DGP. On the other hand, the DGP for the peers does not suffer any parameter change. Consider the following identifying assumption. Assumption 2. Define G(Y ;θ ) = E(cid:0)y |Y (cid:1) for the pre-intervention period and −i0t 0 i0t −i0t G˜(y ;θ˜ ) = E(cid:0)y∗ |Y (cid:1) for the post-intervention period for some (θ(cid:48),θ˜(cid:48))(cid:48) ∈ Θ×Θ. −i0t 0 i0t −i0t 0 0 ˜ ˜ G(·;·) = G(·;·) and θ = θ for 1 ≤ t < T . 0 0 0 Assumption 2 is a key identifiability condition as it allows the econometrician to es- timate the counterfactual, y∗, from the pool of controls y using the same conditional t −i0t expectation model used during the pre-intervention period. Lemma 1. Let m (θ ) = y −G(Y ;θ ) for t ≥ T . Under Assumption 2, t 0 i0t −i0t 0 0 E[m (θ )] = E(y )−E(y∗ ). t 0 i0t i0t 6 Thefunctionm (θ )isthebasistoconstructtheArComethod. Setθ(cid:98) asanestimator t 0 T0 of θ based on the first T −1 observations. 0 0 Definition 1. The Artificial Counterfactual (ArCo) estimator is defined as T (cid:104) (cid:105) 1 (cid:88) ∆(cid:98)i0(λ0) ≡ ∆(cid:98)i0 λ0,θ(cid:98)(λ0) = (cid:98)(1−λ )T(cid:99) m(cid:99)t,T0, (9) 0 t=T0 where m(cid:99)t,T0 ≡ mt(θ(cid:98)T0) for t ≥ T0. At this point the following remarks are important. Remark 1. The ArCo estimator in (9) is defined under the assumption that λ is known. 0 However, in some cases the exact time of the intervention might be unknown due, for example, some anticipation effects. On the other hand, the effects of a policy change may take some time to be noticed. Although the main results in Section 3 are derived under the assumption of known λ , we later show they are still valid when λ is estimated in a first 0 0 step. Remark 2. Assumption 1 allows only for a single break. Nevertheless, we extend to the multiple break case in Section 6.2. Remark 3. In most applications the intervention exists for sure (the outbreak of a war, a new government, a different policy, a new law, etc) and, differently from comparative studies with micro data, it is usually an idiosyncratic change in unit i . For example, only 0 in unit i the new law was enforced or a new government began. On the other hand, the 0 effects of such interventions are unknown and the main goal is to discriminate the actual effects from other confounding factors. Nevertheless, in Section 6.3 we put forward a test that can be used to detect contamination (spillover) effects and in Section 7 we discuss possible sources of biases in our methodology. 2.2 Estimation of the Conditional Model As discussed above, the ArCo method requires an estimator for the parameter vector θ . 0 Consider the following conditional model y = G(Y ;θ )+ν for t < T . (10) i0t −i0t 0 i0t 0 Although Assumption 2 states that G(Y ;θ ) is the expectation of y conditional −i0t 0 i0t on Y , i.e., the conditional mean is correctly specified, the derivations in this section −i0t 7 follow the approach in Po¨tscher and Prucha (1997), where model (10) could be potentially misspecified. The estimator θ(cid:98) is defined as T0 1 T(cid:88)0−1 θ(cid:98) ≡ argminV (y ,...,y ;θ) = argmin v (θ), T0 T0 1 t T −1 t θ∈Θ θ∈Θ 0 t=1 where v (θ) = ν (θ)(cid:48)ν (θ) ∈ R and ν (θ) = y − G(Y ;θ). In the present t i0t i0t i0t i0t −i0t framework, let θ to be defined as 0 θ = argminE[V (y ,...,y ;θ)]. 0 T0 1 t θ∈Θ Wenowpresentsomefairlyweakassumptionstoensuretheconsistencyoftheestimator above. Assumption 3. Assume that (a) Θ is a compact metric space. (b) G(·;θ) is Borel mesurable on Y ⊂ R(n−1)pq for each θ ∈ Θ. (c) G(Y ;·) is equicontinuous on Θ×Y . −i0t (d) The process {x } is (cid:96) -approximable by some α-mixing basis process. Furthermore, the t 0 family of transformation functions g(x ), where g(·) : X → G is equicontinuous in t X , where X ⊆ Rnq and G ⊆ Rk, k ≥ 1. (e) sup|G(·;θ)| < ∞ for all θ ∈ Θ. t (f) For some γ > 0 and as T → ∞, 1 T(cid:88)0−1 (cid:20) (cid:21) E sup|G(Y ;θ)|2+2γ < ∞, T −1 −i0t 0 Θ t=1 1 T(cid:88)0−1E(cid:12)(cid:12)y (cid:12)(cid:12)γ < ∞. T −1 i0t 0 t=1 (g) G(Y ;θ ) = G(Y ;θ ) if, and only if, θ = θ for all (θ .θ ) ∈ Θ×Θ. −i0t 1 −i0t 2 1 2 1 2 Assumption 3(a) is not restrictive as even if Θ is not compact, we can follow the discussion in Section 4.3 of P¨otscher and Prucha (1997) and replace Θ by the compact subset Θ∗ = [θ −M,θ +M], where M > 0 is a constant. In this case we need the extra 0 0 8 (cid:26) (cid:27) assumption that E sup[G(Y ;θ)−G(Y ;θ )]2 < ∞. Assumptions 3(b)-(c) are −i0t −i0t 0 Θ∗ dependent on the choice of the function G(·;·) but are trivially handled. Assumption 3(d) can be replaced by assumptions on the distribution of {η } in Assumption 1. For t instance,if{η }isanindependentandidenticallydistributedsequenceofGaussianrandom t vectors, {x } will be α-mixing and Assumption 3(d) is redundant. The assumption on the t transformation function g(·) is important in order to {y } inherit the “fading memory” t properties of {x }. Assumptions 3(e)-(f) can be relaxed depending of the choice of G(·;·) t and the correctness of the approximating model (10). Finally, Assumption 3(e) states the identification condition. The following proposition states the consistency result. p Proposition 1. Under Assumptions 1 and 3, θ(cid:98) →− θ as T → ∞. T0 0 Remark 4. Although the model is conditioned on p lags of y , a much more parsimo- −i0t nious model can be obtained setting p = 0. In this case the errors will be potentially auto- correlated and a Feasible Generalized Nonlinear Least Squares (FGNLS) estimator could be used instead. All the analysis below does not necessarily depend upon which estimator one chooses to estimate θ , as long as it is consistent. 0 Remark 5. A typical choice for the conditional model is to define G(Y ;θ) as a linear −i0t function of Y . In this case, the conditional expectation could be potentially misspecified −i0t and model (10) would be just a linear projection. However, all the results in this paper would still hold as long as the conditional expectations in Assumption 2 are replaced by the linear projection operator and the conditions in Assumption 3 continue to hold. Remark 6. G(Y ;θ) could also be a linear combination of some dictionary of functions −i0t representing some sieve approximation to the conditional expectation. 2.3 A Comparison with the Literature The papers most similar to ours are Hsiao et al. (2012) and Zhang et al. (2014). In these papers the authors also write a panel factor model for the “treated” and ”untreated” units. However, in their case a single variable is observed for each unit, i.e., x ∈ R. it They construct their counterfactual for x based on a linear regression of x on x ∈ i0t i0t −i0t Rn−1. Hence, their G(·;·) : Rn−1 : R is linear and is assumed to be the true conditional expectation. Furthermore, their asymptotic results rely on correct specification of the conditional model and on the factor structure. In our view, the ArCo approach is a generalization of the nice ideas in Hsiao et al. (2012) and Zhang et al. (2014) due to the following reasons: (1) our proofs are substantially different and much more general 9 than the ones in Hsiao et al. (2012); (2) we consider a multivariate setting where we can simultaneously test for effects of the intervention in a set of variables ; (3) the model used to construct the counterfactual can be nonlinear and misspecified model; and (4) as shown in the following sections we also show that the our method is still valid one the date of the intervention is unknown and we also develop tests for multiple interventions. Although, both the ArCo and the SC methods construct an artificial counterfactual as a function of observed variables for a pool of peers (“untreated” units), the two ap- proaches are different in several dimensions. First, the SC method relies on a convex combination of peers to construct the counterfactual. The ArCo solution is a general nonlinear function. Even, the function G(·;·) is chosen to be linear, the method does not impose any restriction on the parameters of the linear combination. However, it is clear that the ArCo requires the estimations of a parameter vector of higher dimension that the SC alternative. Nevertheless, in cases where the number of observations is small, restrictions on the parameter vector can be easily imposed. For example, we can restrict each equation in (10) to be indexed by the same set of parameters. Second, the weights in the SC method (equivalent to our vector θ) are estimated using time averages of the observed variables for each peer. Therefore, all the time-series dynamics is removed and the weights are determined in a pure cross-sectional setting. In some applications of the SC method, the number of observations to estimate the weights is much lower than the number of parameters to be determined. For example, in Abadie and Gardeazabal (2003) the authors have 13 observations to estimate 16 parameters. A similar issue also appears in Abadie et al. (2010, 2014). Third, the SC method was designed to evaluate the effects of the intervention in a single variable. In order to evaluate the effects in a vector of variables, the method has to be applied several times. The ArCo methodology can be directly applied to a vector of variables of interest. Fourth, our inferential procedure is not based on permutation tests. With respect to the methodology put forward by Pesaran and Smith (2012), the major difference is that the authors construct the artificial counterfactual based on variables that belongtothe“treated”unitandtheydonotrelyonapoolof“untreated”peers. Theirkey assumptionisthatasubsetofvariablesofuniti isinvarianttotheintervention. Although, 0 in some specific cases this could be a reasonable hypothesis, in a general framework this is quite restrictive. Therefore, their method is like the ArCo with G(y ;·) replaced by −i0t G(z ;·), where z is a subset y which is not affected by the intervention. i0t i0t i0t Recently, Angrist et al. (2013) propose a semiparametric method to evaluate the effects of monetary policy based on the so called policy propensity score. Similar to Pesaran and Smith(2012),theauthorsonlyrelyoninformationonuniti andnodonorpoolisavailable. 0 As before, this is a major difference from our approach. Furthermore, their methodology 10

Description:
estimator for the average effect of the intervention and present a collection of com- [email protected], [email protected]. 1 . Consider the following assumption about the data generating process (DGP) of xt:.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.