Intersection Bounds, Robust Bayes, and Updating Ambiguous Beliefs. Toru Kitagawa (cid:3) CeMMAP and Department of Economics, UCL Preliminary Draft November, 2011 Abstract Thispaperdevelopsmultiple-priorBayesianinferenceforaset-identi(cid:133)edparameterwhose identi(cid:133)ed set is constructed by an intersection of two identi(cid:133)ed sets. We formulate an econo- metrician(cid:146)s practice of "adding an assumption" as "updating ambiguous beliefs." Among severalwaystoupdateambiguousbeliefsproposedintheliterature,weconsidertheDempster- Shaferupdatingrule(Dempster(1968)andShafer(1976))andthefullBayesianupdatingrule (FaginandHalpern(1991)andJa⁄ray(1992)),andarguethattheDempster-Shaferupdating ruleratherthanthefullBayesianupdatingrulebettermatcheswithaneconometrician(cid:146)scom- mon adoption of the analogy principle (Manski (1988)) in the context of intersection bound analysis. Keywords: Intersection Bounds, Multiple Priors, Bayesian Robustness, Dempster-Shafer Theory, Updating Ambiguity, Imprecise Probability, Gamma-minimax, Random Set, JEL Classi(cid:133)cation: C12, C15, C21. (cid:3)Email: [email protected]. Financial support from the ESRC through the ESRC Center for Microdata Methods and Practice (CEMMAP) (grant number RES-589-28-0001) is gratefully acknowledged. 1 1 Introduction The intersection bound analysis proposed by Manski (1990, 2003) provides a way to aggregate identifying information for a common parameter of interest by taking the intersection of multiple identi(cid:133)ed sets. This way of constructing and de(cid:133)ning the identi(cid:133)ed set innovates a new iden- ti(cid:133)cation scheme in econometrics, and it has been applied to a wide range of empirical studies, e.g., Manski and Pepper (2000) and Blundell, Gosling, Ichimura, and Meghir (2007)). Recently, Chernozhukov, Lee, and Rosen (2009) develop estimation and inference for the intersected iden- ti(cid:133)ed sets from the classical perspective and develop asymptotically valid con(cid:133)dence intervals for intersected identi(cid:133)ed sets. In this paper, we analyze inference and decision for this class of partially identi(cid:133)ed models from the multiple-prior Bayes perspective. Kitagawa (2011) develops a framework of multiple- prior Bayes analysis that can explicitly take into account robustness/agnosticism pursued in the partial identi(cid:133)cation analysis. This paper extends the approach of Kitagawa (2011) to the inter- section bound analysis with paying special attention to the following questions. Are there any multiple-prior Bayes (subjective probability) formulation that induces the operation of intersect- ing multiple identi(cid:133)ed sets? If so, what kind of subjective-probability-based reasoning do we have to invoke? We approach to these questions by modelling an econometrcian(cid:146)s practice of "imposing an assumption" that leads him to intersect multiple identi(cid:133)ed sets" as "updating ambiguous be- liefs". Among several ways to update ambiguous beliefs proposed in the literature, we consider the Dempster-Shafer updating rule (Dempster (1968) and Shafer (1976)) and the full Bayesian updating rule (Fagin and Halpern (1991) and Ja⁄ray (1992)). Our main (cid:133)nding is that the Dempster-Shafer updating rule instead of the full Bayesian updating rule leads to an aggregation rule of intersecting multiple (random) identi(cid:133)ed sets, and it replicates well the econometrician(cid:146)s commonadoptionofanalogyprinciple(Manski(1988))inthecontextofintersectionboundanaly- sis. This result replicates the belief function analysis of Dempster (1967a) and Shafer (1973), who deduce an aggregation rule of ambiguous information as intersecting random sets. Also, an axiomatic analysis on updating ambiguous beliefs by Gilboa and Schmeidler (1993) provides lucid multiple-prior interpretation behind the Dempster-Shafer updating rule, which also readily applies to our econometric framework. Given these early results, we consider contributions of this paper are (i) to clarify a link between the aggregation rule in the Dempster-Shafer(cid:146)s belief function analysis and growing literatures on inference on partial identi(cid:133)ed model, and (ii) to provide decision theorists some outside-lab evidence that the econometrician(cid:146)s common way of updating ambiguous beliefs is in line with the Dempster-Shafer updating rule rather than the full Bayesian updating rule. 2 To keep a tight focus on our theoretical development, we develop our analysis along with the following simple model of missing data with an instrumental variable (Manski (1990, 2003)). Suppose a survey targets at inferring the population distribution of a binary variable Y 1;0 2 f g (e.g., employed or not). In data, not all the sampled subjects respond to the survey, and the response indicator of a sampled subject is denoted by D 1;0 : D = 1 if Y is observed and 2 f g D = 0 if Y is missing. Suppose that the survey is conducted by two modes, say, either by E-mail or by phone. Let us indicate the survey mode by a binary random variable Z 1;2 : 2 f g Z = 1 if the individual is surveyed by E-mail and Z = 2 if he/she is surveyed by phone. If the survey modes are randomized, then it is reasonable to assume that the survey mode indicator Z is independent of the underlying outcome Y. Associated with this exogeneity restriction, we consider using Z as an instrumental variable in the following manner. Consider Pr(Y = 1 Z = 1) = Pr(Y = 1;D = 1 Z = 1)+Pr(Y = 1 D = 0;Z = 1)Pr(D = 0 Z = 1); j j j j Pr(Y = 1 Z = 2) = Pr(Y = 1;D = 1 Z = 2)+Pr(Y = 1 D = 0;Z = 2)Pr(D = 0 Z = 2): j j j j In the right hand side of the (cid:133)rst equation, the data let us consistently estimate Pr(Y = 1;D = 1 Z = 1) and Pr(D = 0 Z = 1), while the data are silent about the distribution of missing j j outcomesPr(Y = 1 D = 0;Z = 1). Hence,withoutanyassumptions onPr(Y = 1 D = 0;Z = 1), j j what we could say about Pr(Y = 1 Z = 1) given complete knowledge on the distribution of data j is Pr(Y = 1 Z = 1) [Pr(Y = 1;D = 1 Z = 1);Pr(Y = 1;D = 1 Z = 1)+Pr(D = 0 Z = 1)]: j 2 j j j (1.1) Similarly, without any assumptions on Pr(Y = 1 D = 0;Z = 2), it holds that j Pr(Y = 1 Z = 2) [Pr(Y = 1;D = 1 Z = 2);Pr(Y = 1;D = 1 Z = 2)+Pr(D = 0 Z = 2)]: j 2 j j j (1.2) The instrument exogeneity restriction, Y Z, then plays a role to combine these two bounds: ? Pr(Y = 1 Z = 1) = Pr(Y = 1 Z = 2) = Pr(Y = 1) implies that the parameter of interest j j Pr(Y = 1) must lie within the intersection of the two bounds (1.1) and (1.2), Pr(Y = 1;D = 1 Z = 1) max j ( Pr(Y = 1;D = 1 Z = 2) ) j Pr(Y = 1) (cid:20) Pr(Y = 1;D = 1 Z = 1)+Pr(D = 0 Z = 1) min j j : (1.3) (cid:20) ( Pr(Y = 1;D = 1 Z = 0)+Pr(D = 0 Z = 2) ) j j 3 We use intersecting two identi(cid:133)ed sets as a procedure to aggregate two independent pieces of set-identifying information of the common parameter. As far as identi(cid:133)cation is concerned, it is (cid:133)ne to say the complete knowledge on distribution of data is available, and therefore the operation of intersecting the two identi(cid:133)ed sets corresponds to an application of the Boolean logic. With the (cid:133)nite number of observations, however, it is not obvious how we can extrapolate the identi(cid:133)cation scheme of intersection bounds into the (cid:133)nite sample situation where we wish to use the language of probabilistic judgement for the parameter of interest. The main goal of this paper is to answer this question with focusing on a set of beliefs that represents ambiguity of missing data as well as ambiguous belief for the imposed restriction. As emphasized in Manski (1990), another interesting feature of the intersection bounds is the refutability property. It means that, if the intersection bounds turn out to be empty, we can refute the imposed restriction which the operation of intersection relies on. The above intersection bounds (1.3) possess the refutability property, i.e., there exists a distribution of data that makes the intersected bounds empty. This paper also investigates how to incorporate ambiguity of the imposed restriction into posterior inference for the parameter of interest. In the above missing data example, the parameter of interest Pr(Y = 1) is well-de(cid:133)ned no matter whether the exogeneity restriction is correctly speci(cid:133)ed or not. From the single-prior Bayesian point of view, it is natural to incorporate uncertainty on validity of exogeneity restriction into posterior inference by utilizing Bayesian model averaging. This paper explores how to extend the standard Bayesian model averaging to the multiple prior set-up. We derive and analyze the class of model-averaged posteriors , and discuss how to use it for the subsequent inference and decision for the parameter of interest. The rest of the paper is organized as follows. In Section 2, we introduce our analytical framework. Section 3 provides the main result of the paper; the Dempster-Shafer updating rule and the full Bayesian updating rule are implemented and compared. In Section 4, we analyze point estimation and set inference for the set-identi(cid:133)ed parameter using the class of beliefsupdatedbytheDempster-Shaferrule. Modelaveragingwithambiguousbeliefsonimposed assumption is discussed in Section 5, and Section 6 concludes. Proofs and lemma are provided in Appendix A. 4 2 Multiple-Prior Framework: Preparation 2.1 Setup and Notation We lay out the framework of our analysis with focusing on the missing data example given in Introduction. We divide the population of study into two subpopulations that are indexed by a value of an assigned binary instrument, e.g., a subpopulation to be surveyed by E-mail and another to be surveyed by phone. Observations are randomly sampled from each of those. We use subscript j = 1;2 to index each subpopulation, and we denote the likelihood function of each sample by p(X (cid:18) ); (cid:18) (cid:2) ; where X = (Y D ;D : i = 1;:::;n ) denotes observations j j j j j ji ji ji j j 2 generated from subpopulation j (the assigned instrument is Z = j) and (cid:18) is an unknown i j parameter vector for subpopulation j. The size of a sample generated from subpopulation j is denoted by n . A speci(cid:133)cation of (cid:18) must meet the following two requirements, (i) it pins down j j a distribution of data in subpopulation j, and (ii) (cid:18) pins down the value of a parameter to which j a cross-population restriction is imposed. In the missing data example, (cid:18) can be speci(cid:133)ed as j follows; for j = 1;2; (cid:18) = (cid:18) : y = 1;0; d = 1;0 (cid:2) where (cid:18) = Pr(Y = y;D = d Z = j); j ydj j ydj j 2 j j and(cid:2) isf(cid:0)our-dimensionalprobabil(cid:1)itysimplex. Across-populationrestrictionwillbeimposedon j the conditional mean of Y given Z, (cid:17) Pr(Y = 1 Z = j), j = 1;2, which is clearly determined j (cid:17) j by (cid:18) , (cid:17) = h ((cid:18) ) = (cid:18) +(cid:18) [0;1]. j j j j 10j 11j j j 2 Non-identi(cid:133)cation of (cid:18) is de(cid:133)ned formally by observational equivalence: (cid:18) and (cid:18) are j j 0j observationally equivalent if p(X (cid:18) ) = p(X (cid:18) ) for every X (e.g., Rothenberg (1971) and j j j 0j j j j Kadane (1974)). Observational equivalence implies that there exists a reduction of parameters g : (cid:2) (cid:8) such that the likelihood satis(cid:133)es p(X (cid:18) ) = p^(X g ((cid:18) )): Reduced-form para- j j j j j j j j ! j j meters in subpopulation j, which is also called su¢ cient parameters in the statistics literature (e.g., Barankin (1960), Dawid (1979)), (cid:30) g ((cid:18) ) (cid:8) is de(cid:133)ned by a function of (cid:18) that maps j j j j j (cid:17) 2 each observationally equivalent classes of (cid:18) to a point in another parameter space (cid:8) . In the j j example of missing data with an instrumental variable, the observed data likelihood conditional on the instrument is written as a function of (cid:30) = (cid:30) ;(cid:30) ;(cid:30) j 11j 01j misj j j j (cid:16)(Pr(Y = 1;D = 1(cid:17)Z = j);Pr(Y = 0;D = 1 Z = j);Pr(D = 0 Z = j)); (cid:17) j j j for j = 1;2, so, (cid:30) = g ((cid:18) ) = ((cid:18) ;(cid:18) ;(cid:18) +(cid:18) ): (cid:8) j j j 11j 01j 10j 00j j j j j j 2 are reduced-form parameters in subpopulation j, where (cid:8) is the three-dimensional probability j simplex. Denote the inverse image of gj( ) by (cid:0)j : (cid:8)j (cid:19) (cid:2)j. In the missing data example, it is (cid:1) 5 written as (cid:0) ((cid:30) ) = (cid:18) (cid:2) : (cid:18) = (cid:30) ; (cid:18) = (cid:30) ; (cid:18) +(cid:18) = (cid:30) : j j j j 11j 11j 01j 01j 10j 00j misj 2 j j j j j j j n o (cid:0) ((cid:30) ) : (cid:30) (cid:8) partition (cid:2) into regions on each of which the likelihood for (cid:18) is (cid:135)at irrespec- j j j j j j 2 tive of observations X . Note, by construction, (cid:0) ((cid:30) ) = : (cid:8) (cid:9) j j j 6 ; De(cid:133)ne the identi(cid:133)ed set for (cid:17) = h ((cid:18) ) by the range of h ((cid:18) ) when the domain of (cid:18) is given j j j j j j by (cid:0) (cid:30) (cid:2) : j j j (cid:26) (cid:0) (cid:1) H (cid:30) = h ((cid:18) ) : (cid:18) (cid:0) (cid:30) j j j j j j j 2 H 2 (cid:0) (cid:1) (cid:8) (cid:0) (cid:1)(cid:9) In the missing data example, we have H (cid:30) = (cid:30) ;(cid:30) +(cid:30) and = [0;1]. This j j 11j 11j misj j j j H is identical to the Manski(cid:146)s bounds (Manski (1989h)). In what followis, we use the following (cid:0) (cid:1) short-hand notations, (cid:18) ((cid:18) ;(cid:18) ) (cid:2) (cid:2) (cid:2), (cid:30) ((cid:30) ;(cid:30) ) (cid:8) (cid:8) (cid:8), (cid:17) = h((cid:18)) 1 2 1 2 1 2 1 2 (cid:17) 2 (cid:2) (cid:17) (cid:17) 2 (cid:2) (cid:17) (cid:17) (h ((cid:18) );h ((cid:18) )) = ((cid:17) ;(cid:17) ) [0;1]2, X (X ;X ), (cid:0)((cid:30)) [(cid:0) ((cid:30) ) (cid:0) ((cid:30) )] (cid:2), and H((cid:30)) 1 1 2 2 1 2 1 2 1 1 2 2 2 (cid:17) (cid:17) (cid:2) (cid:26) (cid:17) [H ((cid:30) ) H ((cid:30) )] [0;1]2. 1 1 2 2 (cid:2) (cid:26) In the missing data example, the parameter of ultimate interest is the marginal distribution of Y, Pr(Y = 1). We shall denote it by (cid:28) [0;1], which relates to (cid:17) by 2 (cid:28) = (cid:21)(cid:17) +(1 (cid:21))(cid:17) , 1 2 (cid:0) where (cid:21) = Pr(Z = 1). In our development of inference for (cid:28), we ignore estimation for (cid:21) and assume it is known. 2.2 Multiple Priors and Posterior Lower and Upper Probabilities We assume that the two samples are independent in the sense that the likelihood of the entire data X = (X ;X ) is written as 1 2 p(X (cid:18)) = p(X (cid:18) )p(X (cid:18) ): 1 1 2 2 j j j = p^(X g ((cid:18) ))p^(X g ((cid:18) )) 1 1 1 2 2 2 j j = p^(X (cid:30) )p^(X (cid:30) ) 1 1 2 2 j j p^(X (cid:30)): (cid:17) j Consider the standard Bayesian inference based on (cid:22) a single prior distribution on (cid:2). For the (cid:18) given(cid:22) ,themappingbetweenparameter(cid:18)andthereduced-formparameters(cid:30) = (g ((cid:18) );g ((cid:18) )) (cid:18) 1 1 2 2 6 yields (cid:22) a unique prior distribution of the reduced-form parameters (cid:30) (cid:8). The relationship (cid:30) 2 between (cid:22) and (cid:22) is written as, for every measurable subset B (cid:8); (cid:18) (cid:30) (cid:26) (cid:22) (B) = (cid:22) ((cid:0)(B)); (cid:30) (cid:18) where (cid:0)(B) = [(cid:0)((cid:30))]: In the presence of reduced-form parameters (su¢ cient parameters), (cid:30) B [ 2 theposteriordistributionof(cid:18) denotedbyF ( )isobtainedas(see,e.g.,Barankin(1960),Dawid (cid:18)X j (cid:1) (1979), Poirier (1998)). F (A) = (cid:22) (A (cid:30))dF ((cid:30)); A (cid:2); (2.1) (cid:18)X (cid:18)(cid:30) (cid:30)X j Z(cid:8) j j j (cid:26) where F ( x) is the posterior distribution of the reduced-form parameter (cid:30) and (cid:22) ( (cid:30)) is the (cid:30)X (cid:18)(cid:30) j (cid:1)j j (cid:1)j conditional prior of (cid:18) given (cid:30) implied by the initial speci(cid:133)cation of (cid:22) . The expression (2.1) (cid:18) shows that the conditional prior for (cid:18) given (cid:30) is never be updated by data, and only the prior information for the reduced form parameters are updated because the value of the likelihood varies only depends on (cid:30). Therefore, the shape of posterior for (cid:18) remains to be sensitive to the shape of (cid:22) ( (cid:30)) implied by a speci(cid:133)cation of (cid:22) no matter how many observations are available (cid:18)(cid:30) (cid:18) j (cid:1)j in data. This sensitivity of the posterior of (cid:18) to the conditional prior (cid:22) ( (cid:30)) is also carried (cid:18)(cid:30) j (cid:1)j over to the posterior of (cid:17) = (h ((cid:18) );h ((cid:18) )) and (cid:28) if they are set-identi(cid:133)ed. 1 1 2 2 Ambiguitystemmingfromthelackofidenti(cid:133)cationof(cid:18) and(cid:18) canbemodelledintherobust 1 2 Bayesian framework by introducing multiple priors. A class of priors for (cid:18) that is suitable to our context consists of a class of (cid:22) that allows for arbitrary conditional prior (cid:22) ( (cid:30)). Formally, (cid:18) (cid:18)(cid:30) j (cid:1)j we can formulate such class of priors as, given a single prior for the reduced-form parameter (cid:22) , (cid:30) ((cid:22) ) = (cid:22) : (cid:22) ((cid:0)(B)) = (cid:22) (B) for all measurable B (cid:8) ; (cid:30) (cid:18) (cid:18) (cid:30) M (cid:26) This class(cid:8)of prior is indexed by (cid:22) , meaning that the analys(cid:9)is requires a single prior for (cid:30) the reduced-form parameters (cid:30). In other words, we admit a single belief for the distribution of data. A rational for this is that fear for misspeci(cid:133)cation or the lack of prior knowledge is less severe since we know the prior for (cid:30) will be well updated by data. We can in principle adopt the existing selection rules for non-informative priors such as the Je⁄reys(cid:146)prior, the reference prior, and the empirical Bayes rule in order to specify a "reasonably objective" prior for the reduced-form parameters (cid:30) (see Kitagawa (2011) for further discussions). We use the Bayes rule to update each prior (cid:22) ((cid:22) ), and obtain the class of posteriors (cid:18) (cid:30) 2 M of (cid:18), which we denote by :1 We marginalize each posterior of (cid:18) in to form the class of (cid:18)X (cid:18)X F j F j posteriors of (cid:17). We denote thus-constructed class of posteriors of (cid:17) by , (cid:17)X F j F : F ( ) = F (h((cid:18)) X), (cid:22) (cid:22) ; (2.2) (cid:17)X (cid:17)X (cid:17)X (cid:18)X (cid:18) (cid:30) F j (cid:17) j j (cid:1) j 2 (cid:1)j 2 M 1For the giv(cid:8)en speci(cid:133)cation of prior class, the prior-by-prior upda(cid:0)ting(cid:1)r(cid:9)ule and the Dempster-Shafer updating 7 where F is a posterior distribution of (cid:17) [0;1]2. Note that depends on prior for (cid:30) (cid:17)X (cid:17)X j 2 F j through a speci(cid:133)cation of prior class (cid:22) , although our notation does not make it explicit. (cid:30) M Wesummarizetheclassofposteriors byitslowerenvelopeandupperenvelope, theso-called (cid:17)X(cid:0) (cid:1) F j posterior lower probability and posterior upper probability: for D [0;1]2; (cid:26) posterior lower probability: F (D) inf F (D) ; (cid:17)X (cid:17)X j (cid:3) (cid:17) F(cid:17)jX2F(cid:17)jX j (cid:8) (cid:9) posterior upper probability: F (D) sup F (D) : (cid:17)(cid:3)jX (cid:17) F (cid:17)jX (cid:17)jX2F(cid:17)jX(cid:8) (cid:9) Note that the posterior lower probability and the upper probability in general have the conjuga- tion property, F (D) = 1 F (Dc), which we will frequently refer to in our analysis. Since (cid:17)jX(cid:3) (cid:0) (cid:17)(cid:3)jX the prior class ((cid:22) ) is designed to represent the collection of prior knowledge (assumptions) (cid:30) M that will never be updated by data, we can interpret the value of posterior lower probability as "the posterior probability of (cid:17) D being at least F (D) irrespective of the unrevisable prior (cid:17)X 2 j (cid:3) knowledge." The posterior upper probability is interpreted similarly by replacing "at least" in the previous statement with "at most." The next theorem provides closed form expressions of F ( ) and F ( ) and a list of their (cid:17)jX(cid:3) (cid:1) (cid:17)(cid:3)jX (cid:1) analytical properties. Theorem 2.1 (i) For measurable subset D [0;1]2, (cid:26) F (D) = F ( (cid:30) : H((cid:30)) D ); (cid:17)X (cid:30)X j (cid:3) j f (cid:26) g F (D) = F ( (cid:30) : H((cid:30)) D = ): (cid:17)(cid:3)jX (cid:30)jX f \ 6 ;g (ii) F ( )issupermodularandF ( )issubmodular, i.e., formeasurableD , D [0;1]2, (cid:17)jX(cid:3) (cid:1) (cid:17)(cid:3)jX (cid:1) 1 2 (cid:26) F (D D )+F (D D ) F (D )+F (D ); (cid:17)X 1 2 (cid:17)X 1 2 (cid:17)X 1 (cid:17)X 2 j (cid:3) [ j (cid:3) \ (cid:21) j (cid:3) j (cid:3) F (D D )+F (D D ) F (D )+F (D ): (cid:17)(cid:3)X 1[ 2 (cid:17)(cid:3)X 1\ 2 (cid:20) (cid:17)(cid:3)X 1 (cid:17)(cid:3)X 2 j j j j Proof. For a proof of (i), see Theorem 3.1 in Kitagawa (2011). F (D) and F (D) are (cid:17)X (cid:17)(cid:3)X j (cid:3) j containment and capacity functional of random closed sets induced by the posterior distribution of (cid:30), so their supermodularity and submodularity are implied by the Choquet Theorem (see, e.g., Molchanov (2005)). rule (the maximum likelihood updating rule) produces the same class of posteriors for (cid:18): This is because p(X (cid:18))d(cid:22) = p^(X (cid:30))d(cid:22) j (cid:18) j (cid:30) Z Z holds and, therefore, the probability of observing the sample is identical for any (cid:22) (cid:22) . Hence, the prior (cid:18) 2M (cid:30) class (cid:22) never shrinks even when we apply the Dempster-Shafer (maximum likelihood) updating rule. This M (cid:30) (cid:0) (cid:1) phenomenonisinaccordancewiththeconditionfordynamicconsistency(rectanguraritypropertyofapriorclass) (cid:0) (cid:1) discovered by Epstein and Schneider (2003). 8 Statement (i) of this theorem says the posterior lower and upper probability correspond to the containment functional and capacity functional of random closed sets (rectangles) in [0;1]2. In the Dempster-Shafer theory, such functionals are called a plausibility function and a belief function, respectively. In the context partial identi(cid:133)ed model, thus-constructed lower and upper probability represent the posterior probability law of the identi(cid:133)ed sets of (cid:17) induced by the posterior distribution for the identi(cid:133)ed parameters in the model. Our multiple-prior framework based on prior class (cid:22) highlights a seamless link among the random set theory, Dempster- (cid:30) M Shafer theory, and set-identi(cid:133)ed model in econometrics. (cid:0) (cid:1) 3 Updating Ambiguous Posterior Beliefs So far, there is no discussion about how to impose assumptions on (cid:17). Assumptions to be imposed for (cid:17) can be in general represented as a subset D [0;1]2 referred to as an assumption A (cid:26) subset. For instance, the instrument exogeneity restriction in the missing data example speci(cid:133)es D = (cid:17) : (cid:17) = (cid:17) , which is the 45-degree line in [0;1]2. We interpret "imposing an assumption A 1 2 f g for (cid:17)" as updating the class of posteriors with a conditioning set given by assumption subset (cid:17)X F j D . Whatweshallgetafterupdating isanotherclassofposteriorsfor(cid:17). Bymarginalizing A (cid:17)X F j eachposteriorof(cid:17) intheupdatedclassfortheultimateparameterofinterest(cid:28) = (cid:21)(cid:17) +(1 (cid:21))(cid:17) ; 1 2 (cid:0) weobtaintheupdatedclassofposteriorsfortheultimateparameterofinterest(cid:28),whichwedenote by . Our goal is to summarize thus-constructed by its lower probability and use F(cid:28)jX;DA F(cid:28)jX;DA it for posterior inference of (cid:28). Literatures has proposed several ways to update a class of probability measures (see, e.g., GilboaandMarinacci(2011)forasurvey). Asofthisdate,however,theredoesnotappeargeneral agreement on which update rule should be preferred to others. In this paper, we shall focus on two major updating rules, the Dempster-Shafer updating rule (Dempster (1967), Shafer (1973)), synonymously called the maximum likelihood updating rule (Gilboa and Schmeidler (1993)), and the full Bayesian updating rule (Fagin and Halpern (1991) and Ja⁄ray (1992)).2 We do not intend to provide normative argument on which updating rule should be applied in this context, whereas we compare them in order to argue which update rule agrees with the common adoption of the analogy principle (Manski (1988)) in the context of the intersection bound analysis. Given assumption subset D , the class of posteriors for (cid:28) updated by the Demspter-Shafer A updating rule has the following form: write (cid:28) = t((cid:17)) = (cid:21)(cid:17) +(1 (cid:21))(cid:17) [0;1] and let t 1( ) be 1 2 (cid:0) (cid:0) 2 (cid:1) 2Axiomatizations for these updating rules are done by Gilboa and Schmeidler (1993) for the Dempster-Shafer updating rule and by Pires (2002) for the full Bayesian updating rule. 9 its inverse image, F t 1( ) D F(cid:28)DjXS;DA (cid:17) (F(cid:28)jX;DA((cid:1)) = (cid:17)jXF(cid:0)(cid:17)(cid:0)X(D(cid:1) A\) A(cid:1) : F(cid:17)jX 2 F(cid:17)(cid:3)jX); j where is the class of posteriors of (cid:17) de(cid:133)ned by F(cid:17)(cid:3)X j F : F arg max F (D ) . F(cid:17)(cid:3)jX (cid:17) (cid:26) (cid:17)jX (cid:17)jX 2 F(cid:17)jX2F(cid:17)jX (cid:17)jX A (cid:27) On the other hand, the updated class of posteriors for (cid:28) with the full Bayesian updating rule is written as F t 1( ) D F(cid:28)FjXB;DA (cid:17) (F(cid:28)jX;DA((cid:1)) = (cid:17)jXF(cid:0)(cid:17)(cid:0)X(D(cid:1) A\) A(cid:1) : F(cid:17)jX 2 F(cid:17)jX); j where is as de(cid:133)ned in (2.2). A comparison of these de(cid:133)nitions highlight the di⁄erence (cid:17)X F j between the Dempster-Shafer updating rule and the full Bayesian updating rule. The Dempster- Shafer rule (cid:133)rst reduces class of posteriors by discarding F that fails to put maximal (cid:17)X (cid:17)X F j j belief on the assumption subset D ; and, subsequently, applies the standard Bayes rule with A conditioning set D to each of the remaining ones. By contrast, the full Bayesian updating rule A retains all the posteriors in irrespective of what value F puts on D , and applies (cid:17)X (cid:17)X (cid:17)X A F j j 2 F j the standard Bayes rule to all members in . (cid:17)X F j With a metaphor of multiple experts as used in Gilboa and Marinacci (2011), the di⁄erence between the two updating rules in our context can be illustrated as follows. Consider a situation that we consult multiple experts about their opinion on (cid:28). Assume they all agrees with the single prior for (cid:30), but each of them has di⁄erent belief for non-identi(cid:133)ed part of the model, i.e., (cid:22) di⁄ers among them. If we decide to "assume D " and to obtain updated opinions by (cid:18)(cid:30) A j applying the Dempster-Shafer rule, what we actually do is to collect the updated belief of only those experts who are most optimistic on D , and, on the other hand, we completely ignore the A opinions of the rest of experts. Such stringent selection of multiple experts corresponds to the reduction of posterior class to . If we "assume D " and apply the full Bayesian updating F(cid:17)(cid:3)X A j rule, we do ask the opinions (conditional on D is true) of all the experts no matter how much A they believe in D . A The next theorem is the main result of this paper that provides the lower probability of DS and FB . F(cid:28) X;DA F(cid:28) X;DA j j Theorem 3.1 Let D be the assumption subset corresponding to the instrument exogeneity re- A striction, D = (cid:17) : (cid:17) = (cid:17) , and let be as obtained in (2.2). Denote the intersected A 1 2 (cid:17)X f g F j identi(cid:133)ed set by H ((cid:30)) [H ((cid:30) ) H ((cid:30) )] [0;1]. 1 1 2 2 \ (cid:17) \ (cid:26) 10
Description: