Intractablelikelihoods Controlvariates Reduced-Variance Estimation with Intractable Likelihoods Antonietta Mira IDIDS, Universita’ Svizzera Italiana, Switzerland DISAT, Insubria University, Como, Italia Joint with N. Friel (UCD, Dublin) and C. Oates (UTS, Sidney) Warwick - CRiSM, April 22, 2016 Intractablelikelihoods Controlvariates Overview of talk (cid:73) PART I: Zero Variance MCMC for intractable problems (cid:73) PART II: Reduced Variance MCMC for doubly intractable problems Intractablelikelihoods Controlvariates Monte Carlo Integration Let µ be the expected value of a function g : Rd → R, under π: g (cid:82) g(X)π(x)dx µ := E [g(X)] = Rd g π (cid:82) π(x)dx Rd and let X ,...,X be a sequence of iid draws from π. 1 N An unbiased estimator of µ is: g N 1 (cid:88) µˆ := g(X ) g i N i=1 with variance 1 V(µˆ ) = σ2 g N g where σ2 is the variance of g under π. g Therefore if g has finite variance, µˆ is a consistent estimator of µ g g If drawing iid from π is difficult, build an ergodic Markov chain {X }, stationary wrt π n Intractablelikelihoods Controlvariates MCMC Integration Theorem(Tierney1996). SupposetheergodicMarkovchain{Xn},withstationary distributionπ andarealvaluedfunctiong,satisfyoneofthefollowingconditions: (cid:73) ThechainisuniformlyergodicandEπ(cid:2)g(X)2(cid:3)<∞ (cid:73) ThechainisgeometricallyergodicandEπ(cid:2)|g(X)|2+(cid:15)(cid:3)<∞forsome(cid:15)>0 Then limN→∞NV(µˆg) = Eπ(cid:104)(g(X0)−µg)2(cid:105)+2+(cid:88)∞Eπ[(g(Xk)−µg)(g(X0)−µg)] k=1 +∞ = σg2[1+2(cid:88)ρg(k)]=σg2τg =V(g,P) k=1 iswelldefined,non-negativeandfinite,and √ N(µˆg −µg)−→L N(0,σg2τg) (cid:73) Delayedrejectionstrategy−→reduceτg −→bymodifyingP (cid:73) Zerovariancestrategy −→reduceσ2 −→bymodifyingg g Intractablelikelihoods Controlvariates Control Variate Method in MC InMCsimulation,controlvariateareusedtoreducethevarianceofMCestimators. AssumeZ isarandomvariablewithknownmean,andcorrelatedwithg(X): E(Z) = 0 Cov(g(X),Z) = σ (cid:54)=0 g,Z Byexploitingthecorrelationofg(X)andZ,wecanbuildnewunbiasedestimatorsof µg,withlowervariances. Let’sdefine: g˜(X):=g(X)+aZ wherea∈R. Obviously µg˜ :=E[g˜(X)] = µg σ2 = σ2+a2σ2 +2aσ g˜ g Z g,Z Intractablelikelihoods Controlvariates Control Variate Method in MC Byminimizingσ2 w.r.t. a,itcanbeshownthattheoptimalchoiceofais g˜ σ a=− g,Z σ2 z (cid:16) (cid:17) thatreducesthevarianceofσ2 to 1−ρ2 σ2. Therefore g˜ g,Z g 1 (cid:88)N µˆg˜ := g˜(Xi) N i=1 isanewunbiasedestimatorofµg,withvariance V(cid:0)µˆg˜(cid:1)= N1σg2˜ = N1 (cid:16)1−ρ2g,Z(cid:17)σg2 ≤ N1σg2 =V(µˆg) Intractablelikelihoods Controlvariates First order ZV for MCMC Under regularity conditions, the score has zero mean 1 z(x) := − ∇lnπ(x) 2 Use it as a control variate! Intractablelikelihoods Controlvariates First order ZV for MCMC g˜(X) = g(X)+∆ [P(x)]+∇ [P(x)]·z(x) x x where P(x) is polynomial in x If P(x) is a first degree polynomial: g˜(X) = g(X)+aTz(x) The optimal value of a is: a = −Σ−1σ , where Σ = E(zzT), σ = E(zg) zz zg zz zg The optimal a is estimated using the existing MCMC simulation Intractablelikelihoods Controlvariates Second order ZV for MCMC If P(x) is a second degree polynomial: g˜(X) = g(X)− 1tr(B)+(a+Bx)Tz = g(X)+gTy 2 where g and y are column vectors with 1d(d +3) elements: 2 (cid:73) g := [aT bT cT]T: where b := diag(B), and c is a column vector with 1d(d −1) elements; The element ij of matrix B 2 (for i ∈ {2,...,d}, and j < i), is the element 1(2d −j)(j −1)+(i −j) of vector c. 2 (cid:73) y := [zT uT vT]T: where u := x ∗z − 1i (where “∗” = Hadamard product, and i = 2 vector of ones), and v is a column vector with 1d(d −1) elements; 2 x z +x z (for i ∈ {2,...,d}, and j < i), is the element i j j i 1(2d −j)(j −1)+(i −j) of vector v 2 d(d+3) With a polynomial of order 2 we have control variates 2 With a polynomial of order p, we get (cid:0)d+p(cid:1)−1 control variates p Intractablelikelihoods Controlvariates Probit Model y |x ∼ B(1,p ), p = Φ(xTβ) i i i i i where β ∈ Rd is the vector of parameters of the model. The likelihood function is: n l(β|y,X) ∝ (cid:89)(cid:104)Φ(xTβ)(cid:105)yi (cid:104)1−Φ(xTβ)(cid:105)1−yi i i i=1 With flat priors the Bayesian estimator of each parameter, β , is d E [β |y,X], k = 1,2,··· ,d π k
Description: