ebook img

Flexible covariance estimation in graphical Gaussian models PDF

0.41 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Flexible covariance estimation in graphical Gaussian models

TheAnnalsofStatistics 2008,Vol.36,No.6,2818–2849 DOI:10.1214/08-AOS619 (cid:13)c InstituteofMathematicalStatistics,2008 FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS 9 0 By Bala Rajaratnam, H´el`ene Massam1 and Carlos M. Carvalho 0 2 n Stanford University, York University and University of Chicago a J In this paper, we propose a class of Bayes estimators for the 1 covariancematrixofgraphicalGaussianmodelsMarkovwithrespect 2 to a decomposable graph G. Working with the WPG family defined by Letac and Massam [Ann. Statist. 35 (2007) 1278–1323] we derive ] T closed-form expressions for Bayes estimators under the entropy and S squared-error losses. The WPG family includes the classical inverse of the hyper inverse Wishart but has many more shape parameters, . h thusallowingforflexibilityindifferentiallyshrinkingvariouspartsof t a thecovariancematrix.Moreover,usingthisfamilyavoidsrecourseto m MCMC, often infeasible in high-dimensional problems. Weillustrate the performance of our estimators through a collection of numerical [ exampleswhereweexplorefrequentistriskpropertiesandtheefficacy 1 ofgraphsintheestimationofhigh-dimensionalcovariancestructures. v 7 6 1. Introduction. In this paper we consider the problem of estimation of 2 thecovariancematrixΣofanr-dimensionalgraphicalGaussianmodel.Since 3 the work of Stein [35] the problem of estimating Σ is recognized as highly . 1 challenging. In recent years, the availability of high-throughput data from 0 genomic, finance, marketing (among others) applications has pushed this 9 0 problem to an extreme where, in many situations, the number of samples : (n) is often much smaller than the number of parameters. When n<r the v i sample covariance matrix S is not positive definite buteven when n>r, the X eigenstructure tends to be systematically distorted unless r/n is extremely r a small (see [12, 35]). Numerous papers have explored better alternative esti- mators for Σ (or Σ−1) in both the frequentist and Bayesian frameworks (see [4, 8, 9, 15, 16, 17, 19, 25, 26, 29, 35, 37]). Many of these estimators give Received June 2007; revised February 2008. 1Supportedby NSERCDiscovery Grant A8946. AMS 2000 subject classifications. 62H12, 62C10, 62F15. Key words and phrases. Covariance estimation, Gaussian graphical models, Bayes es- timators, shrinkage, regularization. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2008,Vol. 36, No. 6, 2818–2849. This reprint differs from the original in pagination and typographic detail. 1 2 B. RAJARATNAM,H. MASSAMAND C. M. CARVALHO substantial risk reductions compared to the sample covariance estimator S in small sample sizes. A common underlying property of many of these es- timators is that they are shrinkage estimators in the sense of James–Stein [19, 34]. In particular the Bayesian approach often yields estimators which “shrink” toward a structure associated with a prespecified prior. One of the first papers to exploit this idea is [4] which shows that if the prior used on Σ−1 is the standard conjugate, that is, a Wishart distribution, then for an appropriate choice of the shape (or shrinkage) and scale hyperparame- ters, the posterior mean for Σ is a linear combination of S and the prior mean (see Section 3.1). It is easy to show [see (3.16)] that the eigenvalues of such estimators are also shrinkage estimators of the eigenvalues of Σ. More recently, for high-dimensional complex datasets with r often larger than n, regularization methods have been proposed, which impose structure on the estimators through zeros in the covariance or the precision matrix (see [2, 18, 30]). The idea of imposing zeros in the precision matrix is not new, however, and was introduced in [12] in a pioneering paper on covariance selection models which are particular cases of graphical Gaussian models. GraphicalGaussianmodelshaveproven tobeexcellent toolsfortheanalysis of complex high-dimensionaldata wheredependenciesbetween variables are expressed by means of a graph [3, 21]. In this paper we combine the regularization approach given by graphical modelswiththeBayesianapproachofshrinkingtowardastructure.Through a decision-theoretic approach, we derive Bayes estimators of the covariance and precision matrices under certain priors and given loss functions, such that the precision matrix has a given pattern of zeros. Indeed, we work within the context of graphical Gaussian models Markov with respect to a decomposablegraphG.Restrictingourselves todecomposablegraphsallows us to use the family of inverse W Wishart distributions [27] as priors for PG Σ. This is a family of conjugate prior distributions for Σ−1 which includes the Wishart when G is complete (i.e., when the model is saturated) and the inverse of the hyper inverse Wishart, the current standard conjugate prior for Σ−1, when the model is Markov with respect to G decomposable. A potentially restrictive feature of the inverse of the hyper inverse Wishart (and the Wishart) is the fact that it has only one shape parameter. The family of W Wishart distributions considered here has three important PG characteristics. First, it has k+1 shape parameters where k is the number of cliques in G. Second, it forms a conjugate family with an analytically explicit normalizing constant. Third, the Bayes estimators can be obtained in closed-form. In Section 2, we give some fundamentals of graphical models. In Section 3, we recall the properties of the W family and its inverse, the IW , and PG PG we derive the mathematical objects needed for our estimators, that is, the explicit expression for the mean of the IW . Parallel to the development PG FLEXIBLE COVARIANCEESTIMATION 3 of the IW , we present in Section 4 a noninformative reference prior for PG Σ (and the precision matrix Ω). While offering an objective procedure that avoids the specification of hyperparameters, the reference prior also allows for closed-form posterior estimation as the posterior for Σ remarkably falls within the IW family. In Section 5, we derive the Bayes estimator under PG twocommonlyusedlossfunctionsadaptedtographicalmodelsandtheprior considered in Sections 3 and 4. Finally, in Sections 6 and 7 we compare the performance of our estimators in a series of high-dimensional examples. 2. Preliminaries. Let G=(V,E) be an undirected graph with vertex set V ={1,...,r} andedge-set E. Vertices i and j are said to beneighbors in G if(i,j)∈E.Henceforthinthispaper,wewillassumethatGisdecomposable [24], where a perfect order of the cliques is available. For (C ,...,C ) in a 1 k perfect order, we use the notation H =R =C while for j =2,...,k we 1 1 1 write H =C ∪···∪C , R =C \H , S =H ∩C . j 1 j j j j−1 j j−1 j TheS ,j=2,...,karetheminimalseparatorsofG.Someoftheseseparators j can be identical. We let k′≤k−1 denote the number of distinct separators and ν(S) denote the multiplicity of S, that is, the number of j such that S =S. Generally, we will denote by C the set of cliques of a graph G and j by S its set of separators. An r-dimensional Gaussian model is said to be Markov with respect to G if for any edge (i,j) not in E, the ith and jth variables are condition- ally independent given all the other variables. Such models are known as covariance selection models [12] or graphical Gaussian models (see [24, 36]). Without loss of generality, we can assumethat these models have mean zero and are characterized by the parameter set P of positive definite precision G (or inverse covariance) matrices Ω suchthat Ω =0 whenever the edge (i,j) ij is not in E. Equivalently, if we denote by M the linear space of symmetric matrices of order r, by M+⊂M the cone of positive definite (abbreviated r >0) matrices, by I the linear space of symmetric incomplete matrices x G with missing entries x ,(i,j)∈/ E and by κ:M 7→I the projection of M ij G into I , the parameter set of the Gaussian model can be described as the G set of incomplete matrices Σ=κ(Ω−1),Ω∈P . Indeed it is easy to verify G that the entries Σ ,(i,j)∈/E are such that ij (2.1) Σ =Σ Σ−1 Σ , ij i,V\{i,j} V\{i,j},V\{i,j} V\{i,j},j and are therefore not free parameters of the Gaussian models. We are there- fore led to consider the two cones (2.2) P ={y∈M+|y =0,(i,j)∈/E}, G r ij (2.3) Q ={x∈I |x >0,i=1,...,k}, G G Ci 4 B. RAJARATNAM,H. MASSAMAND C. M. CARVALHO where P ⊂Z and Q ⊂I , where Z denotes the linear space of sym- G G G G G metric matrices with zero entries y ,(i,j)∈/E. ij Gro¨ne et al. [14] proved the following: Proposition 2.1. When G is decomposable, for any x in Q there G exists a unique xˆ in M+ such that for all (i,j) in E we have x =xˆ and r ij ij such that xˆ−1 is in P . G This defines a bijection between P and Q : G G (2.4) ϕ:y=(xˆ)−1∈P 7→x=ϕ(y)=κ(y−1)∈Q , G G where κ denotes the projection of M into I . G If for any complete subset A ⊆ V, x =(x ) is a matrix and we A ij i,j∈A denote by (x )0=(x ) the matrix such that x =0 for (i,j)∈/A×A, A ij i,j∈V ij then the explicit expression of x−1 is (2.5) y=xˆ−1= (x−1)0− ν(S)(x−1)0. b C S C∈C S∈S X X For (x,y)∈I ×Z , we define the notion of trace as follows: G G (2.6) tr(xy)=hx,yi= x y . ij ij (i,Xj)∈E Note that for x∈Q and y∈P , hx,yi=tr(xˆy), where tr(xˆy) is defined in G G the classical way. In thesequel, we will also need thefollowing. If,for y∈P G we write y=σ−1 with σ∈Q , we have, for x∈Q , the two formulas G G (2.7) hx,σˆ−1i= hx ,σ−1i− ν(S)hx ,σ−1i, b C C S S C∈C S∈S X X (detx ) (2.8) detxˆ= C∈C C . (detx )ν(S) QS∈S S The graphical GaussianQmodel Markov with respect to G is therefore the family of distributions N ={N (0,Σ),Σ∈Q }={N (0,Σ),Ω=Σ−1∈P }. G r G r G In this paper, we will study various estimators of Σ∈Q and Ω∈P . We b G G will write mle and mle for “maximum likelihood estimate” in the satu- g rated model and in the graphical model, respectively. Also, in this paper, we will use the general symbol θ˜ to denote an estimator of θ rather than the traditional θˆ as the notation θˆ has been reserved for the completion process (see Proposition 2.1). The mle , Ω , for the parameter Ω∈P in g g G N is well known (see [24], page 138). If Z ,i=1,...,n is a sample from the G i e FLEXIBLE COVARIANCEESTIMATION 5 N (0,Σ) distribution in N , if we write U = n Z Zt and S = U and if r G i=1 i i n n>max |C|, then Ω exists and is equal to C∈C g P (2.9) Ωe= (S−1)0− ν(S)(S−1)0, g C S C∈C S∈S X X b where clearly S as a subscript or S in ν(S) refers to the separator while the remaining S’s refer to the sample covariance matrix. If we assume that the graph is saturated, then, clearly the mle is Ω˜ =S−1. Finally, we need to recall some standard notation for various block sub- matrices: for x∈Q , x ,j =1,...,k are well defined and for j =2,...,k, G Cj it will be convenient to use the following: x =x , x =x =xt , Sj hji Rj,Sj [ji hj] (2.10) x[j]=xRj, x[j]·=x[j]−x[jix−hj1ixhj], where xhji ∈Ms+j,x[j]·∈Mc+j−sj,x[ji ∈L(Rcj−sj,Rsj), the set of linear ap- plications from Rcj−sj to Rsj. We will also use the notation x[12i and x[1]· for x[12i=xC1\S2,S2x−S21 and x[1]·=xC1\S2·S2 =xC1\S2−xC1\S2,S2x−S21xS2,C1\S2. 3. Flexible conjugate priors for Σ and Ω. When the Gaussian model is saturated, that is, G is complete, the conjugate prior for Ω, as defined by Diaconis and Ylvisaker [13] (henceforth abbreviated DY) is the Wishart distribution. The induced prior for Σ is then the inverse Wishart IW (p,θ) r with density |θ|p (3.1) IW (p,θ;dx)= |x|−p−(r+1)/2exp−hθ,x−1i1 (x)dx, r Γ (p) M+ r where p> r−1 is the shape parameter, Γ (p) the multivariate gamma func- 2 r tion (as given on page 61 of [31]) and θ∈M+ is the scale parameter. As we have seen in the previous section, when G is not complete, M+ is no longer the parameter set for Σ or the parameter set for Ω. The DY conjugate prior for Σ∈Q was derived by [11] and called the hyper inverse G Wishart (HIW). The induced prior for Ω∈P was derived by [32] and we G will call it the G-Wishart. The G-Wishart and the hyper inverse Wishart are certainly defined on the right cones but they essentially have the same type of parametrization as the Wishart with a scale parameter θ∈Q and G a one-dimensional shape parameter δ. 6 B. RAJARATNAM,H. MASSAMAND C. M. CARVALHO 3.1. The W distribution and its inverse. LetacandMassam [27]intro- PG duced a new family of conjugate priors for Ω∈P with a k+1-dimensional G shape parameter thus leading to a richer family of priors for Ω, and there- fore for Σ=κ(Ω−1)∈Q through the induced prior. It is called the type G II Wishart family. Here, we prefer to call it the family of W -Wishart PG distributions in order to emphasize that it is defined on P . Details of this G distributioncanbefoundinSection3of[27].Wewillfirstrecallheresomeof its main features and then derive some new properties we shall need later in this paper.Let α and β betwo real-valued functions on the collection C and S of cliques and separators, respectively, such that α(C )=α ,β(S )=β i i j j with β =β if S =S . Let c =|C | and s =|S | denote the cardinality i j i j i i i i of C and S , respectively. The family of W -Wishart distributions is the i i PG natural exponential family generated by the measure H (α,β,ϕ(y))ν (dy) G G on P where ϕ(y) is as defined in (2.4) and where, for x∈Q , G G (detx )α(C) (3.2) H (α,β;x)= C∈C C , G (detx )ν(S)β(S) QS∈S S (3.3) ν (dy)=HQ (1(c+1),1(s+1);ϕ(y))1 (y)dy. G G 2 2 PG The parameters (α,β) are in the set B such that the normalizing constant is finiteforallθ∈Q andsuchthatitfactorizesintotheproductofH (α,β;θ) G G and a function ΓII(α,β) of (α,β) only, given below in (3.5). The set B is not known completely but we know that B⊇ B where, P P if, for each perfect order P of the cliques of G, we write J(P,S)={j = S 2,...,k|S =S}, then B is the set of (α,β) such that: j P 1. (α + 1(c −s ))−ν(S)β(S)=0, for all S different from S ; j∈J(P,S) j 2 j j 2 2. −α − 1(c −s −1)>0 for all q=2,...,k and −α − 1(c −s −1)>0; P q 2 q q 1 2 1 2 3. −α −1(c −s +1)−γ > s2−1, where γ = (α −β +cj−s2). 1 2 1 2 2 2 2 j∈J(P,S2) j 2 2 As can be seen from the conditions above, tPhe parameters β(S),S ∈S are linked to the α(C),C ∈C by k′−1 linear equalities and various linear inequalities and therefore B contains the set B of dimension at least k+1, P for each perfect order P. We can now give the formal definition of the W PG family. Definition 3.1. For (α,β)∈B, the W -Wishart family of distribu- PG tions is the family F ={W (α,β,θ;dy),θ∈Q } where (α,β),PG PG G H (α,β;ϕ(y)) (3.4) W (α,β,θ;dy)=e−hθ,yi G ν (dy) PG ΓII(α,β)HG(α,β;θ) G and k ΓII(α,β)=π((c1−s2)s2+ j=2(cj−sj)sj)/2 P FLEXIBLE COVARIANCEESTIMATION 7 (3.5) k c −s 1 2 ×Γ −α − −γ Γ (−α ) Γ (−α ). s2 1 2 2 c1−s2 1 cj−sj j (cid:20) (cid:21) j=2 Y We can also, of course, define the inverse W (α,β,θ) distribution as fol- PG lows. If Y ∼W (α,β,θ), then X =ϕ(Y)∼IW (α,β,θ) with distribution PG PG on Q given by (see (3.8) in [27]) G e−hθ,xˆ−1iH (α,β;x) G (3.6) IW (α,β,θ;dx)= µ (dx), PG ΓII(α,β)HG(α,β;θ) G (3.7) where µ (dx)=H (−1(c+1),−1(s+1);x)1 (x)dx. G G 2 2 QG The hyper inverse Wishart is a special case of the IW distribution for PG δ+c −1 i α =− , i=1,...,k, i 2 (3.8) δ+s −1 i β =− , i=2,...,k, i 2 which are all functions of the same one-dimensional parameter δ. It is tra- ditional to denote the hyper inverse Wishart, that is, this particular IW , PG as the HIW(δ,θ) and this is the notation we will use in Section 6. Corollary 4.1 of [27] states that the IW is a family of conjugate distri- PG butions for the scale parameter Σ in N ; more precisely, we have: G Proposition 3.1. Let G be decomposable and let P be a perfect order of its cliques. Let (Z ,...,Z ) be a sample from the N (0,Σ) distribution 1 n r with Σ∈Q . If the prior distribution on 2Σ is IW (α,β,θ) with (α,β)∈ G PG B and θ ∈Q , the posterior distribution of 2Σ, given nS = n Z Zt, P G i=1 i i is IW (α− n,β− n,θ+κ(nS)), where α− n =(α − n,...,α − n) and PG 2 2 2 1 2 Pk 2 β−n =(β −n,...,β −n)are suchthat (α−n,β−n)∈B andθ+κ(nS)∈ 2 2 2 k 2 2 2 P Q so that the posterior distribution is well defined. Equivalently, we may G say that if the prior distribution on 1Ω is W (α,β,θ), then the posterior 2 PG distribution of 1Ω is W (α− n,β− n,θ+κ(nS)). 2 PG 2 2 For the expression of the Bayes estimators we will give in Section 5, we need to know the explicit expression of the posterior mean of Ω and Σ when the prior on Ω is the W or equivalently when the prior on Σ is the IW . PG PG The mean of the W can be immediately obtained by differentiation of the PG cumulant generating function since the W family is a natural exponential PG family. From (3.4), from Corollary 3.1 and from (4.25) in [27], we easily obtain the posterior mean for Ω=Σ−1 as follows. b 8 B. RAJARATNAM,H. MASSAMAND C. M. CARVALHO Proposition3.2. Let S and Ωbe asinCorollary 3.1;thenthe posterior mean of Ω, given nS, is k n −1 0 E(Ω|S)=−2 α − ((θ+κ(nS)) ) "j=1(cid:18) j 2(cid:19) Cj X (3.9) k n −1 0 − β − ((θ+κ(nS)) ) . j=2(cid:18) j 2(cid:19) Sj # X Since the IW is not an exponential family, its expected value is not as PG straightforward to derive. It is given in the following theorem. Theorem 3.1. Let X be a random variable on Q such that X ∼ G IW (α,β,θ) with (α,β)∈B and θ∈Q ; then E(X) is given by (3.10)– PG P G (3.14): θ h2i E(x )= h2i −(α +((c −s )/2)+γ )−((s +1)/2) 1 1 2 2 2 (3.10) θ h2i = , −(α +((c +1)/2)+γ ) 1 1 2 θ (3.11) E(x )= C1\S2,S2 , C1\S2,S2 −(α +((c +1)/2)+γ ) 1 1 2 θ[1]· E(x )= C1\S2 −(α +((c −s +1)/2)) 1 1 2 s 2 (3.12) × 1− 2(α +((c +1)/2)+γ ) (cid:18) 1 1 2 (cid:19) θ θ−1θ + C1\S2,S2 h2i S2,C1\S2 , −(α +((c +1)/2)+γ ) 1 1 2 and for j=2,...,k (3.13) E(x )=E((x x−1))E(x )=θ θ−1E(x ), [ji [ji hji hji [ji hji hji E(x )= θ[j]· (1+ 1tr(θ−1E(x ))) [j] −(α +((c −s +1)/2)) 2 hji hji j j j (3.14) +θ θ−1E(x )θ−1θ . [ji hji hji hji hj] The proof is rather long and technical and given in the Appendix. Let us note here that (3.10)–(3.14) can also be written in a closed-form expression FLEXIBLE COVARIANCEESTIMATION 9 of the Choleski type, that is, E(X)=TtDT with T lower triangular and D diagonal, where the shape parameters (α,β) are solely contained in D. We do not give it here for the sake of brevity. The important consequence of this theorem is that, from (3.10) to (3.14), we can rebuild E(X)∈Q . Indeed, by definition of Q ,E(X) is made up G G first of E(X ) which is given by (3.10), (3.11), and its transpose and (3.12) C1 and then, successively, of the jth “layer”: E(X ) and its transpose, and [ji E(X ), for each j =2,...,k. These are immediately obtained from (3.13) [j] and(3.14)since,bydefinition,S ⊆H andthereforethequantityE(X ) j j−1 hji is a sub-block of E(X ) and has therefore already been obtained in the Hj−1 first j−1 steps. We can therefore now deducethe posterior mean of Σ when the prior is IW (α,β,θ). PG Corollary 3.1. Let S and Σ be as in Corollary 3.1; then the posterior mean for Σ when the prior distribution on 2Σ is IW (α,β,θ) is given by PG (3.10)–(3.14) where X is replaced by 2Σ, θ is replaced by θ +κ(nS) and (α ,β )’s are replaced by α − n,β − n’s. i i i 2 i 2 3.2. Shrinkage by layers and the choice of the scale parameter θ. When we use the IW (α,β,θ) as a prior distribution for the scale parameter PG Σ, we have to make a choice for the shape hyperparameters (α,β) and the scalehyperparameterθ.WhenGiscomplete,theIW (α,β,θ)becomesthe PG regular inverse Wishart IW(p,θ) as given in (3.1). When G is decomposable and one uses the hyper inverse Wishart HIW(δ,θ), in the absence of prior information,itistraditionaltotakeθtobeequaltotheidentityoramultiple of the identity and δ small, such as 3, for example (see [21]). Thescale parameter,however, can play an importantroleif wehave some prior knowledge on the structure of the covariance matrix (see [4]) and we are interested in “shrinking” the posterior mean of Σ toward a given target. In the saturated case, for a sample of size n from the N(0,Σ) distribution with a Wishart W(ν,(νD)−1) prior on Ω=Σ−1, the posterior mean of Σ is 2 νD+nS (3.15) E(Σ|S)= . ν+n−r−1 First, we note that when n is held fixed and ν is allowed to grow, the posterior mean tends toward D while if ν is held fixed and n is allowed to grow, the estimator tends toward S. Next, let us consider the eigenvalues of theposteriormean.IfwetakeD=¯lI where¯listheaverageoftheeigenvalues l ,...,l of the mle S, then it is easy to see that the eigenvalues g , i= 1 r i 1,...,r of E(Σ|S) are ν¯l+nl i (3.16) g = , i ν−(r+1)+n 10 B. RAJARATNAM,H. MASSAMAND C. M. CARVALHO nearly a weighted average of ¯l and l . Some simple algebra will show that i for l <¯l we always have l <g and that for i such that l >¯l, that is, for i i i i C = li >1, we will have g <l whenever ν> Ci (r+1). Since in order for i ¯l i i Ci−1 thepriortobeproper,wemusthavethatν >r−1,weseethatthiscondition is very weak as long as Ci is close to 1. When the condition ν >r−1 is Ci−1 satisfied, the eigenvalues of the posterior mean are shrunk toward ¯l and the spanof theeigenvalues of E(Σ|S) is smaller than thespanof theeigenvalues of S, which generally can be used to correct the instability of S. (We note that if C = li is sufficiently large, Ci will be sufficiently close to 1 and if i ¯l Ci−1 li is close to 1, then there is really no need to shrink the eigenvalues.) ¯l In Section 5 we show that our Bayes estimators can beexpressed in terms of theposterior mean of Σ and Ω with the IW and the W , respectively, PG PG as priors. One would like to be able to prove properties for the eigenvalues of our estimators similar to those of the posterior mean under the Wishart in the saturated case. This is beyond the scope of this paper. However, we observe in the numerical examples given in Sections 6 and 7 that the eigen- values of our estimators do have shrinkage properties. With this motivation, inSections 6and7,wewilluseIW priorswith θ sothatthepriormeanof PG Σ is the identity as well as with θ equal to the identity. Thus,we firstderive θ so that E(Σ)= 1E(IW (α,β,θ))=I and then, we will argue that our 2 PG estimators can be viewed as shrinkage estimators in the sense of shrinkage toward structure. Lemma 3.1. Let Σ ∈ Q be such that 2Σ ∼ IW (α,β,θ) for given G PG (α,β)∈A. In order to have E(Σ)=I it is sufficient to choose θ as a diag- onal matrix with diagonal elements equal to c −s +1 s −1 1 2 2 θ =−2 α + 1− for l∈[1], ll 1 2 2(α +((c +1)/2)+γ ) (cid:18) (cid:19)(cid:18) 1 1 2 (cid:19) c −s 1 2 θ =−2 α + +γ −(s +1) for l∈h2i, ll 1 2 2 2 (cid:18) (cid:19) c −s +1 1 −1 θ =−2 α + j j 1+ tr(θ−1E(x )) ll j 2 2 hji hji (cid:18) (cid:19)(cid:18) (cid:19) for l∈[j],j =2,...,k. The proof is immediate from (3.10)–(3.14). Let us now argue that one of our estimators (to be derived in Section 5), ΩWPG, equal to the inverse of the completion of the posterior mean L1 E(Σ|S) of Σ when Σ∼IW (α,β,θ), can be viewed as a shrinkage esti- PG matoer. It follows from Theorem 4.4 of [27] that, when the prior on Σ is the IWPG(α,β,θ), Σ[i]·∼IWci−si(−αi,θ[i]·) as defined in (3.1). Then, since

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.