ebook img

Some notes on improving upon the James-Stein estimator PDF

0.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Some notes on improving upon the James-Stein estimator

SOME NOTES ON IMPROVING UPON THE JAMES-STEIN ESTIMATOR 7 By Yuzo Maruyama 0 0 The University of Tokyo 2 Weconsiderestimationofamultivariatenormalmeanvectorun- n der sum of squared error loss. We propose a new class of smooth a J estimators parameterized by α dominating the James-Stein estima- tor. The estimator for α = 1 corresponds to the generalized Bayes 7 estimatorwithrespecttotheharmonicprior.Whenαgoestoinfinity, ] the estimator converges to the James-Stein positive-part estimator. T Thus the class of our estimators is a bridge between the admissible S estimator(α=1)andtheinadmissibleestimator(α=∞).Although . theestimators havequasi-admissibility which is a weaker optimality h than admissibility, the problem of determining whether or not the t a estimator for α>1 admissible is still open. m [ 1. Introduction. Let X be a random variable having p-variate normal 1 distributionN (θ,I ).Thenweconsidertheproblemofestimating themean v p p 6 vector θ by δ(X) relative to quadratic loss. Therefore every estimator is 0 evaluated based on the risk function 2 1 kδ(x)−θk2 kx−θk2 0 R(θ,δ) = E kδ(X)−θk2 = exp − dx. 7 θh i ZRp (2π)p/2 2 ! 0 / The usual estimator X, with the constant risk p, is minimax for any dimen- h sion p. It is also admissible when p = 1 and 2, as shown in [1] and [13], t a respectively. [13] showed, however, that when p ≥ 3, there exists an estima- m tor dominating X among a class of equivariant estimators relative to the : v orthogonal transformation group which have the form i X (1.1) δ (X) = (1−φ(kXk2)/kXk2)X. φ r a [7] succeeded in giving an explicit form of an estimator improving on X as δ (X) = (1−(p−2)/kXk2)X, JS which is called the James-Stein estimator. More generally, a large class of better estimators than X has been proposed in the literature. The strongest tool for this is [14]’s identity as follows. AMS 2000 subject classifications: Primary 62C15; secondary 62C10 Keywords and phrases: The James-Stein estimator, quasi-admissibility 1 Y.MARUYAMA/IMPROVINGON THE JSE 2 Lemma 1.1 ([14]). If Y ∼ N(µ,1) and h(y) isanydifferentiable function ′ such that E[|h(Y)|] < ∞, then ′ (1.2) E[h(Y)(Y −µ)]= E[h(Y)]. By using the identity (1.2), the risk function of the estimator of the form ′ δ (X) = X +g(X) = (X +g (X),...,X +g (X)) is written as g 1 1 p p R(θ,δ ) = E kδ (X)−θk2 g θ g h i p = E kX −θk2 +E kg(X)k2 +2 E (X −θ)′g(X) θ θ θ h i h i Xi=1 (cid:2) (cid:3) p ∂ = p+E kg(X)k2 +2 E g (X) , θ θ i ∂x h i Xi=1 (cid:20) i (cid:21) where g (x) is assumed to be differentiable and E|(∂/∂x )g (X)| < ∞ for i i i i= 1,...,p. Since a statistic Rˆ(δ(X)) given by p ∂ (1.3) p+kg(X)k2 +2 g (X) i ∂x i i=1 X does not depend on the unknown parameter θ and satisfies E(Rˆ(δ(X))) = R(θ,δ), it is called the unbiased estimator of risk. Clearly if p ∂ (1.4) kg(x)k2 +2 g (x) ≤ 0, i ∂x i i=1 X for any x, then δ (X) dominates X. g TomakethestructureofestimatorsimprovingonX morecomprehensible, we consider a class of orthogonally equivariant estimators of a form given in (1.1). Assigning g(x) = −φ(kxk2)x/kxk2 in (1.4), (1.4) becomes ′ (1.5) φ(w)(2(p−2)−φ(w))/w+4φ(w) ≥ 0, for w = kxk2. The inequality (1.5) is, for example, satisfied if φ(w) is mono- tone nondecreasing and within [0,2(p−2)] for any w ≥ 0. Since φ(w) = c for 0 < c< 2(p−2) satisfies the inequality (1.5), (1.6) (1−c/kXk2)X Y. MARUYAMA/IMPROVINGON THE JSE 3 with 0< c < 2(p−2) dominates X. The estimator δ , (1.6) with c= p−2, JS is the best estimator among a class of estimators (1.6) because the risk function of (1.6) is given by p+c(c−2(p−2))E[kXk−2] and hence minimized by c = p−2. It is however noted that when kxk2 < p−2, the James-Stein estimator yields an over-shrinkage and changes the sign of each component of X. The James-Stein positive-part estimator δ+ (X) = max(0,1−(p−2)/kXk2)X, JS eliminates this drawback and dominates the James-Stein estimator. We no- tice herethatthetechniqueforprovingtheinadmissibility ofδ isnotfrom JS the Stein identity or the unbiased estimator of risk given in (1.3). The risk difference between δ and δ is given by JS φ (φ(kXk2)−p+2)2 R(δ ,θ)−R(δ ,θ)= E − +4φ′(kXk2) , JS φ " kXk2 # but φ+ (w) = min(w,p − 2) which makes the James-Stein positive-part JS estimator does not satisfy the inequality (1.7) −(φ(w)−p+2)2+4wφ′(w) ≥ 0 for any w ≥ 0. In Section 2, we will show that there is no φ which satisfies (1.7) for any w ≥ 0, that is, the Stein identity itself is not useful for finding estimators dominating δ . We call such optimality for the James-Stein es- JS timator quasi-admissibility. We will explain theconcept and give a sufficient condition for quasi-admissibility in Section 2. In spite of such difficulty, some estimators which dominate the James- Stein estimator have been given by several authors. [11] and [6] considered the class of estimators of forms δ (X) = (1−φ (kXk2)/kXk2)X where LK LK n φ (w) = p−2− a w−bi, LK i i=1 X where a ≥ 0 for any i and 0 < b < b < ··· < b . For example when i 1 2 n n = 1, they both showed that, δ (X) for 0 < b < 4−1(p − 2) and LK 1 a = 2b 2b1Γ(p/2−b −1)/Γ(p/2−2b −1) is superior to the James-Stein 1 1 1 1 Y.MARUYAMA/IMPROVINGON THE JSE 4 estimator. [10] gave two estimators which shrink toward the ball with center 0, δi (X) = (1−φi (kXk2)/kXk2)X for i = 1,2 where KT KT 0 w ≤ r2 φ1 (w) = KT (p−2− p−2(r/w1/2)i w > r2, i=1 P 0 w ≤ {(p−1)/(p−2)}2r2 φ2 (w) = KT (p−2−r/(w1/2 −r) w > {(p−1)/(p−2)}2r2. Theyshowedthatwhenrissufficientlysmall,thesetwoestimatorsdominate theJames-Steinestimator.However,theseestimatorsarenotappealingsince theyareinadmissible.Inoursetting,theestimation ofamultivariatenormal mean,[3]showedthatanyadmissibleestimator shouldbegeneralized Bayes. Sincetheshrinkagefactor (1−φ (w)/w) becomes negative for somew and LK φi for i= 1,2 fail to be analytic, neither δ nor δi can be generalized KT LK KT Bayes or admissible. In general, when we propose an estimator (δ∗, say) dominating a cer- tain inadmissible estimator, it is extremely important to find it among ad- missible estimators. If not, a more difficult problem (finding an estimator improving on δ∗) just occurs. To the best of our knowledge, the sole ad- missible estimator dominating the James-Stein estimator is [8]’s estimator δ (X) = (1−φ (kXk2)/kXk2)X where K K exp(−w/2) (1.8) φ (w) = p−2−2 . K 1λp/2−2exp(−wλ/2)dλ 0 The estimator is generalized BayesRwith respect to the harmonic prior den- sitykθk2−p whichwasoriginallysuggestedby[14].Theonlyfaultis,however, that it does not improve upon δ (X) at kθk = 0. (See Section 3 for the JS detail.) Shrinkage estimators like (1.1) make use of the vague prior infor- mation that kθk is close to 0. It goes without saying that we would like to get the significant improvement of risk when the prior information is accurate. Though δ (X) is an admissible generalized Bayes estimator and K thus smooth, it has no improvement on δ (X) at the origin kθk = 0. On JS the other hand, δ+ (X) improves on δ (X) significantly at the origin, but JS JS it is not analytic and is thus inadmissible by [3]’s complete class theorem. Therefore a more challenging open problem is to find admissible estimators dominating the James-Stein estimator especially at kθk = 0. In this paper, we will consider a class of estimators 1(1−λ)λα(p/2−1)−1exp(−kXk2αλ/2)dλ (1.9) δ (X) = 0 X α 1λα(p/2−1)−1exp(−kXk2αλ/2)dλ R 0 R Y. MARUYAMA/IMPROVINGON THE JSE 5 for α ≥ 1. In Section 3, we show that δ with α ≥ 1 improves on the α James-Stein estimator and that it has strictly risk improvement at kθk= 0. Furthermore we see that δ approaches δ+ as α goes to ∞. Since δ with α JS α α = 1 corresponds to δ , the class of δ with α ≥ 1 is a bridge between δ K α K which is admissible and δ+ which is inadmissible. Although we show that JS δ with α > 1 is quasi-admissible, which is introduced in Section 2, we have α no idea on its admissibility at this stage. 2. Quasi-admissibility. In this section, we introduce the concept of quasi-admissibility andgive asufficientcondition forquasi-admissibility. We deal with a reasonable class of estimators which have the form (2.1) δ = X +∇logm(kXk2) m = {1+2m′(kXk2)/m(kXk2)}X ′ where ∇ is a differential operator (∂/∂x ,...,∂/∂x ) and m is a positive 1 p function. If m(w) = w−c for c > 0, (2.1) becomes the James-Stein type estimator (1.6). Any (generalized) Bayes estimator with respect to spherical symmetric measure π should also have the form (2.1) because it is written as θexp(−kx−θk2/2)π(dθ) Rp exp(−kx−θk2/2)π(dθ) R Rp (θ−x)exp(−kx−θk2/2)π(dθ) =R x+ Rp exp(−kx−θk2/2)π(dθ) R Rp ∇ exp(−kx−θk2/2)π(dθ) = x+ RRp exp(−kx−θk2/2)π(dθ) RRp = x+∇Rlog exp(−kx−θk2/2)π(dθ) ZRp = x+∇logm (kxk2) π wherem (kxk2) = exp(−kx−θk2/2)π(dθ).[2]and[5]calledanestimator π Rp of the form (2.1) pseudo-Bayes and quasi-Bayes, respectively. If, for given R m, there exists a nonnegative measure ν which satisfies (2.2) m(kxk2) = exp(−kθ−xk2/2)ν(dθ), Rp Z theestimator oftheform(2.1)istrulygeneralized Bayes. However itisoften difficult to determine whether or not m has such an exact integral form. Y.MARUYAMA/IMPROVINGON THE JSE 6 Suppose that δ (X) = δ (X) + k(kXk2)X is a competitor of δ . m,k m m Thensubstitutingg(x) = ∇logm(kxk2) = 2xm′(kxk2)/m(kxk2)andg(x) = 2xm′(kxk2)/m(kxk2)+k(kxk2)x in (1.3) respectively, we have the unbiased estimators of risk as m′(w) 2 m′(w) m′′(w) (2.3) Rˆ(δ )= p−4w +4p +8w m m(w) m(w) m(w) (cid:18) (cid:19) for w = kxk2 and Rˆ(δ ) = Rˆ(δ )+∆(m,mk) where m,k m ′ m(w) (2.4) ∆(m,mk) = 4w k(w)+2pk(w)+4wk′(w)+wk2(w). m(w) If there exists k such that ∆(m,mk) ≤ 0 for any w ≥ 0 with strict inequal- ity for some w , that implies that δ is inadmissible, that is, R(θ,δ ) ≤ m mk R(θ,δ ) for all θ with strict inequality for some θ. If there does not ex- m ist such k, δ is said to be quasi-admissible. Hence quasi-admissibility is a m weaker optimality than admissibility. Now we state a sufficient condition for quasi-admissibility. The idea is originally from [4], but the paper is not so accessible. See also [12], where quasi-admissibility is called permissibility. Theorem 2.1. The estimator of the form (2.1) is quasi-admissible if 1 ∞ w−p/2m(w)−1dw = ∞ as well as w−p/2m(w)−1dw = ∞. Z0 Z1 Proof. WehaveonlytoshowthatRˆ(δ )≤ Rˆ(δ ),thatis,∆(m,mk) ≤ m,k m 0 implies k(w) ≡ 0. Let M(w) = wp/2m(w) and h(w) = M(w)k(w). Then we have h′(w) h2(w) 4wh2(w) d 1 1 ∆(m,mk) = 4w +w = − + . M(w) M2(w) M(w) dw h(w) 4M(w) (cid:18) (cid:26) (cid:27) (cid:19) First we show that h(w) ≥ 0 for all w ≥ 0. Suppose to the contrary that ′ h(w) < 0 for some w . Then h(w) < 0 for all w ≥ w since h(w) should be 0 0 negative. For all w > w , the inequality 0 d 1 1 (2.5) ≥ dw h(w) 4M(w) (cid:18) (cid:19) ∗ should be satisfied. Integrating both sides of (2.5) from w to w leads to 0 1 1 1 w∗ − ≥ M−1(t)dt. h(w∗) h(w ) 4 0 Zw0 Y. MARUYAMA/IMPROVINGON THE JSE 7 ∗ As w → ∞, the right-hand side of above inequality tends to infinity, and this provides a contradiction since the left-hand side is less than −1/h(w ). 0 Thus we have g(w) ≥ 0 for all w. Similarly we can show that g(w) ≤ 0 for all w. It follows that h(w) is zero for all w, which implies that k(w) ≡ 0 for all w. This completes the proof. Combining Theorem 2.1 and [3]’s sufficient condition for admissibility, we see that a quasi-admissible estimator of the form (2.1) is admissible if it is truly generalized Bayes, that is, there exists a nonnegative measure ν such that (2.6) m(kxk2) = exp(−kx−θk2/2)ν(dθ). ZRp However, for given m, it is often quite difficult to determine whether m has such an integral form. Furthermore, even if we find that m does not have an integral form like (2.6), that is, the estimator is inadmissible, it is generally very difficult to find an estimator dominating the inadmissible estimator. The function m for the James-Stein estimator is m(w) = w2−p, which satisfies the assumptions of Theorem 2.1. Corollary 2.1. The James-Stein estimator is quasi-admissible. In this case, it is not difficult to find an estimator dominating δ by JS taking positive part. But it is not easy to find a large class of estimators dominatingtheJames-Stein estimator. InSection 3,weintroduceanelegant sufficient condition for domination over δ proposed by [9]. JS 3. A class of quasi-admissible estimators improving upon the James-Stein estimator. In this section, we introduce [9]’s sufficient con- dition for improving upon the James-Stein estimator and propose a class of smooth quasi-admissible estimators satisfying it. [9]showedthatiflimw→∞φ(w) = p−2thenthedifferenceofriskfunctions between δ and δ can be written as JS φ R(θ,δ )−R(θ,δ ) JS φ ∞ 2f(w,λ) ′ (3.1) = 2 φ(w) φ(w)−(p−2)+ Z0 0wy−1fp(y,λ)dy! w × y−1f (y,λ)dydw, R p Z0 Y.MARUYAMA/IMPROVINGON THE JSE 8 where λ = kθk2 and f (x;λ) denotes a density of a non-central chi-square p distribution with p degrees of freedom and non-centrality parameter λ. Moreover by the inequality w w f (w;λ)/ y−1f (y;λ)dy ≥ f (w)/ y−1f (y)dy, p p p p Z0 Z0 where f (y) = f (y;0), which can be shown by the correlation inequality, p p we have R(θ,δ )−R(θ,δ ) JS φ ∞ w ≥ 2 φ′(w)(φ(w)−φ (w)) y−1f (y,λ)dy dw, 0 p Z0 (cid:18)Z0 (cid:19) where w φ (w) = p−2+2f (w)/ y−1f (y)dy. 0 p p Z0 Sinceφ0 = φK givenin(1.8)byanintegrationbypartsandlimw→∞φK(w) = p−2, we have the following result. Theorem 3.1 ([9]). If φ(w) is nondecreasing and within [φ (w),p − K 2] for any w ≥ 0, then δ (X) of form (1.1) dominates the James-Stein φ estimator. The assumption of the theorem above is satisfied by φ and φ+ . By K JS (3.1), we see that the risk difference at kθk = 0 between δ and δ , the JS φ limit of which is p−2, is given by ∞ w (3.2) 2 φ′(w)(φ(w)−φ (w)) y−1f (y)dy dw. K p Z0 (cid:18)Z0 (cid:19) Hence δ does not improve upon δ (X) at kθk = 0, although δ (X) is K JS K an admissible generalized Bayes estimator and thus smooth. On the other hand, δ+ (X) improves on δ (X) significantly at the origin, but it is not JS JS analytic and is thus inadmissible by [3]’s complete class theorem. Therefore a more challenging problem is to find admissible estimators dominating the James-Stein estimator especially at kθk= 0. In this paper, we propose a class of estimators (3.3) δ (X) = 1−φ (kXk2)/kXk2 X α α (cid:16) (cid:17) Y. MARUYAMA/IMPROVINGON THE JSE 9 where 1λα(p/2−1)exp(−wαλ/2)dλ φ (w) = w 0 α 1λα(p/2−1)−1exp(−wαλ/2)dλ 0R 2exp(−wα/2) R = p−2− α 1λα(p/2−1)−1exp(−wαλ/2)dλ 0 2 (3.4) = p−2− R . α 1(1−λ)α(p/2−1)−1exp(wαλ/2)dλ 0 R The main theorem of this paper is as follows. Theorem 3.2. 1. δ (X) dominates δ (X) for α≥ 1. α JS 2. The risk of δ (X) for α > 1 at kθk = 0 is strictly less than the risk of α the James-Stein estimator at kθk= 0. 3. δ (X)approaches thepositive-part James-Steinestimator whenαtends α to infinity, that is, lim δ (X) = δ+ (X). α→∞ α JS Clearly δ (X) = δ (X). The class of δ with α ≥ 1 is a bridge between 1 K α δ which is admissible and δ+ which is inadmissible. K JS Proof. [part 1] We shall verify that φ (w) for α ≥ 1 satisfies assump- α tions in Theorem 3.1. Applying the Taylor expansion to a part of (3.4), we have α 1 (1−λ)α(p/2−1)−1exp(wαλ/2)dλ 2 Z0 ∞ i = wi (p−2+2j/α)−1 = ψ(α,w) (say.) i=0 j=0 X Y Asψ(α,w)isincreasinginw,φα(w)isincreasinginw.Aslimw→∞ψ(α,w) = ∞, for any α ≥ 1, it is clear that limw→∞φα(w) = p−2. In order to show that φ (w) ≥ φ (w) = φ (w) for α ≥ 1, we have only to check that ψ(α,w) α K 1 is increasing in α. It is easily verified because the coefficient of each term of ψ(α,w) is increasing in α. We have thus proved the theorem. [part 2] Since φ for α > 1 is strictly greater than φ and strictly in- α K creasing in w, the risk difference of δ for α > 1 and δ at kθk = 0, which α JS is given in (3.2), is strictly positive. Y.MARUYAMA/IMPROVINGON THE JSE 10 [part 3] Since ψ(α,w) is increasing in α, it converges to ∞ w i (p−2)−1 p−2 i=0(cid:18) (cid:19) X by the monotone convergence theorem when α goes to infinity. Considering two cases: w < (≥)p−2, we obtain limα→∞φα(w) = w if w < p−2; = p−2 otherwise. This completes the proof. The estimator δ is expressed as X +∇logm (kXk2) where α α 1 αw 1/α m (w) = λα(p/2−1)−1exp − λ dλ . α 2 (cid:26)Z0 (cid:18) (cid:19) (cid:27) Since m (w) ∼ w2−p for sufficiently large w by Tauberian’s theorem and α 0< m (0) < ∞, we have the following result by Theorem 2.1. α Corollary 3.1. δ (X) is quasi-admissible for p ≥ 3. α Needlesstosay,weareextremelyinterestedindeterminingwhetherornot δ (X) for α > 1 is admissible. Since δ (X) with α > 1 is quasi-admissible, α α it is admissible if it is generalized Bayes, that is, there exists a measure ν which satisfies 1 1/α exp(−kθ−xk2/2)ν(dθ) = λα(p/2−1)−1exp(−αkXk2λ/2)dλ . ZRp (cid:18)Z0 (cid:19) I have no idea on the way to construct such a measure ν so far. Even if we find that there is no ν, which implies δ is inadmissible, it is very difficult α to find an estimator dominating δ for α> 1. α References. [1] Blyth, C. R. (1951). On minimax statistical decision procedures and their admissi- bility. Ann. Math. Statist. 22, 22–42. MRMR0039966 (12,622f) [2] Bock, M. E. (1988). Shrinkage estimators: pseudo-Bayes rules for normal mean vectors. In Statistical decision theory and related topics, IV, Vol. 1 (West Lafayette, Ind., 1986). Springer, New York,281–297. MRMR927108 (89b:62106) [3] Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundaryvalueproblems.Ann.Math.Statist.42,855–903.MRMR0286209 (44 #3423) [4] Brown, L. D. (1988). The differential inequality of a statistical estimation problem. InStatisticaldecisiontheoryandrelatedtopics,IV,Vol.1(WestLafayette,Ind.,1986). Springer, NewYork, 299–324. MRMR927109 (89d:62010) [5] Brown, L. D. and Zhang, L. H. (2006). Estimators for gaussian models having a block-wise structure. Mimeo, (availableat:http://www-stat.wharton.upenn.edu/ lbrown/teaching/Shrinkage/).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.