ebook img

Variational formulas for the power of the binary hypothesis testing problem with applications PDF

0.11 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Variational formulas for the power of the binary hypothesis testing problem with applications

Variational formulas for the power of the binary hypothesis testing problem with applications Nir Elkayam Meir Feder Department of Electrical Engineering - Systems Tel-Aviv University, Israel Email: [email protected], [email protected] Abstract 6 Twovariationalformulasforthepowerofthebinaryhypothesistestingproblemarederived.ThefirstisgivenastheLegendre 1 transformofacertainfunctionandthesecond,inducedfromthefirst,isgivenintermsoftheCumulativeDistributionFunction 0 (CDF)ofthelog-likelihoodratio.Oneapplicationofthefirstformulaisanupperboundonthepowerofthebinaryhypothesis 2 testing problem in terms of the Re´nyi divergence. The second formula provide a general framework for proving asymptotic n andnon-asymptoticexpressionsforthe powerofthe testutilizingcorrespondingexpressionsforthe CDF ofthe log-likelihood. a J The framework is demonstrated in the central limit regime (i.e., for non-vanishing type I error) and in the large deviations 5 regime. 2 I. INTRODUCTION ] A classical problem in statistics and information theory is the binary hypothesis testing problem where two distributions, T P and Q, are given. For each test, we have two types of errors, namely the miss-detection (type I) and the false-alarm (type I . II) errors. According to the Neyman-Pearson lemma, the optimal test is based on thresholding the likelihood ratio between s c P and Q. The behavior of the optimal tradeoff between the two types of errors has been studied both in the asymptotic and [ non-asymptotic regimes, and in the central limit regime and large deviation regime. Knowledge of the optimal tradeoff turns out to be useful for recent studies in finite block-length information theory, e.g., in channel coding [1, Section III.E],[2] in 1 v data compression [3] and more. 0 Consider two probability measures P and Q on a sample space W1. This paper provides two variational formulas for 1 the power (or the optimal tradeoff) of the binary hypothesis problem between P and Q. The first is given as the Legandere 8 transform of the convex function (of λ): 6 0 f(λ)=λ min(Q(w),λP(w)) . − 1 w W 0 X∈ and the second, derived from the first, is given as a function of the CDF of the log-likelihood ratio between P and Q with 6 1 respect to P. : We use the first formula to derive a general upper bound on the power of any binary hypothesis testing problem in terms v of the Re´nyi divergence between P and Q. The second formula leads to a general framework for proving asymptotic and i X non-asymptoticbounds on the power of the binary hypothesistesting problem by plugging-inany approximationfor the CDF r of the log-likelihood ratio. The error term in the CDF approximation leads to a corresponding error term in the power of a the binary hypothesis problem. Specifically, by using the Berry-Esse´en theorem we get an approximation of the CDF up to an additive term, which results in an additive error term in the corresponding type I error. By using a large deviation approximation of the CDF, we get an approximation within a multiplicative term which results in a multiplicative error term in the corresponding type I error. II. VARIATIONAL FORMULA’SFOR THEBINARY HYPOTHESIS TESTINGPROBLEM Recall some general (and standard) definitions for the optimal performance of a binary hypothesis testing between two probability measures P and Q on W: β (P,Q)= min Q(w)P (1w), (1) α ZW where P : W 0,1 is any randomizPedw∈tWesPt.(wPTZ)hP|eWZ|:mWi(n1i|wm)u≥mαwXis∈Wguaranteed| to b|e achieved by the Neyman–Pearson ZW lemma. Th|us, β (P→,Q{) giv}es the minimum probability of error under hypothesis Q given that the probability of error under α hypothesis P is not larger than 1 α. The quantity β denotes the power of the test at significance level 1 α. − − 1throughout thispaperweassumediscreteprobability measure Recall that the optimal test is: P =1 +δ 1 , Z|W {QP((ww))<λ} · {QP((ww))=λ} where λ,δ are tuned so that P(w)P (1w)=α w∈W Z|W | Lemma 1. The following varPiational formula holds: β (P,Q)=max min(Q(w),λP(w)) λ(1 α) . (2) α λ − − ! w W X∈ Moreover, β (P,Q)= min(Q(w),λP(w)) λ(1 α) (3) α − − w W X∈ If and only if: Q(w) Q(w) P w: <λ α P w : λ (4) P(w) ≤ ≤ P(w) ≤ (cid:26) (cid:27) (cid:26) (cid:27) Thenextlemmapresentsanothervariationalformula,wherethepowerofthebinaryhypothesistestingproblemisexpressed in terms of the CDF of the log-likelihood ratio. Lemma 2. Let F(z) = Pr logP(w) z denote the CDF of the log-likelihood ratio with respect to the distribution P. P Q(w) ≤ Then: n o β (P,Q)=max ∞F(z)e zdz e Rα (5) 1 α − − − R (cid:18)ZR − (cid:19) Moreover, β1 α(P,Q)= ∞F(z)e−zdz e−Rα (6) − ZR − If and only if: P(w) P(w) P w :log <R α P w :log R (7) Q(w) ≤ ≤ Q(w) ≤ (cid:26) (cid:27) (cid:26) (cid:27) The proof of Lemmas 1 and 2 appears in Appendix A. The following equality may facilitate the expression in (5) ∞F(z)e zdz − ZR =F(R)e−R+EP e−logQP((zz))1 logP(z)>R (8) (cid:18) { Q(z) }(cid:19) This relation follows by using integration in parts and the Riemann-Stieltjes integral, and it holds for both discrete and continuous distributions. Its proof is straight-forward and omitted due to lack of space. It is interesting to note that for R such that F(R)=α, (6), (7) and (8) imply: β1−α(P,Q)=EP (cid:18)e−logQP((zz))1{logQP((zz))>R}(cid:19) =E 1 Q logP(z)>R (cid:18) { Q(z) }(cid:19) P(z) =Q w :log >R (9) Q(z) (cid:18) (cid:19) which gives an intuition and serves as a sanity check for the proposed formulas. III. APPLICATIONS A. An upper bound on β (P,Q) in terms of the Re´nyi divergence 1 α − Let: g (P,Q),log P(w)sQ(w)1 s s − ! w X For 0 s 1: ≤ ≤ min(Q(w),λP(w)) Q(w)1−s(λP(w))s =λsegs ≤ and: β (P,Q)=max min(Q(w),λP(w)) λα 1 α − λ − ! w W X∈ max(λsegs λα) ≤ λ − =egsmax λs λαe−gs λ − Taking the log: (cid:0) (cid:1) logβ1 α(P,Q) gs+log max λs λαe−gs − ≤ λ − (cid:18) (cid:19) (a) slog(αe−(cid:0)gs)+hb(s) (cid:1) g + s ≤ s 1 − 1 s h (s) b = g + log(α)+ s −s 1 s 1 s 1 − − − where (a) follows by an elementary calculus2. Note that 1 g = D (P Q) is the Re´nyi divergence. Taking α = e r, and s 1 s s || − optimize for s: − s h (s) logβ (P,Q) inf D (P Q) r+ b (10) 1−e−r ≤0 s 1− s || − s 1 s 1 ≤ ≤ − − we get the bound on the error exponents. B. Normal approximations Let G(z) be another CDF (e.g. Gaussian), approximating F(z) with an additive approximation error, i.e. G(z) d F(z) G(z)+d l h − ≤ ≤ for all z. Denote by β (F),max ∞F(z)e zdz e Rα (11) 1 α − − − R (cid:18)ZR − (cid:19) the power of the binary hypothesis testing in terms of the CDF of the log-likelihood ratio. Then: β (F)=β (P,Q) 1 α 1 α − − =max ∞F(z)e zdz e Rα − − R (cid:18)ZR − (cid:19) max ∞(G(z)+d )e zdz e Rα h − − ≤ R (cid:18)ZR − (cid:19) =max ∞G(z)e zdz+d e R e Rα − h − − R (cid:18)ZR − (cid:19) =max ∞G(z)e zdz e R(α d ) − − h R (cid:18)ZR − − (cid:19) =β (G) 1−α+dh And similarly: β (P,Q) β (G) 1−α ≥ 1−α−dl 2herehb(s)isthestandardbinaryentropy functioninnats,ie:hb(s)=−slog(s)−(1−s)log(1−s) Let G be now a CDF of a Gaussian distribution approximating F. In this case β (G) can be evaluated explicitly. 1 α Specifically, let L(w)=log P(w) and assume L (D,V) under P, i.e., the likelihood−ratio is distributed normally, then: Q(w) ∼N P(w) G(z)=Pr w :log z P Q(w) ≤ (cid:26) (cid:27) =Pr w:L(w) z P { ≤ } L(w) D z D =Pr w : − − P √V ≤ √V (cid:26) (cid:27) z D =Φ − √V (cid:18) (cid:19) where Φ(x)= √12π x e−t2/2dt. The optimal R=Rα is given by equation (7): Φ R√α−VD =α, i.e. −∞ R R =D+√VΦ 1(α) (cid:16) (cid:17) (12) α − We can use (9): E e−logPQ((zz))1 logP(z)>R (cid:18) { Q(z) }(cid:19) = ∞e−z 1 e−(z−2VD)2dz √2πV ZR (=a)e−D ∞ e−z 1 e−2zV2dz √2πV ZR−D =e−D ∞ 1 e−12(z+VV)2−V2dz · √2πV ZR−D =e−D+V/2 ∞ 1 e−21zV2dz · √2πV ZR−D+V R D+V =e−D+V/2 ΦC − (13) · √V (cid:18) (cid:19) where ΦC(x)=1 Φ(x)3. Plugging (12) into (13) we get: − E(cid:18)e−logPQ((zz))1{logQP((zz))>Rα}(cid:19) =e−D+V/2 ΦC Φ−1(α)+√V · (cid:16) (cid:17) Using the following approximation of ΦC(t), [4, Formula 7.1.13]: 2 e t2/2 2 e t2/2 − ΦC(t) − (14) πt+√t2+4 ≤ ≤ πt+ t2+8/π r r we get for t>>0 (e.g. when V >>0): p 2e t2/2 1 e t2/2 ΦC(t) − = − ≈ π 2t 2π t r r and so: e D+V/2 ΦC Φ 1(α)+√V − − · (cid:16) (Φ−1(α)+(cid:17)√V)2 e D+V/2 1 e− 2 − ≈ · 2π Φ 1(α)+√V r − e D √VΦ−1(α) − − ≈ √V where denote equality up to a multiplicative constant term. ≈ 3ΦC istheQfunction oftheGaussiandistribution. For the block memoryless hypothesis testing problem, the CDF of the log-likelihood ratio can be approximated via the Berry-Esse´en theorem [5, Theorem 1.2]. In this paper we omit the details and refer the interested reader to the monograph [5]. Note also that the results abovestrengthen the existing results on the finite block length source coding problemas it is an instance of the binary hypothesis testing problem [5, 3.2]. C. Large deviation regime In this section we outline the application of the variational formula (5) to the analysis of the large deviation regime of the binary hypothesis testing problem. This might simplify the proof of the formula for the error exponent of the power of the binary hypothesis testing problem with general sources [6, Theorem 2.1]. The setting is as follow. A sequence of distributions P ,Q is given and the limit (if exists) of: n n 1 E (r)= logβ (P ,Q ) (15) n −n 1−e−nr n n is of interest. Let: P (w) F (z)=Pr w:log n z (16) n Pn Q (w) ≤ (cid:26) n (cid:27) and assume that we have a large deviation approximation of F (z), i.e.: n e−nE1(z)−nδnl Fn(nz) e−nE1(z)+nδnh (17) ≤ ≤ The sequences δl,δh 0 represent the multiplicative approximation error. We refer the reader to [7] for the theorems of n n n→ Crame´r and Ellis–Ga¨rt→ne∞r showing when such approximationsexist. Let: fn(r,R)=n ∞e−nE1(z)e−nzdz e−n(R+r) (18) − ZR Then: β (P ,Q ) 1 e nr n n − − =max ∞Fn(z)e−zdz e−nRe−nr R (cid:18)ZnR − (cid:19) (=a)max n ∞F (nz)e nzdz e n(R+r) n − − R (cid:18) ZR − (cid:19) max n ∞e−nE1(z)−nz+nδnhdz e−n(R+r) ≤ R (cid:18) ZR − (cid:19) =enδnhmax n ∞e−nE1(z)−nzdz e−n(R+r+δnh) R (cid:18) ZR − (cid:19) =enδnhmax fn(r+δnh,R) (19) R where (a) comes from a change of variables z =(cid:0)nz . The lowe(cid:1)r bound also follows: ′ β1−e−nr(Pn,Qn)≥e−nδnl mRax fn(r−δnl,R) (20) Let: (cid:0) (cid:1) 1 E (r)= logmax(f (r,R)) (21) 2,n n −n R Combining (21), (19) and (20): δh+E (r+δh) E (r) δl +E (r δl) (22) − n 2,n n ≤ n ≤ n 2,n − n So the analysis of E (r) is of interest. Differentiating f (r,R) with respect to R: 2,n n d fn(r,R)(=a) ne−nE1(R)e−nR+ne−n(R+r) dR − =ne nR e nr e nE(R) − − − − (cid:16) (cid:17) where (a) follows by differentiation under the integral sign4[8]. The optimal R satisfy: r=E(R) (23) TheasymptoticanalysisofE (r)canbecarriedoutusingtheLaplacemethodofintegrationwhichleadstotheasymptotic 2,n behavior of E (r). We omit the details due to space limitations. n IV. SUMMARY Two variational formulas for the power of the binary hypothesis testing problem were derived. The formulas can provide tighterboundsonthepowerofthetest,e.g.,intermsofthe Re´nyidivergence.Furthermore,aframeworkforapproximatingthe poweroftheoptimaltestisproposed.AnyapproximationoftheCDFofthelog-likelihoodratiowillresultinanapproximationto powerofthebinaryhypothesistestingproblem.TheapproximatingCDFshouldbesimpleenoughtoallowthecalculationofan approximated(orexact)poweroftheapproximatingproblem.Specifically,wehaveshownthatfortheGaussianapproximation, the exact power of the test can be calculated. In the large deviation regime, the power of the binary hypothesis problem can also be approximated using the proposed framework by utilizing the Laplace’s method of integration. APPENDIX A PROOFS Proof of lemma 1:: Proof of (3) Let λ,δ be the thresholds for the optimal test, and let: Q(w) A= w: <λ P(w) (cid:26) (cid:27) Q(w) B = w : =λ P(w) (cid:26) (cid:27) Then: α=P(A)+δP(B) (24) And: β =Q(A)+δQ(B) (25) Multiply (24) by λ, subtract (25) and use Q(B)=λP(B): β λα=Q(A) λP(A) − − On the other hand: min(Q(w),λP(w)) = Q(w)+ λP(w) w W w A w Ac X∈ X∈ X∈ =Q(A)+λ(1 P(A)) − =Q(A) λP(A)+λ − =β λα+λ − Thus: β = min(Q(w),λP(w)) λ(1 α) − − w W X∈ Proof of the sup formula (smaller λ): Note that the optimal λ satisfies the following: Q(w) Q(w) P w : λ 1 α P w : >λ (26) P(w) ≥ ≥ − ≥ P(w) (cid:26) (cid:27) (cid:26) (cid:27) 4 ddx(cid:16)Rab((xx))f(x,t)dt(cid:17)=f(x,b(x))b′(x)−f(x,a(x))a′(x)+Rab((xx))fx(x,t)dt Let λ <λ: 1 min(Q(w),λ P(w)) min(Q(w),λP(w)) 1 − w W w W X∈ X∈ = (λ P(w) Q(w)) 1 − w∈W:λ1P(wX)<Q(w)<λP(w) +(λ λ) P(w) 1 − w∈W:λPX(w)≤Q(w) (a) (λ λ) P(w) 1 ≤ − w∈W:λPX(w)≤Q(w) Q(w) =(λ λ)P w: λ 1 − P(w) ≥ (cid:26) (cid:27) (b) (λ λ)(1 α) 1 ≤ − − where (a) follow from: λ P(w) Q(w)<0, (b) follow from λ λ<0 and P w: Q(w) λ 1 α. Rearranging the 1 − 1− P(w) ≥ ≥ − terms: n o min(Q(w),λ P(w)) λ (1 α) 1 1 − − w W X∈ min(Q(w),λP(w)) λ(1 α) ≤ − − w W X∈ If λ does not satisfy the condition (4), then: 1 If P w : Q(w) λ < P w: Q(w) <λ , then we are finished because there exist w with P(w )> 0, Q(w0) < λ, • P(w) ≤ 1 P(w) 0 0 P(w0) and Qn(w0) >λ , whioch givens strict inequalioty in (a) above. P(w0) 1 IfP w : Q(w) λ =P w : Q(w) <λ thenP w : Q(w) λ <αandwehavestrictinequalityP w : Q(w) <λ < • P(w) ≤ 1 P(w) P(w) ≤ 1 P(w) α, wnhich leads to a sotrict innequality in (b) aobove. n o n o Proof of the sup formula (greater λ): For λ >λ we have: 1 min(Q(w),λ P(w)) 1 w W X∈ =Q w:Q(w)<λP(w) { } +Q w :λP(w) Q(w) λ P(w) 1 { ≤ ≤ } +λ P w :Q(w)>λ P(w) 1 1 { } (a) Q w :Q(w)<λP(w) ≤ { } +λ P w :λP(w) Q(w) λ P(w) 1 1 { ≤ ≤ } +λ P w :Q(w)>λ P(w) 1 1 { } =Q w:Q(w)<λP(w) +λ P w :Q(w) λP(w) 1 { } { ≥ } where (a) follow upper bounding Q(w) with λ P(w). 1 min(Q(w),λ P(w)) min(Q(w),λP(w)) 1 − w W w W X∈ X∈ Q w :Q(w)<λP(w) ≤ { } +λ P w:Q(w) λP(w) 1 { ≥ } Q w:Q(w)<λP(w) λP w :Q(w) λP(w) − { }− { ≥ } =(λ λ)P w :Q(w) λP(w) 1 − { ≥ } (λ λ)(1 α) 1 ≤ − − Since λ λ>0 and P w :Q(w) λP(w) 1 α, we have: 1 − { ≥ }≤ − min(Q(w),λ P(w)) λ (1 α) 1 1 − − w W X∈ min(Q(w),λP(w)) λ(1 α) ≤ − − w W X∈ If λ does not satisfy the condition (4), then P w : Q(w) <λ <P w: Q(w) <λ and we are finished because there 1 P(w) P(w) 1 exist w with P(w )>0, Q(w0) λ, and Q(w0) <nλ , which givoes stricnt inequality in (ao) above. 0 0 P(w0) ≥ P(w0) 1 Proof of lemma 2:: Q(w) min(Q(w),λP(w))= λP(w)min ,1 λP(w) w W w W (cid:18) (cid:19) X∈ X∈ Q(w) =λE min ,1 P λP(w) (cid:18) (cid:18) (cid:19)(cid:19) where we take c = for c 0 and 0 = 0 in order to handle the case P(w) = 0 as well. In [2, Lemma 2] we proved 0 ∞ ≥ ·∞ that for any positive random variable U: E min eR U,1 =eR ∞F(z)e−zdz · · ZR (cid:0) (cid:0) (cid:1)(cid:1) where: F(z)=Pr logU z {− ≤ } Here we define U as U(w)= Q(w) and we get: P(w) E min eRQ,1 =eR ∞F(z)e zdz P − P · (cid:18) (cid:18) (cid:19)(cid:19) ZR where: Q(w) F(z)=Pr z : log z P − P(w) ≤ (cid:26) (cid:27) P(w) =Pr z :log z P Q(w) ≤ (cid:26) (cid:27) Taking λ=e R we get: − Q min(Q(w),λP(w))=λE min ,1 P λP w W (cid:18) (cid:18) (cid:19)(cid:19) X∈ Q =e R E min eR ,1 − P · P (cid:18) (cid:18) (cid:19)(cid:19) = ∞F(z)e−zdz ZR Plugging back in (3) and writing for 1 α instead of α: − β (P,Q)=max min(Q(w),λP(w)) λ(α 1 α − λ − ! w W X∈ =max ∞F(z)e−zdz e−Rα R (cid:18)ZR − (cid:19) The optimal λ satisfy (4). Rewriting the condition in terms of R after some algebra we get: P(w) P(w) P w :log <R α P w :log R (27) Q(w) ≤ ≤ Q(w) ≤ (cid:26) (cid:27) (cid:26) (cid:27) REFERENCES [1] Y.Polyanskiy,H.V.Poor,andS.Verdu´,“Channelcodingrateinthefiniteblocklengthregime,”InformationTheory,IEEETransactionson,vol.56,no.5, pp.2307–2359, 2010. [2] N. Elkayam and M. Feder, “Achievable and converse bounds over a general channel and general decoding metric,” arXiv preprint arXiv:1411.0319, 2014.[Online]. Available: http://www.eng.tau.ac.il/∼elkayam/FiniteBlockLen.pdf [3] V. Kostina and S.Verdu´, “Fixed-length lossycompression in the finite blocklength regime,” Information Theory, IEEETransactions on, vol. 58, no. 6, pp.3309–3338, 2012. [4] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas, graphs, and mathematical tables. Courier Corporation, 1964, no.55. [5] V.Y.Tan,“Asymptotic estimates ininformation theorywithnon-vanishing errorprobabilities,” 2014. [6] T.SunHan,“Hypothesis testing withthegeneral source,”Information Theory,IEEETransactions on,vol.46,no.7,pp.2415–2427, 2000. [7] A.DemboandO.Zeitouni, Largedeviations techniques andapplications. SpringerScience &BusinessMedia, 2009,vol.38. [8] H.Flanders,“Differentiation undertheintegral sign,”AmericanMathematical Monthly, pp.615–627,1973.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.